Should have total 4-7 Yrs of experience in Machine Learning EngineeringStrong on programming languages like Python, JavaMust have one cloud hands-on experience (GCP preferred)Must have : Experience working with DockersMust have : Environments managing (e.g. venv, pip, poetry, etc.)Must have : Experience with orchestrators like Vertex AI pipelines, Airflow, etcMust have : Understanding of full ML Cycle end-to-endMust have : Data engineering, Feature Engineering techniquesMust have : Experience with ML modelling and evaluation metricsMust have : Experience with Tensorflow, Pytorch or another frameworkMust have : Experience with Models monitoringGood to have : Hyperparameter tuning experienceProficient in either Apache Spark or Apache Beam or Apache FlinkMust have : Advance SQL knowledgeMust be aware of Streaming concepts like Windowing, Late arrival, Triggers etcShould have hands-on experience on Distributed computingShould have working experience on Data Architecture designShould be aware of storage and compute options and when to choose whatShould have good understanding on Cluster Optimisation / Pipeline Optimisation strategiesShould have exposure on GCP tools to develop end to end data pipeline for various scenarios (including ingesting data from traditional data bases as well as integration of API based data sources)Should have Business mindset to understand data and how it will be used for BI and Analytics purposesShould have working experience on CI / CD pipelines, Deployment methodologies, Infrastructure as a code (e.g. Terraform)Good to have, Hands-on experience on KubernetesGood to have Vector based Database like QdrantGood to have : LLM experience (embeddings generation, embeddings indexing, RAG, Agents, etc.)Experience in Working with GCP tools like :
- Storage : CloudSQL, Cloud Storage, Cloud Bigtable, Bigquery, Cloud Spanner, Cloud DataStore, Vector database
- Ingest : Pub / Sub, Cloud Functions, AppEngine, Kubernetes Engine, Kafka, Micro services
- Schedule : Cloud Composer, Airflow
- Processing : Cloud Dataproc, Cloud Dataflow, Apache Spark, Apache Flink
- CI / CD : Bitbucket+Jenkins / Gitlab, Infrastructure as a tool : Terraform
Gcp, Pytorch, Apache Beam, Tensorflow, Machine Learning, Apache Spark, Python, Java, Docker
Skills Required
Apache Spark, Python, Machine Learning, Gcp, Tensorflow, Docker