We are considering immediate joiners only. Interested candidates may send their resumes to resume@volgainfotech.com, including details of their CTC, ECTC, notice period, and a short overview of their relevant ML and Data Engineering work.
About the Role
We are seeking an experienced Data Engineer (5+ years) with strong, hands-on experience in ML / AI data workflows. You will play a critical role in building feature pipelines, model-serving data flows, and end-to-end orchestration that powers our production ML systems.
This is a fully remote, ownership-driven position.
Key Responsibilities
ML / AI Data Engineering
- Build and maintain feature engineering pipelines for ML model training and inference.
- Develop and optimize model-serving data pipelines ensuring low-latency and reliable delivery.
- Design and orchestrate end-to-end ML workflows (Airflow, Prefect, Dagster, Kubeflow, etc.).
- Work closely with Data Scientists and ML Engineers to productionize ML models.
- Implement automated dataset versioning, feature stores, and reproducibility frameworks.
- Build scalable data foundations required for MLOps : monitoring, retraining triggers, model data validation.
Data Pipelines & ETL
- Design and build high-performance ETL / ELT pipelines for structured and unstructured data.
- Manage ingestion from APIs, databases, files, event streams, and cloud storage.
- Ensure pipelines are fault-tolerant, well-monitored, and automated.
Data Modelling & Data Warehousing
- Build and maintain data models, marts, and warehouse layers to support analytics and ML pipelines.
- Translate ML feature requirements into clean and optimized data structures.
Data Quality & Governance
- Implement schema validation, data quality checks, and automated monitoring.
- Maintain metadata, lineage, and documentation for all data flows.
Cloud & Infrastructure
- Develop cloud-native data workflows (AWS / Azure / GCP).
- Work with data storage and compute systems like S3, BigQuery, Snowflake, Databricks, Redshift, etc.
- Ensure performance optimization, scaling, and cost-efficiency.
DevOps, CI / CD & Automation
- Build CI / CD pipelines for data and ML workflows.
- Containerize pipelines using Docker, and manage deployments via Git-based workflows.
- Automate scheduling, builds, and monitoring for data and ML systems.
Required Skills & Experience
- 5+ years of experience as a Data Engineer.
- Major Requirement (Non-Negotiable) : Strong experience working on ML / AI projects, including :
- ML feature pipelines
- Model-serving data workflows
- ML orchestration (Airflow, Prefect, Dagster, Kubeflow, etc.)
- Strong in Python, SQL, and ETL frameworks.
- Experience with big data technologies (Spark, PySpark, Databricks).
- Hands-on with cloud platforms (AWS / Azure / GCP).
- Experience with CI / CD, Docker, Git, APIs.
- Ability to work independently and in cross-functional remote teams.
- Excellent communication and documentation skills.
Nice-to-Have Skills
- Tools : MLflow, Vertex AI, SageMaker, Azure ML
- Streaming : Kafka, Kinesis, Pub / Sub
- Data quality frameworks : Great Expectations, Soda, Pandera
Why Join Us
- Fully remote with flexible schedule
- Work on real-world ML / AI production systems
- High ownership + direct architectural influence
- Opportunity to collaborate with advanced Data Science & ML teams