Design, develop, and maintain high-performance ETL and real-time data pipelines using Apache Kafka and Apache Flink.
Build scalable and automated MLOps pipelines for training, validation, and deployment of models using AWS SageMaker and associated services.
Implement and manage Infrastructure as Code (IaC) using Terraform to provision and
manage AWS environments.
Collaborate with data scientists, ML engineers, and DevOps teams to streamline model deployment workflows and ensure reliable production delivery.
Optimize data storage and retrieval strategies for large-scale structured and unstructured datasets.
Develop data transformation logic and integrate data from various internal and external sources into data lakes and warehouses.
Monitor, troubleshoot, and enhance performance of data systems in a cloud-native, fast-evolving production setup.
Ensure adherence to data governance, privacy, and security standards across all data handling activities.
Document data engineering solutions and workflows to facilitate cross-functional understanding and ongoing maintenance.
Required Skills and Qualifications :
Extensive experience in building data pipelines and streaming applications using Apache Kafka and Apache Flink.
Strong experience in ETL development, data modeling, and managing data in large-scale environments.
Proficient in AWS services including SageMaker, S3, Glue, Lambda, and Hands-on expertise with MLOps best practices, including model versioning, monitoring, and CI / CD for ML pipelines.
Proficiency in Python and SQL; experience with Java is a plus for streaming jobs.
Deep understanding of cloud infrastructure automation using Terraform or similar IaC tools.
Excellent problem-solving skills with the ability to troubleshoot data processing and
deployment issues.
Experience in fast-paced, agile development environments with frequent delivery cycles.
Strong communication and collaboration skills to work effectively across cross-functional teams.