AI Data Engineer-4-5 yrs experience-Immediate Joiners
The AI Data Engineer designs, develops, and maintains the data pipelines and infrastructure essential for AI and machine learning projects. The role bridges traditional data engineering with the specific requirements of AI—ensuring models are trained on high-quality, well-prepared data and that data flows efficiently from diverse sources into AI and GenAI applications. This is a full time on site job based at our office in Infopark, Kochi.
Key Responsibilities
- Build, test, and maintain scalable data pipelines for AI and machine learning workflows.
- Develop and manage architected data solutions (warehouses, lakes, streaming platforms) to support generative and predictive AI use cases.
- Automate data acquisition, transformation, integration, cleansing, and validation from structured and unstructured sources.
- Collaborate with data scientists, AI / ML engineers, and business teams to understand requirements, provision data assets, and ensure model readiness.
- Optimise ETL / ELT processes for scalability, reliability, and performance.
- Manage data quality frameworks, monitor pipelines, and address data drift, schema changes, or pipeline failures.
- Deploy and track real-time and batch pipelines supporting AI model inference and training.
- Implement security, privacy, and compliance procedures for all AI data operations.
- Document infrastructure, data flows, and operational playbooks related to AI solutions.
Required Skills and Qualifications
Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.Strong expertise with data pipeline orchestration tools (e.g., Apache Airflow, Luigi, Prefect).Proficiency in SQL, Python, and experience working with big data frameworks (Spark, Hadoop).Familiarity with ML / AI frameworks (TensorFlow, PyTorch, Scikit-learn) and MLOps practices.Experience with cloud data solutions / platforms (AWS, GCP, Azure).In-depth understanding of data modelling, storage, governance, and performance optimisation.Ability to manage both batch and streaming data processes and work with unstructured data (images, text, etc.).Excellent troubleshooting, analytical, and communication skills.