Job Role- AI Engineer (Data Pipelines & RAG)
Job Type- Full-time
Work Mode- Remote(6 days working)
We are looking for a hands-on AI / Data Engineer (4–7 years) to build and scale data pipelines powering GenAI and agentic applications. You’ll architect data models, build ETL / ELT workflows, and integrate pipelines with RAG-based systems in a fast-paced startup environment.
What You’ll Do
Data Pipelines & Modeling
- Build scalable ETL / ELT pipelines (batch & streaming) using Python + Spark
- Automate ingestion from databases, APIs, files, SharePoint and other document sources
- Process & structure unstructured files (PDFs, tables, charts, drawings, etc.)
- Own chunking, indexing & embedding strategies for RAG / LLM use cases
- Design logical & physical data models, schema mappings & data dictionaries
GenAI & RAG Integration
Feed real-time data into LLM promptsBuild retrieval workflows for downstream agent / RAG systems in RE / ConstructionObservability & Governance
Implement monitoring, alerting & logging for pipeline reliabilityApply IAM, Unity Catalog and other data privacy / security controlsCI / CD & Automation
Use DevOps workflows with GitHub Actions / Azure DevOps / CircleCIBuild reproducible infra using Terraform / ARM templatesUse Prefect / Airflow for orchestrationWhat You’ll Need
5+ years in data engineering, with 1–2 years working on pipelines for unstructured data & RAG systemsStrong Python + SQL, experience with dlt, DuckDB, DVCAzure cloud experience in productionExperience with chunking / indexing strategies for RAGStrong Git / CI / CD workflowsFamiliarity with Prefect or equivalentGood to Have : MLflow, Docker / K8s, Computer Vision, agentic AI concepts, governance & privacy frameworks (GDPR)