iMerit is a leading AI data solutions company that transforms unstructured data into structured intelligence for advanced machine learning and analytics. Our customers span autonomous mobility, medical AI, agriculture, and more delivering high-quality data services that power next-generation AI systems.
About the Role
We are looking for a seasoned Engineering Lead to architect, scale, and continuously evolve our analytics and observability platform—a system deeply integrated with annotation tools and ML pipelines. This platform powers real-time visibility, operational insights, and automation across large-scale data operations.
In this role, you will not only lead and mentor a team but also set the technical vision for high-throughput streaming systems and modern data lake / warehouse architectures. You will bring proven expertise in high velocity, high volume data engineering, driving innovation in how we process, curate, and surface data to support mission-critical AI workflows
Key Responsibilities
Lead & Inspire : Build and mentor a high-performing data engineering team, fostering innovation, accountability, and technical excellence
Architect at Scale : Design and implement high-volume batch and real-time data pipelines across structured and unstructured sources
Build and maintain real-time data lakes with streaming ingestion, ensuring data quality, lineage, and availability.
Curate, transform, and optimize datasets into high-performance data warehouses (e.g., Redshift, Snowflake) for downstream analytics
Deep Streaming Expertise : Drive adoption and optimization of Kafka for messaging, event streaming, and system integration, ensuring high throughput and low latency
Advanced Processing : Leverage PySpark for distributed data processing and complex transformations, delivering scalable ETL / ELT pipelines
Orchestration & Automation : Utilize AWS Glue and related cloud services to orchestrate data workflows, automate schema management, and scale pipelines seamlessly
Continuous Improvement : Oversee platform upgrades, schema evolution, and performance tuning, ensuring the platform meets growing data and user demands
Observability & Insights : Implement metrics, dashboards, and alerting for key KPIs (annotation throughput, quality, latency), ensuring operational excellence
Cross-Functional Collaboration : Work closely with product, platform, and customer teams to define event models, data contracts, and integration strategies
Innovation and R&D : Research emerging technologies in data streaming, lakehouse architectures, and observability, bringing forward new approaches and prototypes
Minimum Qualifications
10+ years of experience in data engineering or backend engineering, with at least 2–3 years in a leadership or team-lead role
Proven track record in building and operating data pipelines at scale—including both batch ETL / ELT and real-time streaming
Expert-level experience with Kafka for high-throughput data ingestion, streaming transformations, and integrations
Strong hands-on experience with PySpark for distributed data processing and advanced transformations
In-depth knowledge of AWS Glue(or similar) for orchestrating workflows, managing metadata, and automating ETL pipelines
Demonstrated success in upgrading and maintaining real-time data lakes, curating and transforming datasets into performant data warehouses
Familiarity with lakehouse and warehouse patterns (e.g., Delta Lake, Redshift, Snowflake) and schema versioning
Experience with cloud-native data services (S3, Kinesis, Lambda, RDS) and infrastructure-as-code deployments
Preferred Qualifications
Experience with Databricks and Snowflake solutions, including developing on lakehouse architectures and optimizing warehouse performance
Exposure to annotation platforms, ML workflows, or model validation pipelines
Experience with observability tools (Prometheus, Grafana, OpenTelemetry)
Knowledge of data governance, RBAC, and compliance in large-scale analytics environments
Comfort working in Agile, distributed teams with Git, JIRA, and Slack.
Why Join Us?
At iMerit, you will lead a team at the cutting edge of AI data infrastructure—building and evolving platforms that are explainable, auditable, and scalable. You will play a key role in upgrading and maintaining our streaming data lake and transforming it into analytics-ready warehouses, directly shaping how AI systems are built and trusted at scale.
Manager Data Engineering • India