iMerit is a leading AI data solutions company that transforms unstructured data into structured intelligence for advanced machine learning and analytics. Our customers span autonomous mobility, medical AI, agriculture, and more delivering high-quality data services that power next-generation AI systems.
About the Role
We are looking for a seasoned Engineering Lead to architect, scale, and continuously evolve our analytics and observability platform—a system deeply integrated with annotation tools and ML pipelines. This platform powers real-time visibility, operational insights, and automation across large-scale data operations.
In this role, you will not only lead and mentor a team but also set the technical vision for high-throughput streaming systems and modern data lake / warehouse architectures. You will bring proven expertise in high velocity, high volume data engineering, driving innovation in how we process, curate, and surface data to support mission-critical AI workflows
Key Responsibilities
- Lead & Inspire : Build and mentor a high-performing data engineering team, fostering innovation, accountability, and technical excellence
- Architect at Scale : Design and implement high-volume batch and real-time data pipelines across structured and unstructured sources
- Build and maintain real-time data lakes with streaming ingestion, ensuring data quality, lineage, and availability.
- Curate, transform, and optimize datasets into high-performance data warehouses (e.g., Redshift, Snowflake) for downstream analytics
- Deep Streaming Expertise : Drive adoption and optimization of Kafka for messaging, event streaming, and system integration, ensuring high throughput and low latency
- Advanced Processing : Leverage PySpark for distributed data processing and complex transformations, delivering scalable ETL / ELT pipelines
- Orchestration & Automation : Utilize AWS Glue and related cloud services to orchestrate data workflows, automate schema management, and scale pipelines seamlessly
- Continuous Improvement : Oversee platform upgrades, schema evolution, and performance tuning, ensuring the platform meets growing data and user demands
- Observability & Insights : Implement metrics, dashboards, and alerting for key KPIs (annotation throughput, quality, latency), ensuring operational excellence
- Cross-Functional Collaboration : Work closely with product, platform, and customer teams to define event models, data contracts, and integration strategies
- Innovation and R&D : Research emerging technologies in data streaming, lakehouse architectures, and observability, bringing forward new approaches and prototypes
Minimum Qualifications
10+ years of experience in data engineering or backend engineering, with at least 2–3 years in a leadership or team-lead roleProven track record in building and operating data pipelines at scale—including both batch ETL / ELT and real-time streamingExpert-level experience with Kafka for high-throughput data ingestion, streaming transformations, and integrationsStrong hands-on experience with PySpark for distributed data processing and advanced transformationsIn-depth knowledge of AWS Glue(or similar) for orchestrating workflows, managing metadata, and automating ETL pipelinesDemonstrated success in upgrading and maintaining real-time data lakes, curating and transforming datasets into performant data warehousesFamiliarity with lakehouse and warehouse patterns (e.g., Delta Lake, Redshift, Snowflake) and schema versioningExperience with cloud-native data services (S3, Kinesis, Lambda, RDS) and infrastructure-as-code deploymentsPreferred Qualifications
Experience with Databricks and Snowflake solutions, including developing on lakehouse architectures and optimizing warehouse performanceExposure to annotation platforms, ML workflows, or model validation pipelinesExperience with observability tools (Prometheus, Grafana, OpenTelemetry)Knowledge of data governance, RBAC, and compliance in large-scale analytics environmentsComfort working in Agile, distributed teams with Git, JIRA, and Slack.Why Join Us?
At iMerit, you will lead a team at the cutting edge of AI data infrastructure—building and evolving platforms that are explainable, auditable, and scalable. You will play a key role in upgrading and maintaining our streaming data lake and transforming it into analytics-ready warehouses, directly shaping how AI systems are built and trusted at scale.