Responsibilities :
- Design, develop, and maintain scalable data pipelines using Apache Spark and Databricks.
- Implement data warehouse solutions on AWS, leveraging services like Redshift, Athena, and Glue.
- Lead the development of data models and schemas for both SQL and NoSQL databases.
- Implement and manage data governance and quality processes.
- Collaborate with data scientists and analysts to support their data needs.
- Implement CI / CD pipelines for data and ML workflows.
- Mentor and guide junior data :
- 8+ years of hands-on experience in data engineering, with at least 4 years in a lead or
architect-level role.
Deep expertise in Apache Spark, with proven experience developing large-scale distributeddata processing pipelines.
Strong experience with Databricks platform and its internal ecosystem (e.g., Delta Lake, UnityCatalog, MLflow, Job orchestration, Workspaces, Clusters, Lakehouse architecture).
Extensive experience with workflow orchestration using Apache Airflow.Proficiency in both SQL and NoSQL databases (e.g., Postgres, DynamoDB, MongoDB,Cassandra) with a deep understanding of schema design, query tuning, and data partitioning.
Proven background in building data warehouse / data mart architectures using AWS services like Redshift, Athena, Glue, Lambda, DMS, and S3.Familiarity with MLflow, Feature Store, and Databricks-native ML tooling is a plus.Strong grasp of CI / CD for data and ML pipelines, automated testing, and infrastructure-as-code (Terraform, CDK, etc.).
(ref : hirist.tech)