Role : Sr. Data Engineer – AWS Databricks / Data Tech Lead
Type : Contract
Location : Chennai / Hyderabad
Experience : 5+ years
Duration : 3 Months+ Extendable
Key Responsibilities :
- Data Pipeline Development : Design and implement robust ETL / ELT pipelines using Databricks, PySpark, and Delta Lake to process structured and unstructured data efficiently.
- Manage job orchestration, scheduling, and workflow automation through Databricks Workflows or Airflow .
- Performance Optimization : Tune and optimize Databricks clusters and notebooks for performance, scalability, and cost-efficiency.
- Data Governance : Implement data governance and lineage using Unity Catalog and other platform-native features
- Collaboration : Work closely with data scientists, analysts, and business stakeholders to understand data requirements and deliver solutions that meet business needs.
- Cloud Integration : Leverage cloud platforms (AWS) to build and deploy data solutions, ensuring seamless integration with existing infrastructure.
- Data Modeling : Develop and maintain data models that support analytics and machine learning workflows.
- Automation & Monitoring : Implement automated testing, monitoring, and alerting mechanisms to ensure data pipeline reliability and data quality.
- Documentation & Best Practices : Maintain comprehensive documentation of data workflows and adhere to best practices in coding, version control, and data governance.
Required Qualifications :
Experience : 5+ years in data engineering, with hands-on experience using Databricks and Apache Spark.Programming Skills : Proficiency in Python and SQL;Cloud Platforms : Strong experience with cloud services such as AWS (e.g., S3, Glue, RedshiftData Engineering Tools : Familiarity with tools like Airflow, Kafka, and dbt.Data Modeling : Experience in designing data models for analytics and machine learning applications.Collaboration : Proven ability to work in cross-functional teams and communicate effectively with non-technical stakeholders.Primary Skill Set :
Databricks, Apache Spark / Pyspark, Python, SQL, ETL / ELT development, Delta Lake, Cloud platforms (AWS), Data modeling, Cross-functional collaboration, Communication
Secondary Skill Set :
Airflow, dbt, Kafka, Hadoop, MLflow, Unity Catalog, Delta Live Tables, Cluster optimization, Data governance, Security and compliance, Databricks certifications.