We are looking for a skilled Data Engineer with hands-on experience in PySpark and Databricks to design, develop, and optimize large-scale data pipelines. The ideal candidate should have strong problem-solving skills, experience working in cloud-based environments, and the ability to collaborate effectively with cross-functional teams.
Key Responsibilities :
- Develop and maintain ETL / ELT pipelines using PySpark on Databricks.
- Optimize data processing workflows for performance and scalability .
- Work with structured, semi-structured, and unstructured data from multiple sources.
- Collaborate with data scientists, analysts, and business teams to deliver clean, reliable datasets .
- Implement data quality checks, validations, and monitoring .
- Support migration of existing pipelines to cloud platforms (Azure / AWS / GCP).
- Document technical solutions, workflows, and best practices.
Required Skills :
3–5 years of experience in data engineering roles.Strong expertise in PySpark (DataFrames, RDDs, Spark SQL).Hands-on experience with Databricks (Delta Lake, Workflows, Jobs, Clusters).Solid understanding of data warehousing concepts and SQL .Experience with cloud platforms (Azure Data Lake, AWS S3, or GCP BigQuery).Familiarity with CI / CD pipelines and version control (Git).Good understanding of performance tuning and optimization in Spark.Strong problem-solving and debugging skills.Skills Required
Spark SQL, Performance Tuning, Pyspark, Optimization, Sql, ELT, Git, Gcp, Databricks, Azure, Aws, Etl