About the Role
We are looking for a Data Engineer with strong experience in Spark (PySpark) , SQL , and data pipeline architecture . You will play a critical role in designing, building, and optimizing data workflows that enable scalable analytics and real-time insights. The ideal candidate is hands-on, detail-oriented, and passionate about crafting reliable data solutions, while collaborating with cross-functional teams.
Responsibilities
- Design and architect scalable and efficient data pipelines for batch and real-time processing.
- Develop and optimize solutions using Spark (PySpark) and SQL .
- Ensure data pipelines are reliable, maintainable, and well-tested .
- Work with stakeholders to understand business requirements and translate them into data-driven solutions .
- Collaborate with cross-functional teams to ensure data quality, availability, and performance .
- Stay updated with emerging data engineering tools and practices .
Must-Have Skills
Strong expertise in Spark (PySpark) .Proficiency in SQL (query optimization, performance tuning, complex joins).Hands-on experience in designing and architecting data pipelines .Excellent communication and collaboration skills .Good to Have
Experience with data streaming platforms (Kafka, Kinesis, etc.).Proficiency in Python for data engineering tasks.Exposure to Databricks and Azure cloud services.Knowledge of HTAP systems, Debezium, 3PL / logistics domain (having most of this mentioned skills would be great)Familiarity with orchestration frameworks such as Apache Airflow or Apache NiFi .