About the Role :
We are looking for a Data Engineer with strong experience in Spark (PySpark), SQL, and data pipeline architecture.
You will play a critical role in designing, building, and optimizing data workflows that enable scalable analytics and real-time insights.
The ideal candidate is hands-on, detail-oriented, and passionate about crafting reliable data solutions, while collaborating with cross-functional teams.
Responsibilities :
- Design and architect scalable and efficient data pipelines for batch and real-time processing.
- Develop and optimize solutions using Spark (PySpark) and SQL.
- Ensure data pipelines are reliable, maintainable, and well-tested.
- Work with stakeholders to understand business requirements and translate them into data-driven solutions.
- Collaborate with cross-functional teams to ensure data quality, availability, and performance.
- Stay updated with emerging data engineering tools and practices.
Must-Have Skills :
Strong expertise in Spark (PySpark).Proficiency in SQL (query optimization, performance tuning, complex joins).Hands-on experience in designing and architecting data pipelines.Excellent communication and collaboration skills.Good to Have :
Experience with data streaming platforms (Kafka, Kinesis, etc.Proficiency in Python for data engineering tasks.Exposure to Databricks and Azure cloud services.Knowledge of HTAP systems, Debezium, 3PL / logistics domain(having most of this mentioned skills would be great).Familiarity with orchestration frameworks such as Apache Airflow or Apache NiFi.(ref : hirist.tech)