We are seeking a talented and experienced Data Engineer with strong skills in Python, PySpark, SQL, and SparkSQL to join our growing data engineering team. The ideal candidate will have a deep understanding of data processing frameworks, excellent coding skills, and the ability to handle large-scale data transformations in a distributed environment.
Key Responsibilities :
- Design, develop, and maintain robust data pipelines using PySpark and SparkSQL.
- Write efficient, reusable, and reliable code in Python.
- Perform data extraction, transformation, and loading (ETL) from various structured and unstructured data sources.
- Work with large-scale datasets and implement scalable data processing solutions.
- Collaborate with data scientists, analysts, and other stakeholders to understand data needs and deliver high-quality data solutions.
- Optimize and troubleshoot performance issues in data pipelines and queries.
- Ensure data quality, integrity, and security across all pipelines.
Required Skills :
Strong hands-on experience in Python and PySpark.Proficient in SQL and SparkSQL.Solid understanding of distributed computing and big data processing frameworks (Apache Spark).Experience with cloud data platforms (AWS, Azure, or GCP) is a plus.Strong problem-solving and analytical skills.Excellent communication and collaboration abilities.Work Mode :
Hybrid Model : 3 days WFO, 2 days remote
Locations : Pune, Bangalore, Noida, Mumbai, Hyderabad
ref : hirist.tech)