Job Summary :
We are seeking a skilled Data Engineer with strong expertise in Python and PySpark to design, develop, and optimize large-scale data pipelines. The ideal candidate will work closely with data scientists, analysts, and business stakeholders to ensure the delivery of high-quality, reliable, and scalable data Responsibilities :
- Design, develop, and maintain ETL / ELT pipelines for structured and unstructured data.
- Develop and optimize PySpark jobs for large-scale data processing.
- Integrate data from multiple sources into a unified data platform.
- Collaborate with cross-functional teams to understand data requirements and deliver solutions.
- Perform data cleansing, transformation, and validation to ensure accuracy and reliability.
- Monitor, troubleshoot, and improve data pipeline performance.
- Implement best practices in data governance, security, and compliance.
- Work with cloud-based big data platforms (AWS, Azure, GCP) and distributed Skills & Qualifications :
- Strong proficiency in Python programming for data engineering tasks.
- Hands-on experience with PySpark for big data processing.
- Solid understanding of data structures, algorithms, and distributed computing concepts.
- Experience with SQL and relational databases (e.g., MySQL, PostgreSQL).
- Knowledge of data warehousing concepts and tools (e.g., Snowflake, Redshift, BigQuery).
- Familiarity with workflow orchestration tools (Airflow, Luigi, etc.).
- Experience working with cloud services for data engineering.
- Strong problem-solving and debugging skills.
(ref : hirist.tech)