Job Title : PySpark Developer
Location : Chennai, Hyderabad, Kolkata
Work Mode : Monday - Friday (5 days WFO)
Experience : 5+ Years in Backend Development
Notice Period : Immediate to 15 days
Must-Have Experience : Python, PySpark, Amazon Redshift, PostgreSQL
About the Role :
We are looking for an experienced PySpark Developer with strong data engineering capabilities to design, develop, and optimize scalable data pipelines for large-scale data processing. The ideal candidate must possess in-depth knowledge of PySpark, SQL, and cloud-based data ecosystems, along with strong problem-solving skills and the ability to work with cross-functional teams.
Roles & Responsibilities :
- Design and develop robust, scalable ETL / ELT pipelines using PySpark to process data from various sources such as databases, APIs, logs, and files.
- Transform raw data into analysis-ready datasets for data hubs and analytical data marts.
- Build reusable, parameterized Spark jobs for batch and micro-batch processing.
- Optimize PySpark job performance to handle large and complex datasets efficiently.
- Ensure data quality, consistency, and lineage, and maintain thorough documentation across
all ingestion workflows.
Collaborate with Data Architects, Data Modelers, and Data Scientists to implement ingestionlogic aligned with business requirements.
Work with AWS-based data platforms (S3, Glue, EMR, Redshift) for data movement andstorage.
Support version control, CI / CD processes, and infrastructure-as-code practices as required.Must-Have Skills :
Minimum 5+ years of data engineering experience, with a strong focus on PySpark / Spark.Proven experience building data pipelines and ingestion frameworks for relational, semi-structured (JSON, XML), and unstructured data (logs, PDFs).
Strong knowledge of Python and related data processing libraries.Advanced SQL proficiency (Amazon Redshift, PostgreSQL or similar).Hands-on expertise with distributed computing frameworks such as Spark on EMR or Familiarity with workflow orchestration tools like AWS Step Functions or similar.Good understanding of data lake and data warehouse architectures, including fundamentaldata modeling concepts.
Good-to-Have Skills :
Experience with AWS data services : Glue, S3, Redshift, Lambda, CloudWatch.Exposure to Delta Lake or similar large-scale storage technologies.Experience with real-time streaming tools such as Spark Structured Streaming or Kafka.Understanding of data governance, lineage, and cataloging tools (AWS Glue Catalog, ApacheAtlas).
Knowledge of DevOps / CI-CD pipelines using Git, Jenkins.(ref : hirist.tech)