Job Title : PySpark Developer
Locations : Chennai, Hyderabad, Kolkata
Work Mode : MondayFriday (5 days WFO)
Experience : 5+ years in Backend / Data Engineering
Notice Period : Immediate 15 days
Must-Have : Python, PySpark, Amazon Redshift, PostgreSQL
About the Role :
We are seeking an experienced PySpark Developer with strong data engineering expertise to design, develop, and optimize scalable data pipelines for large-scale data processing. The role involves working across distributed systems, ETL / ELT frameworks, cloud data platforms, and analytics-driven architecture. You will collaborate closely with cross-functional teams to ensure efficient ingestion, transformation, and delivery of high-quality data.
Key Responsibilities :
- Design and develop robust, scalable ETL / ELT pipelines using PySpark to process data from databases, APIs, logs, and file-based sources.
- Convert raw data into analysis-ready datasets for data hubs and analytical data marts.
- Build reusable, parameterized Spark jobs for batch and micro-batch processing.
- Optimize PySpark performance to handle large and complex datasets.
- Ensure data quality, consistency, lineage, and maintain detailed documentation for all ingestion workflows.
- Collaborate with Data Architects, Data Modelers, and Data Scientists to implement data ingestion logic aligned with business requirements.
- Work with AWS services (S3, Glue, EMR, Redshift) for data ingestion, storage, and processing.
- Support version control, CI / CD practices, and infrastructure-as-code workflows as needed.
Must-Have Skills :
Minimum 5+ years of data engineering experience, with a strong focus on PySpark / Spark.Proven experience building ingestion frameworks for relational, semi-structured (JSON, XML), and unstructured data (logs, PDFs).Strong Python knowledge along with key data processing libraries.Advanced SQL proficiency (Redshift, PostgreSQL, or similar).Hands-on experience with distributed computing platforms (Spark on EMR, Databricks, etc.).Familiarity with workflow orchestration tools (AWS Step Functions or similar).Strong understanding of data lake and data warehouse architectures, including core data modeling concepts.Good-to-Have Skills :
Experience with AWS services : Glue, S3, Redshift, Lambda, CloudWatch, etc.Exposure to Delta Lake or similar large-scale storage frameworks.Experience with real-time streaming tools : Spark Structured Streaming, Kafka.Understanding of data governance, lineage, and cataloging tools (Glue Catalog, Apache Atlas).Knowledge of DevOps and CI / CD pipelines (Git, Jenkins, etc.).(ref : hirist.tech)