Job Summary :
We are looking for an experienced Senior Data Engineer with strong expertise in AWS Glue, PySpark, and cloud-based data engineering. The ideal candidate will design, develop, and optimize scalable data pipelines and ETL workflows on AWS, ensuring high performance, reliability, and data quality.
Key Responsibilities
- Design, build, and maintain scalable ETL pipelines using AWS Glue , PySpark , and Lambda .
- Develop and optimize PySpark scripts for large-scale data processing and transformation.
- Implement data ingestion from multiple sources (S3, RDS, APIs, Streaming sources).
- Work with AWS services such as S3, Glue, Athena, EMR, Redshift, IAM, CloudWatch .
- Ensure data quality, validation, and governance across all pipelines.
- Optimize performance of Glue jobs, Spark clusters, and query execution.
- Collaborate with data architects, analysts, and cross-functional teams to understand requirements.
- Participate in code reviews, best-practice enforcement, and CI / CD pipeline implementation.
- Troubleshoot production issues, monitor pipeline performance, and deliver timely fixes.
- Manage data workflows, job orchestration, and scheduling using AWS Glue Workflows / Step Functions / Airflow.
Required Skills
Strong hands-on experience with AWS Glue , Glue Studio , and Glue Catalog .Expertise in PySpark and distributed data processing.Proficiency in Python , data structures, and Spark optimization techniques.Strong understanding of S3, IAM, Athena, Redshift, DynamoDB, Kinesis , etc.Experience with ETL / ELT pipeline design , debugging, and performance tuning.Good understanding of data warehousing concepts and SQL.Experience working in Agile environments and version control (Git).