Description :
Key Responsibilities :
- Design, develop, and maintain ETL / ELT data pipelines using PySpark, Spark, and Python
- Build and optimize data ingestion, transformation, and integration workflows on AWS
- Work extensively with AWS Glue, Athena, Redshift, and S3 for data processing
- Implement schema design, fact / dimension modelling, and data partitioning
- Write optimized SQL queries for analytics and processing at scale
- Develop and maintain data streaming pipelines using AWS Kinesis or Apache Kafka
- Conduct performance tuning and troubleshooting for data pipelines
- Ensure data quality, consistency, and high reliability throughout the pipeline lifecycle
Required Technical Skills :
Graduate with specialization in Computer Science / Data Science / Engineering streams or related field with 3 to 5 Years of Hands on Experience In Data EngineeringMinimum 3+ years working with PySpark, Python, and SparkStrong SQL skills including complex queries and performance tuningProven Practical knowledge of AWS data services (Glue, Athena, Redshift, S3)Solid understanding of data warehousing methodologies and modellingExpertise in ETL / ELT pipeline development, performance Optimization and workflow orchestrationExperience with data streaming technologies (Kinesis / Kafka preferred)Must be able to write production-grade code (not only theoretical or oversight role)Good to Have :
Knowledge of Terraform / CloudFormation for infrastructure automation.Experience working in Agile environments.Understanding of data governance, cataloguing, and lineage tools.Preferred Qualifications :
Experience working in Agile development environmentsFamiliarity with version control systems and CI / CD practicesExposure to large-scale data systems and cloud-native architecturesWho Should Apply :
Data Engineers experienced in developing and supporting scalable data pipelinesCandidates ready to work on modern cloud data engineering projectsProfessionals available to join immediately or within a short notice period(ref : hirist.tech)