Senior AWS Data Engineer (PySpark & Python) — On-site, India
Industry & Sector : Leading IT services & cloud data engineering sector focused on end-to-end data platforms, analytics, and enterprise-scale ETL / ELT solutions. We deliver production-grade data pipelines, real-time streaming, and analytics integrations for large enterprise customers across finance, retail, and SaaS domains.
We are hiring an experienced Data Engineer to join an on-site India team to build, optimize, and operate scalable AWS-based data platforms using Python and PySpark. This role requires 5+ years of hands-on data engineering experience and a strong operational mindset.
Role & Responsibilities
- Design, develop, and maintain robust ETL / ELT pipelines on AWS (S3 → Glue / EMR → Redshift / Snowflake) using Python and PySpark.
- Implement efficient Spark jobs, optimize query / performance, and reduce pipeline latency for batch and near-real-time workflows.
- Build and manage orchestration with Apache Airflow (DAGs, sensors, SLA alerts) and integrate with monitoring / alerting.
- Author reusable data models, enforce data quality checks, and implement observability (logs, metrics, lineage).
- Collaborate with data consumers, analytics and ML teams to translate requirements into scalable data contracts and schemas.
- Apply infrastructure-as-code and CI / CD practices to deploy data platform components and automate testing / rollouts.
Skills & Qualifications
Must-Have
5+ years of professional data engineering experience building production pipelines with Python and PySpark / Spark.Proven AWS experience : S3, Glue or EMR, Redshift (or equivalent data warehouse), Lambda and IAM best-practices.Strong SQL skills : query tuning, partitioning, indexing and working knowledge of data warehouse architectures.Hands-on with orchestration tools (Apache Airflow) and experience implementing monitoring and retry / alert strategies.Solid software-engineering fundamentals : unit testing, code reviews, Git-based workflows and CI / CD for data apps.Ability to work on-site in India and collaborate cross-functionally in fast-paced delivery cycles.Preferred
Experience with streaming platforms (Kafka / Kinesis), schema management, and low-latency processing.Familiarity with Terraform / CloudFormation, containerization (Docker), and Kubernetes for data workloads.Background in data modeling, columnar formats (Parquet / ORC), and data governance tools.Benefits & Culture Highlights
Collaborative, delivery-driven culture with strong focus on technical mentorship and upskilling.Opportunity to work on large-scale AWS data platforms and cross-domain analytics projects.Competitive compensation, professional development, and a stable on-site engineering environment in India.If you are a pragmatic, hands-on Data Engineer who thrives on building reliable AWS data platforms with Python and PySpark, we want to hear from you. Apply to join a high-performing team delivering measurable business impact.
Skills : python,aws,pyspark
Show more
Show less
Skills Required
S3, Cloudformation, Pyspark, Emr, Redshift, Sql, Apache Airflow, Docker, Terraform, glue , Kubernetes, Python, Aws