We are looking for a highly skilled Data Engineer with strong expertise in Python, PySpark, and AWS cloud services. The ideal candidate will design, develop, and maintain scalable data pipelines while ensuring seamless data integration and transformation across systems.
Your future duties and responsibilities
- Design and implement ETL pipelines using PySpark and AWS Glue.
- Develop and optimize data processing frameworks on large-scale datasets.
- Work extensively with AWS services such as Glue, Lambda, ECS, S3, DynamoDB, and CloudWatch.
- Build and maintain data ingestion and transformation workflows.
- Develop and manage Python-based automation and data transformation scripts.
- Collaborate with cross-functional teams to ensure data availability, quality, and performance.
- (Good to have) Develop and integrate RESTful APIs for data access and service communication.
- Troubleshoot and optimize data solutions for performance and cost efficiency.
Required qualifications to be successful in this role
Must-Have Skills :
Strong proficiency in Python programming.Hands-on experience with PySpark for distributed data processing.Deep understanding and hands-on exposure to AWS services like :AWS Glue (ETL development)AWS Lambda (serverless data processing)ECS / EKS (containerized workloads)DynamoDB (NoSQL database)S3 (data storage and management)Experience with data ingestion, transformation, and orchestration.Familiarity with API concepts – request / response models, RESTful design, JSON handling.Strong problem-solving and analytical skills.Excellent communication and collaboration abilities.Good-to-Have Skills :
Experience with CI / CD pipelines and Infrastructure as Code (IaC) (e.g., CloudFormation, Terraform).Exposure to API development frameworks (Flask / FastAPI).Knowledge of data lake and data warehouse architecture.Basic knowledge of Docker and Kubernetes.Skills Required
Python, Pyspark, Aws