Design, develop, and maintain scalable and efficient ETL / ELT pipelines using AWS services such as Glue, Lambda, Step Functions, Apache Airflow, and Apache AppFlow.Build and optimize data models and data warehouse solutions on AWS Redshift and other relevant data stores.Implement and manage data lakes on AWS S3, ensuring data quality, security, and accessibility.Develop and deploy data processing applications using Python (Advanced), PySpark, and PySQL.Utilize AWS Athena and other query services to enable data exploration and analysis.Work with relational databases (SQL, PL / SQL) for data extraction, transformation, and loading.Implement and enforce data governance policies and procedures.Monitor and optimize the performance of data pipelines and data warehouse systems.Troubleshoot and resolve data-related issues in a timely manner.Collaborate effectively with data scientists, analysts, and other stakeholders to understand their data requirements and deliver solutions.Stay up-to-date with the latest advancements in AWS data services and big data technologies.Contribute to the documentation of data pipelines, data models, and best practices.Required Skills and Experience :
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- 7 to 10 years of proven experience in data engineering roles.
- Strong knowledge and hands-on experience with AWS cloud services :
- S3, EC2, Athena, Lambda, CloudWatch
- Apache Airflow, Apache AppFlow
- EMR, Glue
- RDS, DMS
- Redshift
- Strong proficiency in programming languages :
- Python Advanced (including relevant data manipulation libraries)
- PySQL
- PySpark (experience with Spark ecosystem)
- Solid understanding and practical application of Data Warehousing and ETL / ELT concepts.
- Strong knowledge of any database system :
- SQL (ability to write complex and optimized queries)
- Experience with PL / SQL is a plus.
- Demonstrable understanding of Data & Analytics principles and how data pipelines support them.
- Familiarity with Data Governance best practices and implementation.
- Proven ability in Performance Tuning and Optimization of data pipelines and database queries.
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration skills.
- Ability to work independently and as part of a team.
Skills Required
Airflow, Pyspark, AWS Glue, Redshift, Python, Sql