Data Pipeline Development : Design, implement, and maintain scalable data pipelines using AWS services such as S3, Glue, Lambda, EMR, and Kinesis.Data Processing and Transformation : Utilize PySpark, Spark, and SQL to perform complex data transformations and aggregations on large datasets within the AWS ecosystem.Data Storage and Management : Design and implement data storage solutions using Amazon S3, RDS, DynamoDB, and Redshift, ensuring data quality, integrity, and accessibility.Data Modeling and Warehousing : Develop and maintain data models to support analytics and reporting, leveraging Redshift as the data warehousing solution.Infrastructure and Cloud Technologies : Provision and manage scalable data infrastructure using EC2, VPC, IAM, CloudFormation, and other relevant AWS services.Performance Optimization and Monitoring : Continuously monitor and optimize data pipelines and systems using CloudWatch, identifying and resolving performance bottlenecks.Collaboration and Knowledge Sharing : Work closely with data scientists, analysts, and other stakeholders to understand data requirements and provide technical guidance.Qualifications :
- Proficiency in Python, SQL, PySpark, and Spark.
- Strong expertise in AWS data services (S3, Glue, Lambda, EMR, Redshift, DynamoDB, RDS).
- Experience with data warehousing, ETL processes, and data modeling.
- Excellent problem-solving, analytical, and communication skills.
Skills Required
Amazon Redshift, Pyspark, Cloud Services, Dynamo Db, glue