Primary skillsets : AWS Sagemaker, Power BI & Python
Secondary skillset : Any ETL Tool, Github, DevOPs(CI-CD)
Mandatory Skill Set :
- Python, PySpark , SQL, AWS with Designing, developing, testing and supporting data pipelines and applications
- Strong understanding and hands-on experience with AWS services like EC2, S3, EMR, Glue, Redshift
- Strong in developing and maintaining applications using Python and PySpark for data manipulation, transformation, and analysis
- Design and implement robust ETL pipelines using PySpark, focusing on performance, scalability, and data quality
- Lead and manage projects, including planning, execution, testing, and documentation and handling customer interacation as key point of contact
- Translate business requirements into technical solutions using AWS cloud services and Python / PySpark
- Deep understanding of Python and its data science libraries, along with PySpark for distributed data processing
- Proficiency in PL / SQL, T-SQL for data querying, manipulation, and database interactions
- Excellent written and verbal communication skills to collaborate with team members and stakeholders
- Experience leading and mentoring teams in a technical environment and providing proposals on solutioning and designed based approach
- 3+ years of experience using SQL in related development of data warehouse projects / applications (Oracle & amp; SQL Server)
- Strong real-life experience in python development especially in PySpark in AWS Cloud environment
- Strong SQL and NoSQL databases like MySQL, Postgres, DynamoDB, Elasticsearch Workflow management tools like Airflow