Job Responsibilities
- Collaborate with project stakeholders (client) to identify product and technical requirements.
- Develop, implement, and tune large-scale distributed systems and pipelines that process large volume of data
- Write clean, maintainable, and testable code for data workflows.
- Troubleshoot data issues and perform root cause analysis
Must have :
3+ years of Hands-on coding experience in PySpark & SQL .Excellent verbal and business communication skills.Worked on complex SQL Query and has work on query performance optimizationGood to have :
Experience working on large-scale data warehouse projects;Teradata experience is a plus.
Experience with ETL tools.Experience working with workflow scheduler tools;experience with Apache Airflow is a plus.
Working experience with Kubernetes , Unix, github