Skills : Bigdata, Pyspark, Python ,Hadoop / HDFS; Spark;
Good to have : GCP / any cloud
Roles / Responsibilities :
- Develops and maintains scalable data pipelines to support continuing increases in data volume and complexity.
- Collaborates with analytics and business teams to improve data models that feed business intelligence tools, increasing data accessibility and fostering data-driven decision making across the organization.
- Implements processes and systems to monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
- Writes unit / integration tests, contributes to engineering wiki, and documents work.
- Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
- Works closely with a team of frontend and backend engineers, product managers, and analysts.
- Defines company data assets (data models), spark, sparkSQL, and hiveSQL jobs to populate data models.
- Designs data integrations and data quality framework.