Open Location - Indore, Noida, Gurgaon, Bangalore, Hyderabad, Pune
Immediate Joiners are preferred.
Qualification
- 4 years of good hands-on exposure with Big Data technologies – pySpark (Data frame and SparkSQL), Hadoop, and Hive
- Good hands-on experience of python and Bash Scripts
- Good understanding of SQL and data warehouse concepts
- Strong analytical, problem-solving, data analysis and research skills
- Demonstrable ability to think outside of the box and not be dependent on readily available tools
- Excellent communication, presentation and interpersonal skills are a must
- Hands-on experience with using Cloud Platform provided Big Data technologies
- Orchestration with Airflow and Any job scheduler experience
- Experience in migrating workload from on-premises to cloud and cloud to cloud migrations
Roles & Responsibilities
Develop efficient ETL pipelines as per business requirements, following the development standards and best practices.Perform integration testing of different created pipeline in Cloud env.Provide estimates for development, testing & deployments on different env.Participate in code peer reviews to ensure our applications comply with best practices.Develops and maintains scalable data pipelines to support continuing increases in data volume and complexity.Collaborates with analytics and business teams to improve data models that feed business intelligence tools, increasing data accessibility, and fostering data-driven decision making across the organization.Writes unit / integration tests, contributes to engineering wiki, and documents work.Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.Works closely with a team of frontend and backend engineers, product managers, and analysts.Defines company data assets (data models), spark, sparkSQL, and hiveSQL jobs to populate data models.Designs data integrations and data quality framework.Mandatory Skills - Any Cloud, Python Programming, SQL, Pyspark