Location : Pune preferred / remote
Core Competencies (Must-Have)
Category Skills / Tools
Programming Python, PySpark / Spark
Operating Systems Linux (RedHat)
Databases Hive, MSSQL, MySQL, PostgreSQL
Big Data Hadoop ecosystem (HDFS, YARN, Sqoop)
Orchestration Apache Airflow
Version Control & CI / CD Git, Jenkins
Cloud Azure (basic knowledge)
Additional Competencies (Nice-to-Have)
Category Skills / Tools
Containerization Docker
Automation Ansible
Analytics Databricks
Visualization Power BI
Key Responsibilities
- Design, develop, and optimize data pipelines using Python and PySpark / Spark.
- Manage and maintain Hadoop ecosystem components (HDFS, YARN, Sqoop).
- Implement and monitor ETL workflows using Airflow and Databricks.
- Perform data modeling and query optimization across SQL databases.
- Containerize applications with Docker and manage deployments via CI / CD pipelines (Jenkins, Git).
- Automate infrastructure provisioning and configuration using Ansible.
- Collaborate on Azure cloud services for storage, compute, and security.
- Integrate data solutions with Power BI for reporting and visualization.
- Ensure data quality, security, and compliance ac