Role Overview
We are seeking a highly skilled Big Data Engineer to join our team. The ideal candidate will have strong experience in building, maintaining, and optimizing large-scale data pipelines and distributed data processing systems. This role involves working closely with cross-functional teams to ensure the reliability, scalability, and performance of data solutions.
Key Responsibilities
- Design, develop, and maintain scalable data pipelines and ETL processes.
- Work with large datasets using Hadoop ecosystem tools (Hive, Spark).
- Build and optimize real-time and batch data processing solutions using Kafka and Spark Streaming.
- Write efficient, high-performance SQL queries to extract, transform, and load data.
- Develop reusable data frameworks and utilities in Python.
- Collaborate with data scientists, analysts, and product teams to deliver reliable data solutions.
- Monitor, troubleshoot, and optimize big data workflows for performance and cost efficiency.
Must-Have Skills
Strong hands-on experience with Hive and SQL for querying and data transformation.Proficiency in Python for data manipulation and automation.Expertise in Apache Spark (batch and streaming).Experience working with Kafka for streaming data pipelines.Good-to-Have Skills
Experience with workflow orchestration tools (Airflow etc.)Knowledge of cloud-based big data platforms (AWS EMR, GCP Dataproc, Azure HDInsight).Familiarity with CI / CD pipelines and version control (Git).