Immediate Joiners preferred! or Max 3 - 4 weeks.
Seeking a skilled Data Engineer with strong experience in SQL coding, Spark development, and Python or Scala programming. The role involves building and optimizing data pipelines, working with large-scale distributed systems, and managing data workflows across Hadoop and HDFS environments. The ideal candidate is hands-on, detail oriented, and comfortable working in a hybrid offshore setup.
Key Responsibilities :
- Design, develop, and maintain scalable data pipelines and ETL workflows.
- Write efficient and optimized SQL code for data extraction, transformation, and analysis.
- Develop Spark applications using Python or Scala for large-scale data processing.
- Work with Hadoop ecosystem components including HDFS, Hive, and related tools.
- Perform data validation, quality checks, and troubleshooting of pipeline issues.
- Collaborate with onshore and offshore teams to understand data requirements and deliver solutions.
- Optimize data storage, retrieval, and performance across distributed systems.
- Maintain clear documentation and follow best practices for coding, version control, and deployment.
Required Skills & Experience :
5 years of experience as a Data Engineer or in a similar role.Strong SQL coding and query optimization expertise.Hands-on experience with Apache Spark (Python or Scala).Solid understanding of Hadoop ecosystem : HDFS, Hive, YARN, MapReduce, etc.Experience with distributed computing, data pipelines, and ETL frameworks.Familiarity with CI / CD, version control (Git), and agile development.Strong analytical and problem-solving skills.Preferred Qualifications :
Experience with cloud platforms (AWS, Azure, or GCP) is a plus.Knowledge of Kafka, Airflow, or similar orchestration / streaming tools.Exposure to data warehousing concepts.