Key Responsibilities :
- Design, develop, and optimize large-scale data processing workflows using Hadoop components such as HDFS, MapReduce, Hive, Pig, and HBase .
- Build and maintain ETL pipelines to ingest and transform data from various sources into Hadoop clusters.
- Collaborate with data scientists, analysts, and business stakeholders to understand data requirements and deliver high-quality data solutions.
- Ensure data quality, integrity, and security within big data environments.
- Monitor and troubleshoot Hadoop cluster performance and resolve issues proactively.
- Implement best practices for data storage, processing, and retrieval in Hadoop ecosystems.
- Develop automation scripts for data pipeline orchestration and workflow management (using tools like Apache Oozie or Airflow).
- Participate in capacity planning, cluster management, and Hadoop upgrades.
- Document data architecture, workflows, and operational procedures.
Qualifications and Requirements :
Bachelor's degree in Computer Science, Information Technology, Engineering, or related field.3+ years of experience as a Big Data Engineer or Hadoop Developer.Hands-on experience with core Hadoop ecosystem components : HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Flume .Strong proficiency in Java, Scala, or Python for big data processing.Experience with data modeling and query optimization in Hive or HBase.Familiarity with data ingestion techniques and tools (Sqoop, Flume, Kafka).Understanding of cluster management and resource schedulers like YARN.Knowledge of Linux / Unix environments and shell scripting.Experience with version control (Git) and CI / CD pipelines.Strong analytical and problem-solving skills.Skills Required
Java, Scala, Python, Linux, Git