Big Data Engineer :
- Design, build, and optimize robust and scalable data pipelines for both batch and real-time data ingestion using Apache Spark (Streaming + Batch), Apache Nifi, and Apache Kafka.
Data Storage and Management :
Manage and maintain data storage solutions on Hadoop / HDFS. Implement data models and schemas in Hive for a data warehouse and reporting layer.Work with HBase for specific use cases requiring fast, random-access to large datasets, leveraging Phoenix / SQL line for SQL-based access.Workflow Orchestration : Develop and manage complex data workflows and dependencies using Apache Oozie.ETL and Data Integration :
Utilize Informatica for traditional ETL workflows and Apache Sqoop to efficiently transfer data between RDBMS and the Hadoop ecosystem.Resource Management :
Work with YARN to manage cluster resources, monitor job execution, and ensure high availability and fault tolerance.Monitoring and Maintenance :
Monitor the health and performance of the data platform using tools like Hue and New Relic. Proactively identify and resolve issues.Collaboration :
Work closely with data scientists, analysts, and other engineering teams to understand data requirements and deliver solutions.Cloud and Advanced Skills (Good to Have) :
Experience with containerization and cloud-native solutions, particularly Anthos, for deploying and managing applications.Familiarity with data observability and logging platforms like Cribl for advanced data collection and routing.Qualifications :
Experience : Proven experience as a Big Data Engineer or a similar role.
Technical Skills :
Strong expertise in Hadoop and HDFS.Proficiency in Apache Spark for both batch and stream processing.Hands-on experience with Apache Hive and HBase.Knowledge of data ingestion tools like Apache Kafka and Apache Nifi.Experience with Apache Oozie or other workflow schedulers.Familiarity with Apache Sqoop for RDBMS integration.Understanding of YARN for cluster resource management.Proficiency in at least one scripting language (e.g., Python, Scala).Familiarity with monitoring tools like Hue and New Relic.Experience with Informatica is a plus.Soft Skills :
Excellent problem-solving and analytical skills.Strong communication and collaboration abilities.Ability to work in a fast-paced, agile environment.Education :
Bachelors or Masters degree in Computer Science, Data Science, or a related field.(ref : hirist.tech)