Description :
Experience : Level 3 : 6-8 Years of exp
Location : Hyderabad
Skill : Python, Spark, HFDS, MongoDB
About the Role :
We are seeking a highly skilled Data Engineer to join our team to design, build, and optimize scalable data pipelines and platforms.
The ideal candidate will have hands-on experience with Python, Spark, HDFS, and MongoDB, and a proven ability to work with large-scale datasets in a distributed Responsibilities :
- Design, develop, and maintain end-to-end data pipelines for batch and real-time processing.
- Work with Apache Spark to process and transform large datasets efficiently.
- Manage and optimize HDFS storage, ensuring data availability, reliability, and performance.
- Develop scripts and data orchestration workflows using Python.
- Build and maintain NoSQL data solutions using MongoDB, including data modeling and performance tuning.
- Collaborate with Data Scientists, Analysts, and Platform Engineering teams to deliver high-quality data solutions.
- Implement data quality, validation, and monitoring frameworks to ensure accuracy and consistency.
- Participate in design reviews, code reviews, and performance optimization initiatives.
- Contribute to the continuous improvement of data engineering standards and best practices.
Required Skills & Qualifications :
Bachelors or Masters degree in Computer Science, Information Technology, Data Engineering or related field.3+ years of hands-on experience in Data Engineering or related domain.Strong proficiency in Python programming for data processing and automation.Expertise in Apache Spark (PySpark preferred) for large-scale data processing.Solid experience with HDFS (Hadoop Distributed File System) and distributed data architecture.Hands-on experience with MongoDB including schema design, queries, and performance optimization.Good understanding of ETL concepts, data warehousing, and data modeling.Proficient in working with Linux / Unix environments and shell scripting.Experience with version control tools like Git.Good to Have (Optional) :
Experience with workflow orchestration tools (Airflow, Luigi, Oozie, etc.)Knowledge of cloud platforms (AWS, Azure, GCP) and cloud-native data servicesExposure to CI / CD and DevOps practices for data engineeringExperience with streaming systems (Kafka, Flink, etc.)(ref : hirist.tech)