About the Role :
We are actively hiring a Data Engineer (SDE2 level) with strong expertise in Core Python, PySpark, and Big Data technologies to join a high-performance data engineering team working on large-scale, real-time data platforms.
This role is ideal for professionals with solid foundational knowledge in object-oriented programming (OOP) in Python, hands-on experience with distributed data processing using PySpark, and familiarity with Hadoop ecosystem tools like Hive, HDFS, Oozie, and Yarn.
As part of a dynamic and collaborative engineering team, you will build robust data pipelines, optimize big data workflows, and work on scalable solutions that support analytics, data science, and downstream applications.
Key Responsibilities :
Data Engineering & Development :
- Design, develop, and maintain scalable and efficient ETL pipelines using Core Python and PySpark.
- Work with structured and semi-structured data from various sources and design pipelines to process large datasets in batch and near real-time.
- Build and optimize Hive queries, manage HDFS data storage, and schedule workflows using Oozie and Yarn.
- Integrate various data sources and ensure clean, high-quality data availability for downstream systems (analytics, BI, ML models, etc.).
Object-Oriented Programming in Python :
Implement clean, modular, and reusable Python code with strong understanding of OOP principles.Debug, test, and optimize existing code and actively participate in peer reviews and design discussions.Design & Architecture :
Participate in design and architectural discussions related to big data platform enhancements.Apply software engineering principles such as modularity, reusability, and scalability.Write well-documented, maintainable, and testable code aligned with best practices and performance standards.Database & Query Optimization :
Work with SQL-based tools (Hive / Presto / SparkSQL) to write and optimize complex queries.Experience in data modeling, partitioning strategies, and query performance tuning is required.Cloud Integration (Bonus) :
Exposure to cloud platforms such as Azure or AWS is a plus.Understanding of cloud storage, data lakes, and cloud-based ETL workflows is advantageous.Hands-on expertise with PySpark and distributed data processingGood working knowledge of SQL (HiveQL, SparkSQL)Experience with Hadoop ecosystem :
HiveHDFSOozieYarnExperience with data ingestion, transformation, and optimization techniques
Good to Have (Bonus Skills) :
Familiarity with CI / CD pipelines and version control (Git)Experience with Airflow, Kafka, or other orchestration / streaming toolsExposure to containerization (Docker) and job scheduling toolsCloud experience with AWS (S3, Glue, EMR) or Azure (ADF, Blob, Synapse)What Were Looking For :
610 years of relevant experience in data engineering or backend developmentStrong problem-solving skills with a keen attention to detailAbility to work independently and within a collaborative team environmentPassion for clean, maintainable code and scalable designExcellent communication and interpersonal skillsWhy Join Us ?
Work on real-time big data platforms at scaleBe part of a fast-growing team solving complex data challengesOpportunities to grow into architectural or lead rolesHybrid work culture with flexibility and ownership(ref : hirist.tech)