Talent.com
This job offer is not available in your country.
Data Engineer - ETL / PySpark

Data Engineer - ETL / PySpark

Acquire Bright MindsBangalore
30+ days ago
Job description

About the Role :

We are actively hiring a Data Engineer (SDE2 level) with strong expertise in Core Python, PySpark, and Big Data technologies to join a high-performance data engineering team working on large-scale, real-time data platforms.

This role is ideal for professionals with solid foundational knowledge in object-oriented programming (OOP) in Python, hands-on experience with distributed data processing using PySpark, and familiarity with Hadoop ecosystem tools like Hive, HDFS, Oozie, and Yarn.

As part of a dynamic and collaborative engineering team, you will build robust data pipelines, optimize big data workflows, and work on scalable solutions that support analytics, data science, and downstream applications.

Key Responsibilities :

Data Engineering & Development :

  • Design, develop, and maintain scalable and efficient ETL pipelines using Core Python and PySpark.
  • Work with structured and semi-structured data from various sources and design pipelines to process large datasets in batch and near real-time.
  • Build and optimize Hive queries, manage HDFS data storage, and schedule workflows using Oozie and Yarn.
  • Integrate various data sources and ensure clean, high-quality data availability for downstream systems (analytics, BI, ML models, etc.).

Object-Oriented Programming in Python :

  • Implement clean, modular, and reusable Python code with strong understanding of OOP principles.
  • Debug, test, and optimize existing code and actively participate in peer reviews and design discussions.
  • Design & Architecture :

  • Participate in design and architectural discussions related to big data platform enhancements.
  • Apply software engineering principles such as modularity, reusability, and scalability.
  • Write well-documented, maintainable, and testable code aligned with best practices and performance standards.
  • Database & Query Optimization :

  • Work with SQL-based tools (Hive / Presto / SparkSQL) to write and optimize complex queries.
  • Experience in data modeling, partitioning strategies, and query performance tuning is required.
  • Cloud Integration (Bonus) :

  • Exposure to cloud platforms such as Azure or AWS is a plus.
  • Understanding of cloud storage, data lakes, and cloud-based ETL workflows is advantageous.
  • Hands-on expertise with PySpark and distributed data processing
  • Good working knowledge of SQL (HiveQL, SparkSQL)
  • Experience with Hadoop ecosystem :

  • Hive
  • HDFS
  • Oozie
  • Yarn
  • Experience with data ingestion, transformation, and optimization techniques

    Good to Have (Bonus Skills) :

  • Familiarity with CI / CD pipelines and version control (Git)
  • Experience with Airflow, Kafka, or other orchestration / streaming tools
  • Exposure to containerization (Docker) and job scheduling tools
  • Cloud experience with AWS (S3, Glue, EMR) or Azure (ADF, Blob, Synapse)
  • What Were Looking For :

  • 610 years of relevant experience in data engineering or backend development
  • Strong problem-solving skills with a keen attention to detail
  • Ability to work independently and within a collaborative team environment
  • Passion for clean, maintainable code and scalable design
  • Excellent communication and interpersonal skills
  • Why Join Us ?

  • Work on real-time big data platforms at scale
  • Be part of a fast-growing team solving complex data challenges
  • Opportunities to grow into architectural or lead roles
  • Hybrid work culture with flexibility and ownership
  • (ref : hirist.tech)

    Create a job alert for this search

    Data Engineer • Bangalore