This job offer is not available in your country.

Data Engineer - ETL / PySpark

Acquire Bright MindsBangalore

30+ days ago

Job description

About the Role :

We are actively hiring a Data Engineer (SDE2 level) with strong expertise in Core Python, PySpark, and Big Data technologies to join a high-performance data engineering team working on large-scale, real-time data platforms.

This role is ideal for professionals with solid foundational knowledge in object-oriented programming (OOP) in Python, hands-on experience with distributed data processing using PySpark, and familiarity with Hadoop ecosystem tools like Hive, HDFS, Oozie, and Yarn.

As part of a dynamic and collaborative engineering team, you will build robust data pipelines, optimize big data workflows, and work on scalable solutions that support analytics, data science, and downstream applications.

Key Responsibilities :

Data Engineering & Development :

Design, develop, and maintain scalable and efficient ETL pipelines using Core Python and PySpark.
Work with structured and semi-structured data from various sources and design pipelines to process large datasets in batch and near real-time.
Build and optimize Hive queries, manage HDFS data storage, and schedule workflows using Oozie and Yarn.
Integrate various data sources and ensure clean, high-quality data availability for downstream systems (analytics, BI, ML models, etc.).

Object-Oriented Programming in Python :

Implement clean, modular, and reusable Python code with strong understanding of OOP principles.

Debug, test, and optimize existing code and actively participate in peer reviews and design discussions.

Design & Architecture :

Participate in design and architectural discussions related to big data platform enhancements.

Apply software engineering principles such as modularity, reusability, and scalability.

Write well-documented, maintainable, and testable code aligned with best practices and performance standards.

Database & Query Optimization :

Work with SQL-based tools (Hive / Presto / SparkSQL) to write and optimize complex queries.

Experience in data modeling, partitioning strategies, and query performance tuning is required.

Cloud Integration (Bonus) :

Exposure to cloud platforms such as Azure or AWS is a plus.

Understanding of cloud storage, data lakes, and cloud-based ETL workflows is advantageous.

Hands-on expertise with PySpark and distributed data processing

Good working knowledge of SQL (HiveQL, SparkSQL)

Experience with Hadoop ecosystem :

Hive

HDFS

Oozie

Yarn

Experience with data ingestion, transformation, and optimization techniques

Good to Have (Bonus Skills) :

Familiarity with CI / CD pipelines and version control (Git)

Experience with Airflow, Kafka, or other orchestration / streaming tools

Exposure to containerization (Docker) and job scheduling tools

Cloud experience with AWS (S3, Glue, EMR) or Azure (ADF, Blob, Synapse)

What Were Looking For :

610 years of relevant experience in data engineering or backend development

Strong problem-solving skills with a keen attention to detail

Ability to work independently and within a collaborative team environment

Passion for clean, maintainable code and scalable design

Excellent communication and interpersonal skills

Why Join Us ?

Work on real-time big data platforms at scale

Be part of a fast-growing team solving complex data challenges

Opportunities to grow into architectural or lead roles

Hybrid work culture with flexibility and ownership

(ref : hirist.tech)

Create a job alert for this search

Data Engineer • Bangalore