This job offer is not available in your country.

Data Engineer

R SystemsNavi Mumbai, Maharashtra, India

26 days ago

Job description

Job Title : Data Engineer

Contract Period : 12 Months

Location : Offshore candidates accepted (Singapore Based Company)

Work Timing : 6.30 AM to 3.30 PM or 7.00 AM to 4.00 PM (IST - India timing)

Experience

Minimum 4+ years as a Data Engineer or similar role.(Please don't apply if less than 4 years exp in Data Engineer)

Proven experience in Python, Spark, and PySpark (non-negotiable).(mandatory

Hands-on in building ETL pipelines, real-time streaming, and data transformations .

Worked with data warehouses, cloud platforms (AWS / Azure / GCP) , and databases .

✅ Technical Skills

Spark Core API : RDDs, transformations / actions, DAG execution.

Spark SQL : DataFrames, schema optimization, UDFs.

Streaming : Structured Streaming, Kafka integration.

Data Handling : S3, HDFS, JDBC, Parquet, Avro, ORC.

Orchestration : Airflow / Prefect.

Performance Tuning : Partitioning, caching, broadcast joins.

Cloud Deployment : Databricks, AWS EMR, Azure HDInsight, GCP Dataproc.

CI / CD : Pytest / unittest for Spark jobs, Jenkins, GitHub Actions.

✅ Education & Soft Skills

Bachelor’s / Master’s in Computer Science, Computer Engineering, or equivalent.

Strong analytical, problem-solving, and communication skills.

About the Role

We are seeking an experienced Data Engineer to join our team and support data-driven initiatives. This role involves building scalable pipelines, working with streaming data, and collaborating with data scientists and business stakeholders to deliver high-quality solutions.

Key Responsibilities

Design, build, and optimize data pipelines and ETL workflows .

Manage and process large datasets using Spark, PySpark, and SQL .

Build and maintain real-time streaming applications with Spark Streaming / Kafka.

Collaborate with data scientists and product teams to integrate AI / ML models into production.

Ensure data quality, scalability, and performance in all pipelines.

Deploy and manage Spark workloads on cloud platforms (AWS, Azure, GCP, Databricks) .

Automate testing and deployment of Spark jobs via CI / CD pipelines.

Requirements

Bachelor’s / Master’s degree in Computer Science, Computer Engineering, or related field.

Minimum 4 years of professional experience as a Data Engineer.

Strong expertise in Python, Spark, PySpark .

Hands-on experience with Spark SQL, DataFrames, UDFs, DAG execution .

Knowledge of data ingestion tools (Kafka, Flume, Kinesis) and data formats (Parquet, Avro, ORC).

Proficiency in Airflow / Prefect for scheduling and orchestration.

Familiarity with performance tuning in Spark (partitioning, caching, broadcast joins).

Experience deploying on Databricks, AWS EMR, Azure HDInsight, or GCP Dataproc .

Exposure to testing and CI / CD for data pipelines (pytest, Jenkins, GitHub Actions).

Budget

Onshore : As per market standards.

Create a job alert for this search

Data Engineer • Navi Mumbai, Maharashtra, India