PySpark Developer

ConfidentialMumbai

5 days ago

Job description

Key Skills & Responsibilities

Strong expertise in PySpark and Apache Spark for batch and real-time data processing.
Experience in designing and implementing ETL pipelines, including data ingestion, transformation, and validation.
Proficiency in Python for scripting, automation, and building reusable components.
Hands-on experience with scheduling tools like Airflow or Control-M to orchestrate workflows.
Familiarity with AWS ecosystem, especially S3 and related file system operations.
Strong understanding of Unix / Linux environments and Shell scripting.
Experience with Hadoop, Hive, and platforms like Cloudera or Hortonworks.
Ability to handle CDC (Change Data Capture) operations on large datasets.
Experience in performance tuning, optimizing Spark jobs, and troubleshooting.
Strong knowledge of data modeling, data validation, and writing unit test cases.
Exposure to real-time and batch integration with downstream / upstream systems.
Working knowledge of Jupyter Notebook, Zeppelin, or PyCharm for development and debugging.
Understanding of Agile methodologies, with experience in CI / CD tools (e.g., Jenkins, Git).

Preferred Skills

Experience in building or integrating APIs for data provisioning.

Exposure to ETL or reporting tools such as Informatica, Tableau, Jasper, or QlikView.

Familiarity with AI / ML model development using PySpark in cloud environments

Skills : ci / cd,zeppelin,pycharm,pyspark,etl tools,control-m,unit test cases,tableau,performance tuning,jenkins,qlikview,informatica,jupyter notebook,api integration,unix / linux,git,aws s3,hive,cloudera,jasper,airflow,cdc,pyspark, apache spark, python, aws s3, airflow / control-m, sql, unix / linux, hive, hadoop, data modeling, and performance tuning,agile methodologies,aws,s3,data modeling,data validation,ai / ml model development,batch integration,apache spark,python,etl pipelines,shell scripting,hortonworks,real-time integration,hadoop.

Mandatory Key Skills - Apache Spark,Python,ETL,Unix,Linux,data engineering,Agile methodologies,CI / CD,data modeling,data validation,PySpark.

Skills Required

Pyspark, Hadoop, Spark SQL, Aws, Scala

Create a job alert for this search

Pyspark Developer • Mumbai