Talent.com
PySpark Developer

PySpark Developer

ConfidentialMumbai
5 days ago
Job description

Key Skills & Responsibilities

  • Strong expertise in PySpark and Apache Spark for batch and real-time data processing.
  • Experience in designing and implementing ETL pipelines, including data ingestion, transformation, and validation.
  • Proficiency in Python for scripting, automation, and building reusable components.
  • Hands-on experience with scheduling tools like Airflow or Control-M to orchestrate workflows.
  • Familiarity with AWS ecosystem, especially S3 and related file system operations.
  • Strong understanding of Unix / Linux environments and Shell scripting.
  • Experience with Hadoop, Hive, and platforms like Cloudera or Hortonworks.
  • Ability to handle CDC (Change Data Capture) operations on large datasets.
  • Experience in performance tuning, optimizing Spark jobs, and troubleshooting.
  • Strong knowledge of data modeling, data validation, and writing unit test cases.
  • Exposure to real-time and batch integration with downstream / upstream systems.
  • Working knowledge of Jupyter Notebook, Zeppelin, or PyCharm for development and debugging.
  • Understanding of Agile methodologies, with experience in CI / CD tools (e.g., Jenkins, Git).

Preferred Skills

  • Experience in building or integrating APIs for data provisioning.
  • Exposure to ETL or reporting tools such as Informatica, Tableau, Jasper, or QlikView.
  • Familiarity with AI / ML model development using PySpark in cloud environments
  • Skills : ci / cd,zeppelin,pycharm,pyspark,etl tools,control-m,unit test cases,tableau,performance tuning,jenkins,qlikview,informatica,jupyter notebook,api integration,unix / linux,git,aws s3,hive,cloudera,jasper,airflow,cdc,pyspark, apache spark, python, aws s3, airflow / control-m, sql, unix / linux, hive, hadoop, data modeling, and performance tuning,agile methodologies,aws,s3,data modeling,data validation,ai / ml model development,batch integration,apache spark,python,etl pipelines,shell scripting,hortonworks,real-time integration,hadoop.
  • Mandatory Key Skills - Apache Spark,Python,ETL,Unix,Linux,data engineering,Agile methodologies,CI / CD,data modeling,data validation,PySpark.
  • Skills Required

    Pyspark, Hadoop, Spark SQL, Aws, Scala

    Create a job alert for this search

    Pyspark Developer • Mumbai