Talent.com
This job offer is not available in your country.
PySpark Developer

PySpark Developer

Corpxcel ConsultingPune
30+ days ago
Job description

Location : Chennai / Bangalore / Hyderabad / Coimbatore / Pune

WFO : 3 days Mandatory from the above-mentioned locations.

Role Summary :

We are seeking a highly skilled PySpark Developer with hands-on experience in Databricks to join Companies IT Systems Development unit in an offshore capacity. This role focuses on designing, building, and optimizing large-scale data pipelines and processing solutions on the Databricks Unified Analytics Platform. The ideal candidate will have expertise in big data frameworks, distributed computing, and cloud platforms, with a deep understanding of Databricks architecture. This is an excellent opportunity to work with cutting-edge technologies in a dynamic, fast-paced environment.

Role Responsibilities :

Data Engineering and Processing :

  • Develop and manage data pipelines using PySpark on Databricks.
  • Implement ETL / ELT processes to process structured and unstructured data at scale.
  • Optimize data pipelines for performance, scalability, and cost-efficiency in Databricks.

Databricks Platform Expertise :

  • Experience in Perform Design, Development & Deployment using Azure Services (Data Factory, Databricks, PySpark, SQL)
  • Develop and maintain scalable data pipelines and build new Data Source integrations to support increasing data volume and complexity.
  • Leverage the Databricks Lakehouse architecture for advanced analytics and machine learning workflows.
  • Manage Delta Lake for ACID transactions and data versioning.
  • Develop notebooks and workflows for end-to-end data solutions.
  • Cloud Platforms and Deployment :

  • Deploy and manage Databricks on Azure (e.g., Azure Databricks).
  • Use Databricks Jobs, Clusters, and Workflows to orchestrate data pipelines.
  • Optimize resource utilization and troubleshoot performance issues on the Databricks platform.
  • CI / CD and Testing :

  • Build and maintain CI / CD pipelines for Databricks workflows using tools like Azure DevOps, GitHub Actions, or Jenkins.
  • Write unit and integration tests for PySpark code using frameworks like Pytest or unittest.
  • Collaboration and Documentation :

  • Work closely with data scientists, data analysts, and IT teams to deliver robust data solutions.
  • Document Databricks workflows, configurations, and best practices for internal use.
  • Technical Qualifications :

    Experience :

  • 4+ years of experience in data engineering or distributed systems development.
  • Strong programming skills in Python and PySpark.
  • Hands-on experience with Databricks and its ecosystem, including Delta Lake and Databricks SQL.
  • Knowledge of big data frameworks like Hadoop, Spark, and Kafka.
  • Databricks Expertise :

  • Proficiency in setting up and managing Databricks Workspaces, Clusters, and Jobs.
  • Familiarity with Databricks MLflow for machine learning workflows is a plus.
  • Cloud Platforms :

  • Expertise in deploying Databricks solutions Azure (e.g., Data Lake, Synapse).
  • Knowledge of Kubernetes for managing containerized workloads is advantageous.
  • Database Knowledge :

  • Experience with both SQL (e.g., PostgreSQL, SQL Server) and NoSQL databases (e.g., MongoDB, Cosmos DB).
  • General Qualifications :

  • Strong analytical and problem-solving skills.
  • Ability to manage multiple tasks in a high-intensity, deadline-driven environment.
  • Excellent communication and organizational skills.
  • Experience in regulated industries like insurance is a plus.
  • Education Requirements :

  • A Bachelors Degree in Computer Science, Data Engineering, or a related field is preferred.
  • Relevant certifications in Databricks, PySpark, or cloud platforms are highly desirable.
  • (ref : hirist.tech)

    Create a job alert for this search

    Pyspark Developer • Pune