Talent.com
This job offer is not available in your country.
Big Data Engineer - PySpark / SQL

Big Data Engineer - PySpark / SQL

PRONEXUS CONSULTING PRIVATE LIMITEDPune
15 days ago
Job description

About the Role :

We are seeking a highly skilled Senior Data Engineer with expertise in designing, building, and optimizing large-scale data pipelines and platforms.

The ideal candidate will have strong hands-on experience with big data technologies, AWS cloud services, and modern CI / CD automation frameworks.

You will play a pivotal role in architecting robust data solutions, ensuring scalability, performance, and reliability, while collaborating closely with cross-functional teams across engineering, product, and operations.

Key Responsibilities :

Data Platform Engineering :

  • Design, develop, and enhance data ingestion, transformation, and orchestration pipelines using open-source frameworks, AWS cloud services, and GitLab automation.
  • Implement best practices in distributed data processing using PySpark, Python, and SQL.

Collaboration & Solutioning :

  • Partner with product managers, data scientists, and technology stakeholders to design and validate scalable data platform capabilities.
  • Translate business requirements into technical specifications and implement data-driven solutions.
  • Optimization & Automation :

  • Identify, design, and implement process improvements including automation of manual processes, pipeline optimization, and system scalability enhancements.
  • Drive adoption of infrastructure-as-code and automated CI / CD pipelines for data workloads.
  • Monitoring & Reliability :

  • Define, implement, and maintain robust monitoring, logging, and alerting mechanisms for data pipelines and services.
  • Ensure data quality, availability, and reliability across the production environment.
  • Technical Enablement :

  • Provide platform usage guidance, technical support, and best practices to teams consuming the data platform.
  • Contribute to internal knowledge bases, playbooks, and engineering documentation.
  • Required Qualifications :

  • Proven experience in building, maintaining, and optimizing large-scale data pipelines in distributed computing environments.
  • Strong programming experience in Python and PySpark, with advanced working knowledge of SQL (4+ years).
  • Expertise in working within Linux environments for data development and operations.
  • Strong knowledge and experience with AWS services such as S3, EMR, Glue, Redshift, Lambda, and Step Functions.
  • Hands-on experience with DevOps / CI / CD tools such as Git, Bitbucket, Jenkins, AWS CodeBuild, and CodePipeline.
  • Familiarity with monitoring and alerting platforms (CloudWatch, Prometheus, Grafana, or equivalent).
  • Knowledge of Palantir is a strong plus.
  • Experience collaborating with cross-functional teams (engineering, product, operations) in a fast-paced environment.
  • Preferred Skills :

  • Experience with containerized environments (Docker, Kubernetes).
  • Exposure to data governance, lineage, and metadata management tools.
  • Working knowledge of infrastructure-as-code tools (Terraform, CloudFormation).
  • Familiarity with streaming technologies such as Kafka or Kinesis.
  • (ref : hirist.tech)

    Create a job alert for this search

    Big Data Engineer • Pune