This job offer is not available in your country.

Big Data Engineer - PySpark / SQL

PRONEXUS CONSULTING PRIVATE LIMITEDPune

15 days ago

Job description

About the Role :

We are seeking a highly skilled Senior Data Engineer with expertise in designing, building, and optimizing large-scale data pipelines and platforms.

The ideal candidate will have strong hands-on experience with big data technologies, AWS cloud services, and modern CI / CD automation frameworks.

You will play a pivotal role in architecting robust data solutions, ensuring scalability, performance, and reliability, while collaborating closely with cross-functional teams across engineering, product, and operations.

Key Responsibilities :

Data Platform Engineering :

Design, develop, and enhance data ingestion, transformation, and orchestration pipelines using open-source frameworks, AWS cloud services, and GitLab automation.
Implement best practices in distributed data processing using PySpark, Python, and SQL.

Collaboration & Solutioning :

Partner with product managers, data scientists, and technology stakeholders to design and validate scalable data platform capabilities.

Translate business requirements into technical specifications and implement data-driven solutions.

Optimization & Automation :

Identify, design, and implement process improvements including automation of manual processes, pipeline optimization, and system scalability enhancements.

Drive adoption of infrastructure-as-code and automated CI / CD pipelines for data workloads.

Monitoring & Reliability :

Define, implement, and maintain robust monitoring, logging, and alerting mechanisms for data pipelines and services.

Ensure data quality, availability, and reliability across the production environment.

Technical Enablement :

Provide platform usage guidance, technical support, and best practices to teams consuming the data platform.

Contribute to internal knowledge bases, playbooks, and engineering documentation.

Required Qualifications :

Proven experience in building, maintaining, and optimizing large-scale data pipelines in distributed computing environments.

Strong programming experience in Python and PySpark, with advanced working knowledge of SQL (4+ years).

Expertise in working within Linux environments for data development and operations.

Strong knowledge and experience with AWS services such as S3, EMR, Glue, Redshift, Lambda, and Step Functions.

Hands-on experience with DevOps / CI / CD tools such as Git, Bitbucket, Jenkins, AWS CodeBuild, and CodePipeline.

Familiarity with monitoring and alerting platforms (CloudWatch, Prometheus, Grafana, or equivalent).

Knowledge of Palantir is a strong plus.

Experience collaborating with cross-functional teams (engineering, product, operations) in a fast-paced environment.

Preferred Skills :

Experience with containerized environments (Docker, Kubernetes).

Exposure to data governance, lineage, and metadata management tools.

Working knowledge of infrastructure-as-code tools (Terraform, CloudFormation).

Familiarity with streaming technologies such as Kafka or Kinesis.

(ref : hirist.tech)

Create a job alert for this search

Big Data Engineer • Pune