About the Role :
We are seeking a highly skilled Senior Data Engineer with expertise in designing, building, and optimizing large-scale data pipelines and platforms.
The ideal candidate will have strong hands-on experience with big data technologies, AWS cloud services, and modern CI / CD automation frameworks.
You will play a pivotal role in architecting robust data solutions, ensuring scalability, performance, and reliability, while collaborating closely with cross-functional teams across engineering, product, and operations.
Key Responsibilities :
Data Platform Engineering :
- Design, develop, and enhance data ingestion, transformation, and orchestration pipelines using open-source frameworks, AWS cloud services, and GitLab automation.
- Implement best practices in distributed data processing using PySpark, Python, and SQL.
Collaboration & Solutioning :
Partner with product managers, data scientists, and technology stakeholders to design and validate scalable data platform capabilities.Translate business requirements into technical specifications and implement data-driven solutions.Optimization & Automation :
Identify, design, and implement process improvements including automation of manual processes, pipeline optimization, and system scalability enhancements.Drive adoption of infrastructure-as-code and automated CI / CD pipelines for data workloads.Monitoring & Reliability :
Define, implement, and maintain robust monitoring, logging, and alerting mechanisms for data pipelines and services.Ensure data quality, availability, and reliability across the production environment.Technical Enablement :
Provide platform usage guidance, technical support, and best practices to teams consuming the data platform.Contribute to internal knowledge bases, playbooks, and engineering documentation.Required Qualifications :
Proven experience in building, maintaining, and optimizing large-scale data pipelines in distributed computing environments.Strong programming experience in Python and PySpark, with advanced working knowledge of SQL (4+ years).Expertise in working within Linux environments for data development and operations.Strong knowledge and experience with AWS services such as S3, EMR, Glue, Redshift, Lambda, and Step Functions.Hands-on experience with DevOps / CI / CD tools such as Git, Bitbucket, Jenkins, AWS CodeBuild, and CodePipeline.Familiarity with monitoring and alerting platforms (CloudWatch, Prometheus, Grafana, or equivalent).Knowledge of Palantir is a strong plus.Experience collaborating with cross-functional teams (engineering, product, operations) in a fast-paced environment.Preferred Skills :
Experience with containerized environments (Docker, Kubernetes).Exposure to data governance, lineage, and metadata management tools.Working knowledge of infrastructure-as-code tools (Terraform, CloudFormation).Familiarity with streaming technologies such as Kafka or Kinesis.(ref : hirist.tech)