About the Role :
We are seeking a highly skilled Senior Data Engineer to design, build, and optimize scalable data pipelines and platforms that power analytics, reporting, and machine learning use cases.
This role requires strong technical expertise in distributed data systems, cloud-native architectures, and automation frameworks.
The ideal candidate is passionate about building reliable, high-performance data infrastructure and enabling data-driven decision-making across the enterprise.
Key Responsibilities :
- Data Pipeline Development : Design, develop, and maintain scalable ETL / ELT pipelines for ingesting, transforming, and processing large volumes of structured and unstructured data.
- Platform Engineering : Enhance data-processing frameworks, orchestration workflows, monitoring systems, and CI / CD pipelines leveraging AWS, GitLab, and open-source technologies.
- Optimization & Automation : Identify opportunities to automate manual processes, optimize workflows for efficiency, and re-architect solutions for improved scalability, availability, and usability.
- Collaboration : Partner with product managers, data scientists, and application teams to understand requirements, define data models, and ensure reliable data delivery for analytical use cases.
- Platform Support : Provide guidance, training, and technical support to internal stakeholders consuming platform services.
- Monitoring & Reliability : Establish metrics, implement monitoring tools, and configure alerting mechanisms to proactively track system health, detect anomalies, and ensure SLA adherence.
- Best Practices : Enforce coding standards, data governance policies, and DevOps practices for secure and compliant data solutions.
Qualifications & Technical Skills
Core Expertise :
Proven experience in building and optimizing data pipelines in distributed environments.Strong programming expertise in Python and PySpark (4+ years).Advanced proficiency in SQL for querying, modeling, and tuning performance.Hands-on experience with Linux environments and shell scripting.Cloud & Tools :
Experience with AWS services such as S3, Glue, EMR, Redshift, Lambda, and Athena.Familiarity with CI / CD and version control tools : Git / Bitbucket, Jenkins, AWS CodeBuild, CodePipeline.Exposure to monitoring and alerting tools (e.g., CloudWatch, Prometheus, Grafana, ELK).Additional Skills :
Working knowledge of Palantir platform is a strong plus.Experience collaborating with cross-functional teams (data scientists, analysts, DevOps, application developers).Strong problem-solving and analytical skills with ability to debug complex data issues.Knowledge of distributed computing concepts, data partitioning, and performance tuning.Preferred Attributes :
Experience with modern data lakehouse and streaming platforms (Databricks, Kafka, Delta Lake).Understanding of data security, governance, and compliance in regulated environments.Ability to design highly available, cost-optimized, and production-ready solutions.Strong communication skills to engage with stakeholders and present technical solutions clearly.(ref : hirist.tech)