Job Title
We are building the future of healthcare analytics. Our goal is to create pipelines that are reliable, observable and continuously improving in production.
- Design, build and maintain scalable ETL pipelines using Python (Pandas, PySpark) and SQL, orchestrated with Airflow.
- Develop and maintain the SAIVA Data Lake / Lakehouse on AWS, ensuring quality, governance, scalability and accessibility.
- Run and optimize distributed data processing jobs with Spark on AWS EMR and / or EKS.
- Implement batch and streaming ingestion frameworks (APIs, databases, files, event streams).
- Enforce validation and quality checks to ensure reliable analytics and ML readiness.
- Monitor and troubleshoot pipelines with CloudWatch, integrating observability tools like Grafana, Prometheus or Datadog.
- Automate infrastructure provisioning with Terraform, following AWS best practices.
- Manage SQL Server, PostgreSQL and Snowflake integrations into the Lakehouse.
- Participate in an on-call rotation to support pipeline health and resolve incidents quickly.
- Write production-grade code, and contribute to design / code reviews and engineering best practices.
Requirements
We require strong technical skills from our team members :
5+ years experience in data engineering, ETL pipeline development or data platform roles.Experience designing and operating data lakes or Lakehouse architectures on AWS.Strong SQL skills with PostgreSQL, SQL Server and at least one AWS cloud warehouse.Proficiency in Python (Pandas, PySpark); Scala or Java a plus.Hands-on with Spark on AWS EMR and / or EKS for distributed processing.Strong background in Airflow for workflow orchestration.Expertise with AWS services : S3, Glue, Lambda, Athena, Step Functions, ECS, CloudWatch.Proficiency with Terraform for IaC; familiarity with Docker, ECS and CI / CD pipelines.Experience building monitoring, validation and alerting into pipelines with CloudWatch, Grafana, Prometheus or Datadog.Strong communication skills and ability to collaborate with data scientists, analysts and product teams.A track record of delivering production-ready, scalable AWS pipelines.