Description : Big Data Functions :
- Design and implement data pipelines for migration from HDFS / Hive to cloud object storage (e.g., S3, Ceph).
- Optimize Spark (and optionally Flink) jobs for performance and scalability in a Kubernetes environment.
- Ensure data consistency, schema evolution, and governance with Apache Iceberg or equivalent table formats.
- Support migration strategy definition by providing technical input and identifying risks.
- Mentor junior developers and review their code / design decisions.
- Collaborate with platform engineers, cloud architects, and product stakeholders to align technical implementation with project goals.
- Troubleshoot complex distributed system issues in data pipelines or storage :
- Experience 7 to 12 Years.
- Scala and Python.
- Apache Spark (batch & streaming) must!
- Deep knowledge of HDFS internals and migration strategies.
- Experience with Apache Iceberg (or similar table formats like Delta Lake / Apache Hudi) for schema evolution, ACID transactions, and time travel.
- Running Spark and / or Flink jobs on Kubernetes (e.g., Spark-on-K8s operator, Flink-on-K8s).
- Experience with distributed blob storages like Ceph or AWS S3 and similar.
- Building ingestion, transformation, and enrichment pipelines for large-scale datasets.
- Infrastructure-as-Code (Terraform, Helm) for provisioning data infrastructure.
- Ability to work independently while guiding be a plus :
- Experience with Apache Flink.
- Prior experience in migration projects or large-scale data platform modernization.
- Apple experience preferred (to enable him / her to get up to speed on our tooling set quickly and more independently).
(ref : hirist.tech)