As part of our team you will be responsible for developing and operating our big data platform using open source or other solutions to aid critical applications such as analytics reporting and AI / ML apps. This includes working to optimize performance and cost automate operations and identifying and resolving production errors and issues to ensure the best data platform experience.
- 3 years of professional software engineering experience with large-scale big data platforms including strong programming skills in Java Scala Python or Go.
- Proven expertise in designing building and operating large-scale distributed data processing systems with a strong focus on Apache Spark.
- Hands-on experience with table formats and data lake technologies such as Apache Iceberg ensuring scalability reliability and optimized query performance.
- Skilled at coding for distributed systems and developing resilient data pipelines.
- Strong background in incident management including troubleshooting root cause analysis and performance optimization in complex production environments.
- Proficient with Unix / Linux systems and command-line tools for debugging and operational support.
- Expertise in designing building and operating critical large-scale distributed systems with a focus on low latency fault-tolerance and high availability.
- Experience with contribution to Open Source projects is a plus.
- Experience with multiple public cloud infrastructure managing multi-tenant Kubernetes clusters at scale and debugging Kubernetes / Spark issues.
- Experience with workflow and data pipeline orchestration tools (e.g. Airflow DBT).
- Understanding of data modeling and data warehousing concepts.
- Familiarity with the AI / ML stack including GPUs MLFlow or Large Language Models (LLMs).
- A learning attitude to continuously improve the self team and the organization.
- Solid understanding of software engineering best practices including the full development lifecycle secure coding and experience building reusable frameworks or libraries.
Key Skills
Kubernetes,FMEA,Continuous Improvement,Elasticsearch,Go,Root cause Analysis,Maximo,CMMS,Maintenance,Mechanical Engineering,Manufacturing,Troubleshooting
Employment Type : Full-Time
Experience : years
Vacancy : 1