ABOUT THE ROLE
Our international team is responsible for the architecture, design, and implementation of central and business-critical platforms for API handling, system integration, and automation of rule-based business decisions. Advising our specialist departments and IT product teams is an essential part in that context. We advise and support the implementation of new use cases and the optimization of existing solutions. In addition, we work together with service providers for operation and support to guarantee the highest grade of reliability We see your role as IT operations engineer as a flexible all-rounder and dynamic team player who enjoys and is keen on operating and optimization of our cloud-based platforms. We offer a variety of tasks to expand and improve our solution, from design to implementation and management of operations-related challenges.
KEY RESPONSIBILITIES & TASKS
Monitoring & Observability :
- Continuous observation of our systems regarding availability, performance, system usage and costs
- Definition, design and implementation of observability / monitoring regarding Service Levels (SLIs / SLOs / SLAs)
- Integration in central observability solutions e.g. : Datadog, Elastic, …
- Reporting of availability, performance, system usage and costs on a regular basis Maintenance :
- Planning, coordination and implementation of system updates in collaboration with our vendors and suppliers.
- Take care of keeping our system secure by fixing vulnerabilities in collaboration with our CISO department
- Take care of housekeeping tasks Automation :
- Drive automation regarding paradigms like CaC / IaC (Configuration as Code / Infrastructure as Code) to ensure the lowest possible degree of error prone manual work.
- Optimize our CI / CD pipeline Incident & Problem Management :
- Take over responsibility of coordination & solving incidents to keep 'Mean-Time-To-Repair' and user impact as low as possible
- Drive and support problem management to ensure system reliability and prevent reoccurring incidents Service Management :
- Take over responsibility of service request handling Continuous Improvement :
- Driving continuous improvement of our platform regarding to scalability, reliability & cost-efficiency
BEHAVIOURS & APPROACH
Strong analytical and problem-solving skillsTeam-oriented with excellent communication and collaboration skillsAbility to build pro-active, co-operative working relationships with customers, peers and key stakeholders based on respect and teamworkAbility to act under pressure and to manage efficiently crisis situationsAble to evaluate information, identify key issues and formulate conclusions based on sound, practical judgment, experience, and common senseWORK EXPERIENCE
Extensive experience in operations of business critical and cloud-based platforms (monitoring, maintenance, improvement, troubleshooting, …) on an enterprise scaleExtensive experience with AWS cloud and container runtimes like ROSA (Red Hat Open Shift on AWS)Good Knowledge in end-to-end monitoring of applications and systems with enterprise observability tools (e.g. Datadog, Elastic, Prometheus, Grafana)Experience with automation tools such as Terraform or Ansible is an advantageExperience in software development and the tools used, such as version management, CI / CD, planning and collaboration tools (e.g. Git, Jenkins, Jira, Confluence, ...)Excellent communication, problem-solving, and stakeholder management skills.EDUCATION & QUALIFICATIONS
Bachelor's or Master's degree in computer science, Engineering, or related disciplineEnglish language – expert proficiency (additional languages are beneficial)Skills Required
ciso , Cicd, Datadog, Elastic Search, Aws