- As a Senior Site Reliability Engineer, you will be responsible for developing sophisticated systems and software based on the customer s business goals, needs and general business environment
- You will work with product management, other engineering teams, customer success and support on developing cutting edge new product features and enhancements across various areas of Boomi offerings
You will :
- Be an active member of an Agile team, collaboratively realizing features through the software development lifecycle.
- Design, build and maintain infrastructure as code that enables provisioning and maintenance of Boomi s infrastructure.
- Participate actively in detecting, remediating and reporting on Production incidents, ensuring the SLAs / SLOs are defined and met.
- Participate in an on-call rotation to ensure coverage for planned / unplanned events.
- Engage with other Engineering organizations to implement processes, identify improvements, and drive consistent results.
- Working with your SRE and other engineering counterparts for building more scalable, resilient and reliable systems.
- Collaborate with Engineering organizations to build and automate tooling.
- Implement best practices on Observability and build monitoring that alerts on symptoms rather than on outages.
- Improve operational processes (such as deployments and upgrades) to make them as simple as possible.
- Plan the growth of Boomi s infrastructure.
- Work independently with a minimal level of guidance from technical leadership.
- Mentor other Boomi engineers, including design collaboration and code reviews.
What you ll need to succeed in this role
- Passionate about SRE, DevOps, Automation and infrastructure platforms.
- Expert in developing Ansible playbooks and automation for Infrastructure as code using CloudFormation templates.
- A grasp of Cloud Native concepts, containerization best practices and security awareness in Cloud will be a strong plus.
- Expert in defining, measuring, and improving Reliability Metrics.
- Strong understanding in implementing observability practices (Monitoring, Logging, Distributed Tracing etc.) preferably using Splunk and New Relic.
- Strong understanding and working experience with AWS / Azure.
- Ability to design and implement API s for use by internal teams.
- Strong understanding of CI / CD workflows.
- Experience with agile collaboration tools, such as JIRA and Confluence.
- Experience with Web Services technologies including REST, SOAP, and WSDL.
Additional experience desired
- 7+ years experience in the software engineering industry, with experience supporting large scale SaaS and Cloud based software solutions in production.
- Certified in Cloud (AWS / Azure / GCP), experience in using services such as virtual machines, containers and databases.
- Experience in Ansible, Terraform, Python and JavaScript.
- Familiarity using AWS technologies such as CloudFormation, S3, ECS, EKS, and EC2.
- Security awareness in the Cloud will be a strong plus.
- Experience in Observability, creating dashboards for SLA / SLI / SLO.
- Basic understanding of Application Integration and / or Data Integration (ETL).
Role : Site Reliability Engineer
Industry Type : IT Services & Consulting
Department : Engineering - Software & QA
Employment Type : Full Time, Permanent
Role Category : DevOps
Education
UG : Any Graduate
PG : Any Postgraduate
Skills Required
Product Management, Agile, Software Development Life Cycle, Javascript, Splunk, Operations, Automation, Jira, Python, Monitoring