Job Title : SRE2
Location : Bengaluru, Karnataka
What you will do :
- Design, write and build tools to improve the reliability, latency, availability and scalability.
- Engender reliability and availability starting with metrics and measurements
- Enable scaling by providing tools, developing training and / or augmenting processes
- Build tools / automate to prevent re-occurrence of problems in mission critical products / services.
- Engages with the development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes.
- Dynamically manage workload of the SRE team, drive and deliver on multiple priorities simultaneously
- Provide thought leadership in architecture, design, product features and provide feedback on products built on a variety of platforms
- Design, code, test, and deliver software to automate manual operational work
- Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents
- Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes
- Identify application patterns and analytics in support of better service level objectives
- Design self-healing and resiliency patterns
- Design automated software and product upgrades, change management, and release management solutions
- Coach or manage teams as applicable
- Participate in the 24x7 support coverage as needed
- Should be self-motivated and willing to work under minimum surveillance
Who you are :
Bachelor's degree or equivalent experience in an software engineering discipline5 to 7 years of experience.Experience in Software development in one or more of the following programming language is must : Python / go,Expertise in at least one technology stack designing, coding, testing, and delivering softwareExperience in Distributed computing.Strong experience in designing and building highly available high-volume messaging infrastructure with Apache Kafka on AWS and On-prem (e.g. stretch cluster, active / active or active / passive) using Mirror Maker or other replication tools.Good experience with Schema Registry, Kafka connectors (source and sink) and KSQL, have worked with Kafka brokers, Zookeeper, Topics, connectors for Setup and administration.Strong experience in Enterprise Redis, cluster setup, administration, reliability and observability.Strong experience in setting up monitoring and management with tools.Working knowledge of monitoring, management tools and data growth management.Devops Tools experience in Jenkins / Ansible / Git workflows / CICDProficiency in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firmWorking knowledge of infrastructure components (e.g. routers, load balancers, cloud products, container systems, compute, storage, and networks)Excellent debugging and troubleshooting skills.Experience with infrastructure provisioning tools like Terraform or Ansible.Hands-on experience deploying and operating applications using IaaS and PaaS Amazon AWS.(ref : hirist.tech)