Talent.com
This job offer is not available in your country.
Site Reliability Engineering, Sr Staff

Site Reliability Engineering, Sr Staff

ConfidentialBengaluru / Bangalore
30+ days ago
Job description
  • SRE lead with capability to execute SRE lifecycle and automation process.
  • Discover, design, and implement changes to existing IT infrastructure with a focus on improved reliability, performance, and standardization.
  • Collaborate with Engineering and business units to translate customer, business, and technical requirements into SRE architectural designs and enhancements.
  • Develop and analyze various business and technical scenarios to drive the highest levels of executive decision-making around infrastructure resources. Drive consensus and decisions with stakeholders.
  • Troubleshoot production issues providing root cause analysis and designing solutions to prevent future occurrences.
  • Build automated, scalable, and rigorous solutions to infrastructure problems by leveraging or developing state-of-the-art automation, mathematical optimization, and / or AI models.
  • Monitor services and create intelligent alarming for quicker incident detection and resolution.
  • Identify opportunities to invent and simplify processes, identifying business risks and implementing resolutions and scalable mechanisms.
  • Ensure efficient resource utilization and continuously improve processes leveraging automation and internal tools resulting in enhanced service delivery, maturity, and scalability.
  • Mentor and coach other SRE team members.
  • The Impact You Will Have :

    • Enhance the reliability and performance of Synopsys IT infrastructure.
    • Standardize and automate processes to increase operational efficiency.
    • Translate complex requirements into actionable SRE designs and solutions.
    • Provide critical insights and drive decision-making for infrastructure improvements.
    • Prevent future production issues through meticulous root cause analysis and proactive solutions.
    • Contribute to the scalability and robustness of our infrastructure through innovative solutions.
    • Enhance incident detection and resolution times, ensuring minimal disruption.
    • Streamline processes to mitigate business risks and improve scalability.
    • Optimize resource utilization, ensuring cost-effective and efficient operations.
    • Develop the next generation of SRE talent through mentorship and coaching.
    • What You'll Need :

    • Extensive experience with a wide range of infrastructure technologies, such as Linux, Windows, High-performance computing, storage platforms, networking, cloud computing, cloud services (IaaS, PaaS, SaaS), virtualization, OpenStack, containerization, and orchestration technologies (e.g., Docker, Kubernetes).
    • Expertise in HPC components like NFS / Shared File systems and Grid Schedulers (IBM spectrum LSF / Univa Grid / SLURM).
    • Deep understanding of IT infrastructure-related services and their dependencies required to troubleshoot issues and define mitigations.
    • Strong command and understanding of statistical concepts / models / analysis and how they relate to product reliability & life cycle analysis.
    • Experience developing quantitative and qualitative analysis and metrics to solve business problems.
    • Experience with developing service level indicators and objectives, instrumenting software, and building alerts.
    • Hands-on experience with one or more of Java / Python / Go / AngularJS / NodeJS languages.
    • Implementation experience in infra-automation tools and frameworks like GitHub, Maven / Gradle, Jenkins, Terraform (IaC), Ansible, Shell scripting.
    • Skills Required

      Maven, Cloud Computing, Linux, Networking, Shell Scripting, Virtualization, Site Reliability Engineering

    Create a job alert for this search

    Engineering Sr Staff • Bengaluru / Bangalore