Talent.com
SRE Architect

SRE Architect

ConfidentialChennai, Hyderabad / Secunderabad, Telangana
30+ days ago
Job description
  • The SRE Architect will play a critical role in designing and implementing Observable, Scalable, Reliable, and Resilient systems and applications that ensure the highest levels of availability and performance for the applications and services. This role requires a deep understanding of software engineering, system architecture, and operations, along with a passion to automate repetitive tasks with GenAI tools and scripts.
  • Key Responsibilities

    • System Design and Architecture : Lead the design and architecture of scalable and reliable systems that meet the needs of our growing user base and business requirements.
    • Automation and Tooling : Develop and maintain automation tools and frameworks that streamline operations and improve system reliability.
    • Monitoring and Observability : Implement and enhance monitoring, logging, and alerting systems to ensure proactive detection and resolution of issues.
    • Capacity Planning : Conduct capacity planning and performance tuning to ensure systems can handle current and future demands.
    • Incident Management : Lead incident response efforts, perform root cause analysis, and implement corrective actions to prevent recurrence.
    • Collaboration and Mentorship : Work closely with software engineers, DevOps, and other stakeholders to promote best practices in reliability engineering and provide mentorship to junior team members.
    • Continuous Improvement : Identify areas for improvement in existing systems and processes, and drive initiatives to enhance system reliability and performance.
    • Skillset :

    • Experience : Overall 16-22 years of experience along with minimum of 9+ years of experience in site reliability engineering, DevOps, or a related field, with a proven track record of designing and implementing reliable systems at scale.
    • Technical Skills :

    • Strong programming skills in languages such as Python, Go, or Java / .Net.
    • In-depth knowledge of cloud platforms (AWS, GCP, Azure) and container orchestration (Kubernetes, Docker).
    • Experience with infrastructure as code (Terraform, Ansible, Puppet).
    • Proficiency in monitoring and observability tools (Prometheus, Grafana, Splunk, AppDynamics, Dynatrace, ELK stack).
    • Solid understanding of networking, security, and system performance tuning.
    • Soft Skills :

    • Strong problem-solving and analytical skills.
    • Excellent communication and collaboration abilities.
    • Ability to work in a fast-paced environment and manage multiple priorities.
    • Passion for continuous learning and staying up-to-date with industry trends and technologies.
    • Preferred Skillset :

    • Experience with chaos engineering and resilience testing.
    • Familiarity with service mesh architectures (Istio, Linkerd).
    • Certifications in cloud platforms (Azure Certified Architect, AWS Certified Architect, Google Cloud Professional Architect, etc.).
    • Skills Required

      Azure, Python, Aws, Gcp, Go, Java

    Create a job alert for this search

    Sre Architect • Chennai, Hyderabad / Secunderabad, Telangana