Talent.com
This job offer is not available in your country.
Site Reliability Engineer - DevOps

Site Reliability Engineer - DevOps

Whitefield CareersBangalore
30+ days ago
Job description

Key Responsibilities :

  • Troubleshoot complex issues in Linux environments and conduct application-level debugging.
  • Manage and provision infrastructure using Terraform and configuration management tools.
  • Orchestrate and manage containers using Kubernetes in a production-grade environment.
  • Design and maintain CI / CD pipelines to enable seamless deployments and continuous delivery.
  • Script automation tools and processes to enhance operational efficiency and reliability.
  • Monitor system health and performance using tools such as Grafana, Prometheus, and Loki.
  • Set up alerts and dashboards for proactive system monitoring and issue detection.
  • Collaborate with development, QA, and operations teams to improve application and system performance.
  • Lead incident response efforts, perform root cause analysis, and ensure timely resolution.
  • Perform API and load testing using Gatling and JMeter to validate system resilience.
  • Administer and support Finacle operations and its integration within the infrastructure.
  • Apply deep knowledge of TCP / IP, HTTP, DNS, and Load Balancing protocols to maintain highly available services.
  • Document system configurations, processes, and troubleshooting guides for internal use.
  • Work across Linux and Windows systems, providing support and implementing improvements.

Key Skills & Qualifications :

  • 4+ years of experience in SRE, DevOps, or Infrastructure Engineering roles.
  • Proven expertise in Linux system administration and debugging complex application issues.
  • Strong experience with Terraform, Kubernetes, and container orchestration.
  • Hands-on experience in managing CI / CD pipelines and version control systems.
  • Proficiency in scripting languages (e.g., Bash, Python, or similar).
  • Sound knowledge of Finacle operations is highly desirable.
  • Familiarity with system architecture, configuration management, and automation tools.
  • Deep understanding of network protocols including TCP / IP, HTTP, DNS, and Load Balancing.
  • Experience with Grafana, Prometheus, Loki, and other observability tools.
  • Ability to define alerts, dashboards, and troubleshoot performance issues using system metrics.
  • Proficient in incident management, root cause analysis, and creating postmortems.
  • Skilled in API testing and load testing using Gatling and JMeter.
  • Strong interpersonal skills and ability to communicate complex technical topics clearly and concisely.
  • Strong documentation skills and ability to collaborate in cross-functional teams.
  • (ref : hirist.tech)

    Create a job alert for this search

    Site Reliability Engineer • Bangalore