Service Reliability Engineers

ConfidentialHyderabad / Secunderabad, Telangana, India

5 days ago

Job description

Our Site Reliability Engineers (SREs) play a crucial role in ensuring our systems are reliable, scalable, and efficient. We are looking for an experienced SRE to join our team and help us maintain and improve our infrastructure.

Responsibilities

Monitor and Maintain Systems : Ensure the availability, performance, and reliability of our production environment by monitoring system health and responding to incidents.
Automation : Develop and implement automation tools to reduce manual intervention and improve system efficiency.
Collaboration : Work closely with development teams to design and implement scalable and reliable systems.
Performance Tuning : Analyze system metrics to identify performance bottlenecks and optimize system performance.
Incident Management : Lead incident response efforts, conduct root cause analysis, and implement preventive measures.
Documentation : Create and maintain comprehensive documentation for system architecture, processes, and procedures.
Capacity Planning : Conduct capacity planning and ensure systems can handle future growth.

Qualifications

Experience : 6+ years of experience in site reliability engineering, operations, or software engineering.

Education : Bachelor's degree in Computer Science, Engineering, or a related field.

Technical Skills : Proficiency in scripting languages (e.g., Python, Ruby), experience with containerization (Docker, Kubernetes), and familiarity with cloud platforms (AWS, GCP, Azure).

System Knowledge : Strong understanding of Linux / Unix systems, networking, and infrastructure components.

Problem-Solving : Excellent troubleshooting and problem-solving skills.

Communication : Strong communication and collaboration skills to work effectively with cross-functional teams.

Certifications : Relevant certifications (e.g., AWS Certified Solutions Architect, Certified Kubernetes Administrator) are a plus.

Preferred Skills

Experience with configuration management tools (e.g., Ansible, Chef, Puppet).

Knowledge of CI / CD pipelines and tools (e.g., Jenkins, GitLab CI).

Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).

Why Join Us

Innovative Environment : Work on cutting-edge technologies and projects.

Growth Opportunities : Opportunities for professional development and career advancement.

Collaborative Culture : Join a team that values collaboration, diversity, and inclusion.

Competitive Benefits : Comprehensive benefits package including health insurance, retirement plans, and more.

Skills Required

Unix, Chef, Prometheus, Elk Stack, Grafana, Jenkins, Gcp, Linux, Docker, Ansible, Ruby, Puppet, Azure, Kubernetes, Python, Aws

Create a job alert for this search

Reliability Engineer • Hyderabad / Secunderabad, Telangana, India

Related jobs

Promoted

Site Reliability Engineer - Datadog

GSPANN Technologies, Inchyderabad, telangana, in

Headquartered in California, U.GSPANN provides consulting and IT services to global clients.We help clients transform how they deliver business value by helping them optimize their IT capabilities,...Show moreLast updated: 12 days ago

Promoted

Site Reliability Engineer

NationsBenefits IndiaHyderabad, Republic Of India, IN

Site Reliability Engineer (SRE) | Fintech | Kubernetes | Datadog |.SRE team focused on maintaining the performance, reliability, and availability of our fintech platforms.Triage and resolve product...Show moreLast updated: 23 days ago

Promoted

Field Service Engineer

ABBHyderabad, Telangana, India

This job is with ABB, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly.At ABB, we help indu...Show moreLast updated: 3 days ago

Promoted

Reliability Engineer Ii

ANSRHyderabad, Republic Of India, IN

ANSR is hiring for one of its clients.About T-Mobile : T-Mobile US, Inc.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its st...Show moreLast updated: 16 days ago

Promoted

Sr Engineer, Site Reliability Engineer [T500-20464]

TMUS Global SolutionsHyderabad, Telangana, India

NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago

Promoted

Service Engineer

ConfidentialHyderabad / Secunderabad, Telangana, India

Job Roles And Responsibilities.Efficiently review loss reported and documents submitted based on details available in claim HUB / photos / video / physical inspection . Review, reconcile, and negotiate...Show moreLast updated: 30+ days ago

Promoted

Lead Service Engineer

ConfidentialHyderabad / Secunderabad, Telangana

The Lead Service Engineer will be responsible for overseeing service engineering projects, providing technical support, and ensuring high-quality service delivery in alignment with client needs and...Show moreLast updated: 1 day ago

Promoted

Reliability Engineer II

ConfidentialHyderabad / Secunderabad, Telangana, India

At Medtronic you can begin a life-long career of exploration and innovation, while helping champion healthcare access and equity for all. You'll lead with purpose, breaking down barriers to innovati...Show moreLast updated: 5 days ago

Promoted

Senior Site Reliability Engineer

AutoRABITHyderabad, Telangana, India

AutoRABIT Profile AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Control, and Backup & Recovery co...Show moreLast updated: 30+ days ago

Promoted

AutoRABIT - Senior Site Reliability Engineer - AWS Infrastructure

AutoRABIT Software Pvt LtdHyderabad

Description : AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Contro...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

Talent Sutrahyderabad, telangana, in

The position exists to deploy the products and their updates ensuring smooth infrastructure and configuration management for robust project delivery. Operating System (Linux & Windows), Ansible, Doc...Show moreLast updated: 1 day ago

Promoted

Site Reliability Engineer

CapgeminiHyderabad, IN

Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 11 days ago

Promoted

Senior Engineer - Reliability [T500-18354]

ANSRHyderabad, Telangana, India

To Care for People on Life's Journey®.We have a relentless drive for innovation and excellence.Whether you're engaging with customers at the airport or advancing our IT infrastructure, every team m...Show moreLast updated: 22 days ago

Promoted

Senior Site Reliability Engineer

Nebula Tech Solutionssecunderabad, telangana, in

SRE team supporting mission-critical applications for our.We’re now looking for engineers who can go beyond operations — those who can. Enhance application reliability through code.Add or modify cod...Show moreLast updated: 1 day ago

Promoted

Senior Reliability Engineer - Component

ConfidentialHyderabad / Secunderabad, Telangana, India

Promoted

Site Reliability Engineer - Elastic Kubernetes Service

D2KSSHyderabad

Description : Key Responsibilities : - Manage and maintain Kubernetes clusters (...Show moreLast updated: 19 days ago

Promoted
New!

Site Reliability Engineer

CitNOW GroupHyderabad, IN

Founded in 2008, CitNOW is an innovative, enterprise-level software product suite that allows automotive dealerships globally to sell more vehicles and parts more profitably.CitNOW’s app-based plat...Show moreLast updated: 15 hours ago

Promoted

Principal Service Reliability Engineer

ConfidentialHyderabad / Secunderabad, Telangana, India

End-to-end service ownership : design for telemetry, security, resiliency, scalability, and performance lead sizing / architecture drive service health reviews and process simplification.Incident mana...Show moreLast updated: 1 day ago