Must-Have Requirements
Experience :
5–8 years in SRE and / or DevOps roles
Programming Skills :
Proficiency in at least one coding language — preferably
Python
or
C++
Platform Support :
Experience supporting and enhancing
AI Platform services
Automation :
Proven ability to implement automation solutions
Troubleshooting :
Strong skills in diagnosing and resolving issues in
distributed systems
Agile Methodologies :
Comfortable working in
Scrum
or
Kanban
environments
Soft Skills :
Team-oriented mindset
Strong interpersonal and communication skills
High ownership and accountability
Work Ethic :
Independent and proactive
Comfortable in fast-paced, ambiguous environments
Energetic, self-directed, and self-motivated
Ray.io Expertise :
Hands-on experience with
Ray.io
for : Workload management
Cluster deployment
Distributed task scheduling
Technical Stack Familiarity :
Kubernetes
Docker
Linux fundamentals
CI / CD pipelines
Good-to-Have Skills
Exposure to
AI / ML infrastructure
or platform engineering
Experience in cross-functional collaboration across service lines
Familiarity with cloud-native tools and observability frameworks
Senior Site Reliability Engineer • India