Job Title : Site Reliability Engineer (SRE)
Company : Talent Worx
Location : Pune, Maharashtra, India
About Talent Worx :
Talent Worx is a dynamic and innovative company at the forefront of digital transformation. We pride ourselves on building robust, scalable, and highly available systems that empower our clients and drive technological advancement. We foster a collaborative environment where continuous learning and operational excellence are key.
About the Role :
We are seeking a highly experienced and technically proficient Site Reliability Engineer (SRE) to join our growing team in Pune. The ideal candidate will have a strong background in ensuring the stability, performance, and scalability of complex, multi-architecture applications. You will be instrumental in bridging the gap between development and operations, implementing best practices for reliability, and driving automation to enhance our systems' resilience. This role requires an individual with exceptional troubleshooting skills, a deep understanding of cloud environments, and a proactive approach to incident management and problem prevention.
Key Responsibilities :
Incident Management & Resolution :
- Lead the resolution of critical incidents and complex issues in Java / .NET based applications, ensuring minimal downtime and impact.
- Perform thorough Root Cause Analysis (RCA) for all major incidents, identifying systemic issues and implementing preventative measures.
- Collaborate cross-functionally with development, QA, and product teams to implement permanent fixes and improvements.
System Reliability & Performance :
Proactively identify potential reliability issues and performance bottlenecks in multi-architecture systems, including microservices.Implement and manage robust alerting and monitoring solutions using tools likeThousandEyes, Splunk, and Google Cloud Monitoring to ensure comprehensive visibility.
Define and track key reliability metrics (SLIs, SLOs) and work towards continuous improvement.DevOps & Cloud Operations :
Leverage expertise in DevOps methodologies to promote automation, efficiency, and cultural collaboration.Manage and optimize CI / CD pipeline deployments using tools like Harness and Bamboo, ensuring smooth and reliable releases.Maintain and manage version control systems, primarily Git.Operate and troubleshoot applications deployed on leading cloud platforms such as Google Cloud Platform (GCP) and Amazon Web Services (AWS).Containerization & Orchestration :
Work extensively with containerization technologies (e.g., Docker) and orchestration platforms (e.g., Kubernetes, Cloud Foundry) to manage application deployments and scaling.Network & Web Technologies :
Apply deep knowledge of Internet protocols (HTTP, DNS, TCP / UDP) and web services technologies (SOAP, JSON, REST) to diagnose and resolve connectivity and communication issues.Automation & Scripting :
Develop and maintain robust automation scripts using Unix Shell Scripting or other programming languages (e.g., Python, Go) to streamline operational tasks, enhance system reliability, and improve efficiency.Stakeholder Communication :
Exhibit very strong communication and stakeholder coordination skills, providing clear, concise, and timely updates during incidents and collaborating effectively with various internal and external teams.Required Skills & Experience :
Minimum of 8 years of extensive experience in application support, specifically with Java / .NET based systems, including issues resolution and incident management.Proven expertise in conducting and documenting Root Cause Analysis (RCA).Demonstrated strong troubleshooting and debugging skills for complex, multi-architecture systems.Solid understanding and practical experience with microservices architecture patterns.Hands-on experience with DevOps practices and principles.Proficiency in cloud computing platforms, particularly GCP and / or AWS.Mandatory experience in setting up and managing alerting and monitoring systems, including ThousandEyes, Splunk, and Google Cloud Alerts monitoring.Direct experience with managing CI / CD pipeline deployments using tools like Harness and Bamboo.Proficient in using Git for version control.Experience working with containerization technologies such as Docker, and orchestration platforms like Kubernetes or Cloud Foundry.Deep knowledge of Internet protocols (HTTP, DNS, TCP / UDP) and web services technologies (SOAP, JSON, REST).Mandatory proficiency in Unix Shell Scripting or at least one programming language (e.g., Python, Go, Ruby).Exceptional communication (written and verbal) and stakeholder coordination skills.Preferred Qualifications :
Bachelors degree in Computer Science, Engineering, or a related field.Certifications in cloud platforms (e.g., GCP Professional Cloud Architect / DevOps Engineer, AWS Certified DevOps Engineer).Experience with Infrastructure as Code (IaC) tools like Terraform or Ansible.Familiarity with ITIL frameworks for incident and problem management.Prior experience in a consulting or service-oriented role.What We Offer :
A challenging and rewarding role within a fast-paced and innovative environment.Opportunity to work with cutting-edge technologies and complex systems.A collaborative and supportive team culture focused on continuous improvement.Competitive compensation and benefits package.Opportunities for professional growth and career advancement.If you are a passionate SRE with a drive for operational excellence and a knack for solving complex problems, we encourage you to apply!
ref : hirist.tech)