About the Role :
We are seeking an experienced Java Production Support Engineer with strong expertise in IT operations, system reliability, and application support. The ideal candidate will have hands-on experience in Java-based systems, site reliability engineering (SRE) practices, and cloud-native environments. This role requires a proactive approach to monitoring, troubleshooting, and ensuring high system availability and Responsibilities :
- Monitor and manage production systems to ensure high availability, stability, and performance.
- Diagnose and resolve incidents, problems, and performance bottlenecks in production environments.
- Participate in release management, change control, and post-deployment reviews to ensure smooth operations.
- Collaborate with development and infrastructure teams to identify and implement platform enhancements and stabilization measures.
- Leverage observability tools to proactively detect anomalies and prevent incidents before they impact users.
- Create and maintain documentation for incident management, root cause analysis (RCA), and operational procedures.
- Participate in an on-call rotation to handle production issues and support mission-critical Skills & Expertise :
- IT Operations : System monitoring, performance tuning, troubleshooting, and incident management.
- Site Reliability Engineering (SRE) : 4+ years of experience in production environment monitoring and reliability practices.
- Development : Strong hands-on experience with Java, JavaScript, Spring Boot, Microservices, and SQL.
- Observability & Event Management Tools : Experience with AppDynamics, Splunk, Prometheus, and Grafana.
- Cloud Platforms : Working knowledge of AWS, Azure, or GCP
(ref : hirist.tech)