Job Description :
Incident Response & On-Call Support
Serve as primary responder for production incidents; participate in on-call rotation for Java platform services.
Investigate and diagnose application-level problems (e.g., memory leaks, GC pauses, thread deadlocks, CPU bottlenecks).
Execute short-term fixes such as restarting services, modifying configurations, or managing deployment rollbacks.
Escalate critical issues to development teams or dependent stakeholders when needed.
Operational Maintenance
Conduct recurring system maintenance : monthly framework upgrades, dependency patching, configuration validation.
Monitor and audit application health, performance, and availability using internal tools and dashboards.
Maintain and improve runbooks, response procedures, and documentation.
Collaboration & Observability
Collaborate with engineering teams during production deployments or rollouts.
Analyze application metrics, logs, and traces to identify system issues or inefficiencies.
Partner with infrastructure, database, and observability teams to tune systems for performance and reliability.
Required Qualifications
Preferred Qualifications
Experience working with Apache Flink, Apache Spark, or other distributed data processing frameworks in production.
Familiarity with operational patterns for diverse data systems including :
Oracle databases
Key-Value stores (e.g., Redis, RocksDB)
Document databases (e.g., MongoDB or similar)
Graph databases (e.g., Neo4j, JanusGraph)
Understanding of production concerns like data consistency, latency, availability, and failure handling.
Exposure to container-based environments (Docker, Kubernetes).
Application Support Engineer • India