Title : Data Platform ProdOps Engineer
Bengaluru, KA
Role Summary
We are seeking a proactive and detail-oriented Production Operations Engineer to join our Data Platforms team. This role plays a critical part in maintaining the stability and reliability of our large-scale data platform by providing hands-on operational support, system monitoring, and first-line incident triage.
The ideal candidate is comfortable working in containerized environments, has solid Linux fundamentals, and demonstrates strong discipline in executing standard operating procedures and documenting outcomes. This position requires close collaboration with Shanghai and San Jose based developers and product teams in a cross-time zone environment.
Key Responsibilities
- Monitor production systems and job pipelines; respond promptly to alerts and anomalies
- Troubleshoot operational issues in collaboration with the development team
- Investigate incidents using logs, metrics, and observability tools (e.g., Grafana, Kibana)
- Perform recovery actions such as restarting pods, rerunning jobs, or applying known mitigations
- Operate in Kubernetes environments to inspect, debug, and manage components
- Support deployment activities through post-release validations and basic checks
- Validate data quality and flag anomalies to the relevant engineering teams
- Maintain clear documentation of incidents, actions taken, and resolution outcomes
- Communicate effectively with remote teams for operational handoffs and follow-ups
Required Qualifications
3 years of experience in production operations, system support, or devops rolesSolid Linux skills (e.g., file system navigation, log analysis, process / network troubleshooting)Hands-on experience with Kubernetes and Docker in production environmentsFamiliarity with observability tools (e.g., Grafana, Kibana, Prometheus)English proficiency for reading, writing, and asynchronous communicationStrong execution discipline and ability to follow structured operational proceduresPreferred Qualifications
Scripting ability (Python or Shell) for log parsing and automationBasic SQL skills for data verification or debuggingExperience with Hadoop and Flink pipelines for batch and stream processing is a strong plusExperience with large-scale distributed data systems or job scheduling frameworksWhat We Offer
Opportunity to work with a highly experienced global engineering teamExposure to enterprise-scale data systems and platform operationsStructured onboarding and mentoring supportLong-term growth potential in devops, platform, or data infrastructure domains