Experience : 14+ Yrs
Job Location : Bangalore / Hyderabad (Currently remote)
Notice Period : Immediate to 15 Days
SRE : Observability, Reliability, Monitoring, Scalability, Change Management
Cloud Providers : AWS, GCP or Azure
Tools : Datadog + Any one of Dynatrace, Splunk, Prometheus, Grafana
Automation : Terraform, Ansible
Programming Language : Python, Golang
Docker / Kubernetes experience is a plus
Key Responsibilities :
- CI / CD Pipeline Setup : Configure and maintain CI / CD pipelines using Jenkins and Google Cloud Build. Integrate these pipelines with GitHub for source code management and ServiceNow for change management orchestration
- Core Logging & Monitoring : Implement centralized logging and metrics collection using GCP Cloud Logging, Open Telemetry, Prometheus, and Grafana. Ensure deep, real-time visibility into the health of the data platform
- Pipeline Development : Develop and maintain data pipelines using Apache Flink and the Bootstrap ETL Library. Implement data lineage and audit logging to ensure compliance with SOX requirements 3
- CI / CD Automated Testing : Establish and maintain automated testing frameworks for data pipelines. Integrate these tests into the CI / CD pipeline to ensure high code quality and pipeline correctness
- Performance Monitoring : Configure and maintain monitoring dashboards and alerting mechanisms using Prometheus and Grafana. Conduct baseline performance testing and ensure the platform is operationally ready for production use.
Required Skills :
DevOps Tools : Jenkins, Google Cloud Build, Docker, Kubernetes, GCP, Open Telemetry, Prometheus, GrafanaProgramming Languages : Python, with experience in developing ETL pipelines and using testing frameworks like pytest.Monitoring and Observability : Experience in setting up and maintaining monitoring and observability solutions using Prometheus, Grafana, and Open Telemetry