About us
ConglomerateIT is a certified and a pioneer in providing premium end-to-end Global Workforce Solutions and IT Services to diverse clients across various domains. Visit us at http : / / www.conglomerateit.com
Our mission is to establish global cross culture human connections that further the careers of our employees and strengthen the businesses of our clients. We are driven to use the power of global network to connect business with the right people without bias. We provide Global Workforce Solutions with affability.
About job
Job Title : Cloud Engineering Ops Lead (AWS + Application Support).
Location : Hyderabad (onsite)
Experience Level : 10+ years
Role Overview
We are seeking a Cloud Engineering Lead to drive reliability, performance, and operational excellence across complex AWS environments and production applications. This hybrid role combines the disciplines of Site Reliability Engineering, Cloud Operations, Application Support, and DevOps to ensure seamless, secure, and cost-efficient delivery of business-critical services.
The ideal candidate will bring deep AWS expertise, automation proficiency, and a strong focus on observability, incident management, and continuous improvement.
Key Responsibilities
1. AWS Cloud & Infrastructure Operations
Design, operate, and optimize AWS environments — including EC2, EKS, RDS, ALB / CloudFront, IAM / OIDC, VPC / TGW / SGs.
Implement Infrastructure as Code (IaC) using Terraform and configuration management via Ansible.
Maintain system hygiene, patching, and OS-level administration across cloud workloads.
Drive cost optimization through tagging, right-sizing, and lifecycle management.
2. Site Reliability Engineering (SRE)
Establish and maintain SLIs, SLOs, and error budgets to ensure service reliability.
Lead incident management, post-mortems, and drive systemic improvements.
Develop and maintain automated runbooks and resiliency playbooks for predictable recovery.
Measure and continuously improve MTTR and change failure rates.
3. Application & Production Support
Own production readiness through deployment validation, rollback planning, and performance baselines.
Support application deployments and lead post-deployment smoke testing and validation.
Troubleshoot production issues end-to-end — across infrastructure, middleware, and application layers.
Partner with development teams to ensure smooth CI / CD integrations and controlled releases.
4. Observability & Monitoring
Build and maintain comprehensive observability using CloudWatch, Prometheus, Grafana, Datadog, or equivalent.
Ensure actionable alerts, clear dashboards, and proper alert routing to responders.
Improve logging, tracing, and metrics coverage to drive proactive issue detection.
5. Backup, DR & Security
Define and validate backup, retention, and restore policies with measurable RPO / RTO objectives.
Implement cross-region replication and disaster recovery strategies.
Maintain strong security posture via IAM policies, OIDC integrations, and role-based access controls.
6. DevOps Enablement
Collaborate with DevOps teams to improve pipeline efficiency, deployment reliability, and release governance.
Automate operational workflows and reduce manual toil using Python, Bash, and IaC tools.
Integrate reliability metrics into CI / CD pipelines to ensure operational readiness before release.
7. Leadership & Mentoring
Lead Sev-1 / 2 incident bridges with structured communication and post-resolution follow-ups.
Mentor engineers in SRE best practices, automation, and cloud operations maturity.
Foster a culture of reliability, transparency, and continuous improvement across teams.
Success Metrics
95% tagging compliance.
Required Skills & Experience
Application Engineering • Nagpur, IN