We are seeking a Cloud Operations Lead to support a leading IT R&D organization in Kolkata. This role ensures the stability, performance, and security of cloud-based systems while driving operational excellence through proactive monitoring, incident management, automation, and capacity planning. You will lead cross-functional teams, optimize cloud resources for cost efficiency, and champion automation to reduce manual effort and improve reliability.
Key Responsibilities
Cloud Operations & Reliability
Manage day-to-day operations across production, staging, and development cloud environments within an R&D context.
Ensure high availability of services through robust monitoring, alerting, and incident response processes.
Lead root cause analyses (RCA) and post-mortem reviews to drive continuous improvement.
Implement observability practices including logging, tracing, and metrics for proactive issue detection.
Oversee patch management and maintenance to ensure systems remain secure and up-to-date.
Automation & Optimization
Develop and maintain automation scripts for provisioning, scaling, and monitoring cloud resources.
Optimize cloud usage through rightsizing, reserved instances, and cost governance (FinOps).
Standardize operational runbooks and playbooks to streamline processes and reduce manual effort.
Security & Compliance
Enforce security baselines, including IAM, encryption, and network segmentation across cloud services.
Collaborate with security teams to implement cloud-native security tools and respond to threats.
Ensure compliance with regulatory standards and audits (SOC 2, ISO 27001, GDPR, HIPAA where applicable).
Team Leadership & Collaboration
Lead, mentor, and develop a team of cloud operations engineers.
Promote a culture of SRE / DevOps best practices, automation, and operational reliability.
Partner with application, DevOps, and networking teams to support business-critical R&D initiatives.
Act as escalation point for critical incidents and operational challenges.
Vendor & Stakeholder Management
Manage relationships with cloud providers (AWS, Azure, GCP) and monitoring tool vendors.
Provide operational metrics and status updates to senior leadership.
Collaborate with finance to align cloud cost forecasts and budget planning.
Required Qualifications
Education & Experience
Bachelor’s degree in Computer Science, IT, or a related field.
5–8 years of experience in cloud operations, SRE, or IT infrastructure.
2+ years in a leadership role managing operational teams, preferably in an R&D environment.
Technical Skills
Expertise in at least one major cloud platform (AWS, Azure, GCP).
Hands-on experience with monitoring and observability tools (CloudWatch, Datadog, New Relic, Prometheus).
Strong knowledge of Infrastructure as Code (Terraform, CloudFormation, ARM templates).
Experience with incident management frameworks (ITIL, SRE principles, PagerDuty / On-Call rotations).
Familiarity with container orchestration (Kubernetes, ECS, AKS, GKE) and CI / CD pipelines.
Understanding of cloud security best practices and compliance frameworks.
Soft Skills
Proven ability to lead and inspire teams in a fast-paced R&D environment.
Strong problem-solving, decision-making, and communication skills.
Collaborative mindset to work effectively with technical and business stakeholders.
Preferred Qualifications
Cloud certifications (AWS SysOps, Azure Administrator, Google Cloud DevOps Engineer, or equivalent).
Experience managing multi-cloud environments.
Knowledge of FinOps and cost governance frameworks.
Familiarity with ITIL processes or formal service management frameworks.
Key Success Metrics
System Uptime : Meet or exceed availability SLAs (>
99.9%).
Incident Response : Reduced MTTR (Mean Time to Resolution) for critical incidents.
Cost Efficiency : Optimize resource utilization and achieve measurable cloud cost savings.
Automation : Increase automation coverage for operational tasks year over year.
Team Performance : Maintain high team engagement and development.
Lead Cloud • Panchkula, Haryana, India