Talent.com
Lead – Cloud Operations & Reliability
Lead – Cloud Operations & ReliabilityElevarae • India
No longer accepting applications
Lead – Cloud Operations & Reliability

Lead – Cloud Operations & Reliability

Elevarae • India
2 days ago
Job description

We are seeking a Cloud Operations Lead to support a leading IT R&D organization in Kolkata. This role ensures the stability, performance, and security of cloud-based systems while driving operational excellence through proactive monitoring, incident management, automation, and capacity planning. You will lead cross-functional teams, optimize cloud resources for cost efficiency, and champion automation to reduce manual effort and improve reliability.

Key Responsibilities

Cloud Operations & Reliability

  • Manage day-to-day operations across production, staging, and development cloud environments within an R&D context.
  • Ensure high availability of services through robust monitoring, alerting, and incident response processes.
  • Lead root cause analyses (RCA) and post-mortem reviews to drive continuous improvement.
  • Implement observability practices including logging, tracing, and metrics for proactive issue detection.
  • Oversee patch management and maintenance to ensure systems remain secure and up-to-date.

Automation & Optimization

  • Develop and maintain automation scripts for provisioning, scaling, and monitoring cloud resources.
  • Optimize cloud usage through rightsizing, reserved instances, and cost governance (FinOps).
  • Standardize operational runbooks and playbooks to streamline processes and reduce manual effort.
  • Security & Compliance

  • Enforce security baselines, including IAM, encryption, and network segmentation across cloud services.
  • Collaborate with security teams to implement cloud-native security tools and respond to threats.
  • Ensure compliance with regulatory standards and audits (SOC 2, ISO 27001, GDPR, HIPAA where applicable).
  • Team Leadership & Collaboration

  • Lead, mentor, and develop a team of cloud operations engineers.
  • Promote a culture of SRE / DevOps best practices, automation, and operational reliability.
  • Partner with application, DevOps, and networking teams to support business-critical R&D initiatives.
  • Act as escalation point for critical incidents and operational challenges.
  • Vendor & Stakeholder Management

  • Manage relationships with cloud providers (AWS, Azure, GCP) and monitoring tool vendors.
  • Provide operational metrics and status updates to senior leadership.
  • Collaborate with finance to align cloud cost forecasts and budget planning.
  • Required Qualifications

    Education & Experience

  • Bachelor’s degree in Computer Science, IT, or a related field.
  • 5–8 years of experience in cloud operations, SRE, or IT infrastructure.
  • 2+ years in a leadership role managing operational teams, preferably in an R&D environment.
  • Technical Skills

  • Expertise in at least one major cloud platform (AWS, Azure, GCP).
  • Hands-on experience with monitoring and observability tools (CloudWatch, Datadog, New Relic, Prometheus).
  • Strong knowledge of Infrastructure as Code (Terraform, CloudFormation, ARM templates).
  • Experience with incident management frameworks (ITIL, SRE principles, PagerDuty / On-Call rotations).
  • Familiarity with container orchestration (Kubernetes, ECS, AKS, GKE) and CI / CD pipelines.
  • Understanding of cloud security best practices and compliance frameworks.
  • Soft Skills

  • Proven ability to lead and inspire teams in a fast-paced R&D environment.
  • Strong problem-solving, decision-making, and communication skills.
  • Collaborative mindset to work effectively with technical and business stakeholders.
  • Preferred Qualifications

  • Cloud certifications (AWS SysOps, Azure Administrator, Google Cloud DevOps Engineer, or equivalent).
  • Experience managing multi-cloud environments.
  • Knowledge of FinOps and cost governance frameworks.
  • Familiarity with ITIL processes or formal service management frameworks.
  • Key Success Metrics

  • System Uptime : Meet or exceed availability SLAs (>
  • 99.9%).

  • Incident Response : Reduced MTTR (Mean Time to Resolution) for critical incidents.
  • Cost Efficiency : Optimize resource utilization and achieve measurable cloud cost savings.
  • Automation : Increase automation coverage for operational tasks year over year.
  • Team Performance : Maintain high team engagement and development.
  • Create a job alert for this search

    Lead Cloud • India

    Related jobs
    Operations Lead

    Operations Lead

    Flock AI • India, India
    Flock is building the future of 1 : 1 personalized commerce.We're a venture-backed AI-powered visual commerce partner built specifically for the retail industry. Our platform creates lifelike AI-gener...Show more
    Last updated: 14 days ago • Promoted
    Hybrid Cloud Platform Lead

    Hybrid Cloud Platform Lead

    Tata Communications Transformation Services (TCTS) • Pune, Republic Of India, IN
    Title : Senior Manager - Cloud SME "Redhat Open shift" (Private Cloud).We are seeking a highly skilled.Hybrid Cloud Platform Engineer. Red Hat OpenShift Container Platform (RHOCP).This unique role r...Show more
    Last updated: 14 days ago • Promoted
    Senior Cloud Solutions Manager

    Senior Cloud Solutions Manager

    Tata Communications Transformation Services (TCTS) • Pune, Republic Of India, IN
    Title : Senior Manager - Cloud SME "Redhat Open shift" (Private Cloud).We are seeking a highly skilled.Hybrid Cloud Platform Engineer. Red Hat OpenShift Container Platform (RHOCP).This unique role r...Show more
    Last updated: 14 days ago • Promoted
    Cloud Reliability Engineer

    Cloud Reliability Engineer

    Synechron • Republic Of India, IN
    We have immediate opportunity for.Site Reliability Engineer Devop 5 to 9 years.SRE (Senior Site Reliability Engineer) Devop. We began life in 2001 as a small, self-funded team of technology speciali...Show more
    Last updated: 30+ days ago • Promoted
    Lead Engineer

    Lead Engineer

    Hyqoo • India, India
    Design, deploy, and manage AWS cloud infrastructure, including EC2 instances, S3 buckets, VPCs, RDS databases, and Lambda functions. Assist in the design, implementation, and maintenance of backup, ...Show more
    Last updated: 19 days ago • Promoted
    Senior DevOps & Database Reliability Engineer – 100% Remote

    Senior DevOps & Database Reliability Engineer – 100% Remote

    Hyly.AI • India, India
    Remote
    AI, we’re building the first AI + Data Fabric for the multifamily industry, transforming how clients manage, secure, and scale their marketing and operational data. As the industry moves toward a co...Show more
    Last updated: 15 days ago • Promoted
    Cloud Transformation Lead

    Cloud Transformation Lead

    LTIMindtree • Pune, Republic Of India, IN
    Cloud transformation and migration projects within the cloud practice.The candidate should demonstrate dynamism commitment and proficiency in preparing tracking and managing project schedules Exten...Show more
    Last updated: 4 days ago • Promoted
    Cloud Reliability Engineer

    Cloud Reliability Engineer

    InstaService • Republic Of India, IN
    InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show more
    Last updated: 22 days ago • Promoted
    Cloud Infrastructure Reliability Engineer

    Cloud Infrastructure Reliability Engineer

    Capgemini • Republic Of India, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer - Cloud Operations

    Site Reliability Engineer - Cloud Operations

    Creencia Technologies Pvt Ltd • India
    We are recruiting an experienced Site Reliability Engineer to join our newly established TechOps division within the Technology department. We maintain the systems that keep our products running smo...Show more
    Last updated: 30+ days ago • Promoted
    Cloud Aws Site Reliability Engineer

    Cloud Aws Site Reliability Engineer

    Accelya • Pune, Republic Of India, IN
    Cloud Site Reliability Engineer (SRE).You will work closely with development, DevOps, and operations teams to ensure system uptime, performance, and cost efficiency. Design and maintain highly avail...Show more
    Last updated: 22 days ago • Promoted
    Cloud Operations Lead

    Cloud Operations Lead

    Impetus • Pune, Republic Of India, IN
    Well-rounded technologist who brings a wealth of real-world experience with : .Building things using different tools and technologies and demonstrating the ability to align with evolving technical tr...Show more
    Last updated: 7 hours ago • Promoted • New!
    Cloud Solutions Integration Lead

    Cloud Solutions Integration Lead

    Oracle • Republic Of India, IN
    An experienced consulting professional who has an understanding of solutions, industry best practices, multiple business processes or technology designs within a product / technology family.Operates ...Show more
    Last updated: 9 days ago • Promoted
    Cloud Reliability Engineer

    Cloud Reliability Engineer

    Grootan Technologies • Chennai, Republic Of India, IN
    Site Reliability Engineer (SRE).In this role, you will be responsible for building and maintaining reliable, scalable, and secure infrastructure to support our applications.You will leverage your e...Show more
    Last updated: 14 days ago • Promoted
    Senior Cloud Reliability Engineer

    Senior Cloud Reliability Engineer

    Datum Technologies Group • Chennai, Republic Of India, IN
    Site Reliability Engineer (SRE).Duration : Contract to Hire (On the Payroll of Datum Technology Group).Location : Chennai || Mumbai || Gurugram. Interview Process : Virtual (2 Rounds) + 1 Technical scr...Show more
    Last updated: 7 hours ago • Promoted • New!
    Infrastructure Reliability Lead

    Infrastructure Reliability Lead

    Media.net • Republic Of India, IN
    Net is a leading, global ad tech company that focuses on creating the most transparent and efficient path for advertising budgets to become publisher revenue. Our proprietary contextual technology i...Show more
    Last updated: 12 days ago • Promoted
    Principal Cloud Reliability Engineer

    Principal Cloud Reliability Engineer

    Datum Technologies Group • Chennai, Republic Of India, IN
    Job Title : Lead Site Reliability Engineer (SRE).Duration : Contract to Hire (On the Payroll of Datum Technology Group).Location : Chennai || Mumbai || Gurugram. Interview Process : Virtual (2 Rounds) +...Show more
    Last updated: 7 hours ago • Promoted • New!
    Cloud Site Reliability Engineer

    Cloud Site Reliability Engineer

    Accelya • Pune, Republic Of India, IN
    Cloud Site Reliability Engineer (SRE).You will work closely with development, DevOps, and operations teams to ensure system uptime, performance, and cost efficiency. Design and maintain highly avail...Show more
    Last updated: 22 days ago • Promoted