Talent.com
Cloud Infrastructure Reliability Engineer
Cloud Infrastructure Reliability EngineerSails Software Inc • Vizag, Andhra Pradesh, India
Cloud Infrastructure Reliability Engineer

Cloud Infrastructure Reliability Engineer

Sails Software Inc • Vizag, Andhra Pradesh, India
16 hours ago
Job description

SRE- AWS

Job Summary

We are looking for an experienced and driven Senior Site Reliability Engineer (SRE) to architect, implement, and maintain robust cloud infrastructure. This role demands a deep understanding of AWS, Kubernetes, ECS, and the ability to build scalable, secure, and highly available infrastructure from scratch. The ideal candidate will be a strong advocate for DevOps principles, automation, and reliability, and will possess the skills to support and optimize complex microservices-based architectures.

Key Responsibilities

  • Infrastructure Design & Implementation
  • Design and build highly scalable, fault-tolerant, and secure cloud infrastructure using AWS, Kubernetes, and ECS.
  • Lead efforts in infrastructure as code (IaC) using tools like Terraform or CloudFormation.
  • Develop and enforce best practices for infrastructure provisioning, security, and cost optimization.

System Reliability & Performance

  • Ensure availability, performance, scalability, and security of production systems.
  • Implement observability strategies including monitoring, logging, and alerting using tools such as Prometheus, Grafana, ELK, or Datadog.
  • Analyse system performance metrics and proactively identify potential issues and bottlenecks.
  • DevOps & Automation

  • Build and maintain CI / CD pipelines to streamline code deployments across environments.
  • Drive automation in infrastructure provisioning, configuration management, and operational tasks.
  • Ensure repeatable and reliable deployments using containers and orchestration tools like Kubernetes and ECS.
  • Service Management

  • Own the SRE lifecycle, including incident management, postmortems, root cause analysis, and runbook creation.
  • Collaborate closely with development and QA teams to ensure seamless microservices integration, deployment, and lifecycle management.
  • Maintain service-level objectives (SLOs), service-level agreements (SLAs), and error budgets.
  • Security & Compliance

  • Implement and enforce cloud security best practices for networking, identity and access management, and data protection.
  • Support audits, compliance assessments, and vulnerability remediation.
  • Monitor for security anomalies and work with security teams to respond to threats.
  • Technical Skills

  • 6+ years of hands-on experience in Site Reliability Engineering, DevOps, or Cloud Engineering.
  • Expertise in AWS services such as EC2, S3, RDS, IAM, VPC, Lambda, CloudWatch, etc.
  • Strong knowledge of Kubernetes and container orchestration best practices.
  • Experience managing services on Amazon ECS (Fargate or EC2).
  • Proficient in infrastructure-as-code tools like Terraform, CloudFormation, or Pulumi.
  • Skilled in scripting languages such as Python, Bash, or Go.
  • Solid grasp of networking, load balancing, DNS, and firewall rules in cloud environments.
  • Deep understanding of microservices architectures, API gateways, and service meshes.
  • Soft Skills

  • Proven leadership and cross-functional collaboration skills.
  • Strong problem-solving and incident-resolution mindset.
  • Clear communication, documentation, and stakeholder reporting abilities.
  • Passion for continuous improvement and automation.
  • Preferred Qualifications

  • AWS certifications such as AWS Certified DevOps Engineer, Solutions Architect – Professional, or equivalent.
  • Familiarity with service meshes like Istio or Linkerd.
  • Experience with serverless architectures and event-driven systems.
  • Knowledge of regulatory compliance (SOC2, ISO 27001, GDPR) in cloud environments.
  • Skills – AWS Cloud, CICD, EC2, Kubernete, Grafana, Datadog, Python

    Key Responsibilities :

    Cloud Platform : GCP

  • Infrastructure Automation : Design, implement, and manage infrastructure as code using Terraform to provision and manage GCP resources.
  • Container Orchestration : Deploy and manage Kubernetes clusters, ensuring efficient operation of containerized applications.
  • Continuous Integration / Continuous Deployment (CI / CD) : Develop and maintain CI / CD pipelines using Jenkins to automate application build, test, and deployment processes.
  • Containerization : Collaborate with development teams to containerize applications using Docker and manage deployments with Helm Charts.
  • Code Quality Assurance : Integrate and manage SonarQube to ensure code quality and security standards are met.
  • Monitoring and Logging : Implement and manage monitoring solutions using Datadog to ensure system health, performance, and security.
  • Collaboration : Work closely with cross-functional teams, including developers, QA, and operations, to streamline processes and improve productivity.
  • Requirements :

  • Experience : 5+ years in DevOps or cloud engineering roles, with at least 3 years of relevant experience in the specified technologies.
  • Technical Proficiency :
  • o Hands-on experience with GCP services and architecture.

    o Proficiency in Terraform for infrastructure as code implementations.

    o Strong understanding and experience with Kubernetes and Docker.

    o Experience in setting up and managing CI / CD pipelines using Jenkins.

    o Familiarity with Helm Charts for application deployment.

    o Experience with SonarQube for code quality analysis.

    o Proficiency in monitoring and logging tools, particularly Datadog.

  • Scripting Skills : Proficiency in scripting languages such as Bash or Python is an added advantage.
  • o Strong problem-solving abilities and analytical thinking.

    o Excellent communication skills, both verbal and written.

    o Ability to work collaboratively in a team environment.

    o Strong organizational and time management skills.

    Skills – Terraform, Kubernetes, Cluster, Docker, GCP, Sonar

    Technical Skills

  • 6+ years of hands-on experience in Site Reliability Engineering, DevOps, or Cloud Engineering.
  • Expertise in AWS services such as EC2, S3, RDS, IAM, VPC, Lambda, CloudWatch, etc.
  • Strong knowledge of Kubernetes and container orchestration best practices.
  • Experience managing services on Amazon ECS (Fargate or EC2).
  • Proficient in infrastructure-as-code tools like Terraform, CloudFormation, or Pulumi.
  • Skilled in scripting languages such as Python, Bash, or Go.
  • Solid grasp of networking, load balancing, DNS, and firewall rules in cloud environments.
  • Deep understanding of microservices architectures, API gateways, and service meshes.
  • Soft Skills

  • Proven leadership and cross-functional collaboration skills.
  • Strong problem-solving and incident-resolution mindset.
  • Clear communication, documentation, and stakeholder reporting abilities.
  • Passion for continuous improvement and automation.
  • Preferred Qualifications

  • AWS certifications such as AWS Certified DevOps Engineer, Solutions Architect – Professional, or equivalent.
  • Familiarity with service meshes like Istio or Linkerd.
  • Experience with serverless architectures and event-driven systems.
  • Knowledge of regulatory compliance (SOC2, ISO 27001, GDPR) in cloud environments.
  • Skills – AWS Cloud, CICD, EC2, Kubernete, Grafana, Datadog, Python

    Create a job alert for this search

    Engineer Cloud Infrastructure • Vizag, Andhra Pradesh, India

    Related jobs
    Senior Cloud Infrastructure Engineer

    Senior Cloud Infrastructure Engineer

    1551 Technology Solutions LLC • Visakhapatnam, IN
    We welcome applications from qualified candidates located anywhere in the Middle East.To design, operate, and optimize the organization’s multi-cloud infrastructure with AWS as the primary platform...Show more
    Last updated: 8 hours ago • Promoted • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Insight Global • Visakhapatnam, IN
    Contract with Insight Global Client.Join our Site Reliability Engineering (SRE) team as a Fullstack Developer, focused on building and maintaining highly reliable, automated, and scalable systems.Y...Show more
    Last updated: 30+ days ago • Promoted
    Terraform and Ansible Platform Engineer

    Terraform and Ansible Platform Engineer

    Capgemini • Visakhapatnam, IN
    Support infrastructure automation using configuration management tools (Chef, Puppet) and Infrastructure-as-Code (IaC) tools (Terraform). Write automation scripts and manage deployment pipelines.Int...Show more
    Last updated: 16 days ago • Promoted
    Data Engineer - Fully Remote (Global Data Platform & Analytics Projects)

    Data Engineer - Fully Remote (Global Data Platform & Analytics Projects)

    SkillsCapital • Visakhapatnam, IN
    Remote
    These fully remote, long-term freelance roles are ideal for engineers who can build scalable data pipelines, work with modern cloud-native data stacks, and support large-scale enterprise data initi...Show more
    Last updated: 5 days ago • Promoted
    Datacenter Network Engineer

    Datacenter Network Engineer

    The AES Group • Visakhapatnam, IN
    Senior Datacenter Network Engineer (Level 4).High-level technical expert (not entry or mid-level).Chennai preferred; remote possible for other locations. Rotational, with ability to work EST hours.C...Show more
    Last updated: 18 days ago • Promoted
    Cloud Infrastructure Engineer

    Cloud Infrastructure Engineer

    ALIANDO • Visakhapatnam, Republic Of India, IN
    Senior Cloud Infrastructure Engineer.ALIANDO is an award-winning Azure Expert Managed Service solutions company focused on selling and deploying Microsoft technologies for U.For over 20 years, Micr...Show more
    Last updated: 5 hours ago • Promoted • New!
    AWS Engineer

    AWS Engineer

    Spryc Systems • Visakhapatnam, IN
    We are seeking an experienced AWS Engineer to design, implement, and maintain AWS infrastructure and services in a managed service environment. The ideal candidate will possess deep expertise in AWS...Show more
    Last updated: 30+ days ago • Promoted
    Lead Engineer

    Lead Engineer

    Hyqoo • Visakhapatnam, IN
    Design, deploy, and manage AWS cloud infrastructure, including EC2 instances, S3 buckets, VPCs, RDS databases, and Lambda functions. Assist in the design, implementation, and maintenance of backup, ...Show more
    Last updated: 25 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Capgemini • Visakhapatnam, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show more
    Last updated: 30+ days ago • Promoted
    DevOps / Platform Engineer

    DevOps / Platform Engineer

    Tritonium • Visakhapatnam, IN
    Tritonium is an AI-powered SaaS platform that transforms app store reviews into actionable insights for mobile product teams. Our infrastructure processes millions of reviews, orchestrates AI analys...Show more
    Last updated: 1 day ago • Promoted
    GCP Cloud Developer / Engineer

    GCP Cloud Developer / Engineer

    Ampstek • Visakhapatnam, IN
    Title : GCP Cloud Developer / Engineer.The GCP Cloud Developer will support enterprise application migrations from on-premise environments to Google Cloud Platform (GCP). This role involves reviewing a...Show more
    Last updated: 8 hours ago • Promoted • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Sails Software Inc • Visakhapatnam, Andhra Pradesh, India
    We are looking for an experienced and driven Senior Site Reliability Engineer (SRE) to architect, implement, and maintain robust cloud infrastructure. This role demands a deep understanding of AWS, ...Show more
    Last updated: 30+ days ago • Promoted
    Cloud Engineer - Full Remote (Global Cloud-Native Projects)

    Cloud Engineer - Full Remote (Global Cloud-Native Projects)

    SkillsCapital • Visakhapatnam, IN
    Remote
    These long-term, fully remote freelance roles are ideal for engineers with strong hands-on experience in AWS, Azure, or Google Cloud who want to build scalable, secure, high-performance cloud solut...Show more
    Last updated: 5 days ago • Promoted
    Sr Systems Engineer Linux – AI Infrastructure

    Sr Systems Engineer Linux – AI Infrastructure

    DC Tech Consulting • Visakhapatnam, IN
    Position : Senior Linux Administrator – AI / ML Infrastructure.We are seeking a highly skilled Senior Linux Administrator to join our team, focusing on the implementation and management of on-premises...Show more
    Last updated: 30+ days ago • Promoted
    Senior IT Cloud Security Engineer

    Senior IT Cloud Security Engineer

    1551 Technology Solutions LLC • Visakhapatnam, IN
    To design, implement, and manage the organization’s end-to-end security posture across AWS and Azure cloud environments, endpoints, data, communications, and systems. The role ensures Zero Trust pri...Show more
    Last updated: 8 hours ago • Promoted • New!
    L2 Cloud Support Engineer with Openstack, Kubernetes, Linux, Python, Ansible / Terraform - 100% REMOTE - Contract Role (US Work Hours)

    L2 Cloud Support Engineer with Openstack, Kubernetes, Linux, Python, Ansible / Terraform - 100% REMOTE - Contract Role (US Work Hours)

    iShift • Visakhapatnam, IN
    Remote
    Job Title : Openstack Support Engineer with OpenStack, Linux, Kubernetes, Terraform.Location : 100% REMOTE - Offshore India. Employment Type : Long Term Contract.OpenStack, Linux systems, Kubernetes, T...Show more
    Last updated: 11 days ago • Promoted
    Network Engineer(Kubernetes)_11+years_Remote

    Network Engineer(Kubernetes)_11+years_Remote

    Tekgence Inc • Visakhapatnam, IN
    Remote
    Contract Duration : 1+ Year Contract.Experience deploying Kubernetes on-prem.Experience with VMware (Vsphere / VSAN / NSX-T) and migrations / replatforming from VMWare into Redhat / Windriver.For Ope...Show more
    Last updated: 8 hours ago • Promoted • New!
    Site Reliability Engineer (SRE) – Infrastructure & Automation

    Site Reliability Engineer (SRE) – Infrastructure & Automation

    InstaService • Visakhapatnam, IN
    InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show more
    Last updated: 28 days ago • Promoted