This job offer is not available in your country.

Cloud Operations L2 Support Engineer

Rakuten SymphonyIndia

18 days ago

Job description

Job Summary :

We are seeking a highly skilled and experienced Cloud Engineer with a strong Site Reliability Engineering (SRE) mindset to join our team. This role will be critical in ensuring the availability, reliability, and performance of our platform services and applications, particularly those supporting our

Radio Access Network (RAN)

and

Core Network

functions deployed on cloud infrastructure. The ideal candidate will possess deep expertise in Kubernetes, cloud operations, and a passion for optimizing complex distributed systems. You will be instrumental in running our production environment, responding to critical incidents, and driving continuous improvement in system reliability and efficiency across both

RAN

and

Core

cloud deployments.

Key Responsibilities :

Platform Reliability & Availability (SRE Focus) :

Run the production environment by proactively monitoring availability and taking a holistic view of system health for our cloud-based

RAN and Core Network

platforms.

Improve the reliability and quality of the system through automation, process refinement, and best practices for both

RAN and Core

cloud components.

Measure and optimize system performance to ensure efficient resource utilization and optimal user experience for network services.

Ensure services are available, the underlying infrastructure is properly functioning and monitor critical applications and related services to guarantee system availability for

RAN and Core

functions.

Cloud Operations & Kubernetes Management :

Design, deploy, and manage Kubernetes clusters and related cloud infrastructure for both

RAN and Core Network

application deployments.

Implement and maintain containerization strategies and orchestration best practices for telecom workloads.

Manage and troubleshoot Robin storage solutions within the Kubernetes environment, supporting the unique storage needs of

RAN and Core

applications.

Implement and manage CI / CD pipelines for cloud-native

RAN and Core

applications.

Responsible for cloud resource provisioning, scaling, and cost optimization for all deployed network functions.

Incident & Problem Management :

Collaborate for high-priority incident tickets (e.g., MIC Reported Incident, Serious / Medium / Small Network Incidents, RIUD Faults), ensuring rapid system recovery for both

RAN and Core

impacted services.

Be on standby to interface with developers when issues arise and get escalated, providing immediate technical insights and support for cloud-native network functions.

Lead Problem Management efforts, including Root Cause Analysis (RCA), for complex incidents affecting

RAN and Core

cloud deployments.

Identify bugs and work with development teams to prioritize and implement fixes for cloud-native network elements.

Monitoring & Alerting :

Implement and maintain robust monitoring, logging, and alerting solutions for cloud infrastructure and applications supporting

RAN and Core

services.

Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for critical

RAN and Core

services running in the cloud.

Automation & Tooling :

Develop and implement automation scripts and tools to streamline operational tasks, deployments, and incident response for cloud-native

RAN and Core

components.

Evaluate and integrate new tools and technologies to enhance operational efficiency.

Collaboration & Knowledge Sharing :

Support for Governance Reports, providing technical data and insights on cloud platform performance for

RAN and Core .

Handle customer queries with technical expertise and provide timely resolutions related to cloud-deployed network services.

Provide training and mentorship to junior team members on cloud technologies and SRE practices, specifically in the context of telecom networks.

Work closely with development, network, and security teams to ensure seamless service delivery across the entire network architecture.

Technical Requirements (Most Visible) :

Deep expertise in Kubernetes :

Cluster deployment, management, and troubleshooting for high-performance telecom workloads.

Container orchestration, Pod lifecycle, Deployments, Services, Ingress.

Helm charts, Kustomize.

Advanced networking within Kubernetes (CNI, CoreDNS, service mesh concepts).

Security best practices in Kubernetes, especially for critical network functions.

Proficiency in Cloud Platforms :

Experience with at least one major cloud provider (e.g., AWS, Azure, GCP) with focus on enterprise-grade infrastructure.

Containerization Technologies :

Docker, container.

Robin Storage :

Hands-on experience with Robin.io or similar distributed persistent storage solutions for Kubernetes, particularly for stateful

RAN and Core

applications.

Infrastructure as Code (IaC) :

Terraform, Ansible, or similar tools for automating cloud and Kubernetes deployments.

Scripting & Automation :

Strong proficiency in Python, Go, Bash, or similar for developing automation and tooling.

Monitoring & Logging Tools :

Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Datadog, or similar, with experience in large-scale data ingestion and analysis.

CI / CD Tools :

Jenkins, GitLab CI / CD, Argo CD, or similar, for continuous deployment of network functions.

Operating Systems :

Linux (e.g., CentOS, Ubuntu, RHEL) expert-level knowledge.

Networking Fundamentals :

Deep understanding of TCP / IP, DNS, Load Balancing, Firewalls, VPNs, and advanced network concepts relevant to telecom (e.g., SRv6, Segment Routing, GTP-U / C).

Telecommunications Network Knowledge :

Strong understanding of Radio Access Network (RAN) architecture, components, and interfaces (e.g., O-RAN, vRAN concepts).

Strong understanding of Core Network (EPC / 5GC) architecture, functions (e.g., AMF, SMF, UPF, MME, SGW, PGW), and protocols.

Familiarity with network function virtualization (NFV) and software-defined networking (SDN) principles.

Qualifications : Education :

Bachelor’s degree in computer science, Engineering, or a related field.

Experience :

Minimum of 5-6 years of experience in a Cloud Engineering, DevOps, or SRE role, with a significant focus on Kubernetes and cloud operations, ideally within a telecommunications or high-availability environment.

Problem-Solving :

Exceptional analytical and problem-solving skills, with a methodical approach to debugging complex distributed systems.

Communication :

Excellent verbal and written communication skills, capable of effectively collaborating with technical and non-technical stakeholders.

Proactive Mindset :

Ability to anticipate issues, identify risks, and propose preventative solutions.

Incident Response :

Proven experience in responding to and resolving critical production incidents in a fast-paced environment.

Continuous Improvement :

A strong desire to learn, adapt, and drive continuous improvement in processes and systems.

Create a job alert for this search

Cloud Support Engineer • India