Sr. DB SRE - Specialist

This job offer is not available in your country.

Sr. DB SRE - Specialist

Sony UK Technology CentreBangalore

30+ days ago

Job type

Full-time

Job description

We look for the risk-takers , the collaborators , the inspired and the inspirational . We want the people who are brave enough to work at the cutting edge and create solutions that will enrich and improve the lives of people across the globe. So, if you want to make the world say wow, let's talk.

The conversation starts here. If this role matches your ambitions and skillset, let's get started with your application . Take a look at our other open positions too. Our many opportunities can lead to infinite possibilities .

JOB DESCRIPTION - DevOps Engineer- EEng

About SISC]

Job Title]

Sr. DB-SRE Specialist

Project Details] :

PDCS team in PlayStation Network Services deals with Databases and Technology enablement across all the engineering team

Technology and Sub-technology]

Kubernetes, Docker

AWS, GCP

Python , Shell scripting

CI / CD , Jenkins

Terraform

Monitoring Tools like Datadog , Splunk, Prometheus, Grafana.

Rational and Non-Rational Databases

Base Location] :

Bangalore

Type] :

Hybrid

Qualifications]

Bachelor’s or Master’s degree in Computer science, Information Science, Electronics and Communication.

Job Overview] :

Primary Skills] :

Minimum 6-7 years of DevOps / SRE experience.

3+ years hands-on experience with AWS or GCP, EC2 (GCE), IAM, S3 (GS), Docker, Kubernetes pods, Jenkins, Prometheus, CloudWatch (Stack Driver), Linux, Ansible.

3+ years’ experience in deploying code and infrastructure in AWS or GCP using continuous integration / continuous delivery (CI / CD) tools in production environments.

3+ years of automation using python or / and Golang or / and shell scripting.

4+ prior experience in developing metrics to monitor health of

infrastructure and applications.

3+ years of experience in managing SaaS applications infrastructure with REST based test automation experience using python.

The candidate should have a thorough understanding of networking fundamentals (TCP / IP, UDP, DHCP, DNS, ICMP, AR, routing and switching).

General understanding of distributed systems.

Understanding of data management technologies including relational and non-relational databases.

Good to have Skills] :

Certification on AWS etc is a BIG plus.

Knowledge of build pipeline / infrastructure like Jenkin, GitHub, CICD would be added advantage.

Work in an agile and highly collaborative environment with our globally distributed engineering teams, architecture, product management, and operations.

Maintain excellent written and verbal communications with clients, employees, and management chain, including status reports, project plans, presentations, etc.

Basic understanding of Terraform or CloudFormation or any IaC code is preferred.

Ideally detailed understanding of IP routing, Security and Cloud services such as CGNAT, IPSec, IDP and SDWAN / SDN for different customer use cases.

Responsibilities and Duties] :

Engage, influence, and promote SRE practices with development, operational, and product groups to align technology service / solution delivery.

Drive quality accountability within the organization with well-defined processes, metrics, and goals.

Manage availability, latency, scalability, and efficiency of Shared Services development by instilling engineering reliability into our development life cycle with a focus on fault-tolerant approaches.

Must be able to define and report "progress" on strategic initiates and project-level tasks to all stakeholders including senior executives and clients and use practical communication

approaches with each constituency.

Implement metrics-driven processes to ensure service quality targets are met.

Manage system availability, health and service levels (SLAs, SLOs) of the large-scale cloud infrastructure, running in AWS and GCP.

Proactively monitor, diagnose, analyze failures, and provide support for software engineers to debug production issues across microservices and distributed platforms. Work with development team in resolving the issues found.

Participate in on-call rotation and resolution of issues in multi-cloud (AWS / GCP) environment.

Monitor metrics and performance of applications and cloud infrastructure.

Manage code releases, i.e., push code and patches on cloud.

Own entire lifecycle of incidents (incident management), including reporting, analyzing, handling incidents, all the way up to its closure and writing RCAs.