Talent.com
This job offer is not available in your country.
Dev Ops Engineering III-SUPPORT SERVICES-Applications-CTB

Dev Ops Engineering III-SUPPORT SERVICES-Applications-CTB

ConfidentialBengaluru / Bangalore, India
9 days ago
Job description

Title : Observability Platforms and SRE Engg.

The Company : World of Kotak product suite encompasses a powerful suite of cross banking assets, all-in-one stop banking services, securities, and investment banking insights across a wide spectrum of the major financial and banking markets.

The Team : You will be working with a team of highly seasoned set of Observability Platform and Site Reliability Engineers part of the Run-The-Bank initiative to deliver Engineering and Technology Operations Excellence for Kotak Banking Product Suite and associated delivery platform.

The Observability Platforms and SRE team is a group of experts developing, maintaining, scaling Observability Platform solutions, driving engineering and automation within the Banking Solutions platform and operation in onPrem and the cloud.

We are looking for a highly motivated individual to take on a role of a Observability Platform Engg. and SRE to help implement our platforms using Open-Source and Enterprise solutions, through IaC, automated operations and configuration management,bringing together observability, and engineering for architecture and operational excellence.

The role will have to develop, test, validate software and hardware systems that enable our Observability Platform. Coordinate the processes and tools to support site stability, resilience and performance of the banking system that is capable of supporting multiple business requirements across an array of technologies. The Engineer will work across Architecture, development, Infrastructure and vendor teams to deliver and support the Observability Platform and SRE guided processes and tools supporting the banking systems.

Impactfulness : The team has an opportunity to advocate and participate in building engineering services that are resilient, optimally monitored, alerted and capability to self-heal thorough reliability engineering practices using software and runbook automation tools to deliver world class banking and related content globally.

  • . Observability Platform engineers will implement site-wide Observability solutions for metrics, logs, traces, alerting and monitoring to be used by development and business teams across the org to monitor their systems and applications. Site Reliability Engineering (SREs) is responsible for keeping all user-facing services, user journey and other Kotak production systems running smoothly.
  • . Said engineers should be a match of software engineers and pragmatic system engineers that embed operational discipline with engineering principles, and mature automation and documentation to our operating environments and associated Kotak Code Base.
  • . Said engineers would have expertise in systems (networking, operating systems, storage, etc), while implementing best practice guidelines for stability, availability, reliability and scalability while keeping the compute and cost factor optimal.
  • . Kotak Platforms are critical applications that have unique used cases and challenges associated that would need to be optimized over time with re-engineering and revised tools and practices.

What's in it for you : / Role : An Observability Platform Engg and SRE is ultimately accountable for building, maintaining and scaling an Observability Platform that can be used by various systems across the Org. They are also accountable for system reliability, resiliency, scalability and reducing time to market by striving to improve end to end service and reduce technical debt. We seek leaders who are passionate about observability and system reliability to influence and drive the strategic platform mission and maturity.

Your mission will be to ensure our services are fast, highly available, and run efficiently through scaling optimally during peak business traffic and load. Your focus would be to solve production problems across the stack going up to the edge. Gain critical domain knowledge to effectively troubleshoot symptoms that impair health leading to performance degradation or service outages. The position requires the flexibility to take a holistic approach to troubleshooting and the ability to deep dive into core technical details working with various development, infra and vendor teams. Build automation tools and processes for system health and acceptance tests to validate changes in lower environment leading to production changes. The Systems Reliability Engineer will ensure the system is well instrumented and highly fault tolerant with proper metrics to report upon.

Key Leadership Responsibilities :

  • . Influence and drive engagement on Observability and SRE practices with development, engineering and product groups to align solution delivery with technology services.
  • . Build quality engineering practices around automation through well-defined processes and monitoring metrics that exhibit process quality.
  • . Conduct transparent and effective blameless post mortems and ensuring Post Incident Reviews have clear Root Cause and Actions with Problem tickets and closures.
  • . Deliver on availability, latency, performance, scalability of Kotak applications by evangelizing engineering principles into development lifecycle with a template on fault tolerant at each level.
  • . Drive non functional requirement review including capacity planning, cost analysis and instrumentation integration to provide complete delivery cycle.
  • . Define Observability and SRE initiatives, tasks and report to all stakeholders, business and build a onboarding template for new and future applications.
  • . Implement metrics driven approach towards service quality targets.
  • Basic Qualification s : 7+ years system & solutions engineering, software development, or technology operations background with 3+ years work experience working as a Systems Engineer, DevOps and / or SRE Roles.

  • . Experience automating infrastructure, testing, and deployments using tools like Terraform, CFT with Jenkins, Ansible, Chef & other industry recognized tools to deliver Infrastructure as Code.
  • . Relevant work experience or familiar with languages / web technologies (Python, Java,C, C++, ASP.NET, JavaScript, Go etc)
  • . Experience with 2 or more scripting languages such as python, perl, unix shell, powershell, groovy, etc...
  • . Experience with AWS technologies : VPC, EC2, EKS, ELB, RDS, Lambda, SES, SNS, Containers, etc.
  • . Experience with any identity management systems such as (SAML / OAuth), MFA, etc.
  • . CI / CD delivery using code and configuration management automation tools such as GitHub, VSTS, Ansible, DSC, Puppet, Ambari, Chef, Salt, Jenkins, Maven, etc.
  • . Delivery using modern methodologies especially SAFE Agile, Lean, etc.
  • . Experience with networking protocols, CDN, App acceleration, Load Balancers, DNS, VPN, PaaS, IaaS, etc.
  • . Experience with troubleshooting networking protocols such as TCP / IP, HTTPS / TLS / Websockets, Multicast and Broadcast messaging.
  • . Experience with cloud infrastructure, storage, platforms, data and with containers (Kubernetes, Container, Docker, virtualization).
  • . Experience with monitoring and observability such as with Grafana, Prometheus, Datadog, Splunk, AppDynamics, New Relic, and Nagios, etc.
  • Preferred Qualifications :

  • . Bachelor's / Master's Degree in Computer Science, Information Systems, or equivalent
  • . AWS Certified Solution Architect - Professional / Associate
  • . Good Leadership skills capable of leading a team.
  • . Good communication skills and a sense of ownership and drive.
  • . Have a software-centric mindset and can understand the full software stack - and beyond.
  • . Embrace automation over manual effort, debugging complex problems and view problems as an opportunity to improve.
  • . Experience designing, building, and operating large-scale production systems
  • . Experience working in enterprise-scale internal or customer-centric projects.
  • . Experience working closely with development & engineering teams.
  • . Good understanding of software development lifecycle (SDLC) and Software Testing in an Agile / Scrum framework.
  • . Strong analytical thinking, problem solving, oral and written communication skills.
  • . Experience working with multiple stakeholders and vendors at various levels.
  • . Understanding of SQL and databases, should be comfortable in writing SQL queries
  • . Hands on doing operational automation using any automation framework.
  • . Good knowledge of working with SOAP, REST services and SOA architecture.
  • . Knowledge of testing in continuous integration / DevOps models is a plus.
  • . Understanding of Cloud technologies like AWS / Azure and micro-services, containers.
  • . Experience in DevOps, Big Data Testing, IOT, Cloud will be added advantage.
  • . Experience automating infrastructure, testing, and deployments using Terraform, CFT with Ansible, Rundeck, Autosys, Jenkins to deliver Infrastructure as Code.
  • . Experience working with the Rundeck tool (Design, Setup, Deployment, Automation & Integration)
  • . Terraform / Kubernetes / Ansible expertise a plus
  • Responsibilities :

  • . Experience with maintaining SLA 99.99% of the Banking Platform and Applications.
  • . Experience in troubleshooting and resolving incidents and using problem management to bring about service improvement using automation to drive resiliency and stability.
  • . Experience in service restoration through standard automized tools and engineering processes to reduce our downtime and improve our SLA / SLI / SLO metrics.
  • . Creating production and migration schedules for large projects with timelines / milestones
  • . Develop and leverage AWS tools and services to manage and automate key operations capabilities.
  • . Proactively ensure the highest levels of systems and infrastructure availability
  • . Monitor and test application performance for potential bottlenecks, identify possible solutions and work with developers to implement those fixes.
  • . Write and maintain custom scripts to increase system efficiency and reduce human intervention time on tasks.
  • . Increase alerting & monitoring quality, Reduce Alarm noise, and Increase Observability Gaps
  • . Optimize Cloud Costing and analyse Capacity Planning
  • . Reduce Operations exposure, Increase the pace of incidents recovery, and Implement Resiliency and remediation plans
  • . Identifying and correcting problems stemming from audit and compliance.
  • . Liaise with vendors and other IT personnel for problem resolution
  • Performance Indicators : Observability Platform and Site Reliability Engineers have the following performance indicators :

  • . Platform adoptability, availability, scalability and performance
  • . Tech Dashboard
  • . Site Availability, Performance
  • . Mean Time to Detection
  • . Mean Time to Resolution
  • . Mean Time Between Failure
  • . Mean Time to Production
  • . Disaster Recovery Time to Recovery
  • . Change Success / Failure Metrics
  • Soft Skills : Communication is core to the success of this role

    Evangelize adoption and use of tools, processes and technologies

    Lead engagements to encourage collaboration within and across teams

    Showcase roadmap and engagement model to relevant stakeholders through write up, teams groups and webinars

    Documentation is core to maintain up to date information on use of tools, process and methodologies. [eg : wiki posts, Confluence write ups]

    Create internal training programs for new staff and upskilling of existing team

    Demonstrate humility, trust and transparency in the way we interact with individuals

    Create a job alert for this search

    Engineering • Bengaluru / Bangalore, India

    Related jobs
    • Promoted
    • New!
    Infrastructure Ops Engineer -Teams / O365

    Infrastructure Ops Engineer -Teams / O365

    KPG99 INCBengaluru, Karnataka, India
    TITLE : Infrastructure Ops Engineer -Teams / O365.Bengaluru, Karnataka 560103, India—(Hybrid Onsite-2 days).End User Computing & L3 Support. Understanding the use of the tools that we deploy to help ou...Show moreLast updated: 14 hours ago
    • Promoted
    IPaas Integration Developer

    IPaas Integration Developer

    AS Technology Corporationhosur, tamil nadu, in
    Customer Integration Specialist (iPaaS / API Integrations).Must be able to work Eastern Time hours.Position Type : We are seeking a technically skilled and customer-oriented Customer Integration Spe...Show moreLast updated: 7 days ago
    • Promoted
    Senior DevOps Enginner

    Senior DevOps Enginner

    Glowingbudhosur, tamil nadu, in
    Glowingbud is a rapidly growing eSIM services platform that simplifies connectivity with powerful APIs, robust B2B and B2C interfaces, and seamless integrations with Telna.Our platform enables glob...Show moreLast updated: 30+ days ago
    • Promoted
    UCCE L3 Engineer

    UCCE L3 Engineer

    Servion Global Solutionshosur, tamil nadu, in
    Supporting Experience on Cisco UCCE / UCCX / PCCE solutions & 3rd party Call recording platforms.Basic Cisco ICM / CCMP / CVP / CUIC & troubleshooting. MACD creation knowledge in Cisco UCCE & IPT platform...Show moreLast updated: 18 days ago
    • Promoted
    Senior Siebel Application Administrator (with DevOps & CPQ expertise)

    Senior Siebel Application Administrator (with DevOps & CPQ expertise)

    Rrootshell Technologiiss Pvt Ltdhosur, tamil nadu, in
    Hope you are doing well & Safe!.Rrootshell Technologiiss Pvt Ltd.We are HIRING & URGENT Requirement for.Senior Siebel Application Administrator (with DevOps & CPQ expertise).This is for FULL -TIME ...Show moreLast updated: 20 days ago
    • Promoted
    SRE Developer – AWS Serverless (6+ yrs)

    SRE Developer – AWS Serverless (6+ yrs)

    Xebiahosur, tamil nadu, in
    We’re Hiring : SRE Developer – AWS Serverless (Offshore, 6+ yrs).Xebia is expanding its Cloud & DevOps practice and looking for. SRE principles (SLIs, SLOs, SLAs, error budgets).CloudWatch, Dynatrace...Show moreLast updated: 24 days ago
    • Promoted
    L4 UC Engineer

    L4 UC Engineer

    Servion Global SolutionsBangalore, IN
    UC Architecture & Design : Deep understanding of Unified Communications Products like CUCM, CUC, IM & Presence, and Expressways. Deep knowledge of designing and troubleshooting clusters, inter-cluste...Show moreLast updated: 18 days ago
    • Promoted
    AWS Cloud Engineer

    AWS Cloud Engineer

    Proglitehosur, tamil nadu, in
    Infrastructure & System Administration : .Deploy, manage, and optimize EC2 instances across dev, test, and production environments. Perform system administration and troubleshooting for Linux and Wind...Show moreLast updated: 7 days ago
    • Promoted
    Azure Integration Services

    Azure Integration Services

    VicTree Solutionshosur, tamil nadu, in
    Strong experience with Azure Integration Services (e.Logic Apps, Azure Functions, APIM, Service Bus, Event Grid).Experience in developing Azure Functions (serverless) for event-driven or API-based ...Show moreLast updated: 4 days ago
    • Promoted
    • New!
    Senior DevSecOps Engineer

    Senior DevSecOps Engineer

    Unisysbangalore, India
    What success looks like in this role : .Provides DevSecOps Engineering support across diverse applications / systems.Designs, implements and manages automated infrastructure provisioning and configur...Show moreLast updated: 11 hours ago
    • Promoted
    DevSecOps / AppSecOps Staff Engineer

    DevSecOps / AppSecOps Staff Engineer

    First American (India)hosur, tamil nadu, in
    Our people-first culture empowers bold thinkers and passionate technologists to solve real-world challenges through scalable architecture and innovative design. If you're driven by impact, thrive in...Show moreLast updated: 8 days ago
    • Promoted
    • New!
    DevOps Engineer - Cloud & Automation

    DevOps Engineer - Cloud & Automation

    Tytan Technology Inc.hosur, tamil nadu, in
    We are looking for an experienced.CI / CD pipelines, cloud deployments, and automation frameworks.This position is ideal for someone passionate about infrastructure as code, containerization, and mod...Show moreLast updated: 10 hours ago
    • Promoted
    DevOps / Platform Engineer

    DevOps / Platform Engineer

    iVedha Inc.hosur, tamil nadu, in
    Hiring a seasoned DevOps / Platform Engineer to drive automation, platform reliability, and robust.Design, deploy, and manage CI / CD pipelines and infrastructure automation, leveraging AI for.Implemen...Show moreLast updated: 30+ days ago
    • Promoted
    Lead DevSecOps Engineer

    Lead DevSecOps Engineer

    sliceBengaluru, Karnataka, India
    We’ve all felt how slow, confusing, and complicated banking can be.We’re building every product from scratch to be fast, transparent, and feel good, because we believe that the best products transc...Show moreLast updated: 15 days ago
    • Promoted
    Calypso Developer

    Calypso Developer

    APPIT Software Inchosur, tamil nadu, in
    We are looking for Calypso Developer.Total Experience in Years : 5 - 10 years.Top 3 skills which is mandatory.Demonstrable experience (at least 5 years) with Java development, particularly along wi...Show moreLast updated: 6 days ago
    • Promoted
    AS400 Developer (US Shift)

    AS400 Developer (US Shift)

    Programmers.iohosur, tamil nadu, in
    AM to 5 PM CST (6 : 30 PM to 3 : 30 AM IST).Hands on experience in IBM AS400 iSeries platform and RPGLE programming is a must. Experienced in Integrated Language Environment (ILE).Experienced in Creatin...Show moreLast updated: 8 days ago
    • Promoted
    Bottomline - DevOps Engineer II - Release Management

    Bottomline - DevOps Engineer II - Release Management

    CAPITALCLOUD INDIA PRIVATE LIMITEDBangalore
    Why Choose Bottomline? Are you ready to transform the way businesses pay and get paid? Bottomline is a global leader in business payments and cash management, with ov...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    AEM DevOps

    AEM DevOps

    Capgeminihosur, tamil nadu, in
    As an AEM DevOps Engineer, you will be responsible for bridging the gap between development and operations for Adobe Experience Manager (AEM) platforms. You will ensure smooth deployment, scalabilit...Show moreLast updated: 2 hours ago
    • Promoted
    • New!
    SDE II – Backend Developer

    SDE II – Backend Developer

    GamersBerghosur, tamil nadu, in
    Position : SDE II – Backend Developer.Company : Gamersberg Technology Pvt.Please apply only through the application form linked at the bottom of this description. Applications submitted via LinkedIn Q...Show moreLast updated: 10 hours ago
    • Promoted
    DevOps Architect

    DevOps Architect

    VARITE INChosur, tamil nadu, in
    Azure Support Engineer- Azure dev / Integration -Immediate.Total Experience in Years : 7-8 (Relevant).Shift Timings in IST : 2 pm IST to 10 pm IST. Top 3 skills which is mandatory.Hands-on expertise...Show moreLast updated: 30+ days ago