Talent.com
This job offer is not available in your country.
Sr. Site Reliability Engineer [T500-20179]

Sr. Site Reliability Engineer [T500-20179]

Delta Air LinesBengaluru, Karnataka, India
17 days ago
Job description

About Delta Tech Hub :

Delta Air Lines (NYSE : DAL) is the U.S. global airline leader in safety, innovation, reliability and customer experience. Powered by our employees around the world, Delta has for a decade led the airline industry in operational excellence while maintaining our reputation for award-winning customer service. With our mission of connecting the people and cultures of the globe, Delta strives to foster understanding across a diverse world and serve as a force for social good. Delta has fast emerged as a customer-oriented, innovation-led, technology-driven business. The Delta Technology Hub will contribute directly to these objectives. It will sustain our long-term aspirations of delivering niche, IP-intensive, high-value, and innovative solutions. It supports various teams and functions across Delta and is an integral part of our transformation agenda, working seamlessly with a global team to create memorable experiences for customers.

Key Responsibilities :

  • Execute on the Incident, Change Management, Problem Management processes
  • Building and supporting a reliable application suite for the environment in order to meet the development and maintenance requirements of systems / platforms.
  • Provide consultation and direct technical support in life cycle planning, problem management, integration, and systems programming
  • Ensure platform performance and availability meet enterprise objectives through monitoring, timely service restoration, and tuning
  • Constantly working to improve and implement automation of applications tasks
  • Providing technical support for systems / platforms according to application SLA's.
  • Responsible for designing and developing resiliency in the application code, troubleshooting incidents, engaging with squads to address failure patterns, and participating in incident management.
  • Strong Troubleshooting ability required
  • Leads calls or contributes in a logical fashion
  • Focus on resolving issues before they become incidents
  • Identify and articulate severity of impacts using provided monitoring tools and escalate as needed
  • Able to understand architecture and design of applications and identify or narrow focus for an incident based on symptoms
  • Perform root cause analysis to quickly recover from service interruptions, and to prevent
  • recurring problems
  • Monitor, manage, and tune platforms to ensure expected availability and performance levels are achieved
  • Identify gaps in monitoring or documentation and reaches out to appropriate teams to fill those gaps
  • Implement changes to platforms with minimal impact to the business by following enterprise standards and procedures
  • Design and document enterprise standards and procedures

Minimum Qualifications :

  • Bachelor’s degree or industry certification in an applicable IT field, in addition to 3 years
  • applicable experience in the design / administration / support of one or more platforms or
  • Bachelor's degree in an IT field, in addition to two years applicable experience in the
  • Design / administration / support of one or more platforms
  • 3+ years of experience as a Systems Engineer or Site Reliability Engineer
  • 3+ years of experience with ops automation using a scripting language such as Python or Ansible- Must Have with either one
  • Site Reliability Engineering : Knowledge of the theories and methodologies of reliability
  • engineering; ability to design, develop and support various tools, services and applications to maintain a reliable site Environment.
  • Performance Measurement and Tuning : Knowledge of system performance, testing and
  • programming; ability to monitor, measure, and optimize system performance and network communication.
  • CI / CD- (Must Have) Pipeline : Knowledge of concepts, values and tools applied in building Continuous
  • Integration (CI), Continuous Delivery and Continuous Deployment (CD) pipeline; ability to design, build, implement and maintain CI / CD pipelines to achieve the automation of software delivery process.
  • Kubernetes and AWS- Must
  • Docker (Good to have)
  • Software Release Management : Knowledge of strategies, practices and tools for managing versions and distribution of software products and enhancements; ability to evaluate and improve release management practices and tools
  • Application Maintenance : Knowledge of production applications; ability to monitor application functions and resolve issues to maintain optimal conditions for system applications.
  • Software Engineering : Knowledge of software engineering; ability to deliver new or enhanced software products.
  • Agile Development : Knowledge of agile methodologies and the agile development lifecycle ability to utilize formal agile methodologies, disciplines, practices and techniques for the delivery of new and enhanced applications.
  • Embraces diverse people, thinking and styles
  • Preferred Qualifications :

  • Master’s degree in Computer Science, Information Technology or related field is preferred
  • Experience and exposure to VMWare VDI implementations a huge plus
  • Experience with Dynatrace APM and synthetic monitoring
  • Experience with airline applications and infrastructure technology is a plus
  • Create a job alert for this search

    Site Reliability Engineer • Bengaluru, Karnataka, India