Talent.com
This job offer is not available in your country.
Senior Analyst, Site Reliability Engineering

Senior Analyst, Site Reliability Engineering

Hudson's Bay Companybangalore, India
20 hours ago
Job description

Job Description

Saks Cloud Services is looking for a Senior Analyst to join the Site Reliability Engineering (SRE) team.The ideal candidate for this role would be someone who is outgoing, obsessed with customer service and has strong analytical and communication skills. This candidate should also strive for continuous improvement, be enthusiastic about new ideas and enjoy opportunities to “think outside the box”.

Position : Sr. Analyst SRE

What this position is all about

The successful candidate will primarily identify and analyze technical problems in systems and applications across all supported divisions. Work closely with cross-functional IT Teams to troubleshoot and resolve application-related issues. Play a key role in implementing new solutions that improve the efficiency and effectiveness of the team and organization. The ideal candidate for this role should have a strong technical background and communicate effectively with technical and non-technical stakeholders.

Role description :

  • ?5+ years of experience working within DevOps or SRE teams.
  • ?3+ years of experience with any Cloud platforms (preferably AWS, Azure)
  • ?Ability to program (structured and OO) with one or more high-level languages, such as JavaScript, Java,Python and bash.
  • ?Participate in on-call rotations (PagerDuty / Opsgenie) and respond to incidents outside of regular hours.
  • ?Run the production environment by monitoring availability and taking a holistic view of system health
  • ?Part of building and implementing services to make IT and support better at their jobs.
  • ?Improve reliability, quality, and time-to-market of our suite of software solutions
  • ?Measure and optimize system performance, to push our capabilities forward, get ahead of customer needs, and innovate to improve continually
  • ?Validate the NFR / SLx with production logs or business analytics.
  • ?Conduct proof-of-concepts to showcase the benefit of the recommendation.
  • ?Instrument the target environment to capture relevant monitoring metrics for analysis.
  • ?Contribute to grooming SRE in core concepts and build a knowledge repository by adding point-of-view documents and blogs.
  • ?Document the engineering strategy and analysis reports.
  • ?Document every action so your findings turn into repeatable actions–and then into automation.
  • ?Hands-on experience with Distributed Version Control Systems such as GIT, AWS Code Commit or equivalent.
  • ?Must have experience with Docker, Kubernetes, Terraform, and Ansible.
  • ?Know your way around Linux and the Unix Shell.
  • ?Experience or familiarity with ELK stack
  • ?Balance feature development speed and reliability with well-defined service level objectives
  • ?Monitor systems and telemetry of Salesforce Commerce Cloud and Salesforce Service Cloud for operational health in terms of site stability, reliability, and performance.
  • ?Prioritize and develop automated administrative and operational tasks to continuously improve site stability, capacity, reliability, and performance.
  • ?Provide active incident response support, investigate major problems, and ensure the timely and effective return to normal operations of the Digital Commerce and CRM platforms during major incidents.
  • ?Provide periodic on-call support based on established 24 / 7 / 365 support schedules.
  • ?Collaborate with Digital Development, and QA teams to ensure that Production environments are deployment-ready by Change Management processes and the Digital release schedules.
  • ?Support Development teams in the provision and configuration of lower environments including CICD pipeline support
  • ?Support incident management and problem management efforts with root cause analysis to effectively identify and resolve issues related to platform reliability, stability, and performance through the careful analysis of telemetry data and system logs.
  • ?Collaborate with Engineering and Project teams to perform production readiness assessments and ensure that proper controls and processes are in place.
  • ?Support / execute production change management requests on behalf of the Digital Engineering teams.
  • ?Evaluate and propose tools and techniques to improve operational activities.
  • ?Support Development teams in the provision and configuration of lower environments.

Job Qualifications

Key Qualifications :

  • ?5+ years of related work experience, preferably in SRE or DevOps-related fields.
  • ?Understand customer business processes & transactions
  • ?Understand application architecture / design, analyze non-functional requirements, SLI / SLO
  • ?Independently troubleshoot performance, scalability, capacity, resilience & reliability issues & correlate to application code & configurations.
  • ?Involve in code, design and Architecture reviews and ensure meeting application reliability goals
  • ?Strong troubleshooting, analytical, and problem-solving skills
  • ?Strong verbal and written communication skills.
  • ?Experience in the administration and support of Digital Retail Platforms, e.g. Salesforce CC, Shopify, Magento, IBM WebSphere Commerce, etc. is an asset.
  • ?Experience with monitoring, logging & telemetry tools like New Relic, Mpulse, Splunk, Nagios, SolarWinds, Prometheus, AWS Cloudwatch, Datadog, etc.
  • ?Experience with cloud infrastructure administration (i.e., AWS, GCP, Azure)
  • ?Basic understanding of Networking, Content Delivery Networks (CDN, e.g. Akamai, Cloudflare), and Saas solutions
  • ?Hands-on experience with scripting languages and in maintaining Automation frameworks (PowerShell, Python, Ruby, AWK, SED, Shell, etc.) to run health checks and self-healing capabilities for the platforms.
  • ?Experience with automation and tools such as (but not limited to) GitHub Actions, Chef, Terraform, Ansible, etc.
  • ?Experience with Web / development technologies (i.e., JavaScript, Node.js, React, HTML, XML, CSS, REST)
  • ?Experience with ticketing and collaboration tools (i.e., JSM, Jira Work Management, ServiceNow)
  • ?3+ years of SRE experience working on telemetry, observation, self-healing solutions, and platform automation
  • Your Life and Career at Saks Cloud Services

  • ?Be part of a world-class team; work adventurously; think and act like an owner-operator!
  • ?Exposure to rewarding career advancement opportunities, from retail to supply chain, to digital or corporate.
  • ?A culture that promotes a healthy, fulfilling work / life balance.
  • ?Benefits package for all eligible full-time employees (including medical, vision, and dental).
  • ?amazing employee discount
  • Create a job alert for this search

    Senior Reliability • bangalore, India

    Related jobs
    • Promoted
    Attack Surface Reduction Senior Analyst

    Attack Surface Reduction Senior Analyst

    Aqilea (formerly Soltia)Bengaluru, Karnataka, India
    We are a consulting company with a bunch of technology-interested and happy people!.We love technology, we love design and we love quality. Our diversity makes us unique and creates an inclusive and...Show moreLast updated: 30+ days ago
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    ScaleneWorksBengaluru, Karnataka, India
    Quick Apply
    Experience in C++ / Java : if one of the two it is ok.Knowledge of cloud would be appreciated.Knowledge of software development life cycle : nice to have. Has working experience and advanced and speci...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Amicon Hub ServicesBengaluru, Karnataka, India
    Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation. Collaborate with development teams to en...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    BayOne Solutionshosur, tamil nadu, in
    Role : Site Reliability Engineer.The CXE Site Reliability Engineering (SRE) team manages the CI / CD pipelines and cloud infrastructure, ensuring seamless deployment, monitoring, and maintenance.Howev...Show moreLast updated: 1 day ago
    • Promoted
    (Only 24h Left) Senior Site Reliability Engineer

    (Only 24h Left) Senior Site Reliability Engineer

    ViewSonicBengaluru, Karnataka, India
    At ViewSonic Technologies, we’re passionate about building software that solves problems.We count on our site reliability engineers (SREs) to empower users with a rich feature set, high availabilit...Show moreLast updated: 10 days ago
    • Promoted
    SolarWinds - Senior Site Reliability Engineer - DevOps

    SolarWinds - Senior Site Reliability Engineer - DevOps

    Solarwinds India Pvt LtdBangalore
    Your Role : We are seeking a Senior Site Reliability Engineer (Infrastructure & Site Reliability Engineering) with experience in AWS, GCP, Kubernetes, and GitOps...Show moreLast updated: 1 day ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.hosur, tamil nadu, in
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineering Lead - Cloud Platform

    Site Reliability Engineering Lead - Cloud Platform

    Leap India Stack FoundationBangalore
    About the job : Position Purpose : At Brambles there is a need to make sure that platforms built on cloud hypervisors run smo...Show moreLast updated: 19 days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    ACL DigitalBengaluru, India
    Service Management : Maintain application uptime / performance, manage system enhancements and defects, oversee daily operational activities, and ensure continuous improvement and adherence to ITIL be...Show moreLast updated: 4 hours ago
    Senior Manager – Site Reliability Engineering (SRE)

    Senior Manager – Site Reliability Engineering (SRE)

    First AdvantageBangalore-560066, ITPL Bangalore, IN
    Quick Apply
    At First Advantage (Nasdaq : FA), people are at the heart of everything we do.From our customers and partners to our greatest advantage — our team members. Operating with empathy and compassion...Show moreLast updated: 12 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    WSO2Bengaluru, Karnataka, India
    Founded in 2005, WSO2 is the largest independent software vendor providing open-source API management, integration, and identity and access management (IAM) to thousands of enterprises in over 90 c...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    TavantBengaluru, Karnataka, India
    With 25+ years of experience building innovative digital products and solutions, Tavant provides impactful results to its customers. It has been the frontrunner in driving digital innovation and tec...Show moreLast updated: 27 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ExasoftBangalore, IN
    Responsibilities and Requirements : .Experience must be at least 10+ years in SRE.Multi Cloud, Hybrid Cloud – on Data center sites. Experience with multiple operating systems (.Operating Systems, Kern...Show moreLast updated: 1 day ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    EmbarkGCCbangalore, karnataka, in
    Senior Site Reliability Engineer (SRE) – Job Description.Implement and tune SLOs / SLIs, build reliability dashboards, and respond to incidents using Grafana IRM, JSM, and escalation workflows.Monito...Show moreLast updated: 27 days ago
    • Promoted
    Senior Site Reliability Engineer [T500-20117]

    Senior Site Reliability Engineer [T500-20117]

    Delta Air Linesbangalore, karnataka, in
    Delta Air Lines (NYSE : DAL) is the U.Powered by our employees around the world, Delta has for a decade led the airline industry in operational excellence while maintaining our reputation for award-...Show moreLast updated: 22 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Uplershosur, tamil nadu, in
    Uplers is hiring for one of the clients.SRE (Oracle Cloud Infrastructure).Remote | Mon–Fri | 10 : 30 AM – 7 : 30 PM IST.Use of personal device required. OCI cloud infrastructure using Terraform and GitL...Show moreLast updated: 25 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    ViewSonicBengaluru, Karnataka, India
    At ViewSonic Technologies, we’re passionate about building software that solves problems.We count on our site reliability engineers (SREs) to empower users with a rich feature set, high availabilit...Show moreLast updated: 30+ days ago
    • Promoted
    Sr. Site Reliability Engineer [T500-20179]

    Sr. Site Reliability Engineer [T500-20179]

    Delta Air Linesbangalore, karnataka, in
    Delta Air Lines (NYSE : DAL) is the U.Powered by our employees around the world, Delta has for a decade led the airline industry in operational excellence while maintaining our reputation for award-...Show moreLast updated: 19 days ago
    • Promoted
    Site Reliability Engineer - Chaos Management

    Site Reliability Engineer - Chaos Management

    Xebiahosur, tamil nadu, in
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 9 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Xebiahosur, tamil nadu, in
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 28 days ago