Talent.com
Site Reliability Engineer

Site Reliability Engineer

ConfidentialNavi Mumbai, Mumbai, India
5 days ago
Job description

Key Responsibilities

Technical & Product Support :

  • Serve as the first line of support for customer-reported technical issues related to our SaaS platform.
  • This involves data connectivity issues, report errors, performance concerns, access problems, data inconsistencies, software bugs, integration challenges etc.
  • Understand and empathize with the challenges ThoughtSpot users face, offering tailored solutions to improve their user experience.
  • Ensure prompt and accurate updates, meet SLAs and provide timely resolution to customer issues via tickets and calls.
  • Create knowledge-base articles to document knowledge and help customers self service.

System Reliability & Monitoring

  • Maintain, monitor, and troubleshoot ThoughtSpot cloud infrastructure.
  • Monitor system health and performance through metrics, logs, and dashboards using tools like Prometheus, Grafana, to detect and prevent issues early.
  • Work with Engineering teams to define, and implement tools to enhance debuggability, supportability, availability, scalability, and performance.
  • Be an expert in cloud and on-premise infrastructure by developing automation and best practices.
  • Participate in on-call rotation for critical SRE systems, lead the incident review and root cause analysis.
  • Required Skills & Experience

  • Exceptional communication skills, both written and verbal, to effectively engage with cross-functional teams, customers, and stakeholders.
  • Relevant work experience troubleshooting complex Linux Systems and managing distributed systems.
  • Experience in virtualization and Cloud technologies.
  • Experience in enterprise customer support, on-call rotation for critical SRE systems, leading incident review and root cause analysis.
  • Ability to diagnose technical problems and work with Engineering on escalated issues.
  • Strong problem solving skills, algorithmic thinking and a strong foundation in how systems should work.
  • Understanding of tools & frameworks required to Operate and manage Cloud infrastructure.
  • Strong customer service skills.
  • Solid communication skills and ability to work independently.
  • Ability to leverage automation, monitoring and data analysis to ensure high availability.
  • Familiarity with scripting languages such as Python, JavaScript or Bash.
  • Exposure to infrastructure and service monitoring tools
  • (ref : hirist.tech)

    Skills Required

    Data Analysis, Javascript, Cloud Technologies, Bash, Automation, Python, Monitoring

    Create a job alert for this search

    Site Reliability Engineer • Navi Mumbai, Mumbai, India