Talent.com
No longer accepting applications
▷ (15h Left) Site Reliability Engineer

▷ (15h Left) Site Reliability Engineer

SFS Group India Pvt. Ltd.Pune, Maharashtra, India
14 hours ago
Job description

Objectives

  • Act as the Site Reliability Engineer for global operations, ensuring system stability, scalability, and efficiency through advanced automation, observability, and proactive infrastructure management.
  • Provide expertise in Kubernetes, Linux, networking, and automation practices to support reliable deployments and resilient services.
  • Maintain a strong sense of reliability, with clear awareness of the risks and impacts that infrastructure and application changes can have.

Principal duties

  • Has strong knowledge of Kubernetes (including Talos) for deployment, scaling, and maintaining containerized applications.
  • Provides Linux administration expertise and ensures secure, efficient system operations.
  • Implements and maintains GitOps workflows using Flux for consistent, automated deployments.
  • Designs and manages infrastructure automation using Puppet and Terraform.
  • Ensures reliable operation of databases such as MySQL / MariaDB, Yugabyte, and MongoDB, supporting data integrity and availability.
  • Operates and integrates streaming platforms (Confluent, Strimzi) for event-driven and real-time processing.
  • Develops automation scripts and tools using Python to improve operational efficiency.
  • Oversees edge device management, ensuring secure connectivity and smooth lifecycle operations.
  • Supports and integrates solutions with Azure and hybrid / multi-cloud environments.
  • Builds and operates monitoring and observability systems (Datadog, Prometheus, Grafana) to ensure system health and transparency.
  • Designs for scalability and high availability, including disaster recovery and failover strategies.
  • Applies security best practices across infrastructure, applications, and data.
  • Evaluates risks carefully before changes, ensuring reliable rollout strategies and minimizing downtime or service disruption.
  • Monitors system reliability, identifies risks, and implements proactive improvements.
  • Collaborates with global teams to share best practices and ensure consistency across environments.
  • Defines and standardizes developer tooling (e.g., IDEs, code quality tools, CI / CD integrations) to ensure consistent development environments and maintain high software quality.
  • Manages developer workstations and operating system standards (currently Ubuntu-based), ensuring performance, security, and compatibility across the engineering organization with focus on the Asia team.
  • Promotes a documentation culture, ensuring clear processes, runbooks, and troubleshooting guides.
  • Report to the offshore Digital Manufacturing team based in Switzerland.
  • Create a job alert for this search

    Site Reliability Engineer • Pune, Maharashtra, India