Talent.com
This job offer is not available in your country.
Validation & Performance Automation Eng

Validation & Performance Automation Eng

LTIMindtreeMangalore, IN
7 days ago
Job description

Job Description :

Senior Infrastructure Test & Validation Engineer (Zero-Touch GPU Cloud – GitOps Validation & Certification)

We are seeking a Senior Infrastructure Test & Validation Engineer with 10+ years of experience to lead the Zero-Touch Validation, Upgrade, and Certification automation of our on-prem GPU cloud platform. This role focuses on ensuring the stability, performance, and conformance of the entire stack—from hardware to Kubernetes—using automated, GitOps-based validation pipelines. The ideal candidate has a strong infrastructure background with deep hands-on skills in Sonobuoy , LitmusChaos , k6 , and pytest , and is passionate about automated test orchestration, platform resilience, and continuous conformance.

Key Responsibilities

  • Design and implement automated, GitOps-compliant pipelines for validation and certification of the GPU cloud stack across hardware, OS, Kubernetes, and platform layers.
  • Integrate Sonobuoy for Kubernetes conformance and certification testing.
  • Design and orchestrate chaos engineering workflows using LitmusChaos to validate system resilience across failure scenarios.
  • Implement performance testing suites using k6 and system-level benchmarks, integrated into CI / CD pipelines.
  • Develop and maintain end-to-end test frameworks using pytest and / or Go , focusing on cluster lifecycle events, upgrade paths, and GPU workloads.
  • Ensure test coverage and validation across multiple dimensions : conformance, performance, fault injection, and post-upgrade validation.
  • Build and maintain dashboards and reporting for automated test results, including traceability, drift detection, and compliance tracking.
  • Collaborate with infrastructure, SRE, and platform teams to embed testing and validation early in the deployment lifecycle.
  • Own quality assurance gates for all automation-driven deployments.

Required Skills & Experience

  • 10+ years of hands-on experience in infrastructure engineering, systems validation, or SRE roles.
  • Primary key skills required are pytest, Go, k6 scripting, automation frameworks integration (Sonobuoy, LitmusChaos), CI integration
  • Strong experience with :
  • Sonobuoy for Kubernetes conformance and diagnostics
  • LitmusChaos for fault injection and resilience validation
  • k6 for performance / load testing in distributed environments
  • pytest or Go-based test frameworks for automation and validation scripting
  • Deep understanding of Kubernetes architecture, upgrade patterns, and operational risks.
  • Experience validating infrastructure components (GPU drivers, kernel modules, CNI, CRI, etc.) across lifecycle events.
  • Proficient in GitOps workflows and integrating tests into declarative, Git-backed pipelines (e.g., with Argo CD, Flux).
  • Hands-on experience with CI / CD systems (e.g., GitHub Actions, GitLab CI, Jenkins) to automate test orchestration.
  • Solid scripting and automation experience (Python, Bash, or Go).
  • Familiarity with GPU-based infrastructure and its performance characteristics is a strong plus.
  • Strong debugging, root cause analysis, and incident investigation skills.
  • Create a job alert for this search

    Performance Automation • Mangalore, IN