Director of Engineering- Cloud Infrastructure
About The Role :
We are seeking a Platform Engineering Leader to head our Cloud Control Plane and Observability initiatives. This leader will be responsible for building and scaling the foundational platform services that power all of ThoughtSpot’s SaaS offerings. This role is ideal for someone who thrives on solving complex infrastructure problems at scale, loves enabling engineering velocity, and is deeply passionate about resilience, performance, and observability in multi-cloud environments.
What You'll Do :
- Lead and Scale Platform Teams : Manage and grow high-performing engineering
teams working on core control plane services and observability infrastructure.
Own the Cloud Control Plane : Architect and operate scalable control plane services, including service orchestration, feature flag systems, configuration propagation, tenancy-aware deployments, and health monitoring.Build a World-Class Observability Stack : Own logging, metrics, tracing, alerting, and visualization to support both developer productivity and system reliability.Drive Operational Excellence : Establish SLOs, improve MTTR / MTTD, and embedresilience across the platform.
Partner Across the Org : Collaborate with SRE, Security, Application, and ProductEngineering teams to ensure platform services meet evolving business needs.
Architect for Multi-Tenant SaaS : Enable secure and efficient scaling across tenants in AWS and GCP, with attention to cost, compliance, and observability.Contribute Hands-On : Participate in architecture reviews, deep dive into production issues, and mentor engineers on best practices in system design and debugging.What You Bring :
14+ years of engineering experience, with at least 5 years in platform / infrastructure leadership roles.Expertise in Kubernetes, service meshes, CI / CD pipelines, and cloud-nativearchitecture.
Proven experience with control plane engineering, including service discovery,dynamic config, scaling orchestration, and policy enforcement.
Deep understanding of observability tooling (e.g., Prometheus, Grafana,OpenTelemetry, Datadog, Elastic, etc.).
Familiarity with distributed systems concepts like CAP theorem, consensus, and leader election.Experience operating multi-tenant systems in AWS and / or GCP environments.Hands-on experience with at least one major programming language (Go, Java, Python).Strong stakeholder management and the ability to influence architectural direction across orgs.