Get To Know Us First!
Who We Are
At interface.ai, we’re redefining the future of banking with AI. Our cutting-edge Generative AI-powered platform serves over 100 banks and credit unions, delivering hyper-personalized customer interactions across voice, chat, and employee-assisting solutions.
Our mission :
To make banking effortless, intelligent, and profitable—enhancing user experience while boosting revenue and efficiency for financial institutions.
We’re not just another AI company. Our proprietary AI, built 100% in-house, is designed for zero-shot learning, achieving 90%+ accuracy on Day 1. With a world-class team from Microsoft, ISB, and IIMs, and a 1,800% growth rate in the last year, we’re shaping the future of AI in financial services.
Join us to build something transformative.
Careers - https : / / interface.ai / open-positions
LinkedIn - https : / / www.linkedin.com / company / interface-ai /
Role – DevOps Engineer III
Location : India (Remote)
Function : Engineering – Product Engineering
Level : Senior
Reports to : Engineering Manager – Product Engineering
About the Role
At interface.ai, we are building BankGPT – the world’s first AI-powered digital banking platform that leverages large language models, multi-agent orchestration, real-time streaming, and voice AI. To support this mission, we are seeking a DevOps Engineer III who will own infrastructure end-to-end, design systems from scratch, and enable highly resilient AI workloads at scale.
This is a senior, hands-on role that requires deep expertise in cloud-native DevOps, infrastructure automation, observability, and security. You’ll not only build and optimize systems but also influence best practices, mentor peers, and contribute to critical decision-making around platform reliability and scalability.
What You’ll Do
Infrastructure Ownership – Design, implement, and scale infra across AWS, GCP, or Azure; drive high availability, multi-AZ, and DR / BCP strategies.
Cloud-Native Enablement – Build and manage Kubernetes clusters (EKS / GKE), service mesh (Istio / Linkerd), and ingress controllers for secure and resilient workloads.
CI / CD & Automation – Architect CI / CD pipelines (ArgoCD / GitOps, Jenkins) and build custom deployment portals and automation tools to accelerate developer productivity.
AI / LLM Reliability – Define and track key metrics (latency, cost, throughput, containment) for AI / LLMs and agent workflows.
Observability & Tracing – Implement end-to-end tracing for multi-turn queries and real-time pipelines using OpenTelemetry, Prometheus, and Grafana.
Vector Databases – Manage and tune vector DBs (Pinecone, Weaviate, Milvus, etc.) for high concurrency, hybrid retrieval, reranking, and resilience.
Resilience & Scaling – Design autoscaling, failover, and health-check–based routing strategies for workloads like WebSockets, RAG pipelines, and voice (STT / TTS).
Scripting & Tooling – Write Bash / Python / Go scripts for operational tooling, log rotation, API integrations, and rollout automation.
Collaboration – Partner with AI and engineering teams to support complex workflows, while driving DevOps best practices across the organization
What You’ll Bring
5–8 years of core DevOps experience with a strong track record of building infra from scratch (not just maintaining existing systems).
Deep expertise in Docker, Kubernetes, Helm, and container orchestration.
Hands-on with Terraform, Crossplane, and declarative infra management.
Strong experience in CI / CD pipelines (ArgoCD, Jenkins, GitOps workflows) and building custom automation.
Proven ability to deploy AI / LLMs & agent workflows reliably in production.
Expertise in defining / tracking AI workflow metrics and observability of multi-turn queries.
Mandatory expertise with vector databases – tuning, scaling, and optimizing retrieval performance.
Proficiency in monitoring & logging tools (Prometheus, Grafana, OpenTelemetry, ELK / OpenSearch).
Familiarity with service mesh (Istio / Linkerd), networking, and multi-cluster workloads.
Proficiency in scripting / programming (Python, Bash, Go preferred).
Knowledge of security best practices in cloud environments (IAM, secrets, secure networking).
Bonus Points :
Experience working on AI-enabled or ML-integrated platforms
Understanding of compliance, security, and auditability requirements in regulated environments
Prior experience working in fast-paced, high-growth product teams
Why Join Us?
Remote-first culture – Work from anywhere, with top-tier colleagues.
High ownership, high impact – Your work will define the future of banking.
Comprehensive Benefits – We take care of our people.
Engineer Iii • Saint Thomas Mount, Tamil Nadu, India