Job Description
We are seeking a highly skilled AI Platform Engineer to design, build, and operate our next-generation AI application platform . In this role, you will work on advanced AI systems including Retrieval-Augmented Generation (RAG) pipelines, multi-model gateways , Model Context Protocol (MCP) tools , agentic workflow automations (e.g., n8n), and secure chat interfaces such as OpenWebUI or custom-built solutions .
You will be responsible for ensuring the production readiness of all platform components—covering Infrastructure as Code (IaC) , CI / CD pipelines , Kubernetes orchestration , observability , cost optimization , and governance . This role will work closely with product, data, and security teams to deliver scalable, secure, and high-performing AI-driven applications.
Key Responsibilities
- Design, develop, and operate the company’s AI application platform , including RAG pipelines, model gateways, MCP integrations, and agent workflows.
- Build and maintain secure chat interfaces using OpenWebUI or custom UI components.
- Own IaC (Terraform, Helm, etc.) , CI / CD automation , and Kubernetes -based deployments for all AI platform components.
- Implement robust observability solutions (monitoring, logging, tracing) across the platform.
- Ensure platform reliability, uptime, and scalability using SRE and DevOps best practices.
- Optimize compute, storage, and inference costs while maintaining performance and quality.
- Establish strong governance , access control, and compliance processes across AI workloads.
- Collaborate cross-functionally with product, data science, engineering, and security to deliver high-impact AI features and integrations.
- Troubleshoot production issues and continuously improve the platform’s architecture and performance.
Required Skills & Qualifications
4–10+ years of experience in software engineering, DevOps, or AI / ML platform engineering .Strong hands-on programming experience in Python, Go, or TypeScript .Expertise with Kubernetes , container orchestration, and cloud-native tooling.Proficiency with Terraform , Helm , or other IaC frameworks.Experience building RAG pipelines , LLM integrations , or similar AI workflows.Familiarity with n8n , LangChain , OpenWebUI , or custom chat interface frameworks.Solid understanding of observability tools (Prometheus, Grafana, ELK, OpenTelemetry).Experience with secure, production-grade deployments of AI or distributed systems.Preferred Qualifications
Experience with multi-model routing, inference gateways, or vector databases.Knowledge of MCP (Model Context Protocol) and related developer tooling.Prior experience scaling AI or distributed systems in cloud environments (AWS, GCP, Azure).Understanding of SRE principles, access governance, and security best practices.