We are seeking an AI Platform Engineer to build and scale the infrastructure that powers our production AI services. You will take cutting-edge models-ranging from speech recognition (ASR) to large language models (LLMs), and deploy them into highly available, developer-friendly APIs.
You will be responsible for creating the bridge between the R&D team , who train models, and the applications that consume them. This means developing robust APIs, deploying and optimising models on Triton Inference Server (or similar frameworks) , and ensuring real-time, scalable inference.
Responsibilities
API Development
- Design, build, and maintain production-ready APIs for speech, language, and other AI models.
- Provide SDKs and documentation to enable easy developer adoption.
Model Deployment
Deploy models (ASR, LLM, and others) using Triton Inference Server or similar systems.Optimise inference pipelines for low-latency, high-throughput workloads.Scalability & Reliability
Architect infrastructure for handling large-scale, concurrent inference requests .Implement monitoring, logging, and auto-scaling for deployed services.Collaboration
Work with research teams to productionize new models.Partner with application teams to deliver AI functionality seamlessly through APIs.DevOps & Infrastructure
Automate CI / CD pipelines for models and APIs.Manage GPU-based infrastructure in cloud or hybrid environments.Requirements
Core Skills
Strong programming experience in Python (FastAPI, Flask) and / or Go / Node.js for API services.Hands-on experience with model deployment using Triton Inference Server, TorchServe, or similar.Familiarity with both ASR frameworks and LLM frameworks (Hugging Face Transformers, TensorRT-LLM, vLLM, etc.) .Infrastructure
Experience with Docker, Kubernetes , and managing GPU-accelerated workloads .Deep knowledge of real-time inference systems (REST, gRPC, WebSockets, streaming).Cloud experience (AWS, GCP, Azure).Bonus
Experience with model optimisation (quantisation, distillation, TensorRT, ONNX).Exposure to MLOps tools for deployment and monitoring