AI Infrastructure Architecture :
- Design and implement asynchronous multi-agent orchestration
- Own end-to-end latency from user message to AI response
- Build resilient inference pipelines that gracefully degrade under load
- Implement intelligent request routing and load balancing for AI workloads
- Migrate critical AI conversation flow from monolith to dedicated services
- Implement WebSocket / streaming infrastructure for real-time chat
- Design circuit breakers and fallback strategies for AI model failures
- Build comprehensive observability for AI system performance
- Optimize credit data retrieval and caching strategies
Requirements
Must-Have Experience :
6+ years building production systems handling >10k concurrent users
Proven experience with async / event-driven architectures (not just REST APIs)Hands-on experience scaling ML / AI inference in productionDeep understanding of caching strategies (Redis, in-memory, CDN)Experience with message queues and real-time communication protocolsAI-Specific Expertise :
Built systems integrating multiple LLM / AI models in productionExperience with AI model serving frameworks (TensorFlow Serving, Triton, etc.)Understanding of AI inference optimization (batching, caching, model quantization)Knowledge of conversation state management and context handlingHas debugged production issues under high AI inference loadShow more
Show less
Skills Required
AI inference optimization, message queues, conversation state management, real-time communication protocols, async event-driven architectures, WebSocket streaming infrastructure, resilient inference pipelines, asynchronous multi-agent orchestration, load balancing for AI workloads, intelligent request routing, caching strategies, circuit breakers and fallback strategies, observability for AI system performance, scaling ML AI inference