About US
Shunya Labs is building the Voice AI Infrastructure Layer for Enterprises powering speech intelligence, conversational agents, and domain-specific voice applications across industries. Born from deep work in mental-health AI and built for global enterprise scale, our stack combines state-of-the-art ASR / TTS models with an open-weights philosophy , driving accuracy, privacy, and scalability.
About the Role
We're seeking an AI Systems Engineer who thrives at the intersection of AI model optimization , infrastructure engineering , and applied research .
You will evaluate, host, and optimize a wide range of AI models—spanning ASR, LLMs, and multimodal systems and build the orchestration layer that powers scalable, low-latency deployments.
This is a role for someone who's comfortable navigating ambiguity , researching emerging AI methods , and translating client requirements into robust, production-ready solutions.
You'll work across the full stack—from GPU inference tuning to React-based control dashboards building a resilient and scalable AI delivery platform.
Key Responsibilities -
AI Model Evaluation & Optimization
- Evaluate, benchmark, and optimize AI models (speech, text, vision, multimodal) for latency, throughput, and accuracy.
- Implement advanced inference optimizations using ONNX Runtime , TensorRT , quantization , and GPU batching .
- Continuously research and experiment with the latest AI runtimes , serving frameworks, and model architectures.
- Develop efficient caching and model loading strategies for multi-tenant serving.
AI Infrastructure & Orchestration
Design and develop a central orchestration layer to manage multi-model inference, load balancing, and intelligent routing.Build scalable, fault-tolerant deployments using AWS ECS / EKS , Lambda , and Terraform .Use Kubernetes autoscaling and GPU node optimization to minimize latency under dynamic load.Implement observability and monitoring (Prometheus, Grafana, CloudWatch) across the model-serving ecosystem.DevOps, CI / CD & Automation
Build and maintain CI / CD pipelines for model integration, updates, and deployment (GitHub Actions, CodePipeline, etc.).Manage Dockerized environments , version control, and GPU-enabled build pipelines.Ensure reproducibility and resilience through infrastructure-as-code and automated testing.Frontend & Developer Tools
Create React / Next.js -based dashboards for performance visualization, latency tracking, and configuration control.Build intuitive internal tools for model comparison, experiment management, and deployment control.Utilize Cursor , VS Code , and other AI-powered development tools to accelerate iteration.Client Interaction & Solutioning
Work closely with clients and internal stakeholders to gather functional and performance requirements .Translate abstract business needs into deployable AI systems with measurable KPIs.Prototype quickly, iterate with feedback, and deliver robust production systems.Research & Continuous Innovation
Stay on top of the latest AI research and model releases (OpenAI, Anthropic, Hugging Face, Meta, etc.).Evaluate emerging frameworks for model serving, fine-tuning, and retrieval (LangChain, LlamaIndex, GraphRAG, etc.).Proactively identify and implement performance or cost improvements in the model serving stack.Share learnings and contribute to the internal AI knowledge base.Ambiguous Problem Solving
Work effectively in undefined problem spaces , identifying optimal paths forward through experimentation.Break down high-level goals into actionable technical strategies.Balance trade-offs between accuracy, latency, and cost while innovating under uncertainty.Required Skills
Strong proficiency in Python , TypeScript / JavaScript , Bash , and modern software development practices.Deep understanding of Docker , Kubernetes , Terraform , and AWS (ECS, Lambda, S3, CloudFront) .Experience with inference optimization (ONNX, TensorRT, quantization, batching).Proven ability to design and scale real-time inference pipelines .Experience building and maintaining CI / CD pipelines and monitoring systems .Hands-on experience with React / Next.js or similar frameworks for dashboard / UI development.Strong grasp of API design , load balancing , and GPU resource management .Nice to Have
Experience with LangChain , LlamaIndex , GraphRAG , or vector databases (FAISS, Neo4j) .Familiarity with speech processing models (Whisper, Silero, NeMo, etc.).Prior work with serverless inference or edge AI architectures.Knowledge of data pipelines , model versioning , and MLOps best practices .Soft Skills
Excellent problem-solving in ambiguous, evolving environments.Strong ability to research, self-learn, and prototype emerging AI technologies.Confident communicator who can translate technical findings to business impact.Ownership mindset with a collaborative, solution-oriented approach.Skills Required
S3, API Design, Bash, Cloudfront, React, Typescript, Javascript, Docker, Terraform, AWS ECS, Load Balancing, Kubernetes, Python