Job descriptionABOUT US While the world races to automate the future, we’re focused on preparing the generation that will lead it. At ADIIVA, we believe the most powerful time to shape a child’s mind is early on, when curiosity is boundless and values are just beginning to form. We offer parents a way to nurture critical thinking, resilience and independence from the very start. With our first office in Bangalore, we are looking to hire a Backend AI Engineer that will help us build a revolutionary product that transforms how children learn. ABOUT THE ROLE We’re hiring a senior backend engineer with deep experience building scalable backend systems and deploying production-grade conversational AI. You will design infrastructure that supports millions of concurrent users while powering intelligent, reliable, low-latency conversational experiences. This is a systems-heavy role with real AI ownership This role offers unparalleled autonomy and the chance to co-create a category-defining AI device for children in a 0→1 environment. This is a full time opportunity based in Bengaluru (on-site). WHAT YOU WILL DO Scalable Backend Architecture - Design the system architecture, covering cloud services, databases, and DevOps workflows specifically optimized for AI agent operations and large language model integration - Design distributed systems handling high concurrency (100k–1M+ users) - Build multithreaded, asynchronous, and event-driven architectures - Develop resilient APIs with low latency and high availability - Implement caching, rate limiting, queue systems, and load balancing AI & ML Infrastructure - Build scalable real-time inference pipelines - Deploy and monitor ML models in containerized environments - Optimize inference cost, latency, and resource usage - Implement evaluation frameworks and automated testing for conversational quality - Handle fallback strategies, retries, and failure modes for AI systems - Manage GPU/accelerator workloads where required Key Technical Skills - Inference engines: vLLM, TGI ( Text Generation Inference), NVIDIA Triton - Orchestration: Kubernetes with GPU node pools, Ray Serve, BentoML - Observability: Prometheus + Grafana for model metrics, latency tracking, drift detection - Cost optimization: Model quantization (AWQ, GPTQ, INT8), batching strategies, auto scaling - Containerization: Docker, Helm, container registries, multi-stage builds - Cloud/accelerators: CUDA, experience with A100/H100 or cloud GPU instances (AWS, GCP, Azure) - Reliability: Circuit breakers, retry logic, shadow deployments, canary rollouts Observability & Debugging - Implement structured logging with traceability across distributed services - Design dashboards for system health and conversation analytics with alerts for fast root cause analysis - Monitor model drift and conversational performance metrics WHAT WE LOOK FOR - Bachelor’s degree in Computer Science, or an equivalent discipline - Ability to work independently in a fast-paced, ambiguous early-stage environment - Growth mindset, strong communication skills and a collaborative spirit - 2+ years of backend engineering experience, focused on LLMs - Strong understanding of concurrency & multithreading, distributed systems, high throughput architectures, real time system design - Hands on experience with conversational AI frameworks, state management - Deep understanding of transformer architectures, attention mechanisms and generative models - Strong Python skills with experience in PyTorch, Hugging Face, TensorFlow, Weights & Biases - Familiarity with LLM orchestration frameworks ( LangGraph, LangChain, or Haystack) - Experience with cloud platforms (AWS, GCP, or Azure) and their AI/ML services - Comfort with MLOps tooling, containerization (Docker) and orchestration (Kubernetes, Airflow) - Knowledge of RAG and vector databases (e.g., FAISS, Weaviate) - Ability to explain complex AI concepts to non-technical stakeholders - Passion for building safe, magical AI experiences for children and families