about
we believe the next decade of computing isn't about better agents or faster models.
it's about machines that remember you, respond naturally, and feel like they understand context across time and modality.
right now, every ai assistant forgets you exist when you close the session. every voice agent sounds scripted and breaks on interrupts. every "breakthrough" is just better prompts on the same broken and painfully complex architecture.
we're building the infrastructure layer that makes ai interactions actually feel natural - adaptive memory systems, real-time voice orchestration, self-evolving agents that evolve from experiences by rewriting their own code, and multimodal context that doesn't fall apart.
think : the default stack that every multimodal agent will need, but doesn't exist yet.
our team is :
- a 25-year-old ai-first builder (that’s me) from IIT-M and BITS Pilani who’s been building self evolving, multi-modal agents for enterprises
- Ashok, who cofounded DriveU and scaled it to ₹125 crore ARR and profitability
- Raveen, cofounder of Myntra, Baby Oye and Multiply Ventures
we also run a private community of 140+ curated ai builders / researchers from labs like anthropic, open ai, cartesia, ultravox, microsoft research, as well as veterans from aws & am
our research thesis
we're betting on three-layer adaptive memory with reinforcement learning :
layer 1 (working memory) : recent context, lasts minutes, conversation-specificlayer 2 (episodic memory) : important past interactions, lasts weeks, user-specificlayer 3 (semantic memory) : learned patterns / preferences, lasts forever, continuously refinedthe rl component : the agent gets feedback signals (explicit corrections, conversation success, user retention) and uses policy gradients to learn :
what's worth moving from working to episodic memory?what patterns in episodic memory should become semantic?when to surface which memory layer during conversations?this isn't novel ml research - we're applying existing rl techniques (you know them) to a new problem : continuous memory improvement for conversational agents.
we're also building real-time voice orchestration with :
predictive vad (anticipate interrupts before they happen)multi-agent coordination (handle 3+ speakers / agents smoothly)latency-adaptive quality (degrade gracefully under network constraints)what you'll actually do (with me)
audit ultravox, sesame, elevenlabs, openai realtime api with medesign prototype adaptive memory system (three layers + rl)ship first voice agent demo with working memorybuild multi-agent orchestration (handling 2+ speakers)implement predictive vad (not just silence detection)create test harnesses for latency, context retention, ux flowpackage everything as usable sdkwrite documentation that doesn't suck or confuseiterate based on what breaks in productioneither scale the infra or spin out a productmentor teams at our hackathons (we're running quarterly builder events)collaborate with fellow lab researchers to turn research papers, frameworks, and technical ideas into tangible demos or proof-of-concepts.contribute to reusable internal libraries, api’s, and infrastructure that power all basethesis projects.not using off-the-shelf solutions (again). evaluating sesame, ultravox, elevenLabs, openai realtime api, and building what they lack.work across voice, vision, and text-based interfaces- integrating models into cohesive user experiences.document learnings as “lab notes” or public prototypes. clarity and storytelling are part of the craft.who this is for
required qualifications
you've shipped something that had real users (product, side project, open source tool)you understand system design at an architectural level, not just api integrationyou can write production-quality code fast (we're judging on speed + quality, not just one)you think from first principles : "why does rag fail for real-time agents?" not "everyone uses rag so we should too"you're comfortable with navigating ambiguity initiallyyou can explain complex technical decisions simply (this matters for docs + future hiring later)preferred qualifications (not required, just helpful)
experience with websockets, webrtc, or real-time audio / video pipelinesfamiliarity with reinforcement learning (grpo, ppo, dqn, policy gradients)you've read papers on memory systems, hci, or voice ai (or you will after reading this)you've built ml models beyond just fine-tuning llmsyou've debugged production systems under load and lived to tell the storiesyou've contributed to open source or have a github people actually look atwhat you'll get
you're not maintaining legacy code or optimizing conversion rates by 0.3%.you're solving genuinely hard problems that don't have stack overflow or chatgpt answers.space to work on cutting edge research, no pressure to serve legacy customersyou define the entire memory + multimodal stack with me from scratchmarket rate salary and founding engineer equity in something that will matterworkspace buzzing with the most hardcore builders & the best research minds in blrwhen people reference "how basethesis handles context" in 2027, that's your architecturedirect access to ai builders / researchers at openai, deepmind, anthropic, xAI, amda community of hundreds hardcore builders and top researchers in frontier labsthe downside :
first couple months you're researching, tinkering and coding 10-12 hours a dayworking on cutting edge research means we rapidly learn and iterate every daywe might and will fail multiple times before we finally land on marswhy you might want to do this
you're probably 2-5 years out of college. worked at a startup or built side projects with users.
you're frustrated by abstraction layer or wrapper culture.
you want to build infrastructure that's technically hard and which requires deep understanding of the entire stack.
you're okay with :
figuring it out together (we don't have all the answers)intense first 6 months proving this mattersrapidly learning and iterating with an obsession towards the misionyou're excited by :
defining a new technical approach to ai memorybuilding in public, open sourcing some partsbeing in the room when we decide together on what to build nextownership of hard technical problemsif you're reading this thinking "f
it, i want to try" - that's the sign to apply.you're probably more ready than you think. capability over credentials.
let's build