We are developing a real-time voice AI platform that combines speech recognition, LLM-driven reasoning, and natural-sounding text-to-speech capabilities. This is production-grade software designed to run continuously, scale under heavy load, and deliver human-like conversations with minimal latency.
If you are a seasoned Python engineer who thrives on building resilient systems and solving tough concurrency challenges, this is the role for you.
What You’ll Do / Essential Job Functions
- Architect and build the conversation orchestration service : ASR → LLM inference → TTS streaming in real time
- Write robust, asynchronous Python code designed to handle high concurrency without deadlocks, race conditions, or memory leaks
- Design and maintain clean, well-structured APIs for future scalability and ease of debugging
- Manage interaction data using SQLAlchemy (or equivalent) with efficient schema design and safe migrations
- Implement observability : structured logging, metrics, and tracing across the system for instant issue diagnosis
- Partner with ML and Product teams to rapidly iterate on conversation flow and user experience
- Enforce a strong testing culture : automated unit tests, E2E flows, and load testing
- Build resilient systems capable of handling real-world edge cases like noisy audio, unreliable APIs, and flaky networks
- Continuously profile, optimize, and reduce latency and response times
Requirements
What We Expect You To Know / Requirement
Deep Python expertise : 5+ years in Python, production systems experience required, context managers, generators, event loops, GIL, and effective use of asyncioDatabase fundamentals : data modeling, efficient queries, ORM best practicesNetworking & I / O : streaming, backpressure, and resilient design for unreliable networksTesting discipline : delivering production-ready, validated codeObservability mindset : metrics, logs, and traces are integral to your coding processProduction readiness : You’ve built and supported systems running live at scale.What You’re Like
Curious : You don’t just fix bugs — you find root causesCalm under pressure : You can diagnose incidents, resolve them quickly, and prevent recurrencesPragmatic : You solve problems without over-engineeringCollaborative : You write code for your teammates and your future selfQuality-driven : You refuse to compromise on correctness and reliabilityData-informed : You make decisions based on real latency metrics, throughput, and error ratesWhat To Expect
This is not a feature-factory role. You will be responsible for building a real-time system that stays online, hits latency targets, and performs reliably under pressure.
Team Culture
Direct, collaborative, and low on politicsHigh ownership : see your work running in production, serving real usersObsessed with clarity, correctness, and reliabilityFast-moving : minimal ceremony, maximum impactPragmatic planning : no endless poker sessions — scope, assign, design, deploy.Your Daily Work
Ensure performance, reliability, and observability in everything you buildCollaborate closely with ML and product teams on speech recognition, TTS voices, and LLM behavior.Monitor, debug, and improve the system as it runs in production.Working Terms : The candidate must be flexible and work during US hours at least until 6 PM ET, which is essential for this role & must have their own system / work setup for remote work.