AI Technical Architect - Agentic & Gen AI Platforms
Experience : 7 - 10 years
Location : Gurugram
Availability : This is an urgent requirement - Immediate joiners preferred
Overview
Own the end-to-end architecture of production-grade AI systems with a strong hands-on orientation. You’ll design secure, scalable, and cost-efficient agentic and GenAI solutions, working closely with development teams to unblock issues, optimize performance, and drive engineering best practices. The role requires translating complex requirements into observable, resilient, and well-governed platforms.
Key Responsibilities
- Architectural Design : Define target architectures for agentic systems (planning / reasoning / tool-calling), GenAI / RAG pipelines, and evaluation loops. Create detailed technical design documents with flow / UML / sequence diagrams and deployment topologies.
- Infrastructure & Cost Modeling : Estimate runtime costs and resource requirements (throughput, latency, concurrency, CPU / GPU, memory, vector index sizing, etc.) while balancing performance and cost.
- Technical Leadership : Lead deep-dive debugging and incident resolution; profile performance bottlenecks, fix defects, and elevate overall engineering standards.
- Reference Implementations : Establish reusable blueprints for agents (Semantic Kernel preferred; LangGraph, AutoGen, or CrewAI acceptable) including schema validation, memory, grounding, and multi-step planning.
- Retrieval Architecture : Design hybrid search and retrieval systems—covering ingestion, embeddings, ranking, query planning, caching, and freshness policies—with measurable evaluation metrics for recall, precision, and hallucination control.
- Production Deployment : Build and deploy on cloud (Azure preferred) using containers and IaC. Integrate identity / secrets, networking, ingress, queues, and eventing with SLIs / SLOs and error budgets.
- Observability & Monitoring : Implement distributed tracing, structured logging, and metrics; standardize dashboards, alerts, and replay capabilities.
- Evaluation Frameworks : Develop evaluation and promotion workflows : Prompt / flow testing, golden datasets, A / B experiments, regression suites, and rollout gates.
- Security & Compliance : Apply threat modeling, prompt-injection defenses, sandboxing, and data governance. Build policies for PII protection, red-teaming, and audit trails.
- Standards & Governance : Define platform standards for code quality, reusable patterns, SDKs, CI / CD templates, documentation, and architecture reviews.
- Cross-Functional Collaboration : Partner with product, data, and SRE teams for capacity planning, disaster recovery, multi-region setups, and post-incident analysis.
- Mentorship : Guide engineers through reviews, coaching, and best practice enforcement focused on reliability and maintainability.
Must-Have Qualifications
7–10 years in software or AI engineering, including 4+ years in GenAI applications and 2+ years architecting production agentic systems.Strong hands-on experience with Python 3.11+ (typing, asyncio, packaging, profiling, pytest).Proven experience with agent frameworks (Semantic Kernel preferred; LangGraph / AutoGen / CrewAI acceptable) and schema-based tool / function calling.Expertise in GenAI / RAG / hybrid retrieval architectures with vector stores (e.g., Azure AI Search, pgvector, Elasticsearch, Pinecone).Deep Azure cloud architecture experience (AI Search, Service Bus, Functions, App Service, Containers, VNets, Key Vault, monitoring, storage).Strong grasp of observability and incident response (OpenTelemetry, metrics, logs, SLOs).Cost and performance optimization mindset : Capacity modeling, autoscaling, GPU / CPU utilization, and FinOps practices.Solid understanding of security and safety fundamentals : D ata isolation, content policy enforcement, and compliance.Excellent written and verbal technical communication (diagrams, ADRs, design docs, reviews).Good-to-Have Skills
Multi-agent design patterns, human-in-the-loop workflows, and graph-based planners.Advanced Azure stack (AKS, App Gateway / WAF, private endpoints).Evaluation depth : red teaming, guardrails, regression frameworks, and canary rollouts.Search / data infrastructure experience (OpenSearch, Pinecone, Redis, BigQuery, Snowflake, Delta).Frontend integration for agent UIs, secure API design, and authentication / authorization best practices.DevOps / IaC proficiency (Docker, Kubernetes, GitHub Actions, Azure DevOps, Terraform / Bicep, secrets management).