Talent.com
AI Architect

AI Architect

RecroCoimbatore, IN
13 hours ago
Job description

Role Overview

As the AI Systems Architect , you’ll own the end-to-end design and delivery of production-grade agentic and Generative AI systems. This is a highly hands-on role requiring deep architectural insight, coding proficiency, and an obsession with performance, scalability, and reliability. You’ll architect secure, cost-efficient AI platforms on AWS, guide developers through complex debugging and optimization, and ensure all systems are observable, governed, and production-ready.

Key Responsibilities

  • Architect Production AI Systems : Design robust architectures for agentic systems (planning, reasoning, tool-calling), GenAI / RAG pipelines, and evaluation workflows. Create detailed design documents including flow / UML / sequence diagrams and AWS deployment topologies.
  • Optimize for Cost & Performance : Model throughput, latency, concurrency, autoscaling, CPU / GPU sizing, and vector index performance to ensure scalable, efficient deployments.
  • Lead Debugging & Stability Efforts : Conduct deep-dive debugging, fix critical defects, and resolve production incidents; pair-program with developers to improve code quality and performance.
  • Standardize Agentic Frameworks : Build reference implementations using Semantic Kernel (preferred), LangGraph, AutoGen, or CrewAI with strong schema validation, grounding, and memory management.
  • Engineer Retrieval & Search Systems : Architect hybrid retrieval solutions including ingestion, chunking, embeddings, ranking, caching, and freshness management while minimizing hallucination risk.
  • Productionize on AWS : Deploy and manage systems using Amazon EKS, Bedrock, S3, SQS / SNS, RDS, and ElastiCache. Integrate IAM / Okta, Secrets Manager, and Datadog for observability, enforcing SLIs / SLOs and error budgets.
  • Implement Observability & Monitoring : Set up distributed tracing, metrics, and logging via OpenTelemetry and Datadog. Standardize dashboards, alerts, and incident response workflows.
  • Govern Evaluation & Rollouts : Build test and evaluation frameworks—golden sets, A / B experiments, regression suites, and controlled rollouts—to ensure consistent quality across releases.
  • Embed Security & Safety : Enforce least privilege, PII protection, and policy compliance through threat modeling, sandboxed execution, and prompt-injection defense.
  • Establish Engineering Standards : Create reusable SDKs, connectors, CI / CD templates, and architecture review checklists to promote consistency across teams.
  • Cross-Functional Leadership : Collaborate with product, data, and SRE teams for capacity planning, DR strategies, and post-incident RCA reviews. Mentor engineers to strengthen design and reliability practices.

Must-Have Qualifications

  • 7–10 years in software / AI engineering, including 4+ years in GenAI application development and 2+ years architecting agentic AI systems.
  • Expert in Python 3.11+ (asyncio, typing, packaging, profiling, pytest).
  • Hands-on experience with Semantic Kernel , LangGraph , AutoGen , or CrewAI .
  • Proven delivery of GenAI / RAG systems on AWS Bedrock or equivalent vector-based platforms (OpenSearch Serverless, Pinecone, Redis).
  • Deep understanding of AWS ecosystem : EKS, Bedrock, S3, SQS / SNS, RDS, ElastiCache, Secrets Manager, IAM / Okta, Kong API Gateway, Datadog.
  • Expertise in observability and incident management using OpenTelemetry and Datadog.
  • Strong focus on cost, performance, and security engineering —FinOps mindset, autoscaling, caching, and policy enforcement.
  • Exceptional communication—clear diagrams, ADRs, and peer review practices.
  • Nice-to-Have Skills

  • Multi-agent orchestration (task decomposition, coordinator-worker, graph-based planning).
  • Expertise with vector databases (OpenSearch, Pinecone, pgvector, Redis).
  • Experience with AI evaluation, guardrails, and rollout gating.
  • Familiarity with frontend agent interfaces, secure APIs, and AuthN / Z best practices.
  • Exposure to policy-as-code , multi-tenant architectures, and feature management (Kong, LaunchDarkly, Flipt).
  • Experience with CI / CD via GitHub Actions and IaC (Terraform / AWS CloudFormation).
  • Create a job alert for this search

    Ai Architect • Coimbatore, IN