Overview
Design, build, and operate production-grade AI agents and tools using modern agentic frameworks in Python. You’ll implement complete agentic workflows—planning, reasoning, retrieval, tool calling, evaluation, and observability—on AWS-native infrastructure. The role demands hands-on development skills, strong understanding of agent frameworks, and experience integrating securely with enterprise applications and data systems.
Responsibilities
- Build and operate agentic workflows in Python (3.11+) using LangGraph or LangChain (preferred). Experience with Semantic Kernel, AutoGen, or CrewAI is valuable.
- Implement tool and function calling flows with schema validation, structured arguments, memory, multi-step planning, and orchestration.
- Develop retrieval-augmented generation (RAG) and hybrid search pipelines — ingestion, chunking, embeddings (Bedrock / OpenAI), ranking, query planning, grounding, and caching.
- Deploy and operate agents on AWS Elastic Kubernetes Service (EKS) or Amazon ECS; manage components like S3, RDS (SQL Server), ElastiCache (Redis), Amazon SQS / SNS, and OpenSearch Serverless.
- Use AWS Bedrock for model orchestration and custom RAG services for knowledge retrieval; integrate with external systems through API Gateway and secure endpoints.
- Ensure observability and reliability : tracing, metrics, and structured logs via Datadog and OpenTelemetry; define alert thresholds and dashboards.
- Integrate with enterprise applications (e.g., SAP, Salesforce, ServiceNow) using APIs with OAuth2 / OIDC, retries, and idempotency.
- Apply security and compliance best practices — IAM policies, AWS Secrets Manager, content filters, prompt-injection defenses, and audit logging.
- Collaborate with AI architects, backend engineers, and product teams to optimize agent performance, cost, and user experience.
- Use existing CI / CD pipelines (GitHub Actions / AWS CodeBuild / CodePipeline) for build, test, and deployment; maintain test coverage with pytest and regression suites .
Must Have
4–6 years in AI / ML or backend engineering, including 2+ years building GenAI or agentic solutions.Strong command of Python 3.11+ — asyncio, typing, packaging, pytest, and profiling.Hands-on experience with LangGraph or LangChain (preferred) for tool orchestration, reasoning, and memory.Experience designing RAG pipelines with vector stores like OpenSearch, pgvector, or Pinecone.Working knowledge of AWS components : EKS, S3, RDS, ElastiCache, SNS / SQS, API Gateway, and Secrets Manager.Proven ability to instrument and monitor services with Datadog, CloudWatch, and OpenTelemetry.Strong grounding in security, rate limiting, circuit breakers, and resilient service design.Clear and structured communication — design docs, flow diagrams, and code reviews.Good to Have
Multi-agent collaboration patterns — coordinator-worker, task decomposition, human-in-the-loop.Evaluation depth — regression suites, hallucination detection, grounding and recall scoring.Familiarity with AWS Bedrock, LLM evaluation pipelines (phoenix / arize, promptfoo), or custom guardrail frameworks.Experience building streaming or chat-based agent UIs integrated with reasoning traces.Exposure to DevOps workflows — Docker, CI / CD use, environment promotion, and release versioning (setup not required).