5+ years of experience in platform engineering, with a proven track record of designing, deploying, and managing scalable and secure cloud-based infrastructures, leveraging both Azure and AWS services.
Experience with Azure services such as Azure AI services, Azure Search, Azure ML, Databricks, Azure Kubernetes Service, and AWS services like AWS SageMaker, AWS Bedrock and AWS Lambda.
Exposure to Generative AI and Agentic AI ecosystems such as Azure OpenAI, Azure AI Foundry, Azure AI Hub, Bedrock, Anthropic Claude, OpenAI API, LlamaCloud, LangChain.
Understanding of token usage, LLM prompt injection risks, Jailbreak attempts and mitigation techniques.
Strong knowledge of governance, audit, observability, and compliance in cloud- based GenAI and ML ecosystems.
Should understand Azure AI Evaluation SDK and AI Red Teaming Prompt Security Scans
Good to have experience with code assistant tools like Github Copilot, Cursor and Claude Code
Expertise in Azure DevOps or AWS CodePipeline, including setting up and managing CI / CD pipelines.
Advanced experience with Azure Blob Storage, Cosmos DB, SQL, Key Vault,, AWS S3, DynamoDB, and AWS RDS etc and their integrations with AI services
Advanced understanding of networking concepts, including DNS management, load balancing, VPNs, and virtual networks (VNets).
Advanced understanding of security concepts, including IAM roles, identities, Azure policies, AWS SCPs.
Experience in Advanced Authentication and Authorization Concepts across various cloud providers and platforms
Must have experience with Azure Policy, AWS SCP, AWS IAM, audit logging, Azure RBAC etc.
Mastery of infrastructure-as-code tools such as Azure ARM / Bicep, Terraform, CloudFormation, or equivalent.
Proficiency in networking, DNS, load balancers, and cloud engineering services.
Knowledge in Python programming and AI / ML libraries (TensorFlow, PyTorch, Sci-Kit learn, Bash & PowerShell etc.).
Experience with containerization and orchestration tools such as Docker and Kubernetes.
Good to have knowledge about Azure Bot framework, APIM, Application Gateway. Also, knowledge about M365 offerings like M365 Copilot. AWS CDK, AWS Python(Boto3) SDK.
Experience with monitoring tools like Grafana, Prometheus, Application Insights,
Log Analytics Workspaces, and Azure Monitor
Understanding of common database technologies for both OLTP and OLAP applications
Azure Services knowledge (Azure machine learning)
o Experience in ML tooling Knowledge of Azure Machine Learning studio, Python SDK (v2),CLI (v2)) to monitor, retrain, and redeploy models.
o Exposure to Azure Machine Learning model as Architect or model built from an open-source platform, such as Pytorch, TensorFlow, or scikit- learn.
o Practical knowledge of how to build efficient end-to-end ML workflows
o Understanding of machine learning & deep learning concepts and algorithms, various statistical techniques, and experimentation analysis workflows
o Enable production models across the ML lifecycle
o Implement CI / CD orchestration for data science pipelines
o Understanding the production deployments and post-deployment model lifecycle management activities : drift monitoring, model retraining, and model technical evaluation business validation
o Work with stakeholders to assist with ML pipeline -related technical issues and support modelling infrastructure needs.
Security and Access
o Familiarity with Microsoft AD
o Understanding of principle of least privilege and its application to projects
(RBAC)
Testing
o Understand and apply unit testing in day-to-day development work
o Applied knowledge of integration testing as part of CI / CD process (ideally on ADO)
Infrastructure as Code
o Understand the key concepts of IaC and some practical experience of application
o Write code in languages such as Python or TypeScript to define cloud infrastructure as code
Specific to AWS :
o AWS database services (such as RDS, DynamoDB, Redshift, Aurora)
o AWS compute services and storage (such as EC2 – including scaling, EBS, EFS)
o AWS serverless technology (including Lambda, SQS, SNS, EventBridge and Step functions)
o AWS KMS
o AWS container services (including ECR)
o AWS CloudFormation and the CDK
Specific to AWS :
o Azure database services (including Cosmos DB, Azure SQL Serverless)
o Azure compute services (VM, VM Scale Sets)
o Azure serverless technology (i.e. Functions, Event Grid / Hub, Queue Storage, Service Bus)
o Azure container services (including ACR / AKS)
o Azure Resource Manager (ARM) / BICEP
o Azure Key Vault
o Azure Machine Learning
o Azure Data Lake Storage
Mlops Engineer • Tirupati, Andhra Pradesh, India