Senior Cloud & ML Infrastructure Engineer
Location : Bangalore / Bengaluru, Hyderabad, Pune, Mumbai, Mohali, Panchkula, Delhi
Experience : 6-10+ Years
Night Shift : 9 pm to 6 am
About the Role :
Were looking for a Senior Cloud & ML Infrastructure Engineer to lead the design, scaling, and optimization of cloud-native machine learning infrastructure. This role is ideal for someone passionate about solving complex platform engineering challenges across AWS, with a focus on model orchestration, deployment automation, and production-grade reliability. Youll architect ML systems at scale, provide guidance on infrastructure best practices, and work cross-functionally to bridge DevOps, ML, and backend teams.
Key Responsibilities :
- Architect and manage end-to-end ML infrastructure using SageMaker, AWS Step Functions, Lambda, and ECR
- Design and implement multi-region, highly-available AWS solutions for real-time inference and batch processing
- Create and manage IaC blueprints for reproducible infrastructure using AWS CDK
- Establish CI / CD practices for ML model packaging, validation, and drift monitoring
- Oversee infrastructure security, including IAM policies, encryption at rest / in-transit, and compliance standards
- Monitor and optimize compute / storage cost, ensuring efficient resource usage at scale
- Collaborate on data lake and analytics integration
- Serve as a technical mentor and guide AWS adoption patterns across engineering teams
Required Skills :
6+ years designing and deploying cloud infrastructure on AWS at scaleProven experience building and maintaining ML pipelines with services like SageMaker, ECS / EKS, or custom Docker pipelinesStrong knowledge of networking, IAM, VPCs, and security best practices in AWSDeep experience with automation frameworks, IaC tools, and CI / CD strategiesAdvanced scripting proficiency in Python, Go, or BashFamiliarity with observability stacks (CloudWatch, Prometheus, Grafana)Nice to Have :
Background in robotics infrastructure, including AWS IoT Core, Greengrass, or OTA deploymentsExperience designing systems for physical robot fleet telemetry, diagnostics, and controlFamiliarity with multi-stage production environments and robotic software rollout processesCompetence in frontend hosting for dashboard or API visualizationInvolvement with real-time streaming, MQTT, or edge inference workflowsHands-on experience with ROS 2 (Robot Operating System) or similar robotics frameworks, including launch file management, sensor data pipelines, and deployment to embedded Linux devices(ref : hirist.tech)