Talent.com
This job offer is not available in your country.
Associate Architect - Platform

Associate Architect - Platform

QuantiphiIndia
30+ days ago
Job description

Role :

Associate Architect - MLOps / LLMOps

Experience : 6 to 8 Years

Location :

Bangalore / Mumbai (Hybrid)

Job Summary :

Join our dynamic team as a

Platform Architect

and leverage your expertise in production-scale platforms within the

GenAI or ML domain . In this role, you'll be instrumental in designing, developing and maintaining cutting-edge build and test environments for critical GenAI workloads running on foundational cloud infrastructure.

You'll partner with architects to design and implement highly robust and scalable systems, while also providing crucial development support to SRE / Operations teams as they tackle complex distributed systems challenges at scale. We're seeking an engineer who champions Quantiphi's dedication to

Cloud-Native development , with a particular emphasis on

Kubernetes .

Job Responsibilities :

As a

Platform Architect , you will play a pivotal role in designing, implementing, and optimizing our cutting-edge infrastructure. Your responsibilities will include :

Designing and implementing state-of-the-art GPU compute clusters

to support critical workloads.

Designing comprehensive automated testing strategies and frameworks

across unit, integration, API, and end-to-end levels for critical commerce flows.

Developing robust performance testing frameworks

to validate platform scalability, resilience, and identify optimization opportunities.

Planning of comprehensive monitoring solutions

with alerting systems to track platform health and ensure SLA compliance.

Designing specialized test frameworks for security controls

and ensuring compliance validation across payment and personal data.

Architecting a scalable automation infrastructure

that supports growing platform capabilities with consistent test environments.

Troubleshooting, diagnosing, and performing root cause analysis

of system failures, isolating components and failure scenarios in collaboration with internal and external partners.

Optimizing cluster operations

for maximum reliability, efficiency, and performance.

Job Requirements :

We are seeking a highly skilled and passionate

Platform Engineer

with :

Over 6-8 years of experience working with developing ML Infrastructure.

Over 3 years of hands-on experience

in large-scale

direct experience building and deploying production-ready services on Kubernetes.

A proven history of

engaging with and contributing to open-source projects .

collaborative spirit , demonstrated by prior work developing scalable software solutions for cloud services.

The ability to

effectively communicate complex technical designs and quality approaches

across various mediums.

deep understanding of GPU computing and AI infrastructure .

A strong

passion for solving complex technical challenges

and optimizing system performance.

Working knowledge of cluster configuration management tools

such as BCM or Ansible, and infrastructure-level applications including Kubernetes, Terraform, and MySQL.

In-depth understanding of container technologies

like Docker and Containers.

Proficiency in programming with Python and Bash scripting.

Ways To Stand Out From The Crowd :

Candidates who possess the following will be highly competitive :

Significant experience with sophisticated infrastructure tooling , including Kubernetes Cluster API, Terraform, Helm, and Operator Framework.

Practical, production-level experience across major cloud platforms : Azure, Google Cloud Platform (GCP), or Amazon Web Services (AWS).

Ability to adapt to new technologies and Frameworks in ML / GenAI landscape.

A strong track record of

successfully refactoring and optimising software for deployment within Kubernetes environments .

Comfort discussing and working with

core Kubernetes concepts like CSI, CNI, and CRI .

Comprehensive understanding of the CNCF landscape

and its associated tooling.

The ability to

decompose complex problems into simpler sub-problems

and leverage existing solutions for efficient implementation, along with designing simple, self-sustaining systems.

Experience leveraging

AI / ML to proactively detect and resolve incidents , automate alert triaging, perform log analysis, and streamline repetitive workflows.

Create a job alert for this search

Platform Architect • India