Talent.com
System / Solutions Architect

System / Solutions Architect

BayOne SolutionsAmritsar, Punjab, India
1 day ago
Job description

About the Role

We are looking for a Systems or Solutions Architect with deep expertise in networking, infrastructure-as-a-service (IaaS), and cloud-scale system design to help architect and optimize AI / ML infrastructure.

The ideal candidate combines strong fundamentals in cloud architecture (AWS or equivalent), networking, compute, and storage, with hands-on experience in Kubernetes, observability, and automation.

You’ll design scalable systems that support large AI workloads — enabling efficient training, inference, and data pipelines across distributed environments.

Key Responsibilities

Architect and scale AI / ML infrastructure across public cloud (AWS / Azure / GCP) and hybrid environments.

Design and optimize compute, storage, and network topologies for distributed training and inference clusters.

Build and manage containerized environments using Kubernetes, Docker, and Helm.

Develop automation frameworks for provisioning, scaling, and monitoring infrastructure using Python, Go, and IaC (Terraform / CloudFormation).

Partner with data science and ML Ops teams to align AI infrastructure requirements (GPU / CPU scaling, caching, throughput, latency).

Implement observability, logging, and tracing using Prometheus, Grafana, CloudWatch, or Open Telemetry.

Drive networking automation (BGP, routing, load balancing, VPNs, service meshes) using software-defined networking (SDN) and modern APIs.

Lead performance, reliability, and cost-optimization efforts for AI training and inference pipelines.

Collaborate cross-functionally with product, platform, and operations teams to ensure secure, performant, and resilient infrastructure.

Required Qualifications

Knowledge of AI / ML infrastructure patterns, including distributed training, inference pipelines, and GPU orchestration.

Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field.

10+ years of experience in systems, infrastructure, or solutions architecture roles.

Deep understanding of :

Cloud architecture : AWS (preferred), Azure, or GCP

Networking : VPC, Transit Gateway, DNS, routing, peering, load balancing, VPN

Compute and storage : EC2, ECS / EKS, S3, EBS, EFS, FSx, caching systems

Core infrastructure : virtualization, containers, distributed systems, and OS-level tuning

Proficiency in Linux systems engineering and scripting with Python and Bash.

Experience with Kubernetes (EKS / GKE / AKS) for large-scale workload orchestration.

Experience with Go (Golang) for infrastructure or network automation.

Familiarity with Infrastructure-as-Code (IaC) tools like Terraform, Ansible, or CloudFormation.

Experience implementing monitoring and observability systems (Prometheus, Grafana, ELK, Datadog, CloudWatch).

Preferred Qualifications

Experience with DevOps and MLOps ecosystems (SageMaker, Kubeflow, MLflow, Airflow).

AWS or cloud certifications such as Solutions Architect Professional or Advanced Networking Specialty.

Experience in performance benchmarking, security hardening, and cost optimization for compute-intensive workloads.

Strong collaboration skills and ability to communicate complex infrastructure concepts clearly.

Create a job alert for this search

Architect • Amritsar, Punjab, India