Talent.com
System / Solutions Architect

System / Solutions Architect

BayOne SolutionsGhaziabad, IN
7 hours ago
Job description

About the Role

We are looking for a Systems or Solutions Architect with deep expertise in networking, infrastructure-as-a-service (IaaS), and cloud-scale system design to help architect and optimize AI / ML infrastructure.

The ideal candidate combines strong fundamentals in cloud architecture (AWS or equivalent), networking, compute, and storage, with hands-on experience in Kubernetes, observability, and automation.

You’ll design scalable systems that support large AI workloads — enabling efficient training, inference, and data pipelines across distributed environments.

Key Responsibilities

  • Architect and scale AI / ML infrastructure across public cloud (AWS / Azure / GCP) and hybrid environments.
  • Design and optimize compute, storage, and network topologies for distributed training and inference clusters.
  • Build and manage containerized environments using Kubernetes, Docker, and Helm.
  • Develop automation frameworks for provisioning, scaling, and monitoring infrastructure using Python, Go, and IaC (Terraform / CloudFormation).
  • Partner with data science and ML Ops teams to align AI infrastructure requirements (GPU / CPU scaling, caching, throughput, latency).
  • Implement observability, logging, and tracing using Prometheus, Grafana, CloudWatch, or Open Telemetry.
  • Drive networking automation (BGP, routing, load balancing, VPNs, service meshes) using software-defined networking (SDN) and modern APIs.
  • Lead performance, reliability, and cost-optimization efforts for AI training and inference pipelines.
  • Collaborate cross-functionally with product, platform, and operations teams to ensure secure, performant, and resilient infrastructure.

Required Qualifications

  • Knowledge of AI / ML infrastructure patterns, including distributed training, inference pipelines, and GPU orchestration.
  • Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field.
  • 10+ years of experience in systems, infrastructure, or solutions architecture roles.
  • Deep understanding of :

  • Cloud architecture : AWS (preferred), Azure, or GCP
  • Networking : VPC, Transit Gateway, DNS, routing, peering, load balancing, VPN
  • Compute and storage : EC2, ECS / EKS, S3, EBS, EFS, FSx, caching systems
  • Core infrastructure : virtualization, containers, distributed systems, and OS-level tuning
  • Proficiency in Linux systems engineering and scripting with Python and Bash.
  • Experience with Kubernetes (EKS / GKE / AKS) for large-scale workload orchestration.
  • Experience with Go (Golang) for infrastructure or network automation.
  • Familiarity with Infrastructure-as-Code (IaC) tools like Terraform, Ansible, or CloudFormation.
  • Experience implementing monitoring and observability systems (Prometheus, Grafana, ELK, Datadog, CloudWatch).
  • Preferred Qualifications

  • Experience with DevOps and MLOps ecosystems (SageMaker, Kubeflow, MLflow, Airflow).
  • AWS or cloud certifications such as Solutions Architect Professional or Advanced Networking Specialty.
  • Experience in performance benchmarking, security hardening, and cost optimization for compute-intensive workloads.
  • Strong collaboration skills and ability to communicate complex infrastructure concepts clearly.
  • Create a job alert for this search

    Architect • Ghaziabad, IN