Talent.com
No longer accepting applications
Cloud Orchestration & Scheduling Architect (AI Infrastructure)

Cloud Orchestration & Scheduling Architect (AI Infrastructure)

Sustainability Economics.aithrissur, kerala, in
11 hours ago
Job description

Location : Bengaluru, Karnataka

About the Company :

Sustainability Economics.ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive efficiency, innovation, and long-term impact across industries. Guided by exceptional leaders and visionaries with decades of expertise in finance, policy, technology, and innovation, we are committed to making long-term efforts to fulfil this vision through our technical innovation, client services, expertise, and capability expansion.

Role Summary :

We are seeking a  Cloud Orchestration & Scheduling Architect  to design and implement intelligent systems that dynamically balance  AI workload demand, supply and compute capacity across cloud data  centers. The role focuses on  building orchestration, scheduling, and optimization frameworks  for large-scale AI inference workloads, ensuring efficient use of compute, energy, and cost resources.

The ideal candidate will have strong experience in  Kubernetes, distributed systems, and scheduling frameworks , with a deep understanding of how to align  AI compute workloads with real-time demand and supply.

Key Responsibilities :

  • Design and implement  workload orchestration and scheduling systems  for AI inference pipelines across distributed data centers.
  • Develop mechanisms to  match workload demand with available compute and supply , optimizing for performance, cost, and sustainability.
  • Integrate  predictive demand forecasting  and  energy availability data  into scheduling decisions.
  • Manage  multi-region orchestration  using Kubernetes, Ray, or similar distributed compute frameworks.
  • Build automation to dynamically  scale GPU / CPU clusters  based on real-time and forecasted AI inference workloads.
  • Implement  cost-aware, and latency-aware scheduling policies  for optimal workload placement.
  • Build dashboards and observability tools to track  compute utilization, cost efficiency.
  • Continuously evaluate new frameworks and optimization algorithms to enhance system scalability and resilience.

Education & Experience :

  • Bachelor's or master's degree in computer science, Information Technology, or related field.
  • Proven experience of 2–5   years   in cloud infrastructure, distributed systems, or large-scale DevOps environments.
  • Proven experience in AI / ML infrastructure orchestration, workflow scheduling, or HPC environments.
  • Hands-on experience managing workloads on AWS, or hybrid cloud platforms.
  • Certifications  (preferred) :
  • AWS Certified Solutions Architect – Associate
  • Certified Kubernetes Administrator (CKA)
  • Skills Required :

  • Strong proficiency in  Kubernetes ,  EKS , or  Ray  for distributed workload orchestration.
  • Experience with  workflow orchestration tools  like Airflow, Argo, or Prefect.
  • Familiarity with containerization  (Docker), IaC tools (Terraform / CloudFormation) , and  GitOps (Argo CD) .
  • Understanding  scheduling algorithms ,  cluster autoscaling , and  load balancing  in large systems.
  • Experience with inference batching, request routing, and autoscaling strategies using frameworks like  Ray Serve ,  Triton Inference Server , or  KServe .
  • Experience building or tuning  custom schedulers  for optimization.
  • Knowledge of  GPU orchestration ,  NVIDIA Triton , or  KServe  for model serving at scale.
  • Understanding of  GPU utilization optimization , and  CUDA profiling  for efficient model execution.
  • Experience in AIOps : automated monitoring, anomaly detection, root-cause analysis, and predictive operations for AI workloads.
  • Experience building or tuning  custom schedulers  for optimization.
  • Experience building  demand forecasting  and  queue-based scheduling  (Redis, Kafka, or similar) systems to balance compute load and supply.
  • Proficiency in Python, Go, or Bash scripting for automation.
  • Exposure to monitoring and observability tools (Prometheus, Grafana, CloudWatch).
  • What You’ll Do :

  • Design the orchestration layer for AI workloads across distributed data centers.
  • Optimize resource allocation by balancing compute demand, supply and economics.
  • Build automation pipelines for dynamic scaling and job scheduling.
  • Monitor system performance and drive continuous improvements in utilization and efficiency.
  • What you will bring

  • Strong systems thinking and the ability to design control mechanisms for complex, distributed workloads.
  • Passion for optimizing AI infrastructure for efficiency, reliability, and sustainability.
  • Curiosity about  distributed scheduling  and  cloud optimization.
  • Hands-on experience with container orchestration, automation, and observability.
  • A proactive, ownership-driven mindset suited for a fast-paced, mission-driven environment.
  • Agile mindset, adaptability, and eagerness to learn emerging tools and technologies.
  • Proactive, ownership-driven approach, with the ability to improve systems end-to-end.
  • Startup DNA → bias to action, comfort with ambiguity, love for fast iteration, and flexible and growth mindset.
  • Why Join Us

  • Shape a  first-of-its-kind AI + clean energy platform .
  • Work with a small, mission-driven team obsessed with impact.
  • An aggressive growth path.
  • A chance to leave your mark at the intersection of  AI and sustainability .
  • Create a job alert for this search

    Cloud Infrastructure Architect • thrissur, kerala, in