Location : Bengaluru, Karnataka
About the Company :
Sustainability Economics.Ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive efficiency, innovation, and long-term impact across industries. Guided by exceptional leaders and visionaries with decades of expertise in finance, policy, technology, and innovation, we are committed to making long-term efforts to fulfil this vision through our technical innovation, client services, expertise, and capability expansion.
Role Summary :
We are seeking a Cloud Orchestration & Scheduling Architect to design and implement intelligent systems that dynamically balance AI workload demand, supply and compute capacity across cloud data centers. The role focuses on building orchestration, scheduling, and optimization frameworks for large-scale AI inference workloads, ensuring efficient use of compute, energy, and cost resources.
The ideal candidate will have strong experience in Kubernetes, distributed systems, and scheduling frameworks , with a deep understanding of how to align AI compute workloads with real-time demand and supply.
Key Responsibilities :
- Design and implement workload orchestration and scheduling systems for AI inference pipelines across distributed data centers.
- Develop mechanisms to match workload demand with available compute and supply , optimizing for performance, cost, and sustainability.
- Integrate predictive demand forecasting and energy availability data into scheduling decisions.
- Manage multi-region orchestration using Kubernetes, Ray, or similar distributed compute frameworks.
- Build automation to dynamically scale GPU / CPU clusters based on real-time and forecasted AI inference workloads.
- Implement cost-aware, and latency-aware scheduling policies for optimal workload placement.
- Build dashboards and observability tools to track compute utilization, cost efficiency.
- Continuously evaluate new frameworks and optimization algorithms to enhance system scalability and resilience.
Education & Experience :
Bachelor's or master's degree in computer science, Information Technology, or related field.Proven experience of 2–5 years in cloud infrastructure, distributed systems, or large-scale DevOps environments.Proven experience in AI / ML infrastructure orchestration, workflow scheduling, or HPC environments.Hands-on experience managing workloads on AWS, or hybrid cloud platforms.Certifications (preferred) :AWS Certified Solutions Architect – AssociateCertified Kubernetes Administrator (CKA)Skills Required :
Strong proficiency in Kubernetes , EKS , or Ray for distributed workload orchestration.Experience with workflow orchestration tools like Airflow, Argo, or Prefect.Familiarity with containerization (Docker), IaC tools (Terraform / CloudFormation) , and GitOps (Argo CD) .Understanding scheduling algorithms , cluster autoscaling , and load balancing in large systems.Experience with inference batching, request routing, and autoscaling strategies using frameworks like Ray Serve , Triton Inference Server , or KServe .Experience building or tuning custom schedulers for optimization.Knowledge of GPU orchestration , NVIDIA Triton , or KServe for model serving at scale.Understanding of GPU utilization optimization , and CUDA profiling for efficient model execution.Experience in AIOps : automated monitoring, anomaly detection, root-cause analysis, and predictive operations for AI workloads.Experience building or tuning custom schedulers for optimization.Experience building demand forecasting and queue-based scheduling (Redis, Kafka, or similar) systems to balance compute load and supply.Proficiency in Python, Go, or Bash scripting for automation.Exposure to monitoring and observability tools (Prometheus, Grafana, CloudWatch).What You’ll Do :
Design the orchestration layer for AI workloads across distributed data centers.Optimize resource allocation by balancing compute demand, supply and economics.Build automation pipelines for dynamic scaling and job scheduling.Monitor system performance and drive continuous improvements in utilization and efficiency.What you will bring
Strong systems thinking and the ability to design control mechanisms for complex, distributed workloads.Passion for optimizing AI infrastructure for efficiency, reliability, and sustainability.Curiosity about distributed scheduling and cloud optimization.Hands-on experience with container orchestration, automation, and observability.A proactive, ownership-driven mindset suited for a fast-paced, mission-driven environment.Agile mindset, adaptability, and eagerness to learn emerging tools and technologies.Proactive, ownership-driven approach, with the ability to improve systems end-to-end.Startup DNA → bias to action, comfort with ambiguity, love for fast iteration, and flexible and growth mindset.Why Join Us
Shape a first-of-its-kind AI + clean energy platform .Work with a small, mission-driven team obsessed with impact.An aggressive growth path.A chance to leave your mark at the intersection of AI and sustainability .