No longer accepting applications

Inference Optimization Engineer(LLM and Runtime)

Sustainability Economics.aiNagpur, IN

11 hours ago

Job description

Location : Bengaluru, Karnataka

About the Company :

Sustainability Economics.ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive efficiency, innovation, and long-term impact across industries. Guided by exceptional leaders and visionaries with decades of expertise in finance, policy, technology, and innovation, we are committed to making long-term efforts to fulfil this vision through our technical innovation, client services, expertise, and capability expansion.

Role Summary :

We are seeking a highly skilled and innovative Inference Optimization (LLM and Runtime) to design, develop, and optimize cutting-edge AI systems that power intelligent, scalable, and agent-driven workflows. This role blends the frontier of generative AI research with robust engineering, requiring expertise in machine learning, deep learning, and large language models (LLMs) and latest trends going on in the industry. The ideal candidate will collaborate with cross-functional teams to build production-ready AI solutions that address real-world business challenges while keeping our platforms at the forefront of AI innovation.

Key Tasks and Accountability :

Optimization and customization of large-scale generative models (LLMs) for efficient inference and serving.
Apply and evaluate advanced model optimization techniques such as quantization, pruning, distillation, tensor parallelism, caching strategies, etc., to enhance model efficiency, throughput, and inference performance.
Implement custom fine-tuning pipelines using parameter-efficient methods (LoRA, QLoRA, adapters etc.) to achieve task-specific goals while minimizing compute overhead.
Optimize runtime performance of inference stacks using frameworks like vLLM, TensorRT-LLM, DeepSpeed-Inference, and Hugging Face Accelerate.
Design and implement scalable model-serving architectures on GPU clusters and cloud infrastructure (AWS, GCP, or Azure).
Work closely with platform and infrastructure teams to reduce latency, memory footprint, and cost-per-token during production inference.
Evaluate hardware–software co-optimization strategies across GPUs (NVIDIA A100 / H100), TPUs, or custom accelerators.
Monitor and profile performance using tools such as Nsight, PyTorch Profiler, and Triton Metrics to drive continuous improvement.

Key Requirements :

Education & Experience

Ph.D. in Computer Science or a related field, with a specialization in Deep Learning, Generative AI, or Artificial Intelligence and Machine Learning (AI / ML) .

2–3 years of hands-on experience in large language model (LLM) or deep learning optimization, gained through academic or industry work.

Skills

Strong analytical and mathematical reasoning ability with a focus on measurable performance gains.

Collaborative mindset, with ability to work across research, engineering, and product teams.

Pragmatic problem-solver who values efficiency, reproducibility, and maintainable code over theoretical exploration.

Curiosity-driven attitude — keeps up with emerging model compression and inference technologies .

What You’ll Do

Take ownership of end-to-end optimization lifecycle — from profiling bottlenecks to delivering production-optimized LLMs.

Develop custom inference pipelines capable of high throughput and low latency under real-world traffic.

Build and maintain internal libraries, wrappers, and benchmarking suites for continuous performance evaluation.

What you will bring

Hands-on experience in building, optimizing machine learning or Agentic Systems at scale.

A builder’s mindset — bias toward action, comfort with experimentation, and enthusiasm for solving complex, open-ended challenges.

Startup DNA → bias to action, comfort with ambiguity, love for fast iteration, and flexible and growth mindset.

Why Join Us

Shape a first-of-its-kind AI + clean energy platform .

Work with a small, mission-driven team obsessed with impact.

An aggressive growth path.

A chance to leave your mark at the intersection of AI and sustainability .

Create a job alert for this search

Optimization • Nagpur, IN

Related jobs

Promoted

Applied AI and LLM Innovation Engineer

Trupti Solutions - IndiaPune, Republic Of India, IN

Generative & Agentic AI Developer with 7+ years of IT experience,.Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) and agent-based AI systems. This role focuses on developing and d...Show moreLast updated: 2 days ago

Promoted

LLM Application Engineer

Vaikhari AIRepublic Of India, IN

Our mission is to ensure AI systems behave reliably across languages, cultures, and real-world conditions.AI systems with a special focus on underrepresented and low-resource languages.Our founder...Show moreLast updated: 6 days ago

Promoted

Machine Learning Engineer, LLM Focus

DeepLLMDataRepublic Of India, IN

DeepLLMData focuses on coding SFT, RLHF, STEM, PhD, Maths, Image and Video data annotators and provisioning.This is a Part-time remote role for LLM Model training. The task needs LLM Trainers to cre...Show moreLast updated: 4 days ago

Promoted

LLM Operations Engineer

Strategic Talent PartnerRepublic Of India, IN

Design and deploy advanced multi-agent pipelines for credit analysis.Optimize inference and prompt chains using frameworks like DSPy, GEPA, and LangChain. Implement reasoning techniques (CoT, ToT, G...Show moreLast updated: 6 days ago

Promoted

Applied Ml - Engineer

TIH | IIT BombayRepublic Of India, IN

Development, adaptation, and implementation of AI / ML algorithms and frameworks, Prediction algorithms.Developing deep learning and machine learning algorithms (CNN, object detection, segmentation, ...Show moreLast updated: 25 days ago

Promoted

Genai Llm Engineer

The Judge GroupChennai, Republic Of India, IN

Position Title : Gen AI LLM Engineer.Position Overview We are seeking a solution-oriented engineer who can identify where AI / LLM capabilities can transform business operations and architect practica...Show moreLast updated: 6 days ago

Promoted
New!

Cotiviti - Artificial Intelligence / MLOps Engineer

Cotiviti India Private LimitedIndia

Description : We are seeking an experienced AI / ML Ops Engineer to bridge the gap between data science and DevOps.The ideal candidate will operationalize machine lear...Show moreLast updated: 18 hours ago

Promoted

ML Ops

EXLIndia, India

Deploy, monitor, and scale ML models on.GCP (Vertex AI, GKE, Cloud Functions).GitHub Actions / Jenkins / cloud-native tools. Containerize and orchestrate workloads with.MLflow, Feast, Prometheus / Gra...Show moreLast updated: 30+ days ago

Promoted

AI / ML Engineer - LLM / Generative AI

WorksconsultancyIndia

Responsibilities : - Research, develop, and fine-tune AI language models for text generation.Implement and optimize embe...Show moreLast updated: 30+ days ago

Promoted

AI Inference Kernel Optimization Specialist

PhinityRepublic Of India, IN

We look forward to when AI can discover the next quantum AI accelerator, or when AI can make RL much more compute-efficient. We want to enable AI to bootstrap its own intelligence, to discover new c...Show moreLast updated: 15 days ago

Promoted

Ai Inference Kernel Engineer

PhinityRepublic Of India, IN

Promoted
New!

Inference Optimization Engineer(LLM and Runtime)

Sustainability Economics.ainagpur, maharashtra, in

AI, enabling profitable energy transitions while powering end-to-end AI infrastructure.By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems t...Show moreLast updated: 7 hours ago

Promoted

Engineer : Senior LLM Optimization (LLMO) / GEO Expert – Google Vertex

Proso.ainagpur, maharashtra, in

Generative Engine Optimization (GEO).You’ll help make sa products and services.AI-driven assistants and LLMs like ChatGPT, Copilot, and Gemini. Optimize LLM pipelines on Google Vertex AI : .Design RAG...Show moreLast updated: 4 days ago

Promoted

Mlops Engineer

Yotta Data Services Private LimitedRepublic Of India, IN

We’re looking for a strategic Senior MLOps Engineer to lead the end-to-end design, implementation, and scaling of our AI infrastructure. You’ll partner with researchers, product teams, and DevOps to...Show moreLast updated: 30+ days ago

Promoted
New!

LLM Engineer

Smart Moves ConsultantsPune, Republic Of India, IN

We are looking for a skilled Python + Generative AI Engineer who is passionate about building intelligent systems using LLMs and modern AI frameworks. The ideal candidate will have strong Python dev...Show moreLast updated: 1 hour ago

Promoted

LLM Solutions Engineer

Tata Consultancy ServicesChennai, Republic Of India, IN

GPT,Gemini, LLaMA, Mistral) and.Solid understanding of data structures, algorithms, and software engineering principles.Bachelor’s or master’s degree in computer science, AI, Data Science, or relat...Show moreLast updated: 6 days ago

Promoted
New!

GenAI / LLM Engineer – Domain-Specific AI Solutions (Telecom)

Mobileumnagpur, maharashtra, in

Mobileum is a leading provider of Telecom analytics solutions for roaming, core network, security, risk management, domestic and international connectivity testing, and customer intelligence.More t...Show moreLast updated: 7 hours ago

Promoted

LLM Application Engineer

SakonPune, Republic Of India, IN

Role : AI Engineer – Agentic Systems & LLM Applications.We’re looking for a well-rounded, forward-thinking AI Engineer who can design, build, and deploy intelligent systems powered by LLMs, retrieva...Show moreLast updated: 6 days ago