This job offer is not available in your country.

Principal AI Engineer

ConfidentialBengaluru / Bangalore

3 days ago

Job description

What you will do

Lead the design and development of scalable, efficient, and secure AI model lifecycle frameworks within Red Hats OpenShift and RHEL AI infrastructures, ensuring models are deployed and maintained with minimal disruption and optimal performance.
Define and implement the strategy for optimizing AI model deployment, scaling, and integration across hybrid cloud environments (AWS, GCP, Azure), working with cross-functional teams to ensure consistent high availability and operational excellence.
Spearhead the creation and optimization of CI / CD pipelines and automation for AI model deployments, leveraging tools such as Git, Jenkins, and Terraform, ensuring zero disruption during updates and integration.
Champion the use of advanced monitoring tools (e.g., OpenLLMetry, Splunk, Catchpoint) to monitor and optimize model performance, responding to issues and leading the troubleshooting of complex problems related to AI and LLM models.
Lead cross-functional collaboration in collaboration with Products & Global Engineering (P&GE) and IT AI Infra teams to ensure seamless integration of new models or model updates into production systems, adhering to best practices and minimizing downtime.
Define and oversee the structured process for handling feature requests (RFEs), prioritization, and resolution, ensuring transparency and timely delivery of updates and enhancements.
Lead and influence the adoption of new AI technologies, tools, and frameworks to ensure that Red Hat remains at the forefront of AI and machine learning advancements.
Drive performance improvements, model updates, and releases on a quarterly basis, ensuring RFEs are processed and resolved within agreed-upon timeframes and driving business adoption.
Oversee the fine-tuning and enhancement of large-scale models, including foundational models like Mistral and LLama, ensuring the optimal allocation of computational resources (GPU management, cost management strategies).
Lead a team of engineers, mentoring junior and senior talent, fostering an environment of collaboration and continuous learning, and driving the technical growth of the team.
Contribute to strategic discussions with leadership, influencing the direction of AI initiatives and ensuring alignment with broader business goals and technological advancements.

What you will bring

A bachelors or masters degree in Computer Science, Data Science, Machine Learning, or a related technical field is required.

Hands-on experience and demonstrated leadership in AI engineering and MLOps will be considered in lieu of formal degree requirements.

10+ years of experience in AI or MLOps, with at least 3 years in a technical leadership role managing the deployment, optimization, and lifecycle of large-scale AI models. You should have deep expertise in cloud platforms (AWS, GCP, Azure) and containerized environments (OpenShift, Kubernetes), with a proven track record in scaling and managing AI infrastructure in production.

Experience optimizing large-scale distributed AI systems, automating deployment pipelines using CI / CD tools like Git, Jenkins, and Terraform, and leading performance monitoring using tools such as OpenLLMetry, Splunk, or Catchpoint. You should have a strong background in GPU-based computing and resource optimization (e.g., CUDA, MIG, vLLM) and be comfortable with high-performance computing environments.

Your leadership skills will be key, as you will mentor and guide engineers while fostering a collaborative, high-performance culture. You should also have a demonstrated ability to drive innovation, solve complex technical challenges, and work cross-functionally with teams to deliver AI model updates that align with evolving business needs. A solid understanding of Agile development processes and excellent communication skills are essential for this role.

Lastly, a passion for AI, continuous learning, and staying ahead of industry trends will be vital to your success at Red Hat.

Desired skills :

10+ years of experience in AI, MLOps, or related fields, with a substantial portion of that time spent in technical leadership roles driving the strategic direction of AI infrastructure and model lifecycle management.

Extensive experience with foundational models such as Mistral, LLama, GPT, and their deployment, tuning, and scaling in production environments.

Proven ability to influence and drive AI and MLOps roadmaps, shaping technical strategy and execution in collaboration with senior leadership.

In-depth experience with performance monitoring, resource optimization, and troubleshooting of AI models in complex distributed environments.

Strong background in high-performance distributed systems and container orchestration, particularly in AI / ML workloads.

Proven experience in guiding and mentoring engineering teams to build high-performance capabilities, fostering a culture of continuous improvement and technical innovation.

As a Principal AI Engineer at Red Hat, you will have the opportunity to drive major strategic AI initiatives, influence the future of AI infrastructure, and lead a high-performing engineering team. This is a unique opportunity for a seasoned AI professional to shape the future of AI model lifecycle management at scale. If youre ready to take on a technical leadership role with a high level of responsibility and impact, we encourage you to apply.

Skills Required

Git, Gcp, Azure, Aws

Create a job alert for this search

Principal Engineer • Bengaluru / Bangalore