Talent.com
This job offer is not available in your country.
Product Manager \u2013 Enterprise AI Operations & Observability

Product Manager \u2013 Enterprise AI Operations & Observability

ConfidentialHyderabad / Secunderabad, Telangana, India
9 days ago
Job description

Product Manager u2013 Enterprise AI Operations & Observability

Department : Tech@Lilly

Location : Hyderabad, India

Position Type : Full-Time

Level : P4

Position Summary

Eli Lilly is seeking a highly accomplished and strategic technology leader to head our Enterprise AI Operations (AIOps) and Observability functions. u00A0 This pivotal role is responsible for defining, implementing, and optimizing operational frameworks, platforms, and processes that ensure the reliability, performance, scalability, and security of AI / ML systems and the broader enterprise technology landscape. The ideal candidate will bring deep expertise in AI / ML lifecycle management, enterprise observability, and automation, with a proven track record of building and leading high-performing teams in complex, large-scale environments. u00A0They will also partner closely with the AI Ops Architect to co-develop and execute a cohesive strategy that delivers measurable value across the organization, ensuring alignment between architectural vision and operational excellence.

Key Responsibilities

Strategic Leadership & Governance

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Develop and execute a comprehensive strategy for enterprise AI operations and observability aligned with business and technology goals.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Establish governance frameworks, standards, and best practices for AI / ML deployments and enterprise observability.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Ensure compliance with regulatory, security, and operational requirements.

AIOps & MLOps Maturity

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Drive the adoption of AIOps practices for proactive issue detection, intelligent alerting, root cause analysis, and automated remediation.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Establish and scale MLOps practices for secure, efficient, and reliable deployment, observability, and lifecycle management of AI / ML models.

Enterprise Observability

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Define and implement a robust observability strategy across infrastructure, applications, networks, security, and data systems.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Standardize the collection, correlation, and analysis of metrics, logs, and traces across all technology layers.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Build predictive capabilities and dashboards to anticipate failures and enable proactive interventions.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Treat observability as a product, continuously iterating to meet evolving business needs.

Tooling, Platform Management & Automation

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Evaluate, implement, and manage advanced observability, and AIOps platforms and tools.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Optimize and scale observability of infrastructure for high availability and performance.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Design intuitive, high-value dashboards and alerting systems that clearly visualize system health and performance.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Champion automation using scripting, orchestration tools, and AI-driven solutions to reduce manual effort and enable self-healing systems.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Partner with automation teams to develop and implement automation scripts and workflows.

Operational Resilienceu00A0

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Ensure high availability and resilience of mission-critical systems, especially AI / ML workloads.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0 Collaborate closely with the Service Management Office and production support teams to drive impactful outcomes and elevate operational success

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Enable methods to reduce mean time recovery (MTTR) and drive continuous operational improvements.

Performance & Reliability Optimization

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Utilize observability data to identify performance bottlenecks, capacity issues, and reliability risks.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Work with relevant teams to implement improvements based on data-driven insights.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Establish and execute performance strategy benchmarks utilizing baselines and KPIs.

Team Leadership & Enablement

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Build, mentor, and lead a high-performing team of engineers and specialists in AIOpsu00A0

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Provide training and documentation to operational teams on leveraging observability platforms for troubleshooting and performance tuning.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Foster a culture of innovation, continuous learning, and operational excellence.

Cross-Functional Collaboration

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Collaborate with AI / ML engineering, data science, infrastructure, cybersecurity, and business teams to operationalize AI initiatives and ensure comprehensive observability coverage.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Serve as a subject matter expert to understand and deliver tailored observability solutions across teams.

Budget & Vendor Management

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Manage departmental budgets and vendor relationships to deliver cost-effective, scalable solutions.

Qualifications

Required

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Bachelor's or master's degree in computer science, Engineering, IT, or a related field.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A015+ years of progressive technology leadership experience, including 5u20137 years in enterprise operations, SRE, or AI / ML operations.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Deep understanding of the AI / ML lifecycle, including development, deployment, observability, and retraining.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Proven experience with enterprise observability across hybrid environments.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Expertise in AIOps principles and implementation.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Proficiency with leading observability and MLOps tools and platforms.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Strong knowledge of cloud platforms, containerization, and microservices.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Excellent leadership, communication, and stakeholder management skills.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Demonstrated ability to build and lead high-performing engineering teams.

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Strong analytical and data-driven decision-making skills.

Preferred

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Experience in regulated industries (e.g., healthcare, finance).

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Certifications in cloud platforms or operational frameworks (e.g., ITIL).

u00B7u00A0u00A0u00A0u00A0u00A0u00A0u00A0Active participation in AIOps or MLOps professional communities.

Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form () for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.

Lillyu00A0does not discriminate on the basis of age, race, color, religion, gender, sexual orientation, gender identity, gender expression, national origin, protected veteran status, disability or any other legally protected status.

#WeAreLilly

Skills Required

Microservices, containerization

Create a job alert for this search

Operation Manager • Hyderabad / Secunderabad, Telangana, India