Job Title : OpenShift Virtualization Engineer / Cloud Platform Engineer (OpenShift Virtualization)
Experience : 5+ Years
Location : Chennai
Job Summary :
We are seeking a highly skilled and experienced OpenShift Virtualization Engineer to join our dynamic Globally positioned, on premises cloud platform team. In this role, you will be responsible for the design, implementation, administration, and ongoing management of our OpenShift Virtualization (KubeVirt) service. You will ensure the stability, performance, and security of our virtualized workloads, leveraging your expertise in Kubernetes, OpenShift, and virtualization technologies. This position requires a strong understanding of day-2 operations, proactive monitoring, and a commitment to operational excellence.
Key Responsibilities :
Observability, Monitoring, logging and Troubleshooting
Implement and maintain comprehensive end to end observability solutions (monitoring, logging, tracing) for the OSV environment, including integration with tools like Dynatrace and Prometheus / Grafana
Explore and implement Event Driven Architecture (EDA) for enhanced real time monitoring and response.
Develop capabilities to flag and report abnormalities and identify "blind spots"
Perform deep dive Root Cause Analysis (RCA), potentially utilizing available tooling, to quickly identify and resolve issues across the global compute environment
Find the needle in a haystack / unhealthy bits in the compute universe (Globally) for faster time to resolution
Monitor VM health, resource usage, and performance metrics proactively
Monitor for unusual activity that might indicate a compromise or misconfiguration Solution Design & Consulting
Provide technical consulting and expertise to application teams requiring OSV solutions
Design, implement, and validate custom or dedicated OSV clusters and VM solutions for critical applications with unique or complex requirements (e.g., specialized appliances) Knowledge Management
Create, maintain, and update comprehensive internal documentation and customer facing content on the Ford docs site to facilitate self service and clearly articulate platform capabilities
Skills Required :
Kubernetes, Openshift , deployment, services, Storage Capacity Management, Linux, Network Protocols & Standards, Python, Scripting, Dynatrace, VMware, Problem Solving, Technical Troubleshoot, Communications, Ability to communicate and work with cross-functional teams and all levels of management , redhat, TERRAFORM, Tekton, Ansible, GitHub, GCP, AWS, Azure, Cloud Infrastructure, CI / CD, DevOps
Experience Required :
Required Qualifications :
Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.
5+ years of experience in IT infrastructure, with at least 2+ years focused on Kubernetes and / or OpenShift.
Proven experience with OpenShift Virtualization (KubeVirt) in a production environment.
Strong understanding of Kubernetes concepts (Pods, Deployments, Services, Storage Classes, Operators, Custom Resources).
Experience with Linux administration and networking fundamentals.
Proficiency in scripting languages (e.g., Bash, Python) for automation.
Experience with monitoring tools (e.g., Prometheus, Grafana, Dynatrace) and logging solutions.
Solid understanding of virtualization concepts and technologies (e.g., KVM, VMware)
Excellent problem-solving skills and the ability to troubleshoot complex issues across multiple layers of the stack.
Strong communication and collaboration skills.
Experience Preferred :
Preferred Qualifications :
Red Hat Certified Specialist in OpenShift Virtualization or other relevant certifications.
Experience with Infrastructure as Code (IaC) tools like Ansible, Terraform, or OpenShift GitOps.
Familiarity with software-defined networking (SDN) and software-defined storage (SDS) solutions.
Experience with public cloud providers (AWS, Azure, GCP) and hybrid cloud architectures.
Knowledge of CI / CD pipelines and DevOps methodologies.
Additional Information :
Key Responsibilities :
Capacity Management
Conduct capacity planning and forecasting for the OpenShift Virtualization platform, including compute, memory, storage, and network resources, to ensure scalability and prevent resource exhaustion.
Analyze resource utilization trends and make recommendations for infrastructure scaling, consolidation, or optimization.
Collaborate with application teams and stakeholders to understand future demand and project capacity needs.
Develop and maintain capacity models and reports to support strategic planning. OSV Automation & Efficiency
Develop automation solutions (scripts, playbooks) for repetitive OSV tasks, including configuration changes, VM management (like snapshot removal), auditing, remediation and integration with ticketing systems
Leverage automation to enable delivering operator updates and changes efficiently at scale
Implement Site Reliability Engineering (SRE) principles and practices to improve overall platform stability, performance, and operational efficiency
Role Based Access Control deployment and auditing
Namespace and Resource Quota management (CPU, Disk and Storage)"
Linux • Delhi, India