Job Description :
As an AI / ML company, we operate a robust infrastructure heavily reliant on computing and data processing. Managing multiple AWS accounts, diverse environments, and Kubernetes deployments demands a comprehensive approach to automation that extends beyond the CI / CD pipeline. The complexities of scalability, reliability, security, and cost optimization persist, presenting continuous challenges. If the prospect of addressing these intricacies in a dynamic environment excites you, this position is an ideal fit.
Responsibilities :
- Manage the streamlining of Infrastructure practice by collaborating with cross-functional teams to improve system reliability, performance, and availability.
- Design, implement, and maintain scalable and reliable infrastructure solutions on cloud platforms such as AWS, GCP, etc.
- Implement Infrastructure as Code (IaC) using tools like Terraform or CloudFormation for automated provisioning and configuration.
- Manage and optimize Kubernetes clusters to support our containerized applications.
- Monitor, troubleshoot, and resolve incidents and system outages in a 24 / 7 production environment.
- Develop and maintain CI / CD pipelines to automate deployment, testing, and release processes.
- Champion best practices for developer productivity, code quality, and collaboration within
development teams.
Requirements :
Bachelor's degree in Computer Science, Information Technology, or related field with 4+ years of industry experience as an SRE / DevOps Engineer.Relevant certifications (e. g., AWS Certified DevOps Engineer, Kubernetes Certification) a plus.Expertise in Kubernetes, including cluster management and orchestration.Expertise in cloud platforms such as AWS, GCP, or Azure.Experience with Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, orequivalent.
Understanding of CI / CD pipelines and automation tools (e. g., Jenkins, GitLab CI / CD).Expertise in scripting and programming skills (e. g., Python, Shell, Go).Knowledge of monitoring and logging tools (e. g., Prometheus, ELK stack).Stay up-to-date with industry trends and emerging technologies to drive innovation bestpractices for security, compliance, and data protection.
ref : hirist.tech)