Description :
Position : Infrastructure Engineer
Experience : 7+ Years (Principal or Staff Level)
Job Type : Full-time
Job Summary :
We are seeking a highly experienced Infrastructure Engineer at the Principal or Staff level, with 7+ years of specialized experience in cloud infrastructure, DevOps, or Site Reliability Engineering (SRE). This critical role involves designing, implementing, and maintaining scalable observability solutions and robust AWS cloud infrastructure. The engineer will be responsible for managing complex containerized environments (Kubernetes, Docker), optimizing costs, driving down MTTR, and ensuring the absolute reliability of our critical enterprise platforms.
Key Responsibilities :
Observability and Reliability Engineering (SRE) :
- Design, implement, and manage advanced monitoring, dashboards, and comprehensive alerting systems using industry-leading tools such as Datadog, CloudWatch, and Sumo Logic.
- Define and continuously refine SLIs (Service Level Indicators), SLOs (Service Level Objectives), and internal SLAs (Service Level Agreements) to measure and improve service reliability.
- Troubleshoot critical production issues efficiently, focusing on reducing MTTR (Mean Time To Resolution) and driving deep RCA (Root Cause Analysis) improvements.
Cloud Infrastructure and Containerization :
Develop robust Infrastructure-as-Code (IaC) solutions with Terraform for managing and provisioning AWS infrastructure at scale.Manage and operate complex containerized environments, including Kubernetes, Docker, and associated service mesh technologies (e.g., Istio, Linkerd).Apply deep expertise in Linux administration and core networking concepts (TCP / IP, load balancing, DNS, firewalls) to ensure optimal system performance and security.Database Operations and Automation :
Optimize, scale, and ensure the reliability of enterprise database services, including Postgres, AWS Aurora, Redshift, and OpenSearch instances.Enhance and automate deployment, testing, and release processes by improving CI / CD pipelines and automation tools (e.g., Jenkins, GitLab CI).Proactively identify and execute technical measures for cost optimization across all managed cloud services.Qualifications :
Experience : Mandatory 7+ years of experience in cloud infrastructure, DevOps, or SRE, ideally functioning at a Principal or Staff Software Engineer level.
Core Technologies (Mandatory) : Expert proficiency in AWS, Terraform (IaC), and Linux.
Containerization & Orchestration : Mandatory hands-on experience with Kubernetes, Docker, and associated operations.
Observability & Tools : Strong hands-on experience with production-grade observability tools (Datadog, CloudWatch, or similar).
Database & CI / CD : Proven experience managing enterprise databases and improving existing CI / CD pipelines.
Preferred Skills :
Certification in AWS (e.g., DevOps Engineer Professional or Solutions Architect Professional).Deep technical experience with specific databases (advanced Postgres tuning, Redshift administration).Proficiency in a programming / scripting language (Python, Go) for developing automation and custom tooling.Experience implementing DevSecOps practices and security compliance tools.(ref : hirist.tech)