What youll be doing :
- Ensure that our applications and environments are stable, scalable, secure and performing as expected.
- Proactively engage and work in alignment with cross-functional colleagues to understand their requirements, contributing to and providing suitable supporting solutions.
- Develop and introduce systems to aid and facilitate rapid growth including implementation of deployment policies, designing and implementing new procedures, configuration management and planning of patches and for capacity upgrades
- Observability : ensure suitable levels of monitoring and alerting are in place to keep engineers aware of issues.
- Establish runbooks and procedures to keep outages to a minimum. Jump in before users notice that things are off track, then automate it for the future.
- Automate everything so that nothing is ever done manually in production.
- Identify and mitigate reliability and security risks. Make sure we are prepared for peak times, DDoS attacks and fat fingers.
- Troubleshoot issues across the whole stack - software, applications and network.
- Manage individual project priorities, deadlines, and deliverables as part of a self-organizing team.
- Learn and unlearn every day by exchanging knowledge and new insights, conducting
constructive code reviews, and participating in retrospectives.
You Must Have :
2+ years extensive experience of Linux server administration include patching, packaging (rpm), performance tuning, networking, user management, and security.2+ years of implementing systems that are highly available, secure, scalable, and self-healing on Azure cloud platformStrong understanding of networking, especially in cloud environments along with a good understanding of CICD.Prior experience implementing industry standard security best practices, including those recommended by AzureProficiency with Bash, and any high-level scripting language.Basic working knowledge of observability stacks like ELK, prometheus, grafana, Signoz etcProficiency with Infrastructure as Code and Infrastructure Testing, preferably using Pulumi / Terraform.Hands-on experience in building and administering VMs and Containers using tools such as Docker / Kubernetes.Excellent communication skills, spoken as well as written, with a demonstrated ability to articulate technical problems and projects to all stakeholders.Extra credits for :
Experience with these technologies :Pulumi with TypeScript or GolangNode.jsKubernetesServerless infrastructure.Azure cloudExperience in governance processes and compliance validation, especially for financial services such as ISOm, SOC2, PCI etc.Experience working in product startups.Experience in administering and scaling PostgreSQL(ref : hirist.tech)