Ensure that our applications and environments are stable, scalable, secure, and performing as expected.
Proactively engage and work in alignment with cross-functional colleagues to understand their requirements, contributing to and providing suitable supporting solutions.
Develop and introduce systems to aid and facilitate rapid growth, including implementation of deployment policies, designing and implementing new procedures, configuration management, and planning of patches, and for capacity upgrades.
Observability : ensure suitable levels of monitoring and alerting are in place to keep engineers aware of issues.
Establish runbooks and procedures to keep outages to a minimum. Jump in before users notice that things are off track, then automate it for the future.
Automate everything so that nothing is ever done manually in production.
Identify and mitigate reliability and security risks.
Make sure we are prepared for peak times, DDoS attacks, and fat fingers.
Troubleshoot issues across the whole stack - software, application, and network.
Manage individual project priorities, deadlines, and deliverables as part of a self-organizing team.
Learn and unlearn every day by exchanging knowledge and new insights, conducting constructive code reviews, and participating in retrospectives.
Requirements :
2+ years of extensive experience in Linux server administration, including patching, packaging (rpm), performance tuning, networking, user management, and security.
2+ years of implementing systems that are highly available, secure, scalable, and self-healing on the Azure cloud platform.
Strong understanding of networking, especially in cloud environments, along with a good understanding of CICD.
Prior experience implementing industry-standard security best practices, including those recommended by Azure.
Proficiency with Bash and any high-level scripting language.
Basic working knowledge of observability stacks like ELK, prometheus, grafana, Signoz, etc.
Proficiency with Infrastructure as Code and Infrastructure Testing, preferably using Pulumi / Terraform.
Hands-on experience in building and administering VMs and Containers using tools such as Docker / Kubernetes.
Excellent communication skills, spoken as well as written, with a demonstrated ability to articulate technical problems and projects to all stakeholders.
Extra credits for :
Experience with these technologies : Pulumi with TypeScript or Golang, Node.js, Kubernetes, Serverless infrastructure, Azure cloud.
Experience in governance processes and compliance validation, especially for financial services such as ISOm, SOC, 2 PCI, etc.
Experience working in product startups.
Experience in administering and scaling PostgreSQL.