Job description
Founding Teams is an AI Incubator & Talent platform. We are supporting the next
generation of AI startup founders with the resources they need including engineering, product, sales, marketing and operations staff to create and launch their product.
The ideal candidate will have a passion for next generation AI tech startups and working with great global startup talent.
About the Role
We are looking for an experienced Lead DevOps Engineer to own and scale our cloud infrastructure, CI / CD pipelines, and deployment strategy. You will be responsible for driving DevOps best practices, improving reliability and performance, and leading a small team of DevOps / Platform engineers. You will work closely with backend, frontend, data, and security teams to ensure fast, secure and scalable delivery of our products.
Key Responsibilities
- Lead the design, implementation, and maintenance of scalable, secure cloud infrastructure
- Own and improve CI / CD pipelines to support rapid, reliable deployments
- Define and enforce infrastructure architecture and DevOps best practices
- Manage monitoring, logging, alerting, and incident response processes
- Optimize system performance, uptime, scalability and cost efficiency
- Drive automation using Infrastructure as Code (Terraform / CloudFormation / Ansible)
- Collaborate with engineering teams to support containerisation and orchestration (Docker, Kubernetes)
- Ensure systems meet security and compliance requirements
- Mentor and manage DevOps / Platform engineers and support team growth
- Support production issues, root-cause analysis, and system improvements
- Drive DevOps roadmap and tooling strategy
Essential Skills & Experience
5–8+ years in DevOps / Platform / Site Reliability EngineeringStrong experience with AWS, Azure or GCP (certification a plus)Expertise in CI / CD tools (GitHub Actions, GitLab CI, Jenkins, CircleCI, etc.)Strong container experience (Docker & Kubernetes)Advanced Infrastructure as Code experience (Terraform or CloudFormation)Strong Linux and networking knowledge (DNS, TCP / IP, VPN, firewalls, proxies)Experience with monitoring tools (Prometheus, Grafana, Datadog, ELK, New Relic)Solid scripting skills (Bash, Python, or Go)Experience scaling systems supporting thousands / millions of usersExperience implementing security best practices (IAM, secrets management, vulnerability scanning)Understanding of microservices and distributed systemsNice to Have
Experience with serverless architectureExperience with multi-region and high-availability systemsKnowledge of security / compliance frameworks (ISO, SOC2, HIPAA, etc.)Experience in a fast-paced startup environmentContributions to open-source projects