Apply SRE core tenets of measurement (SLI / SLO / SLA), eliminate toil, and reliability modeling
Enable and educate development teams on industry best practice design patterns, ways of working and operational knowledge to ensure platform continuity
Develop and architect solutions to infrastructure and operational aspects of new products and feature sets
Assist with go / no go preplanning, verification / validation, and review of existing and new product / services
Proactively analyze data and test the integrity of network / systems to ensure production applications and services are operating optimally
Work within development teams to troubleshoot and resolve business affecting issues
Escalations, incident response, RCA, and blameless postmortem
Participate in on-call rotation
Qualifications
At least 3 years of professional experience within a cloud / web / CDN scale infrastructure
Experience with Python and Go. C / C++ a plus
Expert knowledge of Linux systems, network programming and protocols TCP, UDP, DNS, TLS / SSL, HTTP
Experience with BGP and Anycast routing is a plus
Experience with DevOps principles and concepts such as Infrastructure as Code (Ansible / Saltstack), CI / CD (Gitlab, Jenkins, Git), monitoring and visualization (Prometheus, Grafana)
Experience with big data technologies such as NoSQL / RDBMS, Redis, ElasticSearch, Kafka
Experience with containers and container management (Docker, Kubernetes)
Experience analyzing and building data telemetry, modeling, pipelines, UI visualization
Experience in developing software, troubleshooting, and monitoring large scale distributed systems
Implement software engineering best practices / standards and software development life cycle
Working knowledge and experience of Agile software development methodologies