Job Objective :
Ensure a world-class, uninterrupted cloud gaming experience by maintaining robust production systems and driving continuous improvement in code quality and deployment processes.
Key Responsibilities :
Ownership of Production :
- Take full responsibility for the health, performance, and reliability of production systems, proactively identifying and resolving issues to ensure optimal service delivery.
Production Code Quality :
Uphold high standards for code quality, ensuring that all code deployed to production is stable, reliable, and thoroughly tested.Deployments :
Oversee and optimize deployment processes, aiming to minimize downtime and service interruptions while enabling rapid, safe releases.What Were Looking For :
Self-motivated professionals who can work independently and collaboratively.Individuals who actively participate in decision-making at various organizational levels.Team members who provide critical feedback and insights throughout the operational lifecycle.Engineers engaged across all stages of the software development lifecycle, with a strong focus on operational readiness and long-term service stability.Your expertise and input will directly influence the direction, quality, and success of our cloud gaming platform.
Join us to help shape the future of gaming!
Skill Requirements :
Experience & Background :
Minimum 7+ years of experience in Software Development and / or Linux Systems Administration.Proven track record as a Linux Production Systems Engineer, managing large-scale web services infrastructure.Participation in on-call rotation as required.Technical Skills :
Proficiency in Linux systems engineering and administration.Development experience in at least one of the following programming languages :1. Python (preferred)
2. Bash, Go, Java, C++, or Rust
Additional Expertise (Experience in at least 3 of the following areas) :
Distributed data storage at scale (e.g., Hadoop, Ceph)NoSQL databases at scale (e.g., MongoDB, Redis, Cassandra)Data aggregation technologies (e.g., ElasticSearch, Kafka)Scaling and managing traditional RDBMS with High Availability (e.g., PostgreSQL, MySQL)Monitoring & alerting systems and incident management tools (e.g., Prometheus, Grafana)Kubernetes and / or AWS deployment and managementSoftware distribution at scale (package management and distribution)Configuration management tools (e.g., Ansible, SaltStack, Puppet, Chef)Software performance analysis and load testing (QA or SDET experience is a plus)Interpersonal Skills :
Strong interpersonal, written, and verbal communication skillsref : hirist.tech)