Plan, administer, troubleshoot, and assist in the design of high availability systems and services in production and lab / test environments;Analyze and resolve complex problems including interoperability issues;Integrate applications, systems, and services with enterprise monitoring tools;Develop KPIs and other system metrics for capacity planning, performance management, and monitoring;Manage daily operations of carrier-grade systems.Qualifications :
- Fluent in English;
- 10+ years of IT engineering experience;
- Extensive experience with planning, integrating, and managing Linux-based production platforms;
- Deep understanding of systems and IP networking for distributed platforms;
- Solid experience with AWS-based solutions and cloud architectures;
- Experience integrating and managing monitoring and analytics tools (e.g., Nagios, Splunk, Prometheus, Grafana, etc.);
- Ability to initiate and complete projects with minimal direction and guidance;
- Ability to multi-task and manage multiple projects with competing priorities;
- Self-starter with strong verbal and written communication skills;
- Resolute problem solving and incident management skills;
- Scripting skills in Bash, Perl, Python or a similar language;
- Experience with an automation framework (e.g., Ansible, Chef, Puppet);
- Rotational after-hours on-call support is required;
- Exposure to telecom applications or signaling networks is a big plus.
Skills Required
Scripting, Linux, Networking, Automation, Aws, Monitoring