The core requirements for the job include the following :
Database Administration (DBA) Skills :
- Relational Databases : MySQL, PostgreSQL, Oracle, MS SQL Server.
- Database Backup and Recovery : Tools and strategies for database backups and disaster recovery.
- Performance Tuning : Query optimization, indexing strategies, and database performance troubleshooting.
- Database Security : User management, roles, access control, and as a Service Knowledge :
- Infrastructure as Code (IaC) : Terraform, CloudFormation, Kubernetes.
- Kubernetes and Containers : Good Knowledge and Understanding of Kubernetes and usage of Containers.
- Observability Tools : ELK stack (Elasticsearch, Logstash, Kibana).
- Database Migration : Migrating databases across different platforms or cloud environments.
- Infrastructure Scaling : Vertical and horizontal scaling techniques in cloud environments.
SRE Principles and knowledge (Site Reliability Engineering) :
Strong hands-on experience in AWS and Azure cloud, and a fair understanding of Google Cloud would be required.Experience in handling APIs, troubleshooting API calls, and ensuring seamless integration and performance.Incident Management : Handling database outages, incident response, and on-call rotations.Monitoring and Alerting : Tools like Prometheus, Grafana, Datadog, and CloudWatch suggest proactive monitoring for the application stack.Understanding core SRE principles : SLA, SLI, SLO, Error budgets, etc.Disaster Recovery Planning : Ensuring high availability (HA) and disaster recovery (DR) solutions.Performance Optimisation : Track latency, slow performance, high utilisation issues, and recommend optimisation as required.Scripting and Automation :
Scripting Languages : Python, Shell scripting, Bash, PowerShell.Automation Tools : Ansible, Puppet, Chef.Infrastructure Automation : Automating database deployment, patching, and scaling.Networking and Infrastructure :
Networking Basics : TCP / IP, DNS, Firewall, Load Balancers.Database Connectivity : Connection pooling, failover strategies, and multiregion deployment.Storage and Disk Management : Understanding IOPS, latency, and throughput.Infrastructure : Familiarity with AWS services like EC2 S3 VPC, Security Groups, Private and Public subnets, IAM, CloudWatch, CloudTrail, etc, and Azure services like Virtual Machines, Azure functions, Virtual Network, Resource Manager, etc.OS Skills :
Expertise in Linux OS ( RHEL, Ubuntu, CentOS)Understanding of file systems (ext4 XFS, etc. ), permissions, and ownerships.Knowledge of process monitoring, management, and troubleshooting.Proficiency with tools like top, htop, vmstat, iostat, sar, and dstat to monitor CPU, memory, disk I / O, and network usage.Ability to analyze system logs ( / var / log / , journalctl, dmesg) for troubleshooting.Understanding of resource limits (CPU, memory, disk, network) and how they impactdatabase performance.
(ref : hirist.tech)