Job description :
Key Responsibilities :
1. AWS Cloud Management
- Manage, monitor, and optimize workloads using EC2, RDS, S3, VPC, CloudWatch, IAM, Lambda, and EKS.
- Implement and maintain Infrastructure as Code (IaC) using Terraform, CloudFormation, or AWS CDK.
- Enable auto-scaling, load balancing, and fault-tolerant architectures.
- Configure and maintain AWS Systems Manager (SSM) for patch automation and fleet management.
- Work with AI / ML services (e.g., Amazon Bedrock, SageMaker, Lookout for Metrics) for predictive insights or operational intelligence.
2. Linux Server Administration
Manage and secure Red Hat / CentOS / Ubuntu servers (installation, hardening, patching).Implement user management, shell scripting, crontab automation, SE Linux, and auditing policies.Configure and troubleshoot web servers (Apache / Nginx), databases (MySQL / PostgreSQL), and application services.Monitor performance and automate log analytics integration with CloudWatch Logs or ELK Stack.3. Automation & AI Ops
Develop scripts in Python / Bash / PowerShell for repetitive task automation (e.g., patching, backups, monitoring).Integrate AI-based alert correlation and predictive analytics using AWS CloudWatch Anomaly Detection, Amazon DevOps Guru, or third-party AIOps tools.Automate operational workflows using AWS Lambda, Event Bridge, Step Functions, and SNS.Participate in developing self-healing infrastructure via automation triggers and remediation scripts.4. Security, Compliance & Governance
Implement IAM least privilege, MFA, Guard Duty, Config Rules, and Security Hub compliance checks.Support security posture for DPDP, CERT-In, ISO 27001, and AWS Well-Architected Framework.Ensure patch compliance and vulnerability closure in collaboration with the Security Team.Participate in VAPT remediation and audit reporting.5. Monitoring, Observability & Incident Management
Use CloudWatch, Grafana, Prometheus, or Datadog for real-time performance insights.Utilize AI-based observability tools to reduce false positives and enhance incident triage.Handle L2 incident escalation, perform root cause analysis (RCA), and coordinate L3-level resolution.Prepare health reports, SLA adherence metrics, and cost optimization dashboards.Required Skills :
Strong hands-on experience in AWS EC2, S3, RDS, VPC, IAM, Lambda, CloudWatch, and EKS.Proficient in Linux administration (RHEL, CentOS, Ubuntu) and Bash / Python scripting.Working knowledge of Terraform / CloudFormation / Ansible / Jenkins.Familiarity with AI-powered monitoring or AIOps tools (e.g., AWS DevOps Guru, Datadog AI, Ops Ramp, Splunk AI).Knowledge of Docker and Kubernetes for containerized workloads.Understanding of networking (DNS, VPN, routing, firewalls) and security best practices.Excellent analytical and problem-solving skills with a proactive mindset.Skills Required
Linux Adminstration, Aws, Networking