Responsibility :
- Design, implement, and maintain AWS infrastructure (RDS, EventBridge, Lambda, FIS) following best practices.
- Develop Infrastructure-as-Code modules using Terraform or CloudFormation for reproducible environments.
- Build and integrate CI / CD pipelines (Jenkins, GitHub Actions) for automated testing, deployments, and chaos drills.
- Author automation scripts and CLI tools for snapshot validation, load-test orchestration, and smoke-tests.
- Define monitoring, alerting, and incident-response workflows; own on-call rotations and runbook creation.
- Enforce compliance requirements (FedRAMP, SOC2) in production and FedRAMP accounts.
- Collaborate cross-functionally with automation, identity, and chaos teams to unify shared utilities.
Requirements
5+ years in Site Reliability Engineering, DevOps, or Cloud Engineering roles.Expert knowledge of AWS services, especially RDS, Lambda, EventBridge, and Fault Injection Service.Proficient in Terraform and / or CloudFormation for IaC.Strong scripting skills in Python or Bash; familiarity with Node.js is a plus.Hands-on experience with CI / CD tools (Jenkins, GitHub Actions).Solid understanding of monitoring and observability (Prometheus, Grafana, CloudWatch).Excellent problem-solving, communication, and documentation skills.PreferredExperience with chaos-engineering tools (AWS FIS, Gremlin, LitmusChaos).Familiarity with identity systems and FedRAMP compliance controls.Container orchestration experience (Kubernetes or ECS).Background in database operations and disaster-recovery strategies.EducationB.E. / B.Tech or M.E. / M.Tech in Computer Science, Software Engineering, or a related field.BenefitsOpportunity to work with a dynamic and fast-paced IT organization.Make a real impact on the companys success by shaping a positive and engaging work culture.Work with a talented and collaborative team.Be part of a company that is passionate about making a difference through technology.Skills Required
Kubernetes, ECS, Prometheus, Grafana, Cloudwatch, Devops