Glowingbud is a rapidly growing eSIM services platform that simplifies connectivity with powerful APIs, robust B2B and B2C interfaces, and seamless integrations with Telna. Our platform enables global eSIM lifecycle management, user onboarding, secure payment systems, and scalable deployments. Recently acquired by Telna, we are expanding our product offerings and team to meet increasing demand and innovation goals.
Job Summary
We are seeking a highly experienced Senior DevOps Engineer with 10+ years of expertise in cloud infrastructure, automation, and system reliability. The ideal candidate will be responsible for maintaining scalable AWS-based environments, implementing robust CI / CD pipelines, optimizing system performance, and ensuring high availability of critical applications. This role requires deep expertise in Docker, Kubernetes, Infrastructure as Code (IaC), and system monitoring. The candidate will also be responsible for documenting system architecture, setting SLAs, and leading DevOps best practices across teams. If you thrive in a fast-paced, collaborative environment and are passionate about DevOps, we'd love to hear from you!
Key Responsibilities :
Infrastructure Management : Design, implement, and maintain scalable cloud infrastructure using AWS services.
System Documentation & Diagrams : Maintain up-to-date system diagrams, architecture documentation, and operational procedures.
Containerization & Orchestration : Deploy and manage containerized applications using Docker and Kubernetes.
System Maintenance & Optimization : Ensure high availability, performance tuning, and cost optimization of cloud and on-premise infrastructure.
Monitoring & Observability : Implement detailed system monitoring, logging, and alerting using tools like Datadog, Prometheus, Grafana, ELK stack, or AWS CloudWatch.
Security & Compliance : Enforce security best practices, conduct regular audits, and ensure adherence to compliance standards.
CI / CD Pipeline Management : Build and maintain automated deployment pipelines for seamless application releases.
Incident Response & SLA Management : Define SLAs, monitor system performance, and establish an efficient incident response strategy.
Collaboration & Leadership : Work closely with development, QA, and operations teams to improve reliability, scalability, and efficiency.
Qualifications :
7+ years of experience in DevOps, Site Reliability Engineering (SRE), or Cloud Infrastructure roles.
Expert knowledge of AWS Services (EC2, ECS, S3, RDS, Mongo Atlas, Lambda, VPC, ALB, Gateway, Cognito, WAF, IAM, Amplify CloudFormation, etc.).
Strong experience with Docker & Kubernetes for container orchestration and management.
Hands-on experience with infrastructure as code (IaC) tools like Terraform, CloudFormation, or Pulumi.
Expertise in system monitoring and logging tools (Prometheus, Grafana, ELK Stack, Datadog, AWS CloudWatch).
Proficiency in scripting languages (Bash, Python, or Go) for automation and infrastructure management.
Experience with CI / CD pipelines using Jenkins, AWS CodePipeline, GitHub Actions.
Knowledge of networking, security best practices, and system performance tuning.
Experience with setting and enforcing SLAs for DevOps teams.
Strong problem-solving skills and ability to work in a fast-paced environment.
Preferred Skills :
Thorough Experience with AWS Infrastructure.
Knowledge of serverless architectures and event-driven computing.
Experience with configuration management tools (Ansible, Chef, Puppet).
Background in database administration (PostgreSQL, MySQL, or NoSQL databases).
Senior • Aurangabad, Maharashtra, India