About the Company
We are an AI and Data Consulting Startup transforming how businesses leverage technology through four core service lines :
Consulting Services : AI Strategy, Automation, and Digital Transformation for enterprises.
SaaS Platform Development : Building a business application suite similar to Odoo and Zoho that is AI-native and user friendly.
Data Lakehouse Solutions : Unified data pipelines for aggregation, cleaning, governance, and advanced analytics.
Government Contracting : Developing secure, compliant AI solutions for the public sector.
Our tech stack includes Python, TypeScript, React, Next.js, Go, Rust, Azure, Kubernetes, Spark, MLflow, Postgres, graph databases, and vector stores.
We're a small, fast-moving team delivering enterprise-grade solutions with startup agility.
Role Overview
We are seeking a Senior DevOps / Site Reliability Engineer to design, scale, and optimize our cloud infrastructure. You'll directly influence our system reliability, deployment velocity, and security posture.
Roles & Responsibilities
Infrastructure & Cloud Management
Design and manage Azure Kubernetes Service (AKS) clusters for production workloads.
Configure Azure networking components – VNets, Application Gateway, NSG, Load Balancing.
Build and maintain Dockerized microservices and Helm chart deployments.
Implement Infrastructure as Code (IaC) using Terraform for modular, reusable infrastructure.
CI / CD Pipeline & Deployment Automation
Build GitHub Actions workflows for automated testing, building, and deployment.
Implement blue-green and canary deployments for zero-downtime releases.
Create automated rollback mechanisms and optimize build & deployment pipelines.
Monitoring, Observability & Reliability
Implement Prometheus, Grafana, ELK / Datadog for system monitoring.
Define alerting thresholds and dashboards for uptime and performance metrics.
Lead incident response, root cause analysis, and post-mortem documentation.
Performance, Scalability & Cost Optimization
Architect solutions for real-time WebSocket scalability across thousands of users.
Implement auto-scaling policies (Kubernetes HPA, cluster autoscaler).
Optimize infrastructure to reduce cloud costs by 20–30%.
Security & Compliance
Implement secrets management with Azure Key Vault or HashiCorp Vault.
Enforce container security, network segmentation, and access control.
Support SOC 2, HIPAA, and CMMC L2 compliance initiatives.
Collaboration & Mentorship
Work closely with the Solutions Architect and developers to ensure release reliability.
Mentor junior team members and document DevOps best practices.
Participate in on-call rotations and improve operational excellence.
Skills & Qualifications
Must-Have Skills
Azure Kubernetes Service (AKS) cluster management
Docker, Helm, Terraform (Infrastructure as Code)
CI / CD pipelines (GitHub Actions, Jenkins, GitLab CI / CD)
Prometheus / Grafana / ELK Stack
Azure Networking (NSG, VNet, Load Balancer, App Gateway)
Secrets Management (Azure Key Vault / HashiCorp Vault)
SQL / WebSocket scalability knowledge
Good-to-Have Skills
ArgoCD / Flux (GitOps)
KEDA (Event-Driven Autoscaling)
OpenTelemetry (Distributed Tracing)
Istio / Linkerd (Service Mesh)
Python or Go scripting
FinOps and Azure Cost Management
Education
UG : B.Tech / B.E. – Computer Science / Information Technology
PG : M.Tech / M.E. / Any Postgraduate (preferred)
Why Join Us
Impact : You'll have significant influence on the overall architecture and scalability of our products and the solutions we provide to a diverse set of clients.
Growth : Opportunities to lead team(s) as our organization expands.
Learning : Work with the latest in Azure, Kubernetes, and AI infrastructure.
Culture : Flat hierarchy, collaborative, and outcome-driven team.
Senior Engineer • Bhubaneswar, Odisha, India