Position : DevOps Manager
Location : India (Remote)
Employment Type : Full-Time
Schedule : Monday to Friday, Day Shift
Company Description
Scry AI is a research-led enterprise AI company that builds intelligent platforms to drive efficiency, insight, and compliance. Our platforms Collatio®, Auriga®, and Concentio® streamline complex workflows by automating data extraction, validation, reconciliation and delivering real-time intelligence.
We are seeking a DevOps Manager to lead our infrastructure, CI / CD, and reliability practices across cloud and on-prem deployments. You will own uptime, performance, security, and cost efficiency for AI / ML workloads powering Collatio®, Auriga®, and Concentio®.
Role Overview
As DevOps Manager, you will lead a small team of DevOps / SRE engineers to design, automate, and operate secure, compliant, and highly available platforms across AWS / Azure / GCP and customer on-prem environments. You will standardize IaC, improve CI / CD velocity, build robust observability, and enable GPU-accelerated AI inference at scale for enterprise clients.
Key Responsibilities
Platform Reliability & Operations
- Own SLOs / SLIs, availability, latency, and capacity planning across services.
- Lead incident response, root-cause analysis, postmortems, and on-call processes.
- Implement backup, disaster recovery, and business continuity for multi-region and on-prem.
Cloud, On-Prem & Edge Deployments
Architect Kubernetes platforms (managed and self-hosted), including RBAC, network policies, and secrets management.Standardize infrastructure with Terraform, Helm, and GitOps (Argo CD) for repeatable customer deployments.Support Concentio® edge / IoT rollouts with secure remote updates and telemetry pipelines.AI / ML & Data Infrastructure
Enable GPU scheduling and drivers (CUDA, NVIDIA), inference runtimes (Triton), and model packaging.Build MLOps foundations (MLflow, feature stores) and artifact / version governance.Operate data services (Kafka, PostgreSQL, Redis, MinIO / S3, Elasticsearch / Opensearch) for high-throughput pipelines.CI / CD & Developer Experience
Own CI / CD with GitHub Actions / GitLab CI / Jenkins; establish trunk-based development, automated testing, and canary / blue-green releases.Maintain internal developer platforms, templates, and golden paths to improve delivery speed and quality.Security, Compliance & Observability
Implement least-privilege access, SSO (Okta / AAD), Vault-based secrets, image scanning (Trivy), and policy as code.Ensure SOC 2, ISO 27001, HIPAA / GDPR alignment with audit trails and immutable logs.Build end-to-end observability using Prometheus, Grafana, Loki / EFK, and OpenTelemetry.FinOps & Stakeholder Management
Track cloud spend, rightsize resources, and negotiate quotas for GPU / compute.Partner with Product, Data Science, and Customer Success to plan capacity for new features and enterprise go-lives.Required Qualifications & Skills
Strong Kubernetes expertise (production operations, networking, security, Helm, GitOps).Proven IaC experience with Terraform and configuration management (Ansible).CI / CD at scale with GitHub Actions / GitLab CI / Jenkins; artifact registries and SBOMs.Observability : Prometheus, Grafana, ELK / EFK or Loki, alerting and runbooks.Cloud proficiency in at least one major provider (AWS / Azure / GCP) and Linux fundamentals.Security fundamentals : network segmentation, TLS, secrets management, container hardening.Experience running data / streaming systems (Kafka, Redis, PostgreSQL) in production.Excellent communication, incident leadership, and stakeholder management.Nice-to-Have
GPU orchestration, Triton Inference Server, Hugging Face model serving.Service mesh (Istio / Linkerd), API gateways, and zero-trust patterns.MLOps tooling (MLflow, Feast), Airflow, dbt.Compliance implementations for regulated industries (BFSI, healthcare).Certifications : CKA / CKAD, AWS / Azure / GCP Architect, Security+.Our Ideal Candidate
Drives reliability with automation, not toil.Balances speed and safety with measurable delivery improvements.Thrives in customer-facing, hybrid cloud, and on-prem environments.Coaches teams with clear standards, runbooks, and continuous improvement.Tip for Candidates
If you want to build secure, high-performance platforms for real-world AI at enterprise scale, follow our page for more such relevant job openings.