Job Title : DevOps Engineer
Location : Ahmedabad
Department : Engineering & Infrastructure
Reports To : CTO
________________________________________
About Omnidya Tech LLP,
Hello
Omnidya is building India’s first advanced AI-powered dashcam ecosystem for fleet management, safety analytics, and smart transportation. Our platform fuses edge AI processing (ADAS, DMS, ANPR, telematics) with secure cloud connectivity (AWS IoT, S3, MQTT, and real-time streaming).
We are seeking a DevOps Engineer to scale our infrastructure, automate build and deployment pipelines, and manage GPU-based AI compute clusters both on-premise and in the cloud.
________________________________________
Role Overview
As a DevOps Engineer, you will play a crucial role in automating deployments, managing distributed edge-cloud systems, and maintaining our GPU training and inference environments. You’ll work closely with the AI, firmware, and backend teams to ensure smooth CI / CD workflows, optimal GPU utilization, and high system reliability.
________________________________________
Key Responsibilities
🧩 CI / CD & Automation
- Design, build, and maintain CI / CD pipelines using GitLab CI, Jenkins, or GitHub Actions for backend, AI, and firmware builds.
- Automate testing and deployment for Yocto-based embedded systems
- Create Docker containers and deployment scripts for AI inference and cloud microservices.
☁️ Cloud & Infrastructure Management
Manage and scale AWS infrastructure (IoT Core, EC2, ECR, CloudWatch, Lambda, Route 53).Set up and maintain Terraform or CloudFormation for Infrastructure as Code (IaC).Implement robust monitoring, alerting, and log aggregation using Prometheus, Grafana, ELK, or CloudWatch.⚙️ GPU Rack & Compute Cluster Management
Manage on-premise GPU servers / AI training racks (Ubuntu-based, multi-GPU systems).Configure, optimize, and monitor GPU utilization for PyTorch / TensorFlow workloads.Handle CUDA driver updates, containerized training environments, and model deployment pipelines.Automate job scheduling using Slurm, Docker Swarm, or Kubernetes for GPU workloads.Monitor performance metrics (GPU load, memory, thermals, power usage) to ensure stable training and inference operations.📡 Device Integration & Fleet Management
Streamline OTA (Over-The-Air) update pipelines for connected edge devices.Manage provisioning, authentication, and status monitoring of thousands of IoT devices.Ensure robust MQTT, REST API, and video data sync between dashcams and the cloud.🔒 Security & Compliance
Implement AWS IAM policies, TLS / SSL certificates, and secure OTA mechanisms.Collaborate on device and cloud-level security hardening for regulatory compliance (BIS, ICAT).📘 Documentation & Collaboration
Document automation flows, deployment topologies, and infrastructure standards.Collaborate with AI, embedded, and backend teams to align deployment processes across systems.________________________________________
Required Skills & Experience
🎓 Experience
3–7 years of experience in DevOps, Cloud Infrastructure, or Site Reliability Engineering.🛠️ Technical Skills
Linux system administration (Ubuntu, Yocto, Debian)Containerization : Docker, Podman, Kubernetes (preferably K3s / MicroK8s)CI / CD Tools : GitLab CI, Jenkins, GitHub ActionsCloud Platforms : AWS (EC2, IoT Core, S3, Lambda, CloudWatch)IaC : Terraform, CloudFormationMonitoring : Prometheus, Grafana, ELK StackNetworking : VPN, DNS, load balancing, NAT, SSL certificatesGPU Systems :o Hands-on with NVIDIA GPU drivers, CUDA, cuDNN, TensorRT
o Experience with GPU workload management, thermal / power profiling, and optimization
o Familiarity with multi-GPU training, inference scaling, and model deployment
💡 Bonus Skills
Experience with embedded Linux (Yocto, NXP)Understanding of RTMP / FLV streaming pipelines or GStreamerFamiliarity with Python microservices (FastAPI / Flask)Knowledge of AI / ML model lifecycle management (training → quantization → edge inference)________________________________________
Soft Skills
Strong analytical and problem-solving mindset.Excellent communication and cross-functional collaboration.Passion for automation, reliability, and scalability.Ability to work independently in a fast-paced startup environment.________________________________________
What We Offer
Competitive salary and performance-based bonuses.Opportunity to work on cutting-edge edge-AI + GPU infrastructure projects.Exposure to AWS, IoT, AI training clusters, and fleet-scale deployment systems.Hybrid work setup and rapid growth opportunities in a high-impact product team.