Description : Position Overview :
Were hiring an AI Lead Engineer to architect, ship, and scale production-grade Computer Vision with a focused GenAI charter.
Youll lead 610 ML / CV engineers to deliver high-accuracy, low-latency video analyticsdetection, tracking, segmentation, recognition, re-identificationtackling multi-camera tracking, 24 / 7 streaming at scale, long-tail drift, and edge optimization on GPUs / Jetson, while partnering with Product / Platform to meet FPS, latency, accuracy, and cost SLAs.
The day-to-day centers on core CV model design, training / leval, deployment, and streaming / edge performance, complemented by GenAI that amplifies the stack : integrating VLM / LLM capabilities for natural-language video search, incident summarization, operator Q&A, and copilot workflows; standing up RAG over video / sensor / metadata embeddings with robust prompts, tooling, evals, and guardrails; and driving data ops with synthetic data, auto-labeling, active-learning triage with privacy, safety, and cost controls.
What Youll Own :
End-to-end delivery of CV products : problem framing / data / labeling / model design /
optimization / deployment / monitoring / iteration.
Technical roadmap & architecture for video analytics pipelines (ingest / decode / infer / track / post-process / store / serve).
Team leadership : mentoring, hiring input, OKRs, code / research standards, and performance coaching.
Key Responsibilities : Leadership & Execution :
- Translate business goals (e.g., reduce shrinkage, increase throughput, improve safety) into
measurable CV objectives and SLAs / SLOs (e.g., mAP / IDF1, per-frame latency, dropped-frame
rate, cost / stream).
Lead design reviews; establish MLOps and coding standards; enforce experiment tracking, reproducibility, and dataset / version governance.Drive capacity planning, GPU / Jetson utilization, batching / windowing strategy, autoscaling, and cost Vision R&D (Detection, Tracking, Segmentation, Recognition) :Deliver production models for object (e.g., YOLOv8 / v9,Mask R-CNN / Mask2Former, EfficientNet / ConvNeXt, Build person / vehicle / product re-ID, face / attribute recognition (e.g., ArcFace / CosFace), OCR (e.g., PP-OCR), and keypoint / action recognition (e.g., MMPose, SlowFast / X3D).
Tackle domain adaptation, class imbalance, and occlusions; design augmentations and semi-supervised / active learning loops to harvest hard Analytics & Edge Inference :Architect real-time pipelines using NVIDIA DeepStream / GStreamer / OpenCV; optimize decode (NVDEC), pre / post, and trackers for 3060 FPS at 1080p.Optimize with TensorRT / ONNX Runtime / Torch-TensorRT, INT8 calibration, pruning / distillation; leverage Jetson Orin / Xavier / Nano and DLA where applicable.Design multi-camera fusion, homography / camera calibration, and cross-camera ID consistency for retail, traffic, manufacturing, and security use cases.Implement privacy-by-design features (e.g., face / license blur, PII AI & LLMs :Architect robust RAG : retrieval pipelines with Pinecone / ChromaDB / Milvus; index sharding / compaction; freshness policies; hybrid search.Design agents with LangChain / LangGraph; implement tool-use, safety filters, and guardrails; add evaluation loops (e.g., Serving & MLOps :Ship services via FastAPI / Flask; containerize with Docker; orchestrate on Kubernetes (KServe) or AWS SageMaker / Vertex AI.Build high-throughput inference with Triton Inference Server (dynamic batching, concurrent models, model ensembles).Streaming & storage : RTSP / RTMP ingest, Kafka / Kinesis, object storage + time-series DB; index / frame-level metadata for search and analytics.CI / CD with MLflow / DVC (artifacts, model registry), unit / integration tests, and rollout strategies (canary, Drift & Governance :Production monitoring with Prometheus / Grafana; per-stage latency, FPS, GPU memory / SM occupancy, dropped frames, and backpressure.Model observability : data / feature drift, concept drift on detections / tracks, re-ID distribution shifts, outlier / novelty detection, safety metrics.Human-in-the-loop review tools (CVAT / Label Studio) and auto-retraining triggers; maintain model cards, evaluation reports, versioned prompts / configs, and auditability.Ensure compliance and privacy / PII handling; ONVIF / edge security best & People Leadership :Partner with Product / SRE / DevOps on roadmaps, SLAs, incident response runbooks, and cost / perf tradeoffs.Lead and grow a 610 person CV team; foster a culture of high-quality experiments, rigorous reviews, and measurable impact.Communicate progress / risks to executives with clear, metric-driven updates and customer-facing Qualifications :Candidate shall have a degree in B.E / B.Tech / MCA in any discipline preferable computer science5-8 years in ML / Computer Vision with 2+ years leading 610 engineers delivering productionvideo analytics.
Proven track record shipping systems with business impact (accuracy, latency, cost).Strong in at least one per category (and comfortable across most) :
CV Frameworks : PyTorch (preferred) or TensorFlow; OpenCV, NVIDIA : Triton Inference Server, FastAPI / Flask.Optimization : TensorRT, ONNX Runtime, Torch-TensorRT; quantization / pruning / distillation; INT8 calibration.Tracking / Re-ID / OCR : ByteTrack / OC-SORT / DeepSORT; ArcFace / CosFace; PP-OCR / Tesseract.Agents & Retrieval : LangChain or LangGraph; MLOps : Docker, Kubernetes (KServe / SageMaker / Vertex AI), MLflow / DVC.Cloud : AWS (SageMaker, EC2 / EKS) or GCP (Vertex AI) or Azure ML (AKS).Programming : Python (expert); C++ or Go for perf-critical components; CUDA fundamentals a plus.Streaming / IO : RTSP / RTMP, Kafka / Kinesis / Rabbitmq; ONVIF familiarity.Strong system design (multi-stream pipelines, GPU scheduling, distributed tracking / indexing) and excellent Qualifications :Operating vector or time-series stores for video metadata at 10M50M+ rows; search over tracks, IDs, and events.Experience with multi-camera tracking, calibration and zone-based analytics.Jetson fleet management :
Ray / Ray Serve or Kubeflow; feature stores (Feast); complex event processing.Domain experience in one or more : retail analytics, traffic / ADAS, manufacturing QA,security / safety, sports analytics.
Open-source contributions, patents, or publications in CV / video analytics.(Nice to have) Multimodal exposureCLIP / SigLIP, SAM / Mask2Former, or VLMs for captioning / searchused sparingly to support CV workflows
(ref : hirist.tech)