Essential Duties
Include, but are not limited to, the following :
- Own productionizing models—from tracked experiments to governed releases—ensuring resilient services with clear SLOs, runbooks, and fast, safe rollbacks.
- Build automation-first delivery : reproducible builds, layered tests, and environment promotion via GitLab CI and Terraform-based IaC.
- Engineer scalable serving : batch and real-time inference on EKS / ECS / Lambda and Databricks Model Serving with probes, autoscaling, and canary / blue-green deployments.
- Instrument end-to-end observability (data, model, system); detect drift / regressions; lead incidents and post-mortems that drive durable fixes.
- Partner across teams to translate requirements into designs, ADRs, and change plans; balance security, privacy, cost, and performance tradeoffs.
- Continuously reduce toil through automation, optimize model / GPU / LLM cost, and evolve templates / playbooks for repeatable delivery.
Minimum Qualifications :
Bachelor’s degree in Computer Science, Engineering, Data Science, or a related field and 3+ years of relevant experience as outlined in the essential duties; or High School Diploma / General Education Degree and 6+ years of relevant experience as outlined in the essential duties in lieu of Bachelor’s Degree.3+ years operating ML systems in production (MLOps).Experience with Python for ML engineering (packaging, typing, testing, performance)Experience developing GitLab CI for ML / GenAI (multi-stage pipelines, artifacts, evaluation / security gates) and Terraform for ML / GenAI (reusable modules, drift detection); secure packaging & containerization.Experience deploying and operating compute for ML (EKS / ECS / Lambda), and secure data access patterns (S3 / VPC / IAM / KMS, private endpoints)Experience implementing MLflow tracking, model registry & governed promotion, packaging & deployment to multi-target runtimes.Experience operating real-time + batch / streaming inference workloads, ML observability, layered testing (unit / integration), workflow orchestration, and cost optimization.Experience designing and implementing IAM least-privilege, secrets / key management for CI / CD pipelines; privacy and compliance awareness.Preferred Qualifications :
Advanced GitLab CI (dynamic child pipelines, components, cross-project triggers, security scans, compliance gates).Advanced Terraform (policy-as-code, gated plan / apply, environment promotion).Advanced real-time serving (multi-tenant routing, dynamic model loading) and SLO-driven rollback / automation.Databricks governance (Unity Catalog, lineage) and feature platform approval / reuse workflows.