We are seeking a highly technical Senior Cloud FinOps Engineer specialized in designing, developing, and deploying AI-powered agents and automation systems that proactively monitor, analyze, and optimize multi-cloud spend (AWS, Azure, GCP) in a large-scale research and academic healthcare environment.
Responsibilities :
- Research, design, and develop AI / ML-driven agents and automation workflows that continuously ingest cloud billing, usage, and tagging data (via APIs such as AWS Cost Explorer, Azure Cost Management + Billing, GCP Billing exports, CUR, etc.).
- Build predictive models to forecast spend, identify upcoming eligibility for Savings Plans / Reservations, and recommend optimal purchase strategies (term length, payment option, instance family / region / zone / SKU, convertible vs standard) while factoring in performance SLAs and workload variability typical of research computing.
- Implement real-time anomaly and spike detection with intelligent alerting (Slack, email, ServiceNow, etc.) that includes root-cause analysis and suggested corrective actions.
- Develop automated tagging governance engines that detect missing / incorrect tags, suggest or auto-apply corrections (via Lambda / Functions / Azure Automation), and enforce research grant and department chargeback policies.
- Create “recommendation-as-code” pipelines that generate executable Infrastructure-as-Code (Terraform / CloudFormation / Bicep) or direct API calls to purchase / commit to the optimal savings instruments.
- Design and maintain a centralized FinOps AI dashboard (Power BI + custom web frontend if needed) that surfaces agent-generated insights, confidence scores, projected savings, and one-click approval workflows.
- Integrate the AI platform with existing tooling (AWS Cost Anomaly Detection, Azure Advisor, third-party FinOps platforms) and extend them where native capabilities fall short.
- Collaborate on containerized / microservice architecture (Kubernetes / EKS / AKS / GKE) for the agent platform and ensure all components meet healthcare security and compliance standards.
- Continuously measure savings attribution, model accuracy, and automation adoption; iterate on models using retraining pipelines and feedback loops.
- Document architectures, create runbooks, and mentor FinOps analysts and cloud engineers on using the new AI capabilities.
Requirement :
Education : Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, or a related quantitative field; advanced degree in a healthcare or research-related discipline is a plus.5+ years of hands-on cloud engineering and architecture experience with at least two major providers (AWS and Azure required; GCP a plus).3+ years building production-grade data pipelines, ML models, or intelligent automation in a cloud cost-management or FinOps context.Proven track record of implementing Savings Plans, Reserved Instances, and committed-use discount strategies at scale (>$10M annual cloud spend preferred).
Strong software development skills in Python (mandatory) and at least one additional language (Go, TypeScript / Node.js, Java, etc.).Hands-on experience with ML frameworks (scikit-learn, TensorFlow, PyTorch, XGBoost / LightGBM) and MLOps tools (MLflow, SageMaker, Azure ML, Vertex AI).Expertise in cloud billing APIs, Cost and Usage Reports (CUR), Cost Explorer, Azure Consumption APIs, and building enriched data lakes (S3 + Athena / Glue, Azure Data Lake + Synapse, BigQuery).Proficiency in Infrastructure as Code (Terraform primary; CloudFormation / Bicep acceptable) and CI / CD pipelines (GitHub Actions, GitLab CI, Azure DevOps).Experience with event-driven architectures (EventBridge, Azure Event Grid, Pub / Sub) and serverless compute for real-time processing.Solid understanding of tagging strategies, cost allocation, showbacks / chargebacks in decentralized research / academic environments.Nice to have :
Previous work in healthcare, academic medical centers, or grant-funded research environments.FinOps Certified Practitioner or Platform Engineer certification.Contributions to open-source FinOps or cloud-cost tools (e.g., Kubecost, Cloud Custodian, Infracost, custom agents).Experience with generative AI / LLMs for explaining recommendations to non-technical stakeholders.Familiarity with Apache Airflow, dbt, Databricks, or similar for orchestration and transformation.Knowledge of HIPAA / HITECH-compliant data handling and encryption standards in analytics workloads.