Job Description :
ABOUT STATUSNEO & THE INITIATIVE
Status Neo is building a next-generation Agentic AI Factory - a purpose-built capability to design, develop, and deploy autonomous AI agents across multiple business domains. Operating within a cross-functional squad model and a central platform team, our mission is to deliver intelligent automation across HR, procurement, finance, supply chain, and domain specific operations. We are looking for high-calibre professionals who want to build systems that think, adapt, and act.
ROLE OVERVIEW :
As a Data Engineer in the Agentic AI Factory, you are the backbone that makes intelligent agents possible. You will architect and operate the data pipelines, feature stores, and integration layers that feed domain AI agents with clean, governed, and timely data - owning end-to-end pipelines from Oracle DB source systems through Azure Databricks to downstream ML and analytics consumers.
KEY RESPONSIBILITIES :
- Architect and manage Azure Databricks lakehouse (Bronze to Silver to Gold Delta Lake layers)
- with partitioning, Z-ordering, and ACID guarantees
- Build real-time and batch data pipelines using Azure Event Hubs, Apache Kafka, and Azure Data Factory
- Create and maintain feature engineering pipelines feeding ML feature stores used by Data Scientists and AI agents
- Design and enforce data contracts, schema registries, and data quality frameworks using Great Expectations and Databricks DQ
- Expose clean datasets and features as APIs (FastAPI or Python) consumed by MERN-stack frontend and backend services
- Implement data governance, lineage tracking, and access control using Azure Purview and Unity Catalog
- Collaborate with Software Engineers to integrate data services into the agent orchestration and memory layers
- Build monitoring dashboards and alerting for pipeline health using Azure Monitor and Grafana
- Define and implement cost management strategies for Azure compute and storage
MUST HAVE REQUIREMENTS :- 6-8 years of Data Engineering experience with demonstrable production pipeline ownership
- Expert-level Python and PySpark for large-scale data transformation
- Deep experience with Azure Databricks, Delta Lake, and Unity Catalog
- Hands-on Oracle DB integration experience: JDBC drivers, REST APIs, Oracle GoldenGate or equivalent CDC tooling
- Proficiency with Azure Data Factory, Azure Event Hubs, and Azure Blob / ADLS Gen2
- Strong SQL skills including complex window functions, CTEs, and performance optimisation
- Experience building and publishing data APIs (FastAPI or similar) for downstream consumption
- Knowledge of data modelling patterns: medallion architecture, Kimball, Data Vault
- Infrastructure-as-Code familiarity: Terraform or Bicep for Azure resource provisioning
- Experience with CI/CD for data pipelines via Azure DevOps or GitHub :
- Experience with dbt for data transformation and documentation
- Exposure to vector data stores and embedding pipelines for AI applications
- Knowledge of stream processing with Spark Structured Streaming or Flink
- Familiarity with MERN stack to aid integration with application-layer APIs
- Azure certifications (DP-203, DP-300)
(ref:hirist.tech)