TensorStax is building the next generation of autonomous agents for data engineering. Backed by a $5M seed round.
The Role
As Data Engineer you will design, build, and optimize production-grade pipelines that our agents learn from and eventually operate. You will own the modeling layer in dbt, the orchestration layer in Airflow, and the heavy-lift workloads in Spark.
What You’ll Do
- Model complex, interdependent schemas in dbt across hundreds of tables
- Build advanced, multi-branch Airflow DAGs with sophisticated dependency and failure handling
- Author high-performance Spark jobs (PySpark or Scala) for large-scale batch and incremental workloads
- Codify lineage, testing, and metadata so agents can reason about pipeline state
- Profile and tune query performance across warehouses and lakehouse engines
- Partner with the agent research team to expose realistic failure modes, data drifts, and SLA violations for RL training
- Containerize and deploy everything on Kubernetes-backed infra
About You
4+ years in data engineering or analytics engineering, shipping pipelines at scaleDeep experience with dbt, including macros, custom tests, and refactoring legacy modelsTrack record building and debugging complex Airflow DAGs (Sensors, TaskGroups, SubDAG patterns)Spark power-user capable of distributed joins, window functions, and memory tuningSolid Python, Git, and CI disciplineBonus : experience with Iceberg, Delta, or DataFusion; prior RL or agent workWhy TensorStax
Write the pipelines our autonomous agents learn to operateWork in a tight, senior team that values clean code and measurable impactCompetitive salary, meaningful equity, and hardware budgetRemote-first with optional SF office