We’re a stealth robotics startup in Palo Alto hiring an engineer to define and ship a canonical Tactile Tensor and the reference SDK + conformance suite that makes tactile data reproducible, interoperable, and directly usable for robotics perception and foundation-model training.
Critical requirement : deterministic, byte-stable serialization + strict versioning, plus tokenization-ready interfaces (tensors → stable token streams) for Transformer-style robotics pipelines—without heavy dependencies.
What you’ll do
- Define the Tactile Tensor : units, coordinate frames, timestamps, shapes, uncertainty, required metadata, and forward / backward compatibility rules.
- Build a lightweight reference SDK (Python and / or C++) that validates, serializes / deserializes, and produces identical outputs across platforms.
- Specify training-grade data contracts : deterministic windowing / patching, normalization / quantization, and token schemas that are stable across sensors and logging setups.
- Ship a public-facing spec + examples + CI conformance tests so external robotics labs / OEMs can implement against it with confidence.
- Architect the tensor representation to ensure physical invariances (e.g., coordinate-frame independence, scale-invariant contact patches) so that policies trained on one robot's geometry generalize to another.
Requirements
PhD in a relevant field (Robotics, Computer Science, Applied Mathematics, Electrical Engineering, or similar), or 3+ years of equivalent industry experience.Excellent software engineering fundamentals (API design, packaging, CI, testing, docs).Python and / or C++ proficiency (both ideal).Proven ability to design deterministic serialization and conformance tests (identical inputs → identical bytes across platforms).Experience with high-rate numeric data formats (Arrow / Parquet / Zarr / Protobuf / FlatBuffers or similar).Ability to design metadata + lineage for robotics datasets (device ID, calibration artifact ID, robot / config versions, provenance).Familiarity with ML data pipelines; ability to define tokenization / embedding conventions for transformer training without bundling full ML stacks.Experience designing data schemas that explicitly handle and flag physical sensor artifacts (saturation, dropout, thermal drift, and variable sampling rates) without crashing downstream model inference.Preferred
Experience authoring standards / specs, file formats, or widely-used SDKs.HPC / embedded / performance background; strong “minimal dependency” philosophy.Experience with data integrity / attestation (hashing / signing, provenance chains) for tamper-evident robotics logs.Key Deliverables
PDF Spec : Tactile Tensor schema, metadata / lineage rules, determinism + versioning / migration, conformance criteria.Reference SDK : lightweight schema objects, validators, deterministic serializer / deserializer, minimal dependencies.Dataset Container Spec : reproducible storage + examples (streaming + offline parity; robotics log friendly).ML Interfaces : modular tokenization hooks + reference tokenization recipes (windowing / patching + quantization conventions).CI Suite : golden files, byte-stability, backward / forward compatibility tests, reference implementations.Contract-to-hire with a clear path to full-time and founding equity for the right fit.