About the Role
As a Quantitative Data Engineer , you will be the backbone of the data ecosystem powering our quantitative research, trading, and AI-driven strategies . You will design, build, and maintain the high-performance data infrastructure that enables low-latency, high-fidelity access to market, fundamental, and alternative data across multiple asset classes.
This role bridges quant engineering, data systems, and research enablement , ensuring that our researchers and traders have fast, reliable, and well-documented datasets for analysis and live trading. You’ll be part of a cross-functional team working at the intersection of finance, machine learning, and distributed systems .
Responsibilities
- Architect and maintain scalable ETL pipelines for ingesting and transforming terabytes of structured, semi-structured, and unstructured market and alternative data.
- Design time-series optimized data stores and streaming frameworks to support low-latency data access for both backtesting and live trading.
- Develop ingestion frameworks integrating vendor feeds (Bloomberg, Refinitiv, Polygon, Quandl, etc.), exchange data, and internal execution systems.
- Collaborate with quantitative researchers and ML teams to ensure data accuracy, feature availability, and schema evolution aligned with modeling needs.
- Implement data quality checks, validation pipelines, and version control mechanisms for all datasets.
- Monitor and optimize distributed compute environments (Spark, Flink, Ray, or Dask) for performance and cost efficiency.
- Automate workflows using orchestration tools (Airflow, Prefect, Dagster) for reliability and reproducibility.
- Establish best practices for metadata management, lineage tracking, and documentation.
- Contribute to internal libraries and SDKs for seamless data access by trading and research applications.
In Trading Firms, Data Engineers Typically :
Build real-time data streaming systems to capture market ticks, order books, and execution signals.Manage versioned historical data lakes for backtesting and model training.Handle multi-venue data normalization (different exchanges and instruments).Integrate alternative datasets (satellite imagery, news sentiment, ESG, supply-chain data).Work closely with quant researchers to convert raw data into research-ready features .Optimize pipelines for ultra-low latency where milliseconds can impact P&L.Implement data observability frameworks to ensure uptime and quality.Collaborate with DevOps and infra engineers to scale storage, caching, and compute.Tech Stack
Languages : Python, SQL, Scala, Go, Rust (optional for HFT pipelines)Data Processing : Apache Spark, Flink, Ray, Dask, Pandas, PolarsWorkflow Orchestration : Apache Airflow, Prefect, DagsterDatabases & Storage : PostgreSQL, ClickHouse, DuckDB, ElasticSearch, RedisData Lakes : Delta Lake, Iceberg, Hudi, ParquetStreaming : Kafka, Redpanda, PulsarCloud & Infra : AWS (S3, EMR, Lambda), GCP, Azure, KubernetesVersion Control & Lineage : DVC, MLflow, Feast, Great ExpectationsVisualization / Monitoring : Grafana, Prometheus, Superset, DataDogTools for Finance : kdb+ / q (for tick data), InfluxDB, QuestDBWhat You Will Gain
End-to-end ownership of core data infrastructure in a high-impact, mission-critical domain.Deep exposure to quantitative research workflows , market microstructure , and real-time trading systems .Collaboration with elite quantitative researchers, traders, and ML scientists.Hands-on experience with cutting-edge distributed systems and time-series data technologies .A culture that emphasizes technical excellence, autonomy, and experimentation.Qualifications
Bachelor’s or Master’s in Computer Science, Data Engineering, or related field.2+ years of experience building and maintaining production-grade data pipelines .Proficiency in Python , SQL , and frameworks like Airflow , Spark , or Flink .Familiarity with cloud storage and compute (S3, GCS, EMR, Dataproc) and versioned data lakes (Delta, Iceberg) .Experience with financial datasets , tick-level data , or high-frequency time series is a strong plus.Strong understanding of data modeling, schema design, and performance optimization .Excellent communication skills with an ability to support multidisciplinary teams .