Data Engineer - Financial Infrastructure & Analytics

MerilKarnataka, India

2 days ago

Job description

About the Role

As a Quantitative Data Engineer , you will be the backbone of the data ecosystem powering our quantitative research, trading, and AI-driven strategies . You will design, build, and maintain the high-performance data infrastructure that enables low-latency, high-fidelity access to market, fundamental, and alternative data across multiple asset classes.

This role bridges quant engineering, data systems, and research enablement , ensuring that our researchers and traders have fast, reliable, and well-documented datasets for analysis and live trading. You’ll be part of a cross-functional team working at the intersection of finance, machine learning, and distributed systems .

Responsibilities

Architect and maintain scalable ETL pipelines for ingesting and transforming terabytes of structured, semi-structured, and unstructured market and alternative data.
Design time-series optimized data stores and streaming frameworks to support low-latency data access for both backtesting and live trading.
Develop ingestion frameworks integrating vendor feeds (Bloomberg, Refinitiv, Polygon, Quandl, etc.), exchange data, and internal execution systems.
Collaborate with quantitative researchers and ML teams to ensure data accuracy, feature availability, and schema evolution aligned with modeling needs.
Implement data quality checks, validation pipelines, and version control mechanisms for all datasets.
Monitor and optimize distributed compute environments (Spark, Flink, Ray, or Dask) for performance and cost efficiency.
Automate workflows using orchestration tools (Airflow, Prefect, Dagster) for reliability and reproducibility.
Establish best practices for metadata management, lineage tracking, and documentation.
Contribute to internal libraries and SDKs for seamless data access by trading and research applications.

In Trading Firms, Data Engineers Typically :

Build real-time data streaming systems to capture market ticks, order books, and execution signals.

Manage versioned historical data lakes for backtesting and model training.

Handle multi-venue data normalization (different exchanges and instruments).

Integrate alternative datasets (satellite imagery, news sentiment, ESG, supply-chain data).

Work closely with quant researchers to convert raw data into research-ready features .

Optimize pipelines for ultra-low latency where milliseconds can impact P&L.

Implement data observability frameworks to ensure uptime and quality.

Collaborate with DevOps and infra engineers to scale storage, caching, and compute.

Tech Stack

Languages : Python, SQL, Scala, Go, Rust (optional for HFT pipelines)

Data Processing : Apache Spark, Flink, Ray, Dask, Pandas, Polars

Workflow Orchestration : Apache Airflow, Prefect, Dagster

Databases & Storage : PostgreSQL, ClickHouse, DuckDB, ElasticSearch, Redis

Data Lakes : Delta Lake, Iceberg, Hudi, Parquet

Streaming : Kafka, Redpanda, Pulsar

Cloud & Infra : AWS (S3, EMR, Lambda), GCP, Azure, Kubernetes

Version Control & Lineage : DVC, MLflow, Feast, Great Expectations

Visualization / Monitoring : Grafana, Prometheus, Superset, DataDog

Tools for Finance : kdb+ / q (for tick data), InfluxDB, QuestDB

What You Will Gain

End-to-end ownership of core data infrastructure in a high-impact, mission-critical domain.

Deep exposure to quantitative research workflows , market microstructure , and real-time trading systems .

Collaboration with elite quantitative researchers, traders, and ML scientists.

Hands-on experience with cutting-edge distributed systems and time-series data technologies .

A culture that emphasizes technical excellence, autonomy, and experimentation.

Qualifications

Bachelor’s or Master’s in Computer Science, Data Engineering, or related field.

2+ years of experience building and maintaining production-grade data pipelines .

Proficiency in Python , SQL , and frameworks like Airflow , Spark , or Flink .

Familiarity with cloud storage and compute (S3, GCS, EMR, Dataproc) and versioned data lakes (Delta, Iceberg) .

Experience with financial datasets , tick-level data , or high-frequency time series is a strong plus.

Strong understanding of data modeling, schema design, and performance optimization .

Excellent communication skills with an ability to support multidisciplinary teams .

Create a job alert for this search

Infrastructure Engineer • Karnataka, India