We're hiring a
Data Scientist / Data Engineer
to help us turn raw data into reliable datasets, insights, and models that drive real decisions. This role blends strong
data engineering
(pipelines, quality, orchestration) with hands-on
data science
(analysis, experimentation, forecasting, ML when needed). You'll work closely with product and engineering teams to build data products that are accurate, scalable, and actionable.
What you'll do
Design and build end-to-end
data pipelines
(batch and, if applicable, streaming).
Collect, clean, transform, and model data into well-structured datasets for analytics and ML.
Develop and maintain a
data warehouse / lake
model (dimensional modeling, data marts, curated layers).
Implement
data quality checks , observability, lineage, and monitoring.
Perform exploratory analysis and deliver insights via dashboards, notebooks, and stakeholder-ready summaries.
Build and deploy ML models when needed (forecasting, churn / segmentation, anomaly detection, recommendations).
Run experiments / A / B testing support (metrics definitions, evaluation, statistical validity).
Collaborate with backend teams to define event schemas, tracking plans, and data contracts.
Optimize performance and cost across storage, compute, and queries.
Must-have skills
Strong SQL and solid programming skills (Python preferred).
Experience building pipelines using tools like
Airflow / Dagster / Prefect
(or equivalent).
Strong knowledge of data modeling (star schema, slowly changing dimensions, event modeling).
Experience with at least one of :
PostgreSQL / MySQL / BigQuery / Snowflake / Redshift .
Proven ability to validate data correctness and implement
data quality frameworks .
Comfortable communicating insights and technical trade-offs to non-technical stakeholders.
Nice-to-have skills
Streaming :
Kafka / Kinesis / PubSub , real-time processing ( Spark Streaming / Flink ).
Big data :
Spark , distributed compute, partitioning strategies.
Lakehouse :
Iceberg / Delta / Hudi , object storage (S3 / GCS / Azure Blob).
MLOps :
MLflow , model monitoring, feature stores, deployment pipelines.
BI :
Superset / Power BI / Looker / Metabase , semantic layers.
Cloud : AWS / Azure / GCP (IAM, networking basics, managed data services).
Experience with privacy / security compliance (PII handling, retention policies, access controls).
What we value
Ownership : you build reliable systems, not just one-off scripts.
Curiosity : you ask the 'why' behind metrics and propose better approaches.
Practicality : you can balance speed vs correctness and deliver iteratively.
Strong collaboration with engineers, product, and leadership.
Data Scientist • Delhi, India