About the Role
We are looking for a Data Engineer to design, build, and maintain scalable data infrastructure that powers analytics, reporting, and AI-driven initiatives. You will work with modern columnar databases, design efficient ETL / ELT pipelines, and ensure the reliability and performance of our data ecosystem.
Responsibilities
- Design, develop, and maintain data pipelines (batch and streaming) to ingest, process, and transform large-scale datasets.
- Optimize query performance and storage efficiency in columnar databases (e.g., Snowflake, BigQuery, Redshift, ClickHouse).
- Write efficient SQL for data modeling, transformation, and analytics use cases.
- Collaborate with data analysts, scientists, and product teams to ensure data availability and reliability.
- Implement data quality checks, monitoring, and alerting across pipelines.
- Work with cloud platforms (AWS / GCP / Azure) and orchestration frameworks (Airflow, dbt, Dagster, etc.) to manage workflows.
- Contribute to data architecture design including partitioning, indexing, and schema evolution strategies.
- Ensure compliance with data governance, security, and privacy standards.
Requirements
Proven experience in data engineering (3–6 years preferred, but flexible based on role level).Strong knowledge of SQL and experience with columnar databases (Snowflake, BigQuery, Redshift, DuckDB, ClickHouse, etc.).Proficiency with ETL / ELT tools and workflow orchestration (Airflow, dbt, Luigi, Prefect, Dagster).Experience with Python / Scala / Java for data engineering.Familiarity with data lakes, lakehouses, and warehouse architectures.Hands-on experience with cloud services (AWS S3 / Glue, GCP Dataflow / BigQuery, Azure Data Factory, etc.).Understanding of data modeling techniques (star schema, dimensional modeling).Strong problem-solving and optimization skills for working with large datasetsNice to Have
Knowledge of real-time streaming frameworks (Kafka, Flink, Spark Streaming).Familiarity with DevOps practices (CI / CD for data pipelines, Docker, Kubernetes).Exposure to machine learning pipelines or MLOps practices.Experience with BI tools (Tableau, PowerBI, Looker).