This job offer is not available in your country.

Data Engineer - Python / SQL

Masin Projects Pvt. LtdGurgaon

30+ days ago

Job description

Data Engineer - Multi-source ETL & GenAI Pipelines (3+ Years)

Roles and Responsibilities :

Build and maintain scalable, fault-tolerant data pipelines to support GenAI and analytics workloads across OCR, documents, and case data.
Manage ingestion and transformation of semi-structured legal documents (PDF, Word, Excel) into structured formats.
Enable RAG workflows by processing data into chunked, vectorized formats with metadata.
Handle large-scale ingestion from multiple sources into cloud-native data lakes (S3, GCS), data warehouses (BigQuery, Snowflake), and PostgreSQL.
Automate pipelines using orchestration tools like Airflow / Prefect, including retry logic, alerting, and metadata tracking.
Collaborate with ML Engineers to ensure data availability, traceability, and performance for inference and training pipelines.
Implement data validation and testing frameworks using Great Expectations or dbt.
Integrate OCR pipelines and post-processing outputs for embedding and document search.
Design infrastructure for streaming vs batch data needs and optimize for cost, latency, and reliability.

Qualifications :

Bachelors or Masters degree in Computer Science, Data Engineering, or equivalent.

3+ years of experience in building distributed data pipelines and managing multi-source ingestion.

Proficiency with Python, SQL, and data tools like Pandas, PySpark.

Experience working with data orchestration tools (Airflow, Prefect), and file formats like Parquet, Avro, JSON.

Hands-on experience with cloud storage / data warehouse systems (S3, GCS, BigQuery, Redshift).

Understanding of GenAI and vector database ingestion pipelines is a strong plus.

Bonus : Experience with OCR tools (Tesseract, Google Document AI), PDF parsing libraries (PyMuPDF), and API-based document processors.

(ref : hirist.tech)

Create a job alert for this search

Data Engineer • Gurgaon

Related jobs

Promoted

Data Engineer

Canopus Infosystems - A CMMI Level 3 CompanyDelhi, IN

Python expertise and hands-on experience in handling large datasets, data cleaning, analysis, and visualization.The ideal candidate should be capable of building data pipelines, performing web scra...Show moreLast updated: 22 days ago

Promoted
New!

Data Engineer

Vriba SolutionsDelhi, IN

Design, develop & maintain ETL / ELT pipelines.Ingest & transform data from APIs, DBs, files, streams.Build real-time & batch processing solutions. Data validation, quality & cleansing.Translate busin...Show moreLast updated: 5 hours ago

Promoted

Databricks SQL Engineer with Commercial Pharma

KeasisDelhi, IN

Job Title : Databricks SQL Engineer with Commercial Pharma.We are looking for a Databricks SQL Engineer to design, develop, and optimize data solutions using Databricks SQL.The ideal candidate will ...Show moreLast updated: 4 days ago

Promoted

Data Engineer

INFEC Servicesnoida, delhi, in

Design, develop, and optimize data pipelines and ETL processes on GCP or Azure.Work with structured and unstructured data, integrating sources such as databases, APIs, and streaming platforms.Imple...Show moreLast updated: 9 days ago

Promoted
New!

Lead Data Engineer

ITQube LTDnoida, delhi, in

Looking for a Data Engineer to join our engineering will contribute directly to the design, automation, and optimization of our data processes, primarily developing solutions in Python within the A...Show moreLast updated: 4 hours ago

Promoted
New!

Senior GCP Data Engineer

Kovan Technology Solutionsfaridabad, haryana, in

Role : Sr GCP Data Engineer (Google Cloud Platform).Develop, construct, test and maintain data acquisition pipelines for large volumes of structed and unstructured data. This includes batch and real-...Show moreLast updated: 4 hours ago

Promoted

Data Engineer

Havells India LtdNoida, Uttar Pradesh, India

We are seeking a skilled and experienced Data Engineer to join our dynamic team.The ideal candidate will have a strong background in data engineering, with a focus on PySpark, Python, and SQL.Exper...Show moreLast updated: 30+ days ago

Promoted
New!

Databricks Engineer

RapidBrainsGhaziabad, IN

ETL, and cloud-based data solutions.The ideal candidate should be highly skilled in.SQL, Databricks, PySpark, and cloud technologies. AWS (S3, Redshift, Lambda, EC2).Automate ETL processes and impro...Show moreLast updated: 5 hours ago

Promoted

Data Engineer

Manuh TechnologiesGhaziabad, IN

Strong proficiency in Python programming (Pandas, NumPy, PySpark, or similar).Hands-on experience with Dask for large-scale distributed data processing. Proven expertise as a Data Modeler (conceptua...Show moreLast updated: 30+ days ago

Promoted

Senior Data Engineer

SAIVA AIfaridabad, haryana, in

We are building the future of healthcare analytics.Join us to design, build, and scale robust data pipelines that power nationwide analytics and support our machine learning systems.Our goal : pipel...Show moreLast updated: 12 days ago

Promoted
New!

Data Engineer

LanceSoft Middle Eastfaridabad, haryana, in

We have a new opportunity for ".Interested candidates send me your CV to.Months Contract with possible extension.Previous experience 7-10+ as a big data engineer. In-depth knowledge of Hadoop (Cloud...Show moreLast updated: 4 hours ago

Promoted

Data Engineer

ACL Digitalfaridabad, haryana, in

Design, develop, and optimize Spark-based data pipelines on Databricks for large-scale data processing.Design, develop, and optimize AWS pipeline as applicable. Implement and manage GitHub asset bun...Show moreLast updated: 30+ days ago

Promoted

Senior Data Engineer

DeltacubesDelhi, IN

Build and maintain scalable ETL / ELT pipelines.Work with Snowflake and BigQuery for data storage.Implement orchestration with Airflow or Prefect. Integrate data workflows with Python.Optimize data pi...Show moreLast updated: 19 days ago

Promoted

Data Engineer

TechVeritofaridabad, haryana, in

You will play a critical role in designing, building, and optimizing data workflows that enable scalable analytics and real-time insights. The ideal candidate is hands-on, detail-oriented, and passi...Show moreLast updated: 4 days ago

Promoted

Azure Data Engineer

Tata Consultancy Servicesfaridabad, haryana, in

Job Title : Azure Data Engineer.Required Skillset : Azure Databricks(Strong),scala(strong), Pyspark(Good to have).Azure Databricks using Scala, PySpark. Software Development Life Cycle experience.Work...Show moreLast updated: 30+ days ago

Promoted
New!

Data Engineer

Response InformaticsGreater Delhi Area, India

Requirements : Expert Data Engineer.Hands-on experience with PySpark / Spark-SQL.Hands-on with Spark, SQL optimization.Deep understanding of DWH, Data modelling. Skills : Data Engineering- Databricks, ...Show moreLast updated: 5 hours ago

Promoted

Data Engineer

HISH IT SERVICESDelhi, IN

Location : Remote(Banglore,Chennai,Pune).Pay : 14LPA - 18 LPA(Based on Experience).Timings : A couple of hours overlap with EST, as the client is Canada-based (till 12AM IST).Start Date : 20th Octob...Show moreLast updated: 11 days ago

Promoted

Data Engineer

VAANTECHnew delhi, delhi, in

Immediate Joiner (or within 15 days).Includes Night Shifts (US Shift).Preferred Candidates : From Chennai.Data Pipeline Optimization & Tuning. Hadoop Infra / Cloud Platforms.Lead and mentor a team of...Show moreLast updated: 30+ days ago

Promoted
New!

Senior Data Engineer (AWS / Databricks)

Accoladefaridabad, haryana, in

The multifamily real estate industry is undergoing a massive transformation, and Accolade is at the forefront.We are building the industry's first AI-native Operations Centralization Platform, desi...Show moreLast updated: 4 hours ago

Promoted

Backend and Data Pipeline Engineer

JRD SystemsDelhi, IN

Job Role : Backend and Data Pipeline Engineer - Python.We’re investing in technology to develop new products that help our customers drive their growth and transformation agenda.These include new da...Show moreLast updated: 9 days ago