Talent.com
This job offer is not available in your country.
Data Engineer - Python / SQL

Data Engineer - Python / SQL

Masin Projects Pvt. LtdGurgaon
30+ days ago
Job description

Data Engineer - Multi-source ETL & GenAI Pipelines (3+ Years)

Roles and Responsibilities :

  • Build and maintain scalable, fault-tolerant data pipelines to support GenAI and analytics workloads across OCR, documents, and case data.
  • Manage ingestion and transformation of semi-structured legal documents (PDF, Word, Excel) into structured formats.
  • Enable RAG workflows by processing data into chunked, vectorized formats with metadata.
  • Handle large-scale ingestion from multiple sources into cloud-native data lakes (S3, GCS), data warehouses (BigQuery, Snowflake), and PostgreSQL.
  • Automate pipelines using orchestration tools like Airflow / Prefect, including retry logic, alerting, and metadata tracking.
  • Collaborate with ML Engineers to ensure data availability, traceability, and performance for inference and training pipelines.
  • Implement data validation and testing frameworks using Great Expectations or dbt.
  • Integrate OCR pipelines and post-processing outputs for embedding and document search.
  • Design infrastructure for streaming vs batch data needs and optimize for cost, latency, and reliability.

Qualifications :

  • Bachelors or Masters degree in Computer Science, Data Engineering, or equivalent.
  • 3+ years of experience in building distributed data pipelines and managing multi-source ingestion.
  • Proficiency with Python, SQL, and data tools like Pandas, PySpark.
  • Experience working with data orchestration tools (Airflow, Prefect), and file formats like Parquet, Avro, JSON.
  • Hands-on experience with cloud storage / data warehouse systems (S3, GCS, BigQuery, Redshift).
  • Understanding of GenAI and vector database ingestion pipelines is a strong plus.
  • Bonus : Experience with OCR tools (Tesseract, Google Document AI), PDF parsing libraries (PyMuPDF), and API-based document processors.
  • (ref : hirist.tech)

    Create a job alert for this search

    Data Engineer • Gurgaon

    Related jobs
    • Promoted
    Data Engineer

    Data Engineer

    Canopus Infosystems - A CMMI Level 3 CompanyDelhi, IN
    Python expertise and hands-on experience in handling large datasets, data cleaning, analysis, and visualization.The ideal candidate should be capable of building data pipelines, performing web scra...Show moreLast updated: 22 days ago
    • Promoted
    • New!
    Data Engineer

    Data Engineer

    Vriba SolutionsDelhi, IN
    Design, develop & maintain ETL / ELT pipelines.Ingest & transform data from APIs, DBs, files, streams.Build real-time & batch processing solutions. Data validation, quality & cleansing.Translate busin...Show moreLast updated: 5 hours ago
    • Promoted
    Databricks SQL Engineer with Commercial Pharma

    Databricks SQL Engineer with Commercial Pharma

    KeasisDelhi, IN
    Job Title : Databricks SQL Engineer with Commercial Pharma.We are looking for a Databricks SQL Engineer to design, develop, and optimize data solutions using Databricks SQL.The ideal candidate will ...Show moreLast updated: 4 days ago
    • Promoted
    Data Engineer

    Data Engineer

    INFEC Servicesnoida, delhi, in
    Design, develop, and optimize data pipelines and ETL processes on GCP or Azure.Work with structured and unstructured data, integrating sources such as databases, APIs, and streaming platforms.Imple...Show moreLast updated: 9 days ago
    • Promoted
    • New!
    Lead Data Engineer

    Lead Data Engineer

    ITQube LTDnoida, delhi, in
    Looking for a Data Engineer to join our engineering will contribute directly to the design, automation, and optimization of our data processes, primarily developing solutions in Python within the A...Show moreLast updated: 4 hours ago
    • Promoted
    • New!
    Senior GCP Data Engineer

    Senior GCP Data Engineer

    Kovan Technology Solutionsfaridabad, haryana, in
    Role : Sr GCP Data Engineer (Google Cloud Platform).Develop, construct, test and maintain data acquisition pipelines for large volumes of structed and unstructured data. This includes batch and real-...Show moreLast updated: 4 hours ago
    • Promoted
    Data Engineer

    Data Engineer

    Havells India LtdNoida, Uttar Pradesh, India
    We are seeking a skilled and experienced Data Engineer to join our dynamic team.The ideal candidate will have a strong background in data engineering, with a focus on PySpark, Python, and SQL.Exper...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Databricks Engineer

    Databricks Engineer

    RapidBrainsGhaziabad, IN
    ETL, and cloud-based data solutions.The ideal candidate should be highly skilled in.SQL, Databricks, PySpark, and cloud technologies. AWS (S3, Redshift, Lambda, EC2).Automate ETL processes and impro...Show moreLast updated: 5 hours ago
    • Promoted
    Data Engineer

    Data Engineer

    Manuh TechnologiesGhaziabad, IN
    Strong proficiency in Python programming (Pandas, NumPy, PySpark, or similar).Hands-on experience with Dask for large-scale distributed data processing. Proven expertise as a Data Modeler (conceptua...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Data Engineer

    Senior Data Engineer

    SAIVA AIfaridabad, haryana, in
    We are building the future of healthcare analytics.Join us to design, build, and scale robust data pipelines that power nationwide analytics and support our machine learning systems.Our goal : pipel...Show moreLast updated: 12 days ago
    • Promoted
    • New!
    Data Engineer

    Data Engineer

    LanceSoft Middle Eastfaridabad, haryana, in
    We have a new opportunity for ".Interested candidates send me your CV to.Months Contract with possible extension.Previous experience 7-10+ as a big data engineer. In-depth knowledge of Hadoop (Cloud...Show moreLast updated: 4 hours ago
    • Promoted
    Data Engineer

    Data Engineer

    ACL Digitalfaridabad, haryana, in
    Design, develop, and optimize Spark-based data pipelines on Databricks for large-scale data processing.Design, develop, and optimize AWS pipeline as applicable. Implement and manage GitHub asset bun...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Data Engineer

    Senior Data Engineer

    DeltacubesDelhi, IN
    Build and maintain scalable ETL / ELT pipelines.Work with Snowflake and BigQuery for data storage.Implement orchestration with Airflow or Prefect. Integrate data workflows with Python.Optimize data pi...Show moreLast updated: 19 days ago
    • Promoted
    Data Engineer

    Data Engineer

    TechVeritofaridabad, haryana, in
    You will play a critical role in designing, building, and optimizing data workflows that enable scalable analytics and real-time insights. The ideal candidate is hands-on, detail-oriented, and passi...Show moreLast updated: 4 days ago
    • Promoted
    Azure Data Engineer

    Azure Data Engineer

    Tata Consultancy Servicesfaridabad, haryana, in
    Job Title : Azure Data Engineer.Required Skillset : Azure Databricks(Strong),scala(strong), Pyspark(Good to have).Azure Databricks using Scala, PySpark. Software Development Life Cycle experience.Work...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Data Engineer

    Data Engineer

    Response InformaticsGreater Delhi Area, India
    Requirements : Expert Data Engineer.Hands-on experience with PySpark / Spark-SQL.Hands-on with Spark, SQL optimization.Deep understanding of DWH, Data modelling. Skills : Data Engineering- Databricks, ...Show moreLast updated: 5 hours ago
    • Promoted
    Data Engineer

    Data Engineer

    HISH IT SERVICESDelhi, IN
    Location : Remote(Banglore,Chennai,Pune).Pay : 14LPA - 18 LPA(Based on Experience).Timings : A couple of hours overlap with EST, as the client is Canada-based (till 12AM IST).Start Date : 20th Octob...Show moreLast updated: 11 days ago
    • Promoted
    Data Engineer

    Data Engineer

    VAANTECHnew delhi, delhi, in
    Immediate Joiner (or within 15 days).Includes Night Shifts (US Shift).Preferred Candidates : From Chennai.Data Pipeline Optimization & Tuning. Hadoop Infra / Cloud Platforms.Lead and mentor a team of...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Data Engineer (AWS / Databricks)

    Senior Data Engineer (AWS / Databricks)

    Accoladefaridabad, haryana, in
    The multifamily real estate industry is undergoing a massive transformation, and Accolade is at the forefront.We are building the industry's first AI-native Operations Centralization Platform, desi...Show moreLast updated: 4 hours ago
    • Promoted
    Backend and Data Pipeline Engineer

    Backend and Data Pipeline Engineer

    JRD SystemsDelhi, IN
    Job Role : Backend and Data Pipeline Engineer - Python.We’re investing in technology to develop new products that help our customers drive their growth and transformation agenda.These include new da...Show moreLast updated: 9 days ago