Talent.com
AI Engineer - GPT / LangChain / RAG / Data Pipelines
AI Engineer - GPT / LangChain / RAG / Data PipelinesPeak Trust Global Real Estate • gurgaon, haryana, in
AI Engineer - GPT / LangChain / RAG / Data Pipelines

AI Engineer - GPT / LangChain / RAG / Data Pipelines

Peak Trust Global Real Estate • gurgaon, haryana, in
4 hours ago
Job description

Location : Remote

Type : Full-time

Experience : 3+ Years

Salary : up to 70K / Month based on experience

Role Summary

We are looking for a hands-on AI Data Engineer who can independently manage end-to-end data workflows, including data collection, document processing, dataset preparation, retrieval pipelines, model fine-tuning, and data visualization.

This role requires strong technical skills across Python, automation, ML tooling, and analytical reporting.

Key Responsibilities (Technical) 1. Data Acquisition & Automation

  • Build automated data collection workflows using tools such as Firecrawl , Playwright , Scrapy , or similar frameworks
  • Extract multi-format documents (PDFs, HTML, text, images)
  • Handle large-scale crawling, rate limits, error handling, and scheduling

2. Document Processing & Transformation

  • Clean and process unstructured documents
  • Apply OCR (Tesseract, PaddleOCR) for scanned files
  • Convert and structure data using PyPDF2 , pymupdf , BeautifulSoup , etc.
  • Prepare data in formats such as JSON, JSONL, or CSV
  • 3. Dataset Preparation

  • Segment and structure text for ML training
  • Create Q&A datasets, summaries, instruction-response pairs, and labeled text
  • Build high-quality datasets compatible with fine-tuning frameworks
  • 4. Retrieval & Indexing Pipelines

  • Implement document chunking strategies
  • Generate embeddings and manage vector databases ( Qdrant , Pinecone , Weaviate )
  • Build retrieval workflows using LangChain or LlamaIndex
  • Optimize retrieval accuracy and latency
  • 5. Model Training & Fine-Tuning

  • Run fine-tuning jobs using HuggingFace Transformers , LoRA / QLoRA , or similar methods
  • Monitor training performance and refine datasets
  • Package and deploy fine-tuned models
  • 6. Data Visualization & Analytics

  • Create analytical charts, trends, and insights using :
  • Pandas
  • Matplotlib
  • Seaborn
  • Plotly
  • Build simple internal dashboards or visual summaries for reports
  • Transform raw datasets into meaningful visual insights
  • 7. Automation & Infrastructure

  • Write modular, maintainable Python scripts
  • Containerize workflows with Docker
  • Maintain version control with Git
  • Ensure reproducibility and pipeline stability
  • Required Technical Skills

  • Strong proficiency in Python
  • Experience with Firecrawl , Playwright, Scrapy, or similar tools
  • Strong background in document parsing , text processing, and OCR
  • Familiarity with LangChain or LlamaIndex
  • Experience with vector databases
  • Hands-on experience with HuggingFace , Transformer models, and fine-tuning
  • Ability to write clean, efficient data pipelines
  • Experience with Matplotlib , Seaborn , Plotly , or other visualization tools
  • Comfort using Docker and Git
  • Nice to Have

  • Experience serving models or building small APIs (FastAPI)
  • Exposure to GPU training environments
  • Background in large-scale unstructured data work
  • Ability to create lightweight dashboards (Plotly Dash, Streamlit)
  • Ideal Candidate

  • Comfortable owning full pipelines independently
  • Detail-oriented and analytical
  • Strong problem-solving ability
  • Can work with minimal supervision
  • Enjoys building structured systems from scratch
  • Create a job alert for this search

    Ai Data Engineer • gurgaon, haryana, in

    Related jobs
    AI Engineer

    AI Engineer

    empirical.run • Gurugram, Haryana, India
    At Empirical, we build AI agents that write and maintain e2e tests for web apps.Our agents ship thousands of test changes daily, by replicating actions that QA engineers take : editing test code fil...Show more
    Last updated: 6 days ago • Promoted
    AI Engineer

    AI Engineer

    KPMG Delivery Network • Gurugram, Haryana, India
    KPMG Delivery Network India (KDNI) is a diverse entity spread across multiple cities in India.We are an important part of the KPMG Delivery Network (KDN), a global organization that supports KPMG m...Show more
    Last updated: 6 days ago • Promoted
    Generative AI Engineer - Python / LLM

    Generative AI Engineer - Python / LLM

    BigStep Technologies • Gurugram
    Key Responsibilities : - Designing and Developing AI models, creating architectures, algorithms, and frameworks for generative AI. Implementing AI models into existing systems a...Show more
    Last updated: 30+ days ago • Promoted
    Ai Engineer - Gpt / Langchain / Rag / Data Pipelines

    Ai Engineer - Gpt / Langchain / Rag / Data Pipelines

    Peak Trust Global Real Estate • Gurgaon, Republic Of India, IN
    This role requires strong technical skills across Python, automation, ML tooling, and analytical reporting.Key Responsibilities (Technical). Build automated data collection workflows using tools suc...Show more
    Last updated: less than 1 hour ago • Promoted • New!
    AI Python Engineer

    AI Python Engineer

    SolarSys Innovations Private Ltd • gurugram, India
    AI Engineer (Backend) – Python.Design and develop Python backend services using FastAPI FastAPI for AI-powered.Implement agentic workflows using LangChain LangChain and / or Azure AI agentic.AutoGen ...Show more
    Last updated: 4 hours ago • Promoted • New!
    Generative AI Engineer

    Generative AI Engineer

    True Tech Professionals • Gurugram, Haryana, India
    Design, develop, and deploy Generative AI solutions for real-world applications.Work with advanced Large Language Models (LLMs) and build Retrieval-Augmented Generation (RAG) pipelines.Implement AI...Show more
    Last updated: 30+ days ago • Promoted
    Python (Airflow / DBT) Engineer

    Python (Airflow / DBT) Engineer

    Luxoft • Gurgaon, Haryana, India
    Project Description : This project is part of a strategic initiative to migrate legacy on-premises data systems to a modern, scalable cloud-based data platform using Snowflake on Azure.The goal is t...Show more
    Last updated: 10 days ago • Promoted
    Generative AI Engineer

    Generative AI Engineer

    Live Connections • gurgaon, haryana, in
    Required Notice Period - Immediate Joiners or Serving Notice or 30 days.Bachelor’s in CS / ML / AI or related field; Master’s or PhD preferred. ML / Data Science with a focus on generative AI, LLMs, or co...Show more
    Last updated: 16 days ago • Promoted
    Auxo AI - Data Engineer - Python / ETL

    Auxo AI - Data Engineer - Python / ETL

    AuxoAI • Gurugram
    Description : AuxoAI is seeking a skilled and experienced Data Engineer to join our dynamic team.The ideal candidate will have 3-7 years of prior e...Show more
    Last updated: 9 days ago • Promoted
    EXL - Generative AI Engineer - SQL / Python

    EXL - Generative AI Engineer - SQL / Python

    EXL • Gurugram
    Senior / Analytics Consultant GenAI (48 years) Location : Gurgaon / Bangalore - Hybrid About the Role : Show more
    Last updated: 30+ days ago • Promoted
    Senior Associate Engineer - Generative AI

    Senior Associate Engineer - Generative AI

    Spectral Consultants • Gurgaon
    Job Description : - Spectral Consultant is currently hiring Senior AssociateGen AI & Python for one of the...Show more
    Last updated: 30+ days ago • Promoted
    Gen AI Engineer

    Gen AI Engineer

    Max Healthcare • Gurugram, Haryana, India
    Location : Max Healthcare, Gurgaon Head Office.We are looking for an AI Engineer with strong expertise in Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and AWS cloud infrastruc...Show more
    Last updated: 3 days ago • Promoted
    AI Data Engineer - 17852

    AI Data Engineer - 17852

    Turing • Gurugram, Haryana, India
    We’re looking for experienced AI data engineers skilled in Python to collaborate with one of the world’s top Large Language Model (LLM) companies. Your work will directly help improve how AI models ...Show more
    Last updated: 6 days ago • Promoted
    Lead GCP Data Engineer

    Lead GCP Data Engineer

    Impetus • Gurugram, Haryana, India
    Lead Data Engineer – GCP (BigQuery • Composer • Python • PySpark).You will lead the design, build and operation of large-scale data platforms on the Google Cloud Platform.You will manage a team of ...Show more
    Last updated: 6 days ago • Promoted
    GCP Data Engineer

    GCP Data Engineer

    Impetus • Gurugram, Haryana, India
    Design, build, and maintain large-scale data pipelines on BigQuery and other Google Cloud Platform (GCP) services.Use Python and PySpark / Spark to transform, clean, aggregate and prepare data for an...Show more
    Last updated: 30+ days ago • Promoted
    AI Engineer - GPT / LangChain / RAG / Data Pipelines

    AI Engineer - GPT / LangChain / RAG / Data Pipelines

    Peak Trust Global Real Estate • Gurgaon, Haryana, India
    Location : Remote Type : Full-time Experience : 3+ Years Salary : up to 70K / Month based on experience Role Summary We are looking for a hands-on AI Data Engineer who can independently manage en...Show more
    Last updated: 2 hours ago • Promoted • New!
    Senior Data & AI Engineer

    Senior Data & AI Engineer

    TBO.COM • Gurugram, Haryana, India
    Floor Tower C Building No - 5 DLF Epitome Gurgaon Haryana.TBO is a global platform that aims to simplify all buying and selling travel needs of travel partners across the world.The proprietary tech...Show more
    Last updated: 4 days ago • Promoted
    Max Healthcare - Generative AI Engineer - Python / LLM

    Max Healthcare - Generative AI Engineer - Python / LLM

    Max Healthcare • Gurugram
    Location : Max Healthcare, Gurgaon Head Office.Role Overview : We are looking for an AI Engineer with strong expertise in Large Language Models (LLMs), Retrieval-Au...Show more
    Last updated: 30+ days ago • Promoted