Talent.com
Forage AI - Data Pipeline Engineer - Python / ETL / SQL

Forage AI - Data Pipeline Engineer - Python / ETL / SQL

Forage AIDelhi, IN
30+ days ago
Job type
  • Remote
Job description

Description : Data Pipeline Engineer Web Services, WebCrawling, ETL, NLP(spaCy / LLM), AWS. Experience Level : 5-7 years of relevant experience in data engineering.

About Forage AI :

Forage AI is a pioneering AI-powered data extraction and automation company that transforms complex, unstructured web and document data into clean, structured intelligence. Our platform combines web crawling, NLP, LLMs, and agentic AI to deliver highly accurate firmographic and enterprise insights across numerous domains. Trusted by global clients in finance, real estate, and healthcare, Forage AI enables businesses to automate workflows, reduce manual rework, and access high-quality data at scale.

About the Role :

We are seeking a Data Pipeline Engineer to develop, optimize, and maintain production-grade data pipelines focused on web data extraction and ETL workflows. This is a hands-on role requiring strong experience with Python (as the primary programming language), spaCy, LLMs, webcrawling, and cloud deployment in containerized environments.

Youll have opportunities to propose, experiment with, and implement GenAI-driven approaches, innovative automations, and new strategies as part of our product and pipeline evolution. Candidates should have 5-8 years of relevant experience in data engineering, software engineering, or related fields.

Key Responsibilities :

  • Design, build, and manage scalable pipelines for ingesting, processing, and storing web and API data.
  • Develop robust web crawlers and scrapers in Python (Scrapy, lxml, Playwright) for structured and unstructured data.
  • Create and monitor ETL workflows for data cleansing, transformation, and loading into PostgreSQL and MongoDB.
  • Apply spaCy for NLP tasks and integrate / fine-tune modern LLMs for analytics.
  • DriveGenAI-based innovation and automation in core data workflows.
  • Develop and deploy secure REST APIs and web services for data access and Integrate RabbitMQ,Kafka, SQS(for distributed queueing), and Redis (for caching) into data workflows; also proficient with distributed queue tools such as Celery, TaskIQ.
  • Containerize and deploy solutions using Docker on AWS(EC2, ECS, Lambda).
  • Collaborate with data teams, maintain pipeline documentation, and enforce data quality standards.
  • Maintain and enhance legacy in-house applications as required.

Technical Skills & Requirements :

  • Primary programming language is Python; must have experience writing independent Python packages.
  • Experience with multithreading and asynchronous programming in Python.
  • Advanced Python skills, including web crawling (Scrapy, lxml, Playwright) and strong SQL / data handling abilities.
  • Experience with PostgreSQL (SQL) and MongoDB (NoSQL).
  • Proficient with workflow orchestration tools such as Airflow.
  • Hands-on experience with RabbitMQ, Kafka, SQS(for queueing / distributed processing), and Redis (for caching).
  • Practical experience with spaCy for NLP and integration of at least one LLM platform (OpenAI, HuggingFace, etc.).
  • Experience with GenAI / LLMs, prompt engineering, or integrating GenAI features into data products.
  • Proficiency with Docker and AWS services (EC2, ECS, Lambda).
  • Experienced in developing secure, scalable REST APIs using FastAPI and / or Flask.
  • Familiarity with third-party APIs integration, including authentication, data handling, and rate limiting.
  • Proficient in using Git for version control and collaboration.
  • Strong analytical, problem-solving, and documentation skills.
  • Bachelors or Masters degree in Computer Science or related field.
  • What We Offer :

  • High ownership and autonomy in shaping technical solutions and system architecture.
  • Opportunities to learn modern technologies and propose technical initiatives including GenAI-based approaches.
  • Collaborative, supportive, and growth-oriented engineering culture.
  • Exposure to a broad set of business and technical problems.
  • Structured onboarding and domain training.
  • Work-from-Home :
  • Business-grade computer (modern processor i7, i9 , 16 GB+ RAM) with no performance obstacles.
  • Reliable high-speed internet for video calls and remote work.
  • Quality headphones & camera for clear audio and video.
  • Stable power supply and backup options in case of outages.
  • (ref : hirist.tech)

    Create a job alert for this search

    Data Pipeline Engineer • Delhi, IN

    Related jobs
    • Promoted
    • New!
    Fullstack Python Engineer (Cloud and AI)

    Fullstack Python Engineer (Cloud and AI)

    Luxoft Indianarela, delhi, in
    Opening roles in Bangalore, Chennai, Gurugram, Hyderabad and Noida.Must have experience of maintaining & upgrading.NET based applications on Windows OS / Azure Cloud ,Experience with the MVC.NET, S...Show moreLast updated: 9 hours ago
    • Promoted
    • New!
    ETL & Tableau Data Engineer

    ETL & Tableau Data Engineer

    Magma ConsultancyDelhi, IN
    ETL & Tableau Data Engineer (Remote | Full-Time | Join in 2 Weeks).We are a forward-thinking technology and data consulting firm dedicated to helping organizations unlock the full potential of thei...Show moreLast updated: 15 hours ago
    • Promoted
    Senior Python Data Engineer

    Senior Python Data Engineer

    iVoyantghaziabad, uttar pradesh, in
    Join a dynamic engineering team working on a high-impact tax reporting platform for the 2025 fiscal season.The core goal is to modernize and significantly accelerate the generation of Excel-based r...Show moreLast updated: 3 days ago
    • Promoted
    • New!
    Lead Data Engineer - Python & GCP || Contract Job || 8-10 Years Experience

    Lead Data Engineer - Python & GCP || Contract Job || 8-10 Years Experience

    People Prime Worldwideghaziabad, uttar pradesh, in
    Our Client is a global IT services company headquartered in Southborough, Massachusetts, USA.Founded in 1996, with a revenue of $1. B, with 35,000+ associates worldwide, specializes in digital engin...Show moreLast updated: 9 hours ago
    • Promoted
    Platform Engineer – Python / Databricks / Notebooks / Kubernetes

    Platform Engineer – Python / Databricks / Notebooks / Kubernetes

    SyntasaDelhi, IN
    Platform Engineer – Python / Databricks / Notebooks / Kubernetes.Syntasa is seeking a high-caliber and dedicated.Syntasa Technologies India Private Limited. This position offers an exciting opportuni...Show moreLast updated: 26 days ago
    • Promoted
    • New!
    AWS Data Engineer

    AWS Data Engineer

    People Prime WorldwideDelhi, IN
    Important Note (Please Read Before Applying).You have less than 8 years or more than 10 years of total experience.You do NOT have strong Python + AWS Data Engineering experience.You are NOT hands-o...Show moreLast updated: 15 hours ago
    • Promoted
    Senior Data Engineer

    Senior Data Engineer

    IntelliasDelhi, IN
    Apache Flink / Apache Spark (Streaming).Data Engineer or similar role, with hands-on expertise in large-scale, production-grade data pipelines. Kafka + Flink / Spark Streaming).Python for data engin...Show moreLast updated: 16 days ago
    • Promoted
    • New!
    AWS data engineer

    AWS data engineer

    Tata Consultancy ServicesGhaziabad, IN
    TCS is looking for AWS data engineer.Location : Kolkata, Hyderabad, Bangalore, Chennai, Pune, Gurgaon.Strong hands-on experience with AWS Data Services : . Amazon S3, Glue, Redshift, Athena, Kinesis, E...Show moreLast updated: 15 hours ago
    • Promoted
    Data Engineer - Offshore

    Data Engineer - Offshore

    iO AssociatesDelhi, India, India
    Working Pattern : 5 Working Days per Month.We are seeking an experienced Data Engineer with strong hands-on expertise in Snowflake and Python-based ETL development. The successful candidate will buil...Show moreLast updated: 6 days ago
    • Promoted
    Data Engineer

    Data Engineer

    IntraEdgeGhaziabad, IN
    We are seeking a highly skilled Data Engineer with strong experience in Python, PySpark, Snowflake, and AWS Glue to join our growing data team. You will be responsible for building scalable and reli...Show moreLast updated: 30+ days ago
    • Promoted
    Data Engineer

    Data Engineer

    Alp Consulting Ltd.Delhi, IN
    Architect and maintain our Amazon Redshift Serverless data warehouse.Design and implement ETL pipelines from operational Redshift to staging (DSA), landing (DLA), and TDW layers.Model data using st...Show moreLast updated: 30+ days ago
    • Promoted
    Data Engineer

    Data Engineer

    MastekDelhi, IN
    Deep hands-on experience with Unity Catalog — creating and managing catalogs, schemas, and tables.Experience automating data onboarding and metadata registration via Unity Catalog APIs or Databrick...Show moreLast updated: 10 days ago
    • Promoted
    Data Engineer

    Data Engineer

    Ubique SystemsDelhi, IN
    Primary skills : Python, SQL, data lakes, azure.Pipeline Development & Automation.Design, build, and maintain CI / CD pipelines to automate deployment of DQ rules and data services across environments...Show moreLast updated: 30+ days ago
    • Promoted
    Data Engineering Role

    Data Engineering Role

    100x.incDelhi, IN
    At least 3 years of professional experience in Data Engineering.Demonstrated end-to-end ownership of ETL pipelines.Deep, hands-on experience with AWS services : EC2, Athena, Lambda, and Step Functio...Show moreLast updated: 30+ days ago
    • Promoted
    AI Data Engineer

    AI Data Engineer

    Peak Trust Global Real EstateDelhi, IN
    This role requires strong technical skills across Python, automation, ML tooling, and analytical reporting.Key Responsibilities (Technical). Build automated data collection workflows using tools suc...Show moreLast updated: 3 days ago
    • Promoted
    • New!
    Data Engineer

    Data Engineer

    NexionPro ServicesDelhi, IN
    PySpark-based data processing workflows.Collaborate with data architects, analysts, and cross-functional teams to understand business requirements and translate them into technical solutions.Optimi...Show moreLast updated: 15 hours ago
    • Promoted
    Data Engineer

    Data Engineer

    HCLTechnarela, delhi, in
    We’re Expanding Our Digital & Data Talent Pool.Are you a professional ready to take on enterprise-scale transformation journey? We are on a mass hiring drive and looking for experts who can bring i...Show moreLast updated: 30+ days ago
    • Promoted
    Data Analytics Developer

    Data Analytics Developer

    Daten Technology SolutionsNoida, Uttar Pradesh, India
    Tableau, SAP BO (or any other reporting tools).Experience of any other Google services like DataStream, Cloud composer, Dataplex. Required Skills and Qualifications.Implementing Business Intelligenc...Show moreLast updated: 14 days ago