Talent.com
Freelance Deep Web Crawler Engineer (AI-Integrated Data Pipeline)
Freelance Deep Web Crawler Engineer (AI-Integrated Data Pipeline)Sixteen Alpha AI • New Delhi, Delhi, India
Freelance Deep Web Crawler Engineer (AI-Integrated Data Pipeline)

Freelance Deep Web Crawler Engineer (AI-Integrated Data Pipeline)

Sixteen Alpha AI • New Delhi, Delhi, India
4 days ago
Job description

About the Project

We’re developing a next-generation intelligent web crawling system capable of exploring deep and dynamic web data sources — including sites behind authentication, infinite scrolls, and JavaScript-heavy pages.

The crawler will be integrated with an AI-driven pipeline for automated data understanding, classification, and transformation.

We’re looking for a highly experienced engineer who has previously built large-scale, distributed crawling frameworks and integrated AI or NLP / LLM-based components for contextual data extraction.

Key Responsibilities

  • Design, develop, and deploy scalable deep web crawlers capable of bypassing common anti-bot mechanisms.
  • Implement AI-integrated pipelines for data processing, entity extraction, and semantic categorization.
  • Develop dynamic scraping systems for sites that rely on JavaScript, infinite scrolling, or APIs.
  • Integrate with vector databases , LLM-based data labeling, or automated content enrichment modules.
  • Optimize crawling logic for speed, reliability, and stealth across distributed environments.
  • Collaborate on data pipeline orchestration using tools like Airflow, Prefect, or custom async architectures.

Required Expertise

  • Proven experience building deep or dark web crawlers (Playwright, Scrapy, Puppeteer, or custom async frameworks).
  • Strong understanding of browser automation, session management, and anti-detection mechanisms .
  • Experience integrating AI / ML / NLP pipelines — e.g., text classification, entity recognition, or embedding-based similarity.
  • Skilled in asynchronous Python (asyncio, aiohttp, Playwright async API).
  • Familiar with database and pipeline systems — PostgreSQL, MongoDB, Elasticsearch, or similar.
  • Ability to design robust data flows that connect crawling → AI inference → storage / visualization.
  • Nice to Have

  • Knowledge of LLMs (OpenAI, Hugging Face, LangChain, or custom fine-tuned models) .
  • Experience with data cleaning, deduplication, and normalization pipelines .
  • Familiarity with distributed crawling frameworks (Ray, Celery, Kafka) .
  • Prior experience integrating real-time analytics dashboards or monitoring tools.
  • What We Offer

  • Competitive freelance pay based on expertise and delivery.
  • Flexible, async-first remote collaboration.
  • Opportunity to shape an AI-first data platform from the ground up.
  • Potential for long-term partnership if the collaboration is successful.
  • Create a job alert for this search

    Engineer • New Delhi, Delhi, India

    Related jobs
    Ai Web Scraping Engineer

    Ai Web Scraping Engineer

    Acuity Knowledge Partners • Gurgaon, Republic Of India, IN
    Job Responsibilities : Develop and maintain web scraping scripts using Javascript, Python and Selenium to extract data from websites and APIs Understanding requirement of customer needs, identify so...Show more
    Last updated: 19 hours ago • Promoted • New!
    Full Stack AI engineer

    Full Stack AI engineer

    AnswerThis (YC F25) • Meerut, IN
    Remote (Applications open worldwide).Semantic Search, Vector Databases, Prompt Engineering, GenAI Frameworks, React Agents, Graph Agents, Document Parsing, Python, Scalable APIs.AnswerThis is an AI...Show more
    Last updated: 30+ days ago • Promoted
    Web Crawling Engineer

    Web Crawling Engineer

    Forage AI • Meerut, IN
    The ideal candidate will have strong Python programming skills and experience in web scraping frameworks, browser automation tools, and handling anti-scraping mechanisms. Forage AI is a pioneering A...Show more
    Last updated: 6 days ago • Promoted
    Data Engineer - Web Scraping

    Data Engineer - Web Scraping

    Alternative Path • Meerut, IN
    Alternative Path is seeking skilled software developers to collaborate on client projects with an asset management firm.In this role, you will collaborate with individuals across various company de...Show more
    Last updated: 30+ days ago • Promoted
    Senior Full Stack Engineer

    Senior Full Stack Engineer

    Confidential • Noida, India
    Remote Opportunity Working days.Full Stack Engineer (AI Application Developer).Build front-end interfaces and backend services integrating AI models. Full Stack Engineer (AI Application Developer) –...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Data Engineers (Google Stack)- Remote

    Sr. Data Engineers (Google Stack)- Remote

    Mewar Infotech Limited • Delhi, IN
    Remote
    BigQuery, Vertex AI, Pub / Sub, Cloud Functions.Implement transformations using .Collaborate with stakeholders for data modeling, operational support, and performance tuning.Strong hands-on experienc...Show more
    Last updated: 7 days ago • Promoted
    AI Web Scraping Engineer

    AI Web Scraping Engineer

    S2T AI - AI-Powered Investigations • Delhi, IN
    We're seeking a forward-thinking.AI tools to accelerate development and streamline data extraction processes.Join our India team and work at the intersection of traditional scraping expertise and c...Show more
    Last updated: 30+ days ago • Promoted
    Backend + AI Engineer

    Backend + AI Engineer

    Rivi • Meerut, IN
    We build AI-first products across travel and beyond.We’re looking for a backend-builder passionate about scalable APIs, microservices, databases, and LLM integrations to power seamless, high-perfor...Show more
    Last updated: 30+ days ago • Promoted
    Web Developer

    Web Developer

    GOODFAITH BRANDING & DIGITAL MARKETING • Delhi, IN
    GOODFAITH Branding & Digital Marketing specializes in creating deeply human, optimally structured and impactful digital branding and marketing strategies rooted in logic. Combining advanced AI tools...Show more
    Last updated: 3 days ago • Promoted
    Clicflyer - Data Acquisition Specialist - Web Scraping

    Clicflyer - Data Acquisition Specialist - Web Scraping

    ClicFlyer • Gurugram
    Responsibilities : - Develop Cutting-Edge Solutions : Design, develop and maintain robust web scraping solutions that extract large datasets from various websites, f...Show more
    Last updated: 30+ days ago • Promoted
    Clicflyer - Data Engineer - Hadoop

    Clicflyer - Data Engineer - Hadoop

    ClicFlyer • Gurugram
    Roles and Responsibilities : Proficiency in building highly scalable ETL and streaming-based data pipelines using Google Cloud Platform (GCP) services and pr...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Google Bigquery Engineer

    Sr. Google Bigquery Engineer

    Celsior • Noida, Republic Of India, IN
    We are seeking an experienced Google BigQuery Developer to support a large-scale migration project involving the transition from current to digital applications, modernization of data flows, and in...Show more
    Last updated: 2 days ago • Promoted
    Sr. Google BigQuery Engineer

    Sr. Google BigQuery Engineer

    Celsior • Ghaziabad, IN
    We are seeking an experienced Google BigQuery Developer to support a large-scale migration project involving the transition from current to digital applications, modernization of data flows, and in...Show more
    Last updated: 1 day ago • Promoted
    Forward Deployed AI Engineer

    Forward Deployed AI Engineer

    Palindrome • Delhi, IN
    As a Forward-Deployed Engineer (GenAI), you will partner directly with Palindrome’s customers in wealth management and financial services to design, prototype, and productionise GenAI-driven soluti...Show more
    Last updated: 10 days ago • Promoted
    Deep Learning Engineer (Freelancer)

    Deep Learning Engineer (Freelancer)

    Confidential • Delhi
    Develop and train deep learning models for diverse use cases.Optimize model performance and ensure scalability across applications. Collaborate with data science and engineering teams to integrate d...Show more
    Last updated: 30+ days ago • Promoted
    Remote GenAI Engineer

    Remote GenAI Engineer

    EazyML • Delhi, IN
    Remote
    Founded by Bell Labs research veterans, and associated with breakthrough startups like Amelia, EazyML, specializes in Transparent Machine Learning. Early on EazyML founders saw the need for Transpa...Show more
    Last updated: 28 days ago • Promoted
    Web Developer Search Engine Optimization

    Web Developer Search Engine Optimization

    Orchid Hotel And Catering Supplies • Karol Bagh, Delhi, India
    Orchid Dinex is a leading supplier of premium tableware and buffetware for the HoReCa industry.We specialize in porcelain crockery, innovative buffet solutions, banquet and catering buffet displays...Show more
    Last updated: 30+ days ago • Promoted
    Forward Deployed Engineer

    Forward Deployed Engineer

    Searchability® • Meerut, IN
    Forward Deployed Engineer - AI💻.Remote-based - relocation to Dubai📍.Searchability MENA is working with an innovative AI startup looking for a. This is a rare chance to get involved with a company ...Show more
    Last updated: 13 hours ago • Promoted • New!