Talent.com
Freelance Deep Web Crawler Engineer (Ai-Integrated Data Pipeline)

Freelance Deep Web Crawler Engineer (Ai-Integrated Data Pipeline)

Sixteen Alpha AIMadurai, Republic Of India, IN
6 days ago
Job description

About the Project

We’re developing a next-generation intelligent web crawling system capable of exploring deep and dynamic web data sources — including sites behind authentication, infinite scrolls, and JavaScript-heavy pages.

The crawler will be integrated with an AI-driven pipeline for automated data understanding, classification, and transformation.

We’re looking for a highly experienced engineer who has previously built large-scale, distributed crawling frameworks and integrated AI or NLP / LLM-based components for contextual data extraction.

Key Responsibilities

  • Design, develop, and deploy scalable deep web crawlers capable of bypassing common anti-bot mechanisms.
  • Implement AI-integrated pipelines for data processing, entity extraction, and semantic categorization.
  • Develop dynamic scraping systems for sites that rely on JavaScript, infinite scrolling, or APIs.
  • Integrate with vector databases , LLM-based data labeling, or automated content enrichment modules.
  • Optimize crawling logic for speed, reliability, and stealth across distributed environments.
  • Collaborate on data pipeline orchestration using tools like Airflow, Prefect, or custom async architectures.

Required Expertise

  • Proven experience building deep or dark web crawlers (Playwright, Scrapy, Puppeteer, or custom async frameworks).
  • Strong understanding of browser automation, session management, and anti-detection mechanisms .
  • Experience integrating AI / ML / NLP pipelines — e.G., text classification, entity recognition, or embedding-based similarity.
  • Skilled in asynchronous Python (asyncio, aiohttp, Playwright async API).
  • Familiar with database and pipeline systems — PostgreSQL, MongoDB, Elasticsearch, or similar.
  • Ability to design robust data flows that connect crawling → AI inference → storage / visualization.
  • Nice to Have

  • Knowledge of LLMs (OpenAI, Hugging Face, LangChain, or custom fine-tuned models) .
  • Experience with data cleaning, deduplication, and normalization pipelines .
  • Familiarity with distributed crawling frameworks (Ray, Celery, Kafka) .
  • Prior experience integrating real-time analytics dashboards or monitoring tools.
  • What We Offer

  • Competitive freelance pay based on expertise and delivery.
  • Flexible, async-first remote collaboration.
  • Opportunity to shape an AI-first data platform from the ground up.
  • Potential for long-term partnership if the collaboration is successful.
  • Create a job alert for this search

    Engineer • Madurai, Republic Of India, IN

    Related jobs
    • Promoted
    Remote GenAI Engineer

    Remote GenAI Engineer

    EazyMLMadurai, IN
    Remote
    Founded by Bell Labs research veterans, and associated with breakthrough startups like Amelia, EazyML, specializes in Transparent Machine Learning. Early on EazyML founders saw the need for Transpa...Show moreLast updated: 29 days ago
    • Promoted
    Full Stack Web Developer

    Full Stack Web Developer

    DecodesMadurai, IN
    Familiarity with containerization tools (.A maker mindset! you care about both.Build and ship features end-to-end, fast.Work closely with design, product, and other devs to make magic happen.Keep t...Show moreLast updated: 2 days ago
    • Promoted
    Data Engineer

    Data Engineer

    RecroMadurai, IN
    Data Pipeline Engineering : Design, build, and maintain ingestion, transformation, and storage pipelines using Azure Data Factory, Synapse Analytics, and Data Lake. AI Data Enablement : Collaborate wi...Show moreLast updated: 30+ days ago
    • Promoted
    Backend + AI Engineer

    Backend + AI Engineer

    RiviDindigul, IN
    We build AI-first products across travel and beyond.We’re looking for a backend-builder passionate about scalable APIs, microservices, databases, and LLM integrations to power seamless, high-perfor...Show moreLast updated: 30+ days ago
    • Promoted
    Sr. Google BigQuery Engineer

    Sr. Google BigQuery Engineer

    CelsiorMadurai, IN
    We are seeking an experienced Google BigQuery Developer to support a large-scale migration project involving the transition from current to digital applications, modernization of data flows, and in...Show moreLast updated: 3 days ago
    • Promoted
    Full Stack AI engineer

    Full Stack AI engineer

    AnswerThis (YC F25)Madurai, IN
    Remote (Applications open worldwide).Semantic Search, Vector Databases, Prompt Engineering, GenAI Frameworks, React Agents, Graph Agents, Document Parsing, Python, Scalable APIs.AnswerThis is an AI...Show moreLast updated: 30+ days ago
    • Promoted
    Webflow Developer

    Webflow Developer

    Summit Tech.auDindigul, IN
    Summit Tech is an Australian technology agency dedicated to providing affordable and innovative online solutions for enterprise companies. Based in Australia, Summit Tech combines cutting-edge techn...Show moreLast updated: 1 day ago
    • Promoted
    Web Analytics & Tracking Lead

    Web Analytics & Tracking Lead

    The Conqueror ChallengesDindigul, IN
    We are a growing team of passionate, performance-driven individuals on a mission to be the best at growing multiple international e-commerce businesses with great products.Over the past 8 years, we...Show moreLast updated: 26 days ago
    • Promoted
    Freelance AI / ML Engineer (ETL + MLOps + AWS)

    Freelance AI / ML Engineer (ETL + MLOps + AWS)

    ThreatXIntelMadurai, IN
    ThreatXIntel is a dedicated cybersecurity startup specializing in protecting businesses and organizations from cyber threats. With services such as cloud security, web and mobile security testing, c...Show moreLast updated: 1 day ago
    • Promoted
    Senior.Net Web Developer and SQL Expert

    Senior.Net Web Developer and SQL Expert

    AtigroMadurai, IN
    Net developer with a passion for cutting edge? Join Atigro and play a key role in shaping the future of AI-powered enterprise solutions! We’re a fast-growing AI team working on innovative, challeng...Show moreLast updated: 15 days ago
    • Promoted
    • New!
    Deep Learning Engineer

    Deep Learning Engineer

    Tomorrow World Technology (TWT)Dindigul, IN
    Position : Deep Learning Engineer – Computer Vision & Autonomy.An experienced Deep Learning Engineer specializing in Computer Vision, Sensor Fusion, and Multimodal AI. R&D; in autonomous aerial syste...Show moreLast updated: 4 hours ago
    • Promoted
    Data Engineer - Web Scraping

    Data Engineer - Web Scraping

    Alternative PathMadurai, IN
    Alternative Path is seeking skilled software developers to collaborate on client projects with an asset management firm.In this role, you will collaborate with individuals across various company de...Show moreLast updated: 30+ days ago
    • Promoted
    Forward Deployed Engineer

    Forward Deployed Engineer

    Searchability®Madurai, IN
    Forward Deployed Engineer - AI💻.Remote-based - relocation to Dubai📍.Searchability MENA is working with an innovative AI startup looking for a. This is a rare chance to get involved with a company ...Show moreLast updated: 2 days ago
    • Promoted
    Machine Learning Engineer-Agentic AI

    Machine Learning Engineer-Agentic AI

    Innodata Inc.Madurai, IN
    Design and implement multi-agent systems using LangChain, LangGraph, CrewAI, AutoGen or similar frameworks.Build A2A (agent-to-agent) orchestration and implement MCP (multi-context protocol) for co...Show moreLast updated: 15 days ago
    • Promoted
    Freelance Opportunity : Senior AI / ML Engineer (AWS MLOps & Data Pipelines)

    Freelance Opportunity : Senior AI / ML Engineer (AWS MLOps & Data Pipelines)

    ThreatXIntelMadurai, IN
    ThreatXIntel is a startup cybersecurity company dedicated to protecting businesses from evolving cyber threats.We specialize in cloud security, web and mobile security testing, DevSecOps, and tailo...Show moreLast updated: 1 day ago
    • Promoted
    AI Ops Engineer + Rag + AIML+ GitHub Copilot(Hybrid : Bangalore)

    AI Ops Engineer + Rag + AIML+ GitHub Copilot(Hybrid : Bangalore)

    DigiHelic Solutions Pvt. Ltd.Madurai, IN
    Champion the pragmatic use of AI tools (e.GitHub Copilot) across various development teams.Design and simulate realistic use cases to demonstrate the value of AI tools in different tech stacks.Act ...Show moreLast updated: 1 day ago
    • Promoted
    Lead AI Engineer

    Lead AI Engineer

    BlendMadurai, IN
    We are looking for an AI Engineer with hands-on experience designing and deploying scalable AI solutions.In this role, you will be part of a cross-functional team working on cutting-edge projects i...Show moreLast updated: 11 days ago
    • Promoted
    AI Platform Engineer

    AI Platform Engineer

    BayOne SolutionsMadurai, IN
    We are seeking a highly skilled.In this role, you will work on advanced AI systems including.Retrieval-Augmented Generation (RAG). Model Context Protocol (MCP) tools.OpenWebUI or custom-built soluti...Show moreLast updated: 6 days ago