We are seeking a Junior Web Crawling Engineer who will be responsible for building and maintaining web crawlers, extracting valuable insights from the web, and ensuring data quality. The ideal candidate will have strong Python programming skills and experience in web scraping frameworks, browser automation tools, and handling anti-scraping mechanisms.
About Forage AI : Forage AI is a pioneering AI-powered data extraction and automation company that transforms complex, unstructured web and document data into clean, structured intelligence. Our platform combines web crawling, NLP, LLMs, and agentic AI to deliver highly accurate firmographic and enterprise insights across numerous domains. Trusted by global clients in finance, real estate, and healthcare, Forage AI enables businesses to automate workflows, reduce manual rework, and access high-quality data at scale.
Key Responsibilities :
- Maintain and enhance existing web scraping and data crawling projects.
- Develop and refine crawlers using Python-based tools and frameworks.
- Utilize browser automation tools (e.g., Playwright, Selenium) to handle dynamic content.
- Clean, validate, and integrate extracted data into downstream storage systems.
- Implement and manage solutions for anti-bot measures (CAPTCHAs, IP rotation, etc.).
- Optimize crawling efficiency and ensure compliance with web crawling best practices.
- Collaborate with cross-functional teams to improve data acquisition strategies.
Required Skills & Qualifications :
Proficiency in Python and 2 years of work experience of web scraping frameworks (especially Scrapy).Strong knowledge of browser automation tools such as Playwright or Selenium.Solid understanding of HTML, CSS, and selector languages (XPath / CSS).Experience in handling anti-scraping challenges and ensuring robust data extraction.Familiarity with distributed scraping techniques and data pipelines.Ability to troubleshoot and optimize web crawlers for performance and reliability.Strong analytical and problem-solving skills with attention to detail.Excellent communication and inter-personal skills.Other Infrastructure Requirements
Since this is a completely work-from-home position, you will also require the following -
High-speed internet connectivity for video calls and efficient work.Capable business-grade computer (e.g., modern processor, 8 GB+ of RAM, andno other obstacles to interrupted, efficient work).
Headphones with clear audio quality.Stable power connection and backups in case of internet / power failure.Show more
Show less
Skills Required
Xpath, SCRAPY, Css, Selenium, Html, Python