Role Overview : You will design, build, and operate software for data collection and processing at scale. The role is hands‑on, with emphasis on clean design, reliability, and performance.About Forage AI : ForageAI builds next‑generation systems fordata collection and processing— large‑scaleweb crawling ,document parsing , data pipelines, and automation. We work primarily inPython , leveragecloud‑nativedesigns (mainlyAWS , with exposure toGCP / Azure ), and increasingly applyGenAI and AI agentsacross our stack. Every developer owns their module and collaborates closely with peers in a high‑ownership, high‑trust environment.Location : Remote (Work from Home)Key Responsibilities :
- Develop and maintainPythonapplications for crawling, parsing, enrichment, and processing of large datasets.
- Build and operate data workflows (ETL / ELT), including validation, monitoring, and error‑handling.
- Work withSQLandNoSQL(plus vector databases / data lakes) for modeling, storage, and retrieval.
- Contribute to system design usingcloud‑native components on AWS(e.g., S3, Lambda, ECS / EKS, SQS / SNS, RDS / DynamoDB, CloudWatch).
- Implement and consumeAPIs / microservices ; write clear contracts and documentation.
- Writeunit / integration tests , perform debugging and profiling; contribute tocode reviewsand maintain high code quality.
- Implementobservability(logging / metrics / tracing) and basicsecuritypractices (secrets, IAM, least privilege).
- Collaborate with Dev / QA / Ops; ship incrementally using PRs and design docs.Required Qualifications :
- 2–4 yearsof professional software engineering experience.
- Strong proficiency inPython ; good knowledge ofdata structures / algorithmsandsoftware design principles .
- Hands‑on withSQLand at least oneNoSQLstore; familiarity withvector databasesis a plus.
- Experience withweb scraping frameworks(e.g., Scrapy, Selenium / Playwright, BeautifulSoup) and resilient crawling patterns (respect robots / rotations / retries).
- Practical understanding ofsystem designand distributed systems basics.
- Exposure toAWSservices and cloud‑native design; comfortable onLinuxand withGit .Preferred / Good to Have (Prioritized)
- GenAI & LLMs : experience withLangChain, CrewAI, LlamaIndex , prompt design,RAGpatterns, and vector stores. (Candidates with this experience will be prioritized.)
- CI / CD & Containers : exposure to pipelines (GitHub Actions / Jenkins),Docker , andKubernetes .
- Data Pipelines / Big Data : ETL / ELT,Airflow ,Spark ,Kafka , or similar.
- Infra as Code : Terraform / CloudFormation; basic cost‑ and performance‑optimization on cloud.
- Frontend / JS : not required ; basic JS or frontend skills are anice‑to‑haveonly.
- Exposure toGCP / Azure .How We Work :
- Ownership of modules end‑to‑end (design → build → deploy → operate).
- Clear communication, collaborative problem‑solving, and documentation.
- Pragmatic engineering : small PRs, incremental delivery, and measurable reliability.Work‑from‑Home Requirements :
- High‑speed internet for calls and collaboration.
- A capable, reliable computer (modern CPU, 8GB+ RAM).
- Headphones with clear audio quality.
- Stable power and backup arrangements.ForageAI is an equal‑opportunity employer. We value curiosity, craftsmanship, and collaboration.