About the job
Designation : Data & Integration Engineer (Python / TypeScript, Azure, Integrations)
Experience : 4 -8 Years
Location : Cochin
Job Summary :
Build data pipelines (crawling / parsing, deduplication / delta, embeddings) and connect external systems and interfaces.
Key Responsibilities :
- Development of crawling / fetch pipelines (API-first; playwright / requests only where permitted)
- Parsing / normalization of job postings & CVs, deduplication / delta logic (seen hash, repost heuristics)
- Embeddings / similarity search (controlling Azure OpenAI, vector persistence in pgvector)
- Integrations : HR4YOU (API / webhooks / CSV import), SerpAPI, BA job board, email / SMTP
- Batch / stream processing (Azure Functions / container jobs), retry / backoff, dead-letter queues
- Telemetry for data quality (freshness, duplicate rate, coverage, cost per 1,000 items)
- Collaboration with FE for exports (CSV / Excel, presigned URLs) and admin configuration
Must Have Requirements :
4+ years of backend / data engineering experiencePython (FastAPI, pydantic, httpx / requests, Playwright / Selenium), solid TypeScript for smaller services / SDKsAzure : Functions / Container Apps or AKS jobs, Storage / Blob, Key Vault, Monitor / Log AnalyticsMessaging : Service Bus / Queues, idempotence & exactly-once semantics, pragmatic approachDatabases : PostgreSQL, pgvector, query design & performance tuningClean ETL / ELT patterns, testability (pytest), observability (OpenTelemetry)Nice-to-have :
NLP / IE experience (spaCy / regex / rapidfuzz), document parsing (pdfminer / textract)Experience with license / ToS-compliant data retrieval, captcha / anti-bot strategies (legally compliant)Working method : API-first, clean code, trunk-based development, mandatory code reviewsTools / stack : GitHub, GitHub Actions / Azure DevOps, Docker, pnpm / Turborepo (Monorepo), Jira / Linear, Notion / ConfluenceOn-call / support : rotating, "you build it, you run it"(ref : hirist.tech)