Location : India (remote – Bangalore / Karnataka area preferred)
Type : Full-time contractor / employee
Urgency : Position to be filled ASAP
About the role
You will be a core member of the team building a data platform that maps economic, advertising and real-estate actors using public / open data sources (social networks, marketplaces, registers, press) for public administration.
Key responsibilities
- Design, build and maintain ingestion pipelines from APIs, web scrapers and open data sources (batch & incremental / delta loads).
- Implement data workflows using Airflow (or similar) and Python / Spark for cleaning, normalization and entity resolution.
- Model and optimize datasets in PostgreSQL / PostGIS , S3-compatible object storage (MinIO) and Elasticsearch .
- Implement provenance tracking (URL, timestamp, hash) and basic quality checks (coverage, error rate, freshness).
- Work closely with Data Analysts, BI and DevOps to ensure reliable, secure and scalable data flows.
- Document data models, schemas, and pipelines for handover to local and client teams.
Must-have skills
3–6 years of experience as Data Engineer.Strong Python (Pandas, PySpark or Spark) and SQL.Hands-on with Airflow (or similar orchestrator), Kafka or other streaming / queue systems.Experience with PostgreSQL (indexes, partitioning, query optimization).Good understanding of data modelling , lineage and data quality.Comfortable working in Linux environments, Docker and Git.Experience in distributed systems and performance tuning.Nice-to-have
Experience with PostGIS , GeoServer / MapLibre or geo-analytics.Experience with OSINT / public-data / web-scraping projects (Playwright / Selenium).Knowledge of Neo4j or other graph DBs.Exposure to security and compliance (data residency, GDPR-like frameworks).