At HG Insights we provides AI-powered revenue growth intelligence and technology intelligence data to help B2B companies refine their go-to-market (GTM) strategies by analyzing market size, identifying potential customers, and uncovering in-market purchasing signals. With the recent acquisitions of MadKudu and TrustRadius, we’ve created an agentic GTM ecosystem that eliminates manual handoffs, guesswork, and siloed signals.
We’re searching for a strong Data Engineer to take this ecosystem to the next level. You’ll play a critical role in tackling high-impact projects, including large-scale data integration, improving and enriching intent signals, and building the foundational capabilities that will power our future AI initiatives. This is an opportunity to work with complex data pipelines at scale, shape the backbone of our intelligence platform, and directly influence how thousands of enterprise customers uncover growth opportunities.
What You Will Do
- Build and maintain ETL pipelines using Apache Airflow in our kubernetes environment, processing 10M+ daily events from MongoDB to PostgreSQL and BigQuery
- Design real-time data streaming architectures using Google Cloud Pub / Sub for visitor tracking, intent signals, and company resolution workflows
- Develop ML data infrastructure for AI-powered features including vector embeddings, semantic search, and automated content generation using OpenAI and Google Vertex AI
- Create analytics and reporting systems on BigQuery for customer intent data, lead scoring models, and business intelligence dashboards
- Implement data quality and monitoring frameworks ensuring accuracy across the entire data pipeline from source to customer delivery
- Build customer data delivery systems including Snowflake integration and API endpoints for enterprise data consumption
What You Will Be Responsible For
Data pipeline reliability with 99.9% uptime for critical business intelligence feeding customer-facing intent data productsData accuracy and consistency across MongoDB → PostgreSQL → BigQuery → Snowflake transformation chainsML infrastructure performance supporting real-time vector search, embedding generation, and AI model inferenceScalable data architecture handling exponential growth in visitor data and intent signalsCustomer data SLAs ensuring timely delivery of intent data to enterprise customers with contractual requirementsBuilding Dashboards and alerting for management of these systemsWhat You Will Need
9+ years data engineering with Python, SQL, and distributed data processing frameworksStrong ETL / ELT experience with Apache Airflow, dbt, or similar orchestration toolsCloud data platform expertise with BigQuery, Cloud Storage, and data warehouse optimizationReal-time streaming experience with Pub / Sub, Kafka, or similar event-driven systemsDatabase proficiency across PostgreSQL, MongoDB, and modern data warehouse technologiesNice to Have
ML / AI pipeline experience with vector databases (Pinecone), embedding models, and LLM integrationIntent data and lead scoring domain knowledge in B2B marketing technologyApache Beam or Dataflow for large-scale data processingCustomer analytics and behavioral data modeling experience