Job Description :
Function : Data Science and Analysis - Data Science / Machine Learning
LLMs NLP Machine Learning Generative AI Python
As a Senior ML Engineer at Demandbase, you'll play a strategic role in building cutting-edge, production-level machine learning systems that drive deep technographic intelligence and high-impact business decisions. This role goes beyond conventional ML development and will architect and scale LLM-powered solutions that detect, classify, and map technographic signals from diverse sources (e. g., LinkedIn, SSL certs, blogs, and subdomains) to Demandbase's structured product catalog. You'll apply a combination of traditional ML, deep learning, and Large Language Models (LLMs) (e. g., LLaMA-3 Gemma, Mistral, GPT) to transform unstructured signals into actionable intelligence, pushing the boundaries of what's possible in entity resolution, product discovery, and dynamic catalog enrichment.
Responsibilities :
- Build scalable, production-ready ML + LLM hybrid systems for :
1. Technographic signal extraction.
2. Entity resolution from noisy data (e. g., LinkedIn job descriptions, SSL certs).
3. Product and category mapping via instruction-tuned LLMs and vector DBs.
Use foundation models like LLaMA and Gemma for deep contextual understanding, enabling semantic inference of products, categories, and subcategories.Leverage RAG pipelines and prompt engineering to improve product detection and catalog alignment with minimal labeled data.Translate complex, ambiguous business needs into ML and LLM solution frameworks.Lead the model lifecycle from data ingestion, feature design, model development (ML / LLM), fine-tuning, to deployment and monitoring.Implement human-in-the-loop systems and catalog-aware feedback loops for continuous catalog enrichment and model refinement.Conduct multi-modal experiments using traditional ML and transformer-based LLMs for hybrid architectures.Optimize model performance using best-in-class techniques for :1. Few-shot and zero-shot learning, Approximate nearest neighbors (ANN).
2. LLM-based reranking and metadata generation.
Automate categorization of unknown / new tools (e. g., Turso, Appsmith) using LLM-based generalization and taxonomy mapping.Ensure robust production-grade model deployment using tools like :1. Ray, Dask, Deepspeed (for scalable inference).
2. MLflow, Airflow, Feast, Kubeflow (for lifecycle management and monitoring).
3. Vector DBs (e. g., FAISS, Weaviate) for embedding-based matching.
Integrate LLM-based services into high-throughput pipelines, ensuring low latency, scalability, and fault tolerance.Stay up to date with LLM advancements and experiment with new architectures (e. g., RAG, LLM agents, toolformer models).Contribute to Demandbase's AI strategy by identifying opportunities where LLMs can create product and customer impact.Prototype and evangelize the usage of LLM-based reasoning and categorization across product teams.Guide and mentor junior ML engineers and data scientists on LLM-centric design patterns, fine-tuning strategies, and deployment frameworks.Foster collaboration with product, engineering, and analytics to deliver unified, data-driven solutions.Requirements :
Bachelor's or Master's in Computer Science, Data Science, or related field.8 - 12 years in data science / ML with at least 2+ years in LLM applications or GenAI projects.Demonstrated ability to design and productionize scalable ML systems that incorporate both deep learning and foundation models.LLM Expertise : Familiarity with LLaMA, Gemma, GPT-3.5 / 4 Mistral, and instruction-tuning or prompt engineering.ML Stack : TensorFlow, PyTorch, Scikit-learn, Pandas, NumPy.Cloud & MLOps : AWS / GCP, Docker, Kubernetes, MLflow, Airflow, CI / CD.Data Handling : SQL, Spark, Dask, Feature Stores (e. g., Feast).Vector Matching : FAISS, Weaviate, embedding-based retrieval methods.Structured thinking with strong communication skills to influence cross-functional stakeholders.Curiosity and initiative to push boundaries using LLMs in innovative ways.Experience with entity resolution, technographic detection, or catalog generation.Hands-on with retrieval-augmented generation (RAG) pipelines and vector databases.Background in graph-based inference or taxonomic classification using LLMs.(ref : hirist.tech)