At RemoteStar, we're currently hiring for one of our client based in Spain
About client :
Well-funded and fast-growing deep-tech company founded in 2019. We are the biggest Quantum Software company in the EU. They are also one of the 100 most promising companies in AI in the world (according to CB Insights, 2023) with 150+ employees and growing, fully multicultural and international.
They provide hyper-efficient software to companies seeking to gain an edge with quantum computing and artificial intelligence. Their main products, Singularity and CompactifAI, address critical needs across various industries.
VISA WLL BE PROVIDED BY CLIENT
Joining and Retention bonuses will be provided.
Required Qualifications
- Master’s, or Ph.D. in Computer Science, AI, Data Science, Physics, Math, or a related field. Or equivalent industry experience.
- 4+ years of experience in data science, machine learning, or related roles, with demonstrated experience with NLP or LLMs.
- In-depth knowledge of large foundational model architectures (language and multimodal models) and their lifecycle : training, fine-tuning, alignment, and evaluation.
- Proficient in Python and data tooling ecosystems (Pandas, NumPy, Hugging Face Datasets & Transformers libraries).
- Hands-on experience with text data collection from diverse sources : web scraping, APIs, proprietary corpora, etc.
- Strong understanding of data quality metrics including bias detection, toxicity, and readability.
- Experience working in large shared distributed computing environments, familiarity with relevant tools for hardware optimization (vLLM, TensorRT, NeMo, etc.).
- Experience with version control (git), unit testing, and other fundamental aspects of software development.
- Effective communication and interpersonal abilities.
Preferred Qualifications
Experience building or contributing to datasets used in LLM pretraining or supervised fine-tuning.Experience building foundational LLMs from the ground upFamiliarity with alignment techniques (e.g., reinforcement learning, preference modeling, reward modeling).Exposure to multilingual and low-resource language datasets.Contributions to open-source datasets, tools, or publications in dataset-centric research.Knowledge of ethical AI, data governance, privacy laws (e.g., GDPR), and responsible data use.Familiarity with the software development lifecycle and agile methodologiesAs a Senior LLM Engineer, you will
Design and implement strategies for creating, sourcing, and augmenting datasets tailored for LLM training and fine-tuning.Develop scalable pipelines to collect, clean, filter, annotate, and validate large volumes of text data, ensuring quality, ethical compliance, etc.Collaborate with ML engineers, researchers, and software engineers to achieve ambitious goals in the preparation of LLMs and complementary work (preparing datasets, model evaluation, model serving, etc.).Develop and integrate new routines for modifying and enhancing LLMs, and extending their functionality.Make effective use of distributed compute resources and clusters (GPU’s), identify opportunities for further optimization.End-to-end preparation of compressed and specialized LLMs for use in production.Keep up to date with research trends in LLM foundation models, dataset curation, LLM pretraining data, and benchmarking.Contribute to building documentation, development standards, and a healthy shared code base.Mentor other engineers and provide knowledge sharing of cutting-edge techniques.We offer
Two unique bonuses : signing bonus at incorporation and retention bonus at contract completion.Relocation package (if applicable).Up to 9-month contract, ending on June 2026.Hybrid role and flexible working hours.Be part of a fast-scaling Series B company at the forefront of deep tech.Equal pay guaranteed.International exposure in a multicultural, cutting-edge environment.