Senior Data Scientist Shiprocket
Job Overview :
Shiprocket is looking for a highly skilled and experienced Senior Data Scientist to join our dynamic team. As a Senior Data Scientist, you will play a critical role in leveraging data to drive insights and solutions that enhance our logistics platform. You will be responsible for leading data-driven projects, developing predictive models, and working closely with cross-functional teams to optimize operations and improve customer experiences.
In this role, you will also build and scale large-scale machine learning systems, work on GenAI applications including LLMs and RAG pipelines, and lead efforts in fine-tuning models (LoRA, QLoRA, PEFT), MLOps productionization, and vector database integration for real-time :
- Lead Data Science Projects : Oversee the end-to-end execution of data science projects, from data collection and cleaning to model development, validation, and deployment.
- Predictive Modeling : Develop and implement advanced predictive models to solve complex business problems and drive strategic decision-making.
- Data Analysis : Conduct deep-dive analyses to uncover actionable insights and trends that inform business strategies and operations.
- Collaboration : Work closely with product managers, engineers, and other stakeholders to integrate data science solutions into our products and services.
- Innovation : Stay abreast of the latest developments in data science and machine learning, and apply innovative techniques to improve our data capabilities.
- Mentorship : Mentor junior data scientists and data analysts, providing guidance and support to help them grow their skills and contribute effectively to the team.
- Optimization : Continuously monitor and optimize models and algorithms to ensure they remain effective and relevant in a changing business environment.
- ML at Scale : Design and implement large-scale distributed ML systems, including parallel training / inference pipelines across millions of users and transactions.
- LLMs & RAG Pipelines : Build and deploy Retrieval-Augmented Generation pipelines using large language models with custom embedding and retrieval strategies.
- Model Fine-Tuning : Apply techniques such as LoRA, QLoRA, and PEFT for adapting foundation models to domain-specific tasks (e.g., address parsing, fraud scoring).
- Vector Databases : Integrate and optimize vector DBs like FAISS, pgvector, or Milvus for semantic search, retrieval, and matching in LLM workflows.
- MLOps Productionization : Own end-to-end deployment, monitoring, and lifecycle management of ML models using tools like SageMaker, Docker, Airflow, MLflow, or KubeFlow.
Skills and Qualifications :
Education :
Bachelors, Masters, or Ph.D. in Data Science, Computer Science, Statistics, Mathematics, or a related field.Experience :
Minimum of 3 years of experience in data science, with a proven track record of leading successful data-driven projects. (48 years preferred)Technical Skills :
Proficiency in programming languages such as Python, Shell Scripting, and SQL.Strong experience with machine learning frameworks (e.g., TensorFlow, PyTorch, scikit-learn).Experience with big data technologies such as Spark, Hadoop is a plus.Experience with AWS and cloud-based ML deployment solutions (e.g., SageMaker, Batch, Lambda).Expertise in building and managing end-to-end ML pipelines and ETL processes.Experience with large language models (LLMs) and embeddings for downstream applications.Experience in RAG architecture : chunking, vectorization, retrieval, prompt orchestration.Familiarity with vector search engines like FAISS, pgvector, or Pinecone.Hands-on with fine-tuning techniques : LoRA, QLoRA, PEFT, quantization, and distillation.Understanding of model observability, drift detection, model versioning, and CI / CD for ML.Adaptability :
Ability to work in a fast-paced, dynamic environment and manage multiple projects (ref : hirist.tech)