We are seeking a motivated and results-driven Junior Data Scientist with a passion for problem-solving, data analytics, and natural language processing (NLP). In this role, you will leverage your skills in Python, SQL, Machine learning, and NLP to develop data-driven solutions that drive business decisions and insights. You will be part of a dynamic team and work closely with senior data scientists and business stakeholders to create and optimize models, perform data analysis, and automate processes.
As a Junior Data Scientist, you will have the opportunity to work on a variety of projects, ranging from predictive modelling to text analysis and cloud-based data solutions. You should have a solid foundation in logical thinking, problem-solving, and the ability to apply your knowledge to real-world challenges.
Roles & Responsibilities
Work independently to manage multiple projects at once while ensuring deadlines are met and data output is accurate and appropriate for the business.
Implementation of newer algorithms according to the business needs.
Collaborate with cross functional agile teams of software engineers, product managers, and others in building new product features.
Think strategically about data as a core enterprise asset and assist in all phases of the advanced analytic development process.
Key Technical Skills :
Programming :
Strong proficiency in Python , with experience in libraries like pandas , numpy , scikit-learn , networkx , and TensorFlow for data manipulation and machine learning tasks.
Proficient in building and deploying high-performance APIs using FastAPI and Python , ensuring fast response times and scalability.
Familiarity with Linux / Unix / Shell environments.
Experience with Data Structures and Algorithms.
Machine Learning :
Hands-on experience with supervised learning (e.g., regression, classification), unsupervised learning (e.g., clustering, PCA), and ensemble methods (e.g., random forests, gradient boosting).
Familiarity with model evaluation metrics (e.g., accuracy , precision , recall , F1-score , ROC-AUC ) and techniques for hyperparameter tuning (e.g., GridSearchCV , Randomized SearchCV ).
Natural Language Processing (NLP) :
Understanding of key NLP tasks like text classification, sentiment analysis, named entity recognition (NER), and text summarization using traditional techniques (e.g., TF-IDF, bag-of-words) and pre-trained models.
Experience with using LLMs such as GPT , BERT , and T5 for text generation, question answering, and document classification, using libraries like Hugging Face Transformers .
Ability to fine-tune pre-trained LLMs on domain-specific data and evaluate model performance using metrics like accuracy, F1-score, bleu score, and perplexity, optimizing models for production use.
SQL & Database Management :
Strong SQL skills for querying relational databases (e.g., PostgreSQL , MySQL ) and managing large datasets.
Experience in writing complex queries for data extraction, transformation, and aggregation.
Azure Cloud Services :
Familiarity with Azure Databricks , Azure Blob Storage , and Azure SQL Database for data processing and storage.
Basic knowledge of Azure Data Factory for building and orchestrating data pipelines.
Data Visualization :
Experience creating visualizations with matplotlib , seaborn , and plotly for data exploration and model results presentation.
Familiarity with dashboarding tools like Power BI or Tableau (optional).
Git :
Proficient in creating and managing feature branches, resolving merge conflicts, and merging code into the main branch using Git merge and Git rebase .
Experienced with pull requests (PRs) , reviewing and commenting on code, and working within Git workflows (e.g., GitFlow or feature branching) to ensure smooth team collaboration.
Required Qualifications :
A bachelor’s or master’s degree in computer science , statistics , mathematics , or a related field.
3-5 years of professional experience as a Data Scientist , working with machine learning, data analysis, NLP, and cloud technologies.
Strong communication skills, with the ability to explain complex data science concepts to non-technical stakeholders.
Ability to work independently and as part of a team in a fast-paced environment.
Nice-to-Have Skills :
Experience with model deployment and CI / CD pipelines.
Familiarity with big data technologies such as Spark or Hadoop .
Experience with Docker for containerization and Kubernetes for model orchestration.
Data Scientist • Pushkar, Rajasthan, India