We are seeking a motivated and results-driven Junior Data Scientist with a passion for problem-solving, data analytics, and natural language processing (NLP). In this role, you will leverage your skills in Python, SQL, Machine learning, and NLP to develop data-driven solutions that drive business decisions and insights. You will be part of a dynamic team and work closely with senior data scientists and business stakeholders to create and optimize models, perform data analysis, and automate processes.
As a Junior Data Scientist, you will have the opportunity to work on a variety of projects, ranging from predictive modelling to text analysis and cloud-based data solutions. You should have a solid foundation in logical thinking, problem-solving, and the ability to apply your knowledge to real-world challenges.
Roles & Responsibilities
- Work independently to manage multiple projects at once while ensuring deadlines are met and data output is accurate and appropriate for the business.
- Implementation of newer algorithms according to the business needs.
- Collaborate with cross functional agile teams of software engineers, product managers, and others in building new product features.
- Think strategically about data as a core enterprise asset and assist in all phases of the advanced analytic development process.
Key Technical Skills :
Programming :Strong proficiency in Python , with experience in libraries like pandas , numpy , scikit-learn , networkx , and TensorFlow for data manipulation and machine learning tasks.Proficient in building and deploying high-performance APIs using FastAPI and Python , ensuring fast response times and scalability.Familiarity with Linux / Unix / Shell environments.Experience with Data Structures and Algorithms.Machine Learning :Hands-on experience with supervised learning (e.g., regression, classification), unsupervised learning (e.g., clustering, PCA), and ensemble methods (e.g., random forests, gradient boosting).Familiarity with model evaluation metrics (e.g., accuracy , precision , recall , F1-score , ROC-AUC ) and techniques for hyperparameter tuning (e.g., GridSearchCV , Randomized SearchCV ).Natural Language Processing (NLP) :Understanding of key NLP tasks like text classification, sentiment analysis, named entity recognition (NER), and text summarization using traditional techniques (e.g., TF-IDF, bag-of-words) and pre-trained models.Experience with using LLMs such as GPT , BERT , and T5 for text generation, question answering, and document classification, using libraries like Hugging Face Transformers .Ability to fine-tune pre-trained LLMs on domain-specific data and evaluate model performance using metrics like accuracy, F1-score, bleu score, and perplexity, optimizing models for production use.SQL & Database Management :Strong SQL skills for querying relational databases (e.g., PostgreSQL , MySQL ) and managing large datasets.Experience in writing complex queries for data extraction, transformation, and aggregation.Azure Cloud Services :Familiarity with Azure Databricks , Azure Blob Storage , and Azure SQL Database for data processing and storage.Basic knowledge of Azure Data Factory for building and orchestrating data pipelines.Data Visualization :Experience creating visualizations with matplotlib , seaborn , and plotly for data exploration and model results presentation.Familiarity with dashboarding tools like Power BI or Tableau (optional).Git :Proficient in creating and managing feature branches, resolving merge conflicts, and merging code into the main branch using Git merge and Git rebase .Experienced with pull requests (PRs) , reviewing and commenting on code, and working within Git workflows (e.g., GitFlow or feature branching) to ensure smooth team collaboration.Required Qualifications :
A bachelor’s or master’s degree in computer science , statistics , mathematics , or a related field.3-5 years of professional experience as a Data Scientist , working with machine learning, data analysis, NLP, and cloud technologies.Strong communication skills, with the ability to explain complex data science concepts to non-technical stakeholders.Ability to work independently and as part of a team in a fast-paced environment.Nice-to-Have Skills :
Experience with model deployment and CI / CD pipelines.Familiarity with big data technologies such as Spark or Hadoop .Experience with Docker for containerization and Kubernetes for model orchestration.