Responsibilities :
- Develop and implement machine learning models and algorithms for classification, regression, clustering, recommendation, and more.
- Build and maintain data pipelines for training and inference workflows.
- Collaborate with data scientists, product managers, and software engineers to integrate AI models into production systems.
- Optimize model performance and scalability for real-time and batch processing.
- Conduct experiments, evaluate model performance, and iterate based on results.
- Stay up to date with the latest research and advancements in AI / ML and apply them to practical use cases.
- Document code, processes, and model behavior for reproducibility and compliance.
Basic Requirements :
1. Programming Languages :
Python : Core language for AI / ML development. Proficiency in libraries like :
NumPy, Pandas for data manipulationMatplotlib, Seaborn, Plotly for data visualizationScikit-learn for classical ML algorithmsFamiliarity with R, Java, or C++ is a plus, especially for performance-critical applications.2. Machine Learning & Deep Learning Frameworks :
Experience building models using the following :
TensorFlow and Keras for deep learningPyTorch for research-grade and production-ready modelsXGBoost, LightGBM, or CatBoost for gradient boostingUnderstanding of model training, validation, hyperparameter tuning, and evaluation metrics (e.g., ROC-AUC, F1-score, precision / recall).3. Natural Language Processing (NLP) :
Familiarity with :
Text preprocessing (tokenization, stemming, lemmatization)Vectorization techniques (TF-IDF, Word2Vec, GloVe)Transformer-based models (BERT, GPT, T5) using Hugging Face TransformersExperience with text classification, named entity recognition (NER), question answering, or chatbot development.4. Computer Vision (CV) :
Experience with :
Image classification, object detection, segmentationLibraries like OpenCV, Pillow, and AlbumentationsPretrained models (e.g., ResNet, YOLO, EfficientNet) and transfer learning5. Data Engineering & Pipelines :
Ability to build and manage data ingestion and preprocessing pipelines.Tools : Apache Airflow, Luigi, Pandas, DaskExperience with structured (CSV, SQL) and unstructured (text, images, audio) data.6. Model Deployment & MLOps :
Experience deploying models as :
REST APIs using Flask, FastAPI, or DjangoBatch jobs or real-time inference servicesFamiliarity with :Docker for containerizationKubernetes for orchestrationMLflow, Kubeflow, or SageMaker for model tracking and lifecycle management7. Cloud Platforms :
Hands-on experience with at least one cloud provider :
AWS (S3, EC2, SageMaker, Lambda)Google Cloud (Vertex AI, BigQuery, Cloud Functions)Azure (Machine Learning Studio, Blob Storage)Understanding of cloud storage, compute services, and cost optimization.8. Databases & Data Access :
Proficiency in :
SQL for querying relational databases (e.g., PostgreSQL, MySQL)NoSQL databases (e.g., MongoDB, Cassandra)Big data tools like Apache Spark, Hadoop, or Databricks is a plus9. Version Control & Collaboration :
Experience with Git and platforms like GitHub, GitLab, or Bitbucket.Familiarity with Agile / Scrum methodologies and tools like JIRA, Trello, or Asana.10. Testing & Debugging :
Writing unit tests and integration tests for ML code.Using tools like pytest, unittest, and debuggers to ensure code reliability.(ref : hirist.tech)