Description : Role Overview :
We are seeking a highly skilled Software Engineer specializing in Large Language Models (LLMs) to design, develop, and deploy cutting-edge AI solutions leveraging state-of-the-art transformer architectures.
- The ideal candidate will have strong expertise in deep learning, NLP, and model optimization, combined with software engineering best practices for building scalable AI systems in production.
- Youll collaborate with data scientists, ML engineers, and product teams to build intelligent applications powered by advanced generative AI models such as GPT, LLaMA, Falcon, Mistral, Claude, or similar open-source and proprietary models.
Key Responsibilities :
Design, train, fine-tune, and evaluate Large Language Models (LLMs) for specific use cases (e.g., summarization, code generation, chatbots, reasoning, and retrieval-augmented generation).Experiment with transformer-based architectures (e.g., GPT, T5, BERT, LLaMA, Mistral).Develop parameter-efficient fine-tuning (PEFT) strategies such as LoRA, QLoRA, adapters, or prompt-tuning.Create and maintain high-quality datasets for pretraining, fine-tuning, and evaluation.Optimize model inference using techniques like quantization, distillation, and tensor parallelism for real-time or edge deployment.Integrate LLMs into production environments using frameworks like Hugging Face Transformers, PyTorch Lightning, or DeepSpeed.Implement scalable model serving solutions using FastAPI, Ray Serve, Triton Inference Server, or similar frameworks.Build and maintain APIs or SDKs that expose LLM capabilities to other teams and products.Evaluate and experiment with open-source and proprietary foundation models.Keep up with the latest trends in Generative AI, NLP, and Transformer models.Perform benchmarking, ablation studies, and A / B testing to measure performance, cost, and quality improvements.Collaborate with ML Ops and DevOps teams to design CI / CD pipelines for model training and deployment.Manage and optimize GPU / TPU clusters for distributed training and inference.Implement robust monitoring, logging, and alerting for deployed AI systems.Ensure software follows clean code principles, version control, and proper documentation.Partner with product managers, data scientists, and UX teams to identify and translate business problems into AI-driven solutions.Contribute to internal research initiatives and help shape the companys AI strategy.Mentor junior engineers in AI model development, coding standards, and best practices.Required Technical Skills :
Core Expertise :
Strong proficiency in Python and deep learning frameworks (PyTorch, TensorFlow, JAX).Hands-on experience with transformer architectures and LLM fine-tuning.Deep understanding of tokenization, attention mechanisms, embeddings, and sequence modeling.Experience with Hugging Face Transformers, LangChain, LlamaIndex, or OpenAI API.Experience deploying models using Docker, Kubernetes, or cloud ML services (AWS Sagemaker, GCP Vertex AI, Azure ML, OCI Data Science).Familiarity with model optimization (quantization, pruning, distillation).Knowledge of retrieval-augmented generation (RAG) pipelines, vector databases (FAISS, Pinecone, Weaviate, Chroma).Additional Skills (Good to Have) :
Experience with multi-modal models (text + image, text + code).Familiarity with MLOps tools like MLflow, Kubeflow, or Weights & Biases (W&B).Understanding of Responsible AI practicesbias mitigation, data privacy, and model explainability.Experience contributing to open-source AI projects(ref : hirist.tech)