We're seeking an exceptional AI Engineer with deep expertise in TensorFlow model training to design and build next-generation AI systems. This role focuses on developing sophisticated machine learning models, particularly Large Language Models and NLP solutions, while leveraging AWS cloud infrastructure for scalable deployment.
Key Responsibilities :
- Design and architect enterprise-scale AI / ML solutions with emphasis on custom model development and training
- Build, train, and optimize deep learning models using TensorFlow and TensorFlow Extended (TFX)
- Develop and fine-tune Large Language Models for domain-specific applications
- Implement advanced NLP pipelines including text classification, named entity recognition, sentiment analysis, and language generation
- Lead model training infrastructure design, including distributed training strategies and GPU optimization
- Deploy and manage ML models on AWS SageMaker and AWS Bedrock platforms
- Establish MLOps practices for model versioning, experiment tracking, and continuous training
- Optimize model architectures for performance, accuracy, and computational efficiency
- Conduct thorough model evaluation, validation, and performance benchmarking
- Collaborate with data engineering teams to build robust training data pipelines
- Mentor ML engineers and data scientists on TensorFlow best practices and model training techniques
Required Qualifications :
3+ years of hands-on experience in machine learning engineering and AI architectureExpert-level proficiency in TensorFlow 2.x for model development and trainingDeep understanding of neural network architectures (Transformers, CNNs, RNNs, attention mechanisms)Proven track record training large-scale models, including experience with LLMsStrong expertise in Natural Language Processing and modern NLP techniquesExtensive experience with AWS cloud services, particularly SageMaker and BedrockSolid understanding of training optimization techniques (learning rate scheduling, regularization, gradient accumulation)Experience with distributed training frameworks and multi-GPU / TPU trainingStrong Python programming skills and experience with NumPy, Pandas, and scikit-learnKnowledge of model compression techniques (quantization, pruning, distillation)Preferred Skills :
Experience with Hugging Face Transformers, LangChain, or similar LLM frameworksFamiliarity with PyTorch or JAX in addition to TensorFlowKnowledge of reinforcement learning from human feedback (RLHF) techniquesExperience with vector databases (Pinecone, Weaviate, ChromaDB) for RAG applicationsUnderstanding of prompt engineering and few-shot learning strategiesExperience with Kubernetes and containerization (Docker) for ML workloadsPublications or contributions to open-source ML projectsTechnical Skills :
Frameworks : TensorFlow, Keras, TensorFlow Serving, TFXCloud : AWS SageMaker, AWS Bedrock, EC2, S3, LambdaLanguages : Python, SQLMLOps : MLflow, Weights & Biases, KubeflowTools : Jupyter, Git, Docker, TensorBoard