Description : Position Overview :
We are seeking a highly driven Software Development Engineer II (SDE II) with a strong foundation in machine learning systems engineering and scalable software development.
This role is ideal for someone who is deeply curious, thrives on solving complex challenges at scale, and enjoys building robust Gen AIdriven ML systems.
The selected candidate will play a key role in designing, deploying, and maintaining production-grade ML infrastructure particularly focused on Transformer-based (LLM) models and agentic AI architectures.
This is a core engineering role that blends large-scale software design with modern machine learning deployment practices.
Key Responsibilities :
1. ML System Engineering and Development :
- Architect, design, and implement scalable and efficient machine learning systems and APIs for production use.
- Build high-performance ML services capable of handling large data volumes and concurrent user access.
- Implement monitoring and observability tools to ensure reliability and performance of deployed ML models.
2. Model Deployment and Integration :
Deploy, scale, and maintain Transformer-based LLMs and Gen AI models in real-world environments.Optimize deployment workflows using containerization (Docker, Kubernetes) and continuous integration pipelines.Ensure end-to-end reproducibility, versioning, and lifecycle management of ML artifacts.3. Infrastructure and Data Pipelines :
Design and manage data ingestion and transformation pipelines for ML applications.Collaborate with infrastructure and DevOps teams to build fault-tolerant, distributed ML systems.Leverage cloud-native services (AWS / GCP / Azure) for compute, storage, and orchestration of ML workloads.4. Collaboration and System Optimization :
Partner with research scientists and data engineers to productionize ML research into robust engineering deliverables.Continuously profile, benchmark, and optimize system throughput and latency.Contribute to best practices in software engineering, testing, and documentation for ML infrastructure.5. Technical Documentation and Governance :
Maintain detailed documentation of ML services, APIs, and deployment architectures.Support model auditability, explainability, and compliance with data governance standards.Required Skills and Experience :
3+ years of hands-on experience designing and implementing complex, multi-component software systems.Proficiency in Python and strong knowledge of software engineering principles (modular design, code optimization, testing).Solid understanding of SQL and database schema design for structured and unstructured data.Familiarity with ML model lifecycle management (training, evaluation, deployment, monitoring).Working understanding of Transformer / LLM architectures and NLP-based ML systems.Experience in building or integrating agentic AI systems using frameworks like LangGraph, LangChain, or similar.Strong communication, analytical thinking, and problem-solving abilities.Proven experience in end-to-end product delivery or managing technology-driven solutions.Preferred / Bonus Qualifications :
Experience deploying high-availability web or ML applications in production.Hands-on exposure to cloud infrastructure (AWS / GCP / Azure), including GPU-based compute instances.Proficiency in containerization and orchestration tools such as Docker, Kubernetes, and Helm.Familiarity with distributed systems and parallel processing frameworks.Knowledge of CI / CD pipelines, API Gateway, and microservices-based architecture.Experience deploying open-source LLMs (e.g., Llama, Mistral, Falcon) in production.Prior experience working in a start-up or fast-paced innovation environment.Educational Background :
Bachelors or Masters degree in Computer Science, Artificial Intelligence, Machine Learning, or related fields.Additional certifications in ML Engineering, Cloud Computing, or DevOps will be advantageous.(ref : hirist.tech)