Optimize the design practices for building AI / ML modelsPerformance test and tune the AI / ML models and any runtime componentsImplement environments in our global data centersOperationalize data center failover processes for inference enginesIntegrate with corporate security systems, SOC, NOC, etc. for monitoring and alertingDefine deployment best practices and automate the promotion of code / models (DevOps processes)Manage, maintain, refresh and upgrade environmental components (including patching, etc.)Engineer and operationalize the AI / ML API environments
- Support models built-in Python and R
- Define architecture and selecting appropriate runtime components to host AI / ML models
- Support the Data Science team with any data movements, data transformation, and analysis, etc
- Capacity management and planning of the environment
- Build API interface for AI / ML models
Skills Required :
- Nice to have experience with GPU-based compute for AI / ML learning and inference engine.
- Nice to have experience programming in the R language
- Experience with deployment and management of Windows and Unix VMs
- Some experience understanding the AI / ML approaches
- Must understand the package management for various environments (Python, R, and Hadoop environments)
- Must have experience with administration of Hadoop ecosystem – specifically HDFS, Hive, and Spark
Nice to have – Site Reliability Engineering experience (1+ years)Nice to have – Site Reliability Engineering experience (1+ years)
- Nice to have – experience deploying and managing containers
- Must have experience scripting with a shell script and / or PowerShell
- Must have experience programming with Python
- Nice to have experience implementing and managing Kafka
- Nice to have experience with Notebooks for data analytics