Optimize the design practices for building AI / ML models
Performance test and tune the AI / ML models and any runtime components
Implement environments in our global data centers
Operationalize data center failover processes for inference enginesIntegrate with corporate security systems, SOC, NOC, etc. for monitoring and alerting
Define deployment best practices and automate the promotion of code / models (DevOps processes)
Manage, maintain, refresh and upgrade environmental components (including patching, etc.)
Engineer and operationalize the AI / ML API environments
Support models built-in Python and R
Define architecture and selecting appropriate runtime components to host AI / ML models
Support the Data Science team with any data movements, data transformation, and analysis, etc
Capacity management and planning of the environment
Build API interface for AI / ML models
Skills Required :
Nice to have experience with GPU-based compute for AI / ML learning and inference engine.
Nice to have experience programming in the R language
Experience with deployment and management of Windows and Unix VMs
Some experience understanding the AI / ML approaches
Must understand the package management for various environments (Python, R, and Hadoop environments)
Must have experience with administration of Hadoop ecosystem – specifically HDFS, Hive, and Spark
Nice to have – Site Reliability Engineering experience (1+ years)Nice to have – Site Reliability Engineering experience (1+ years)
Nice to have – experience deploying and managing containers
Must have experience scripting with a shell script and / or PowerShell
Must have experience programming with Python
Nice to have experience implementing and managing Kafka
Nice to have experience with Notebooks for data analytics