The Life Cycle of a Machine Learning [ML] Model Project with MLflow
Working in data science can be particularly challenging when using Machine Learning (ML) codes in the actual production lines or, say, when it comes to deployment. Data scientists are usually well versed in creating or choosing the best model to solve a particular ML problem.
Still, it is an entirely different ball game when packaging, deploying, and maintaining these models to work with the production team. The solution to this problem comes in the shape of machine learning operations (MLOps).
MLOps is an engineering practice based on DevOps principles to increase the efficiency of workflows. One may consider MLOps as an assembly line for Machine Learning (ML). Thus, MLOps creates a link between data scientists and the production team.
This article will cover the following topics: What is MLOps, its evolution and need; the comparison between MLOps and DevOps; ML workflow; and what is MLflow, its components, and how helps.
What is MLOps?
MLOps, in brief, can be defined as “the ability to apply DevOps principles to Machine Learning applications,” or a set of best practices used to automate the end-to-end ML life cycle resulting in Continuous Integration and Continuous Delivery (CI/CD). It is a concept that refers to the merger of long-established DevOps methods with the growing science of machine learning.
MLOps includes model development, data management, code generation, model training and retraining, deployment, and model monitoring. By integrating DevOps’s principles into a machine learning project, MLOps offers a smarter development life cycle and the flexibility to adjust to ever-changing business needs.
Where DevOps is a proven practice for the software development side, for MLOps, multiple business units collaborate with data science teams to ensure that everybody is on one page regarding how production ML models are to be created, deployed, and maintained, same as DevOps.
The Evolution of the MLOps
The evolution of the MLOps can be divided into the following phases:
Pre-Historic (Open source):
Before the rise of open-source tools, businesses using machine learning employed proprietary tools like SAS, SPSS, and FICO to perform modeling.
Stone Age (The Rise of Open Source)
In early 2000s we saw the emergence of open-source data science tools such as Python using SciPy stack, scitkit-learn, TensorFlow, etc., and R using dplyr ggplot2, etc.
Bronze Age (Open Source)
From 2015 to 2018 (Bronze Age), open-source model deployment containerized the previous approach, making it easy to scale.
Golden Age (ML as a Platform)
Businesses now know that making money in novel ML training is hard as thousands of people are serving that offer, and most educational institutes do such work and give away results for free but making a “platform” that:
· Dockerized open-source ML stacks
· Deployed them on-prem or in the cloud via Kubernetes
A summary of the evolution timeline of MLOps is represented in figure 1.
Need for MLOps
A good chunk of time from the data science teams is spent only on deploying the ML models, whereas the same is made very easy using MLOps. MLOps generate long-term value, at the same time, reducing the risk linked with ML projects.
MLOps helps organizations streamline the process. Automation with MLOps shortens the time-to-market and decreases costs. In short, we need MLOps as it provides:
a. Long term value
b. Reduces cost
c. Streamlines the process
Comparison Between MLOps and DevOps
Instilling MLOps with the above benefits, such a concept will demand a huge number of resources. Fortunately, application program interfaces (APIs) such as MLflow are available to help us achieve this.
What is MLflow?
MLflow is one such API that integrates the principles of MLOps into the project with minimal changes to the already existing code. The philosophy of MLflow is to apply as few restrictions as possible to existing workflows.
Although it is intended to work with any machine learning library, it aims to make codebases reproducible and reusable by different data scientists.
MLflow’s underlying philosophy is to put as few restrictions on your workflow: it is created to work with every machine learning library, determine most things about code by convention, and need negligible changes to integrate into the current codebase.
At the same time, MLflow intends to take any codebase written in its format and make it reusable by many data scientists.
ML Workflow and How MLflow Helps
ML projects require data scientists to experiment with a wide range of datasets and algorithms to build models that maximize a specific target set.
Still, once scientists have built a model, they also need to deploy it in a production setting, monitor the model’s performance over a period of time, and retrain it if any degradation is observed. Thus, working with ML can be challenging for reasons including but not limited to:
Tracking: When working on a personal setup, it can become difficult to track and comment on which data and what parameters were input for a distinct result.
Reproducibility: It can be challenging for different teams of data scientists working on a project to reproduce a code that worked the first time on your laptop. The same problem may occur when running the same code at a larger scale on a different platform.
Model Packaging: Different teams have different approaches to model packaging, which may cause problems when collaborating.
Model Version Management: During the lifecycle of the ML project, multiple models are created. There is a lack of a central place to manage model versions.
Above mentioned problems can be tackled using individual or multiple ML libraries. MLflow permits a user to train, deploy and reuse ML models with multiple libraries, and other scientists can later use that model as a “Black Box.”
The platform helps sort these issues using its components which are discussed below.
Components of MLflow
MLflow offers four components for ML workflow management. These four components can be visualized as shown in fig 2.
MLflow Tracking is a user interface for scientists to log the parameters, metrics, code versions, and other aspects when working on ML projects. The tracking can be used in any environment to log results. For example, an entire team may use it to compare results from different users.
The component works around the concept of runs (execution of data science code). Each run records the following information:
MLflow Projects are a standard format for packaging data science code such that they can be reused. These packages are a directory with Git repository or a code.
Each project uses a descriptor file to find how to run the project (i.e., the project may contain a file as conda.yaml to specify the Conda environment). A project can specify properties as:
MLflow Project works well with MLflow tracking; MLflow will automatically detect the project version and other parameters when used together. Thus, one can run existing MLflow Projects from GitHub and combine them in their workflow.
Models can be seen as a format that captures the flavor of the model that is to be deployed in a variety of environments. Models are saved in a directory including descriptor and arbitrary file that lists all the “Flavors” that the model can be deployed in.
Keeping flavors field aside, the MLflow model YAML format may contain:
MLflow Model Registry
MLflow Model Registry presents a centralized store for models, APIs, and UIs to manage the full life cycle of an ML model. The Model Registry introduces concepts that define and assist during the entire lifecycle of an MLflow Model.
To summarize, we have discussed the life cycle of an ML Model project and potential problems that may arise during different project stages. How the philosophy of MLOps can help us solve such problems and how MLflow as a tool can be of potential to us.
Top digital transformation companies often board MLOps experts experienced in understanding and managing the MLflow lifecycle and how to build and run models with ease, and track and code environments, etc.
Hassan Sherwani is the Head of Data Analytics and Data Science working at Royal Cyber. He holds a PhD in IT and data analytics and has acquired a decade worth experience in the IT industry, startups and Academia. Hassan is also obtaining hands-on experience in Machine (Deep) learning for energy, retail, banking, law, telecom, and automotive sectors as part of his professional development endeavors.