Chapter 4 MLflow

MLflow is an open source platform to manage the machine learning lifecycle end-to-end. This includes the experimentation phase of ML models, their development to be reproducible, their deployment, and the registration of a ML model to be served. MLflow provides four primary components to manage the ML lifecycle. They can be either used on their own or they also to work together.

  • MLflow Tracking is used to log and compare model parameters, code versions, metrics, and artifacts of an ML code. Results can be stored to local files or to remote servers, and can be compared over multiple runs. MLflow Tracking comes with an API and a web interface to easily observe all logged parameters and artifacts.
  • MLflow Models enables to manage and deploy machine learning models from multiple libraries. It allows to package your own ML model for later use in downstream tasks, e.g. real-time serving through a REST API. The package format defines a convention that saves the model in different “flavors” that can be interpreted by different downstream tools.
  • MLflow Registry provides a central model store to collaboratively manage the full lifecycle of a MLflow Model, including model versioning, stage transitions, and annotations. It comes with an API and user interface for easy use of such funtionalities and each of those aspects can be checked in MLflows’ web interface.
  • MLflow Projects packages data science code in a standard format for a reusable, and reproducible form to share your code with other data scientists or transfer it to production. A project might be a local directory or a Git repository which uses a descriptor file to specify its dependencies and entrypoints. An existing MLflow Project can be also run either locally or from a Git repository.

MLflow is library-agnostic, which means one can use it with any ML library and programming language. All functions are accessible through a REST API and CLI, but quite conveniently the project comes with a Python API, R API, and a Java API already included. It is even possible to define your own plugins. The aim is to make its use as reproducible and reusable as possible so Data Scientists require minimal changes to integrate MLflow into their existing codebase. MLflow also comes with a user web interface to conveniently view and compare models and metrics.

Web Interface of MLflow

Prerequisites

To go through this chapter it is necessary to have Python and MLflow installed. One can install MLflow locally via pip install MLflow. This tutorial is based on MLflow v2.1.1. It is also recommended to have knowledge of VirtualEnv, Conda, or Docker when working with MLflow Projects.