5 Minute Tracking Server Overview
In this guide we will walk you through how to view your MLflow experiment results with different types of tracking server configurations. At a high level, there are 3 ways to view your MLflow experiments:
- [Method 1] Start your own MLflow server.
- [Method 2] Use a free hosted tracking server - Databricks Free Trial.
- [Method 3] Use production Databricks/AzureML.
To choose among these 3 methods, here is our recommendation:
- If you have privacy concerns (data/model/tech stack), use Method 1 - start your own server.
- If you are a student or an individual researcher, or if you are developing in cloud-based notebooks (e.g., Google Colab), use Method 2 - free hosted tracking server.
- Enterprise users, or if you want to serve or deploy your model for a production use-case, please use Method 3 - production Databricks/AzureML.
Overall Method 2 - free hosted tracking server is the simplest way to get started with MLflow, but please pick the method that best suits your needs.
Method 1: Start Your Own MLflow Server​
Disclaimier: This part of guide is not suitable for running in a cloud-provided IPython environment (e.g., Collab, Databricks). Please follow the guide below in your local machine (laptop/desktop).
A hosted tracking server is the simplest way to store and view MLflow experiments, but it is not suitable for every user. For example, you may not want to expose your data and model to others in your cloud provider account. In this case, you can use a local hosted MLflow server to store and view your experiments. To do so, there are two steps:
- Start your MLflow server.
- Connect MLflow session to the local MLflow server IP by
mlflow.set_tracking_uri()
.
Start a Local MLflow Server​
If you don't have MLflow installed, please run the command below to install it:
$ pip install mlflow
The installation of MLflow includes the MLflow CLI tool, so you can start a local MLflow server with UI by running the command below in your terminal:
$ mlflow ui
It will generate logs with the IP address, for example:
(mlflow) [master][~/Documents/mlflow_team/mlflow]$ mlflow ui
[2023-10-25 19:39:12 -0700] [50239] [INFO] Starting gunicorn 20.1.0
[2023-10-25 19:39:12 -0700] [50239] [INFO] Listening at: http://127.0.0.1:5000 (50239)
Opening the URL of the MLflow tracking server in your browser will bring you to the MLflow UI. The image below is from the open source version of the MLflow UI, which is a bit different from the MLflow UI on Databricks Workspaces. Below is a screenshot of the landing page:
It's also possible to deploy your own MLflow server on cloud platforms, but it is out of the scope of this guide.
Connect MLflow Session to Your Server​
Now that the server is spun up, let's connect our MLflow session to the local server. This is very similar to how we connect to a remote hosted tracking provider such as the Databricks platform.
mlflow.set_tracking_uri("http://localhost:5000")
Next, let's try logging some dummy metrics. We can view these test metrics on the local hosted UI:
mlflow.set_experiment("check-localhost-connection")
with mlflow.start_run():
mlflow.log_metric("foo", 1)
mlflow.log_metric("bar", 2)
Putting it together you can copy the following code to your editor and save it as log_mlflow_with_localhost.py:
import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("check-localhost-connection")
with mlflow.start_run():
mlflow.log_metric("foo", 1)
mlflow.log_metric("bar", 2)
Then execute it by:
$ python log_mlflow_with_localhost.py
View Experiment on Your MLflow Server​
Now let's view your experiment on the local server. Open the URL in your browser, which is http://localhost:5000 in our case. In the UI, inside the left sidebar you should see the experiment with name "check-localhost-connection". Clicking on this experiment name should bring you to the experiment view, similar to what is shown below.
Clicking on the run ("clumsy-steed-426" in this example, yours will be different) will bring you to the run view, similar as below.
Conclusion​
That's all about how to start your own MLflow server and view your experiments. Please see the pros and cons of this method below:
-
Pros
- You have full control of your data and model, which is good for privacy concerns.
- No subscription is required.
- Unlimited quota of experiments/runs.
- You can even customize your UI by forking the MLflow repo and modify the UI code.
-
Cons
- Requires manual setup and maintenance.
- Team collaboration is harder than using a hosted tracking server.
- Not suitable for cloud-based notebook, e.g., Google Colab.
- Requires extra port forwarding if you deploy your server on cloud VM.
- No serving support.
Method 2: Use Free Hosted Tracking Server (Databricks Free Trial)​
The Databricks Free Trial offers an opportunity to experience nearly full functionlities of the Databricks platform including managed MLflow. You can use a Databricks Workspace to store and view your MLflow experiments without being charged within the free trial period. Refer to the instructions in Try Managed MLflow for how to use the Databricks Free Trial to store and view your MLflow experiments.
Conclusion​
The pros and cons of this method are summalized as follows:
-
Pros
- Effortless setup.
- Free within free trial credits and periods.
- Good for collaboration, e.g., you can share your MLflow experiment with your teammates easily.
- Compatible for developing on cloud-based notebook, e.g., Google Colab.
- Compatible for developing on cloud VM.
-
Cons
- Has quota limit and time limit.
Method 3: Use Production Hosted Tracking Server​
If you are an enterprise user and willing to productionize your model, you can use a production platform like Databricks or Microsoft AzureML. If you use Databricks, MLflow experiment will log your model into the Databricks MLflow server, and you can register your model then serve your model by a few clicks.
The method of using production Databricks is the same as using Databricks Free Trial, you only need to
change the host to be the production workspace. For example, https://dbc-1234567-123.cloud.databricks.com
.
For more information about how Databricks power your Machine Learning workflow, please refer to the doc
here.
To use AzureML as the tracking server, please read the doc here
Conclusion​
That's all about how to use a production platform as the tracking server. Please see the pros and cons of this method below:
-
Pros
- Effortless setup.
- Good for collaboration, e.g., you can share your MLflow experiment with your teammates easily.
- Compatible for developing on cloud-based notebook, e.g., Google Colab.
- Compatible for developing on cloud VM.
- Seamless model registration/serving support.
- Higher quota than the Databricks Free Trial (pay as you go).
-
Cons
- Not free.
- Need to manage a billing account.