Nebius Llm Fine Tuning Mlflow
Nebius Llm Fine Tuning Mlflow
Whitepaper
LLM fine-tuning
with MLflow:
a practical guide
1
Table of contents
Introduction 3
Conclusion 21
References 23
2
Introduction
Large Language Models (LLMs) has revolutionized how we use AI
by making text generation and interpretation easier than ever. They are not
just for experts anymore — now, anyone can create AI tools for problem-
solving or boosting productivity.
3
MLflow: Brief
product overview
MLflow is a popular open-source tool for experiment tracking and simplified
model management, enabling you to streamline and enhance your
MLOps pipeline. Many ML teams around the world use MLflow in various
configurations to develop AI models and deploy them to production. Below
are three core advantages this tool offers to ML engineers and data scientists:
Today, there are many MLOps platforms and tools that offer functionality
similar to MLflow. Some are available as software-as-a-service solutions,
while others come as on-premises installations. As an open-source tool,
MLflow remains an affordable MLOps solution, allowing small and medium
ML teams to compete effectively in the global AI race. However, its
open-source distribution model does come with certain limitations, such
as a lack of stability guarantees and added complexity in installation
and maintenance.
4
LLM fine-
tuning for GenAI
applications
Let’s imagine your ML team received a task to develop a chatbot
for the upcoming GenAI feature in your SaaS product. According
to the initial specification, the foundation of this service should
be one of the existing LLMs, like Llama 3.
The diagram below shows four main stages your team will likely
go through to accomplish this goal.
You may notice that the data preparation stage is not included
in the diagram. This is a large and complex activity that we have
intentionally chosen to omit to keep the focus on the fine-tuning
part of the process.
5
Stage 1. Model selection
At this stage, the goal is to choose a pre-trained LLM model that
is most suitable for our needs. First, define and formulate the technical
requirements, considering the parameters and conditions of your business
use case. Then, shortlist the most suitable models and conduct several
runs to evaluate their performance quality.
Performance-consumption tradeoff
Bigger LLMs usually deliver better answer accuracy but their inference
requires more compute resources, significantly impacting the unit economics
of the project
Evaluation complexity
Selecting an LLM requires considering multiple parameters across different
dimensions. Without using a systematic approach, this can become confusing
License limitations
Different models have varying license and usage restrictions. Carefully
investigate the applicable scope for the selected model
6
Step 1: Define model requirements
Before selecting a base model, clearly define your task and specific
requirements:
7
Figure 2. MLflow can visually show the difference between models in tabular format or as a diagram.
(Source: DSC 2024 Tutorial RAG Evaluation with MLflow, GitHub)
Log metrics Use MLflow Metrics Tracking functionality to track all relevant
and parameters information about the model selection process.
about the model,
Use mlflow.autolog() for starting MLflow tracking.
data and resource
consumption Use manual logging functions to track custom metrics
and parameters.
Evaluate LLM with Use MLflow built-in metrics for evaluating LLMs for quantitative
built-in and custom and qualitative model measurement. Implement custom metrics
metrics using MLflow’s flexible metric logging system with heuristic-based
metrics and LLM-as-a-Judge metrics.
Design complexity
Changes in some parameters of the design process can cause unpredictable
(and sometimes undetectable) outcomes for the final fine-tuning strategy.
Reproducibility
Even minor differences in how you run your training can lead to different
outcomes, making it difficult to ensure consistency across iterations.
Collaboration complexity
Having multiple ML engineers work on the same project can turn the fine-
tuning design into a poorly organized and overly complicated process.
Scalability limitations
Practices valid for small projects may not translate effectively as the scale
of your efforts increases.
9
Step 4: Define project objectives Good practices
Figure 3. MLflow can automatically track dozens of LLM fine-tuning parameters and metrics.
10
Step 6: Define dataset strategy Good practices
• Select appropriate training data sources. • Test the data processing pipeline.
• Document resource
• Set up experiment tracking. consumption for each run.
11
Helpful MLflow practices
Centralize experiment Configure MLflow’s tracking URI to centralize experiment
tracking logging across teams (e.g., mlflow.set_tracking_uri()
for remote tracking).
Ensure reproducibility Log models with MLflow Tracking APIs to automatically infer
required dependencies for the model flavor.
12
Stage 3. Training
and evaluation
This stage involves training the model based on the designed fine-tuning
strategy. Typically, this stage includes running several training iterations
and evaluating the resulting metrics. As the most resource-intensive part
of the workflow, it requires careful planning to minimize errors and ensure
optimal results. The outcome of this stage is a fine-tuned LLM evaluated
against the initial requirements.
Hardware failures
Hardware issues, connection losses or resource limits can disrupt the training
process. Without proper checkpointing and recovery systems, hours of costly
progress can be lost.
Hyperparameter optimization
Finding the right combination of learning rate, batch size and other
parameters is particularly challenging with LLMs. Each test run is expensive
and time-consuming, making traditional optimization approaches impractical.
13
Step 9: Prepare data Good practices
14
Step 11: Evaluate the LLM Good practices
15
Helpful MLflow practices
Enable real-time metric Use mlflow.autolog() to capture training progress metrics
logging autmatically.
16
Stage 4. Model
management
and deployment
This stage focuses on preparing the fine-tuned model for production
deployment and ensuring it can deliver business value efficiently.
Additionally, this stage includes key procedures and routines
for maintaining a streamlined and scalable delivery environment.
17
Step 13: Manage environment and model Good practices
Figure 6. MLflow model is a standardized format for packaging that contains all metadata about the model and dependencies.
18
Step 15: Control model versions Good practices
Figure 7. Model Registry helps version models and manage lifecycle with aliases.
19
Helpful MLflow practices
Automate model Use MLflow’s automatic logging features to log models
and artifact logging and artifacts alongside experiment runs.
Use aliases for model Deploy and organize models with aliases and tags.
lifecycle management
Set up automated promotion workflows.
20
Conclusion
In this white paper, we explored the standard steps involved in fine-tuning
an LLM for a generative AI application. We also highlighted the value that
MLflow brings to every step of this process.
Key takeaways:
• MLflow gives you various options to organize the data about your
runs, experiments and results in the most convenient way.
• MLflow collects and stores model development
metadata in a standardized way, ensuring
the reproducibility of runs and experiments.
We hope this starter guide proves useful for ML teams and individuals
customizing existing models to extract additional value. MLflow’s robust
functionality is more than enough to perform large-scale training and fine-
tuning for ML teams of any scale.
21
Managed MLflow
in Nebius AI Cloud
Considering how useful and convenient MLflow is for ML teams,
we decided to launch it in our cloud as a fully-managed solution.
From the user’s perspective, this means you don’t have to worry about
software version control, updates or server maintenance. Nebius handles
all necessary compute and supporting services to ensure MLflow runs
seamlessly and is available out of the box.
22
References
• MLflow documentation: Tutorials and Use Case
Guides for GenAI applications in MLflow
23
© 2025 Nebius B.V.
24