0% found this document useful (0 votes)
4 views

Model Training

The document outlines the process of training a machine learning model, emphasizing the high costs associated with pretraining and providing resources for cost estimation. It details the steps for loading a pretrained model, configuring training arguments, and setting up a custom dataset for training. Additionally, it includes code snippets for logging training loss and generating text from an intermediate model checkpoint.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Model Training

The document outlines the process of training a machine learning model, emphasizing the high costs associated with pretraining and providing resources for cost estimation. It details the steps for loading a pretrained model, configuring training arguments, and setting up a custom dataset for training. Additionally, it includes code snippets for logging training loss and generating text from an intermediate model checkpoint.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

keyboard_arrow_down Lesson 5.

Model training
Pretraining is very expensive! Please check costs carefully before starting a pretraining project.

You can get a rough estimate your training job cost using this calculator from Hugging Face. For
training on other infrastructure, e.g. AWS or Google Cloud, please consult those providers for up to
date cost estimates.

from google.colab import drive


drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.moun

import warnings
warnings.filterwarnings('ignore')
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
device = "cuda"

keyboard_arrow_down 1. Load the model to be trained


Load the upscaled model from the previous lesson:

import torch
from transformers import AutoModelForCausalLM

pretrained_model = AutoModelForCausalLM.from_pretrained(
"./drive/MyDrive/TinySolar-308m-4k-init",
device_map="auto",
torch_dtype=torch.bfloat16,
use_cache=False,

# Ready for pretraining!


print(pretrained_model)

LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(32000, 1024)
(layers): ModuleList(
(0-15): 16 x LlamaDecoderLayer(
(self_attn): LlamaAttention(
(q_proj): Linear(in_features=1024, out_features=1024, bias=False)
(k_proj): Linear(in_features=1024, out_features=256, bias=False)
(v_proj): Linear(in_features=1024, out_features=256, bias=False)
(o_proj): Linear(in_features=1024, out_features=1024, bias=False)
)
(mlp): LlamaMLP(
(gate_proj): Linear(in_features=1024, out_features=4096, bias=False)
(up_proj): Linear(in_features=1024, out_features=4096, bias=False)
(down_proj): Linear(in_features=4096, out_features=1024, bias=False)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm((1024,), eps=1e-06)
(post_attention_layernorm): LlamaRMSNorm((1024,), eps=1e-06)
)
)
(norm): LlamaRMSNorm((1024,), eps=1e-06)
(rotary_emb): LlamaRotaryEmbedding()
)
(lm_head): Linear(in_features=1024, out_features=32000, bias=False)
)

keyboard_arrow_down 2. Load dataset


Here you'll update two methods on the Dataset object to allow it to interface with the trainer.
These will be applied when you specify the dataset you created in Lesson 3 as the training data in
the next section.

Note that the code has additional comment strings that don't appear in the video. These are to help
you understand what each part of the code is doing.

!pip install datasets==2.16.1

Requirement already satisfied: datasets==2.16.1 in /usr/local/lib/python3.11/dist-packag


Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.11/dist-packages (f
Requirement already satisfied: pyarrow>=8.0.0 in /usr/local/lib/python3.11/dist-packages
Requirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.11/dist-packages
Requirement already satisfied: dill<0.3.8,>=0.3.0 in /usr/local/lib/python3.11/dist-pack
Requirement already satisfied: pandas in /usr/local/lib/python3.11/dist-packages (from d
Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.11/dist-packag
Requirement already satisfied: tqdm>=4.62.1 in /usr/local/lib/python3.11/dist-packages (
Requirement already satisfied: xxhash in /usr/local/lib/python3.11/dist-packages (from d
Requirement already satisfied: multiprocess in /usr/local/lib/python3.11/dist-packages (
Requirement already satisfied: fsspec<=2023.10.0,>=2023.1.0 in /usr/local/lib/python3.11
Requirement already satisfied: aiohttp in /usr/local/lib/python3.11/dist-packages (from
Requirement already satisfied: huggingface-hub>=0.19.4 in /usr/local/lib/python3.11/dist
Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (fro
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.11/dist-packages (f
Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.11/dist
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.11/dist-packag
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.11/dist-packages
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.11/dist-packa
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.11/dist-pac
Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.11/dist-packag
Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.11/dist-packa
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.11/d
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dis
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-pack
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-pack
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from

import datasets
from torch.utils.data import Dataset
import torch

class CustomDataset(Dataset):
def __init__(self, args, split="train"):
self.args = args
self.dataset = datasets.load_dataset(
"parquet",
data_files=args.dataset_name,
split=split
)
def __len__(self):
return len(self.dataset)

def __getitem__(self, index):


# Convert the lists to a LongTensor for Pytorch
input_ids = torch.LongTensor(self.dataset[index]["input_ids"])
labels = torch.LongTensor(self.dataset[index]["input_ids"])
# NOTE: we are putting labels = input_ids are because
# we wanna perform next token prediction

# Return the sample as a dictionary


return {"input_ids": input_ids, "labels": labels}

keyboard_arrow_down 3. Configure Training Arguments


Here you set up the training run. The training dataset you created in Lesson 3 is specified in the
Dataset configuration section.
Note: there are comment strings in the cell below that don't appear in the video. These have been
included to help you understand what each parameter does.

from dataclasses import dataclass, field


import transformers

@dataclass
class CustomArguments(transformers.TrainingArguments):
# Dataset configuration
dataset_name: str = field(
default="packaged_pretrain_dataset.parquet"
)
num_proc: int = field(default=8)
max_seq_length: int = field(default=32)

# Core training configuration


optim: str = field(default="adamw_torch")
max_steps: int = field(default=30)
per_device_train_batch_size: int = field(default=2)

# Other training configurations


seed: int = field(default=0)
learning_rate: float = field(default=5e-5)
weight_decay: float = field(default=0)
warmup_steps: int = field(default=10)
lr_scheduler_type: str = field(default="linear")
gradient_checkpointing: bool = field(default=True)

bf16: bool = field(default=True)


gradient_accumulation_steps: int = field(default=1)
dataloader_num_workers: int = field(default=0)

# Logging configurationn
logging_steps: int = field(default=3)
report_to: str = field(default="none")

# Saving configurations
# save_strategy: str = field(default="steps")
# save_steps: int = field(default=3)
# save_total_limit: int = field(default=2)

from dataclasses import dataclass, field


import transformers

@dataclass
class CustomArguments(transformers.TrainingArguments):
dataset_name: str = field( # Dataset configuration
default="packaged_pretrain_dataset.parquet")
num_proc: int = field(default=1) # Number of subprocesses for data p
max_seq_length: int = field(default=32) # Maximum sequence length
# Core training configurations
seed: int = field(default=0) # Random seed for initialization, e
optim: str = field(default="adamw_torch") # Optimizer, here it's AdamW implem
max_steps: int = field(default=30) # Number of maximum training steps
per_device_train_batch_size: int = field(default=2) # Batch size per device during trai

# Other training configurations


learning_rate: float = field(default=5e-5) # Initial learning rate for the opt
weight_decay: float = field(default=0) # Weight decay
warmup_steps: int = field(default=10) # Number of steps for the learning
lr_scheduler_type: str = field(default="linear") # Type of learning rate scheduler
gradient_checkpointing: bool = field(default=True) # Enable gradient checkpointing to
dataloader_num_workers: int = field(default=0) # Number of subprocesses for data l
bf16: bool = field(default=True) # Use bfloat16 precision for traini
gradient_accumulation_steps: int = field(default=1) # Number of steps to accumulate gra

# Logging configuration
logging_steps: int = field(default=3) # Frequency of logging training inf
report_to: str = field(default="none") # Destination for logging (e.g., Wa

# Saving configuration
save_strategy: str = field(default="steps") # Can be replaced with "epoch"
save_steps: int = field(default=3) # Frequency of saving training chec
save_total_limit: int = field(default=2) # The total number of checkpoints t

Parse the custom arguments and set the output directory where the model will be saved:

parser = transformers.HfArgumentParser(CustomArguments)
args, = parser.parse_args_into_dataclasses(
args=["--output_dir","output"]
)

Setup the training dataset:

train_dataset = CustomDataset(args=args)

Check the shape of the dataset:

print("Input shape: ", train_dataset[0]['input_ids'].shape)

Input shape: torch.Size([32])

keyboard_arrow_down 4. Run the trainer and monitor the loss


First, set up a callback to log the loss values during training (note this cell is not shown in the
video):

from transformers import Trainer, TrainingArguments, TrainerCallback

# Define a custom callback to log the loss values


class LossLoggingCallback(TrainerCallback):
def on_log(self, args, state, control, log=None, **kwargs):
if log is not None:
self.logs.append(log)
def __init__(self):
self.logs = []

# Initialize the callback


loss_logging_callback = LossLoggingCallback()

Then, create an instance of the Hugging Face Trainer object from the transformers library. Call
the train() method of the trainder to initialize the training run:

trainer = Trainer(
model=pretrained_model,
args=args,
train_dataset=train_dataset,
eval_dataset=None,
callbacks=[loss_logging_callback],

trainer.train()
[30/30 02:54, Epoch 0/1]
Step Training Loss

3 4.519700

6 4.468100

9 4.413700

12 4.804900

15 4.517900

18 4.849500

21 3.763700

24 4.778400

27 4.008900

30 4.191900

TrainOutput(global_step=30, training_loss=4.431664880116781, metrics={'train_runtime':


174.5637, 'train_samples_per_second': 0.344, 'train_steps_per_second': 0.172,
'total flos': 3180342804480 0 'train loss': 4 431664880116781 'epoch':

You can use the code below to save intermediate model checkpoints in your own training run:

# Saving configuration
# save_strategy: str = field(default="steps") # Can be replaced with "epoch"
# save_steps: int = field(default=3) # Frequency of saving training ch
# save_total_limit: int = field(default=2) # The total number of checkpoints

keyboard_arrow_down Checking the performance of an intermediate checkpoint


Below, you can try generating text using an intermediate checkpoint of the model. This checkpoint
was saved after 10,000 training steps. As you did in previous lessons, you'll use the Solar tokenizer
and then set up a TextStreater object to display the text as it is generated:

from transformers import AutoTokenizer, TextStreamer


model_name_or_path = "./drive/MyDrive/TinySolar-308m-4k-init"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

from transformers import AutoTokenizer, TextStreamer, AutoModelForCausalLM


import torch

model_name_or_path = "./output/checkpoint-30"
model2 = AutoModelForCausalLM.from_pretrained(
model_name_or_path,
device_map="auto",
torch_dtype=torch.bfloat16,
)

prompt = "I am an engineer. I love"

inputs = tokenizer(prompt, return_tensors="pt").to(model2.device)

streamer = TextStreamer(
tokenizer,
skip_prompt=True,
skip_special_tokens=True
)

outputs = model2.generate(
**inputs,
streamer=streamer,
use_cache=True,
max_new_tokens=64,
do_sample=True,
temperature=1.0,
)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


the people who go there, so it is because for me, my passion to become the next leader o

Start coding or generate with AI.

You might also like