GRU——Gated Recurrent Unit
符号 | 含义 | 矩阵大小 |
---|---|---|
x t x_t xt | 当前时刻的输入信息 | m × 1 m\times1 m×1 |
h t − 1 h_{t-1} ht−1 | 上一时刻的隐藏状态 | n × 1 n\times1 n×1 |
h t h_t ht | 传递到下一时刻的隐藏状态 | n × 1 n\times1 n×1 |
h ~ t \tilde{h}_t h~t | 候选隐藏状态 | n × 1 n\times1 n×1 |
z t z_t zt | 更新门输出 | n × 1 n\times1 n×1 |
r t r_t rt | 重置门输出 | n × 1 n\times1 n×1 |
σ \sigma σ | Sigmoid函数,值域 [ 0 , 1 ] [0,1] [0,1] | |
t a n h tanh tanh | tanh函数,值域 [ − 1 , 1 ] [-1,1] [−1,1] | |
W r W_r Wr | 重置门权重矩阵 | n × ( m + n ) n\times(m+n) n×(m+n) |
W z W_z Wz | 更新门权重矩阵 | n × ( m + n ) n\times(m+n) n×(m+n) |
W W W | 候选隐藏状态计算矩阵 | n × ( m + n ) n\times(m+n) n×(m+n) |
结构
重置门
r t = σ ( W r ⋅ [ h t − 1 , x t ] T ) r_t=\sigma(W_r\cdot[h_{t-1},x_t]^T) rt=σ(Wr⋅[ht−1,xt]T)
重置门选择了sigmoid激活函数将值映射到 [ 0 , 1 ] [0,1] [0,1]之间,所得的 r t r_t rt接下来对 h t − 1 h_{t-1} ht−1作Hadamard乘积,决定保留多少历史信息。重置的含义也就是利用记忆的信息与现在传入的信息决定要保留多少历史信息。
更新门
z t = σ ( W z ⋅ [ h t − 1 , x t ] T ) z_t=\sigma(W_z\cdot[h_{t-1},x_t]^T) zt=σ(Wz⋅[ht−1,xt]T)
更新门决定使用多少历史信息和当前信息来更新当前隐藏状态
候选隐藏状态
h ~ t = t a n h ( W ⋅ [ r t ⊙ h t − 1 , x t ] T ) \tilde{h}_t=tanh(W\cdot[r_t\odot h_{t-1},x_t]^T) h~t=tanh(W⋅[rt⊙ht−1,xt]T)
利用重置门计算得到的 r t r_t rt 对过去的 h t − 1 h_{t-1} ht−1 进行遗忘,再并入现在的信息 x t x_t xt,利用 W W W 作映射再使用tanh激活。
更新隐藏状态
h t = ( 1 − z t ) ⊙ h t − 1 + z t ⊙ h ~ t h_t=(1-z_t)\odot h_{t-1}+z_t\odot\tilde{h}_t ht=(1−zt)⊙ht−1+zt⊙h~t
利用更新门计算得到的 z t z_t zt 对过去的历史信息 h t − 1 h_{t-1} ht−1 作 Hadamard 乘积,决定留下多少历史信息。再对候选隐藏状态 h ~ t \tilde{h}_t h~t 作 Hadamard 乘积,决定并入多少当前信息。
下一层的输入
上一层每个时间步输出的 h t h_t ht 将作为下一层输入的 x t x_t xt 。
h t h_t ht 的初始化
每一层的 h 0 h_0 h0 都需要初始化,如果不传递参数,pytorch默认全为 0 0 0
参数问题
共享参数
GRU在不同时间步的网络参数是相同的,在每一层之间是不一样的。这使得网络可以处理任意长度的序列。
参数量
设层数为num_layers,整个GRU的参数量为:
n u m _ l a y e r s × 3 × n × ( m + n ) num\_layers\times3\times n\times(m+n) num_layers×3×n×(m+n)
梯度消失问题的解决
-
重置门帮助捕获序列中的短期依存关系(short-term dependencies)
-
更新门帮助捕获序列中的长期依存关系(long-term dependencies)
代码
nn.GRU简介
import torch.nn as nn
rnn = nn.GRU(input_size, hidden_size, num_layers, bias,
batch_first, dropout, bidirectional)
-
input_size:特征维度 m m m
-
hidden_size:隐藏层维度 n n n
-
num_layers:网络层数
-
bias:线性映射时是否使用bias项,默认为T
-
batch_first:如果为F,输入维度应为
(time_step, batch, input_size)
;如果为T,输入维度应为(batch, time_step, input_size)
; -
dropout:隐藏层dropout率,默认为 0 0 0
-
bidirectional:是否使用双向的GRU,如果为T,则自动将序列正序、反序各输入一次
output, h_n = gru(input, h_0)
-
h_0是初始化的第一个时间步的隐藏状态,维度需要为
(num_layers * num_directions, batch, hidden_size)
-
output形状:(
time_step,batch,num_directions * hidden_size)
。这个ouput包含了每个时间步的输出。可以使用output.view(seq_len, batch, num_directions, hidden_size)
分解维度。 -
隐藏层形状:
(num_layers * num_directions, batch, hidden_size)
。可以使用h_n.view(num_layers, num_directions, batch, hidden_size)
分解维度。
数值型预测输出的完整代码
使用MSE作为Loss,同时将时间序列数据划分为一个一个滑动时间步输入进行训练。利用Adam优化器。
import time
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from torch.utils.tensorboard import SummaryWriter
class GRUNet(nn.Module):
"""
A GRU model for sequence processing.
Parameters
----------
input_size : int
The number of input features (dimension of the input data).
hidden_size : int
The number of features in the hidden state of the GRU.
num_layers : int
The number of stacked GRU layers.
output_size : int
The number of output features (dimension of the final output).
bias : bool, optional, default=True
Whether to use bias in the GRU layers.
output_type : {'last', 'mean'}, optional, default='last'
Determines how to process GRU outputs:
- 'last' uses the output from the last time step.
- 'mean' uses the average of all time steps.
dropout : float, optional, default=0.2
Dropout rate applied to the GRU layers to prevent overfitting.
bidirectional : bool, optional, default=False
Whether to use a bidirectional GRU.
"""
def __init__(self, input_size, hidden_size, num_layers, output_size, bias=True, output_type='last', dropout=0.2, bidirectional=False):
super(GRUNet, self).__init__()
# Initialize the GRU layer
self.gru = nn.GRU(
input_size=input_size,
hidden_size=hidden_size,
num_layers=num_layers,
bias=bias,
batch_first=True,
dropout=dropout,
bidirectional=bidirectional
)
# Fully connected layer (adjusted for bidirectional GRU)
self.fc = nn.Linear(hidden_size * (2 if bidirectional else 1), output_size)
# Output type: 'last' for last timestep or 'mean' for the mean of all timesteps
self.output_type = output_type
def forward(self, x, h_0=None):
"""
Perform a forward pass through the GRU model.
Parameters
----------
x : torch.Tensor
Input tensor of shape (seq_len, batch_size, input_size).
h_0 : torch.Tensor, optional
Initial hidden state of shape (seq_len, batch_size, input_size).
If None, it will be initialized to zeros.
Returns
-------
torch.Tensor
The output tensor after passing through the GRU and the fully connected layer.
The shape of the output is (batch_size, output_size).
"""
# Pass input through the GRU layer
out, _ = self.gru(x, h_0)
# Process the output based on the specified output type
if self.output_type == 'last':
out = out[-1, :, :] # Use the output from the last timestep
elif self.output_type == 'mean':
out = out.mean(dim=0) # Compute the mean over all timesteps
# Pass the processed output through the fully connected layer
out = self.fc(out)
return out
def initialize_model(config, device):
"""
Initialize the model, loss function, and optimizer.
Parameters
----------
config : dict
Hyperparameter dictionary containing settings such as input_size, hidden_size, etc.
device : torch.device
Device to run the model on ('cuda' or 'cpu').
Returns
-------
model : GRUNet
The initialized GRU model.
criterion : nn.Module
The loss function (Mean Squared Error in this case).
optimizer : torch.optim.Optimizer
The optimizer (Adam in this case).
"""
# Initialize the GRU model
model = GRUNet(
input_size=config['input_size'],
hidden_size=config['hidden_size'],
num_layers=config['num_layers'],
output_size=config['output_size'],
bias=config["bias"],
output_type=config['output_type'],
dropout=config['dropout'],
bidirectional=config['bidirectional']
).to(device)
# Loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=config['learning_rate'])
return model, criterion, optimizer
def train(model, train_loader, criterion, optimizer, num_epochs, device, writer):
"""
Train the GRU model.
Parameters
----------
model : nn.Module
The GRU model to train.
train_loader : torch.utils.data.DataLoader
DataLoader for training data.
criterion : nn.Module
The loss function.
optimizer : torch.optim.Optimizer
The optimizer used for training.
num_epochs : int
Number of epochs to train the model.
device : torch.device
Device to run the model on ('cuda' or 'cpu').
writer : SummaryWriter
TensorBoard writer to log the training process.
Notes
-----
This function logs the following to TensorBoard:
- Loss at each training step.
- Average loss per epoch.
- Gradients of all model parameters.
"""
global_step = 0
for epoch in range(num_epochs):
model.train()
epoch_loss = 0
start_time = time.time()
for _, (inputs, targets) in enumerate(train_loader):
inputs, targets = inputs.to(device), targets.to(device)
# Forward pass
outputs = model(inputs)
loss = criterion(outputs, targets)
# Backward pass
optimizer.zzero_grad()
loss.backward()
optimizer.step()
# Accumulate loss for the epoch
epoch_loss += loss.item()
# Log the loss for the current step
global_step += 1
writer.add_scalar(f'Loss/train', loss.item(), global_step)
# Log gradients for all parameters every step
for name, param in model.named_parameters():
if param.grad is not None:
writer.add_histogram(f'Gradients/{name}', param.grad, global_step)
# Log the average loss for the epoch
avg_epoch_loss = epoch_loss / len(train_loader)
writer.add_scalar(f'Loss/epoch', avg_epoch_loss, epoch)
# Print results for the epoch
print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {avg_epoch_loss:.4f}, Time: {time.time() - start_time:.2f}s")
# Log the gradients of parameters after the epoch
for name, param in model.named_parameters():
if param.grad is not None:
writer.add_histogram(f'Gradients/{name}_epoch', param.grad, epoch)
def load_data(file_name, seq_len, step_size):
"""
Load and process data for time series prediction using a sliding window.
Parameters:
-----------
file_name (str): Path to CSV file.
seq_len (int): Length of the sliding window.
step_size (int): Step size for sliding window (how much to move per step).
Returns:
--------
tuple: (X, y) where:
- X is the input data
- y is the target data
"""
data = pd.read_csv(file_name)
# Assume the last column is the target
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
X_windowed, y_windowed = [], []
# Create windows for training
for i in range(0, len(X) - seq_len, step_size):
X_windowed.append(X[i:i + seq_len])
y_windowed.append(y[i + seq_len]) # Predict the next time step
# If there are remaining samples that couldn't form a full window, add them as well
if len(X) % step_size != 0:
# Last incomplete sequence
X_windowed.append(X[-seq_len:-1])
y_windowed.append(y[-1]) # Use the last available target
X = torch.tensor(X_windowed, dtype=torch.float32)
y = torch.tensor(y_windowed, dtype=torch.float32)
return X, y
def save_model(model, model_filename):
"""
Save the trained model to a specified file.
Parameters:
-----------
model (nn.Module): The trained PyTorch model.
model_filename (str): The file path where the model will be saved.
Returns:
--------
None
"""
torch.save(model.state_dict(), model_filename)
print(f"Model saved to {model_filename}")
def main():
"""
Main function to train and evaluate the GRU model.
"""
# Set device (GPU or CPU)
device = torch.device(config['device'] if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
# Load data
file_name = config['data_file']
seq_len = config['time_step']
step_size = config['step_size']
X, y = load_data(file_name, seq_len, step_size)
train_dataset = TensorDataset(X, y)
train_loader = DataLoader(train_dataset, batch_size=config['batch_size'], shuffle=True)
# Initialize model, criterion, and optimizer
model, criterion, optimizer = initialize_model(config, device)
# TensorBoard writer
writer = SummaryWriter(config['log_dir'])
# Train the model
train(model, train_loader, criterion, optimizer, config['num_epochs'], device, writer)
# Save the trained model with a defined filename
model_filename = config['model_filename']
save_model(model, model_filename)
# Close the TensorBoard writer
writer.close()
if __name__ == "__main__":
# Configuration file for training the GRU model
config = {
# Data parameters
'data_file': 'data.csv', # Path to the CSV file containing input features and targets
'time_step': 5, # Length of the sliding window (sequence length). Defines the number of time steps to consider for each input sequence.
'step_size': 1, # Step size for sliding window. Defines how much to move the window for each new sequence.
# Model parameters
'input_size': 10, # Number of input features (dimension of the input data)
'hidden_size': 64, # Number of features in the hidden state of the GRU
'num_layers': 2, # Number of stacked GRU layers
'output_size': 1, # Number of output features (dimension of the final output)
'bias': True, # Whether to use bias in GRU layers
'output_type': 'last', # 'last' or 'mean', how to process GRU outputs
'dropout': 0.2, # Dropout rate applied to GRU layers
'bidirectional': False, # Whether to use bidirectional GRU
# Training parameters
'learning_rate': 0.001, # Learning rate for the optimizer
'batch_size': 64, # Batch size for training
'num_epochs': 20, # Number of epochs to train the model
# TensorBoard parameters
'log_dir': 'runs/gru_experiment', # Directory to save TensorBoard logs
# Model saving parameters
'model_filename': 'gru_model.pth', # Path to save the trained model
# Device configuration (GPU or CPU)
'device': 'cuda', # Device to run the model on ('cuda' or 'cpu')
}
main()