PyTorch Geometric的Mini-batches

本文介绍如何使用PyTorch Geometric加载ENZYMES数据集,并详细解析Batch对象的属性,包括ptr和batch的作用及其实现原理。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

官方文档 链接

加载ENZYMES数据集

from torch_geometric.datasets import TUDataset
from torch_geometric.data import DataLoader


dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES', use_node_attr=True)

loader = DataLoader(dataset, batch_size=4, shuffle=True)

ENZYMES数据集

batch

获取一个batch

batch = loader.__iter__().next()
print(batch)
# Batch(batch=[169], edge_index=[2, 556], ptr=[5], x=[169, 21], y=[4])

由于batch_size=4,所以batch中有4个图。batch的属性如图所示:
在这里插入图片描述

batch.keys
# ['x', 'edge_index', 'y', 'batch', 'ptr']

batch[0].keys
# ['x', 'edge_index', 'y']
取出单个数据
for i in range(batch.num_graphs):
    print(batch[i])
"""
Data(edge_index=[2, 178], x=[50, 21], y=[1])
Data(edge_index=[2, 114], x=[30, 21], y=[1])
Data(edge_index=[2, 160], x=[60, 21], y=[1])
Data(edge_index=[2, 104], x=[29, 21], y=[1])
"""
ptr属性

注意ptr这个属性,如果要把batch中的4个图取出来需要这个属性。
在这里插入图片描述

  • batch[0]就是[0:50] 50-0=50
  • batch[1]就是[50:80] 80-50=30
  • batch[2]就是[80:140] 140-80=60
  • batch[3]就是[140:169] 169-140=29
batch属性

输出batch属性查看一下
在这里插入图片描述
发现连续50个0,30个1,60个2,29个3

batch是怎么区分数据包括哪些的
batch.__slices__
""
{'y': [0, 1, 2, 3, 4], 
'x': [0, 50, 80, 140, 169], 
'edge_index': [0, 178, 292, 452, 556]}
""

获取batch[0]的时候,根据batch.__slices__

  • batch[0]['y'] = batch['y'][ batch.__slices__['y'][0]:batch.__slices__['y'][0+1] ]
  • batch[0]['x'] = batch['x'][ batch.__slices__['x'][0]:batch.__slices__['x'][0+1] ]
  • batch[0]['edge_index'] = batch['edge_index'][ batch.__slices__['edge_index'][0]:batch.__slices__['edge_index'][0+1] ]

获取batch[1]batch[2]、… 、batch[n]的时候,只用将 0 0 0改为相应的下标即可

PyTorch Geometric(PYG)-实现小批量data类中__inc__与__cat_dim__的含义与作用

https://blog.csdn.net/qq_41795143/article/details/114281387

### Mini-GCN: Implementation and Applications in Graph Neural Networks Mini-GCN is an approach that addresses the limitations of traditional Graph Convolutional Networks (GCNs) by employing mini-batch training techniques. Traditional GCNs, as mentioned earlier[^1], require holding the entire graph adjacency matrix and node features in memory during training, leading to high computational and memory complexities. This limitation makes it challenging to scale GCNs to larger graphs. #### Principle of Mini-GCN The principle behind Mini-GCN lies in its ability to process smaller subsets of the graph data at a time, reducing both memory usage and computational overhead. By using stochastic gradient descent with mini-batches, Mini-GCN can efficiently train on large-scale graphs without needing to store the entire graph structure in memory. The key idea involves sampling subgraphs or nodes from the original graph for each batch update, allowing the model to generalize well while maintaining manageable resource consumption. #### Memory Complexity Reduction For an L-layer GCN model, the time complexity is 𝒪(Lnd²) and the memory complexity is 𝒪(Lnd + Ld²)[^1]. Mini-GCN reduces these complexities significantly by limiting the number of nodes processed simultaneously through mini-batching strategies. Instead of processing all n nodes at once, only a subset s << n is used per iteration, thereby lowering the effective complexities to approximately 𝒪(Lsd²) for time and 𝒪(Lsd + Ld²) for memory when s is much smaller than n. #### Implementation Details Below is a simplified implementation of Mini-GCN using PyTorch Geometric, a popular library for graph neural networks: ```python import torch from torch_geometric.loader import NeighborLoader from torch_geometric.nn import GCNConv from torch_geometric.datasets import Planetoid # Load dataset dataset = Planetoid(root='/tmp/Cora', name='Cora') data = dataset[0] # Define Mini-GCN model class MiniGCN(torch.nn.Module): def __init__(self, input_dim, hidden_dim, output_dim): super(MiniGCN, self).__init__() self.conv1 = GCNConv(input_dim, hidden_dim) self.conv2 = GCNConv(hidden_dim, output_dim) def forward(self, x, edge_index): x = self.conv1(x, edge_index).relu() x = self.conv2(x, edge_index) return x # Initialize model, loss function, and optimizer model = MiniGCN(dataset.num_features, 16, dataset.num_classes) criterion = torch.nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters(), lr=0.01) # Create neighbor loader for mini-batch training loader = NeighborLoader(data, num_neighbors=[25, 10], batch_size=128, shuffle=True) # Training loop model.train() for epoch in range(200): total_loss = 0 for batch in loader: optimizer.zero_grad() out = model(batch.x, batch.edge_index) loss = criterion(out[:batch.batch_size], batch.y[:batch.batch_size]) loss.backward() optimizer.step() total_loss += loss.item() print(f"Epoch {epoch+1}, Loss: {total_loss:.4f}") ``` #### Applications of Mini-GCN Mini-GCN finds applications in various domains where large-scale graph data exists, such as social network analysis, recommendation systems, and bioinformatics. For instance, in social networks, Mini-GCN can be employed to predict user interactions or classify community structures more efficiently compared to full-batch methods. Similarly, in recommendation systems, it helps infer missing links between users and items based on their interaction patterns represented as graphs.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值