This document summarizes recent developments in action recognition using deep learning techniques. It discusses early approaches using improved dense trajectories and two-stream convolutional neural networks. It then focuses on advances using 3D convolutional networks, enabled by large video datasets like Kinetics. State-of-the-art results are achieved using inflated 3D convolutional networks and temporal aggregation methods like temporal linear encoding. The document provides an overview of popular datasets and challenges and concludes with tips on training models at scale.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
SAM is a new segmentation model that can segment objects in images using natural language prompts. It was trained on over 1,100 datasets totaling over 10,000 images using a model-in-the-loop approach. SAM uses a transformer-based architecture with encoders for images, text, bounding boxes and masks. It achieves state-of-the-art zero-shot segmentation performance without any fine-tuning on target datasets.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
SAM is a new segmentation model that can segment objects in images using natural language prompts. It was trained on over 1,100 datasets totaling over 10,000 images using a model-in-the-loop approach. SAM uses a transformer-based architecture with encoders for images, text, bounding boxes and masks. It achieves state-of-the-art zero-shot segmentation performance without any fine-tuning on target datasets.
2012年12月1日の"System Center User Group Japan 第5回勉強会 (http://atnd.org/events/33503)"で使用した資料です。
Hyper-V 2012 におけるNUMAの扱いやCSV (Cluster Shared Volumes) の改良点を説明したものです。
15. 15
Follow us on Twitter! @NVIDIAAIJP ハッシュタグ: #GDLCJP
キャスト可否判定グラフの例
Placeholder
Mul
Reciprocal
GradFilter
MatMul
Placeholder
GradInput
ReluGrad
LossGrad
MatMul
Conv2d
Relu
Add
Loss
MatMul
VariableV2
Mul
VariableV2
Mul
VariableV2
16. 16
Follow us on Twitter! @NVIDIAAIJP ハッシュタグ: #GDLCJP
キャスト可否判定グラフの例
ステップ 1: 演算の色を初期化
Placeholder
Mul
Reciprocal
GradFilter
MatMul
Placeholder
GradInput
ReluGrad
LossGrad
MatMul
Conv2d
Relu
Add
Loss
MatMul
VariableV2
Mul
VariableV2
Mul
VariableV2
17. 17
Follow us on Twitter! @NVIDIAAIJP ハッシュタグ: #GDLCJP
キャスト可否判定グラフの例
ステップ 2: "Never" の伝播
Placeholder
Mul
Reciprocal
GradFilter
MatMul
Placeholder
GradInput
ReluGrad
LossGrad
MatMul
Conv2d
Relu
Add
Loss
MatMul
VariableV2
Mul
VariableV2
Mul
VariableV2
18. 18
Follow us on Twitter! @NVIDIAAIJP ハッシュタグ: #GDLCJP
キャスト可否判定グラフの例
ステップ 3: "Always" に挟まれた "Maybe"
Placeholder
Mul
Reciprocal
GradFilter
MatMul
Placeholder
GradInput
ReluGrad
LossGrad
MatMul
Conv2d
Relu
Add
Loss
MatMul
VariableV2
Mul
VariableV2
Mul
VariableV2
19. 19
Follow us on Twitter! @NVIDIAAIJP ハッシュタグ: #GDLCJP
キャスト可否判定グラフの例
ステップ 4: "Always" の境界検出
Placeholder
Mul
Reciprocal
GradFilter
MatMul
Placeholder
GradInput
ReluGrad
LossGrad
MatMul
Conv2d
Relu
Add
Loss
MatMul
VariableV2
Mul
VariableV2
Mul
VariableV2
20. 20
Follow us on Twitter! @NVIDIAAIJP ハッシュタグ: #GDLCJP
キャスト可否判定グラフの例
ステップ 5: キャストの挿入
FP16 Cast
Mul
Reciprocal
GradFilter
MatMul
Placeholder
GradInput
ReluGrad
LossGrad
MatMul
Conv2d
Relu
Add
Loss
MatMul
VariableV2
Mul
VariableV2
Mul
VariableV2
Placeholder
FP16 Cast
FP16 Cast
FP32 Cast FP16 Cast
FP32 Cast
FP32 Cast
21. 21
Follow us on Twitter! @NVIDIAAIJP ハッシュタグ: #GDLCJP
AMP
GENERAL PURPOSE
22. 22
Follow us on Twitter! @NVIDIAAIJP ハッシュタグ: #GDLCJP
混合精度演算は汎用的
23. 23
Follow us on Twitter! @NVIDIAAIJP ハッシュタグ: #GDLCJP
混合精度演算による高速化
画像分類に限らず様々なタスクで有効
24. 24
Follow us on Twitter! @NVIDIAAIJP ハッシュタグ: #GDLCJP
AMP
SCHEDULE & USAGE
25. 25
Follow us on Twitter! @NVIDIAAIJP ハッシュタグ: #GDLCJP
自動混合精度演算の有効化
わずか数行の追加で最大 3 倍の高速化
More details: https://developer.nvidia.com/automatic-mixed-precision
TensorFlow
PyTorch
MXNet
os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '1'
amp.init()
amp.init_trainer(trainer)
with amp.scale_loss(loss, trainer) as scaled_loss:
autograd.backward(scaled_loss)
model, optimizer = amp.initialize(model, optimizer)
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
OR
export TF_ENABLE_AUTO_MIXED_PRECISION=1
GA Available Since Q2 2018
GA Coming Soon
GA GTC 19
26. 26
Follow us on Twitter! @NVIDIAAIJP ハッシュタグ: #GDLCJP
CHAINER の MIXED PRECISION 対応
https://github.com/chainer/chainer/pull/6337https://github.com/chainer/chainer/pull/6337
27. 27
Follow us on Twitter! @NVIDIAAIJP ハッシュタグ: #GDLCJP
https://twitter.com/melleo1978/status/1110203991764262913
28. 28
Follow us on Twitter! @NVIDIAAIJP ハッシュタグ: #GDLCJP
NGC の TensorFlow イメージ
19.03 以降は Automatic Mixed Precision 対応
https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow
29. 29
Follow us on Twitter! @NVIDIAAIJP ハッシュタグ: #GDLCJP
ngc.nvidia.com