TensorFlow User Group ハード部 #2 (2017年4月21日)
https://tfug-tokyo.connpass.com/event/54426/
の発表資料です
TensorFlow r1.0(r1.1)で公開されたXLAのソースコードを追ってみました
This document summarizes recent developments in action recognition using deep learning techniques. It discusses early approaches using improved dense trajectories and two-stream convolutional neural networks. It then focuses on advances using 3D convolutional networks, enabled by large video datasets like Kinetics. State-of-the-art results are achieved using inflated 3D convolutional networks and temporal aggregation methods like temporal linear encoding. The document provides an overview of popular datasets and challenges and concludes with tips on training models at scale.
The detailed results are described at GitHub (in English):
https://github.com/jkatsuta/exp-18-1q
(maddpg/experiments/my_notes/のexp1 ~ exp6)
立教大学のセミナー資料(前篇)です。
資料後篇:
https://www.slideshare.net/JunichiroKatsuta/ss-108099542
ブログ(動画あり):
https://recruit.gmo.jp/engineer/jisedai/blog/multi-agent-reinforcement-learning/
TensorFlow User Group ハード部 #2 (2017年4月21日)
https://tfug-tokyo.connpass.com/event/54426/
の発表資料です
TensorFlow r1.0(r1.1)で公開されたXLAのソースコードを追ってみました
This document summarizes recent developments in action recognition using deep learning techniques. It discusses early approaches using improved dense trajectories and two-stream convolutional neural networks. It then focuses on advances using 3D convolutional networks, enabled by large video datasets like Kinetics. State-of-the-art results are achieved using inflated 3D convolutional networks and temporal aggregation methods like temporal linear encoding. The document provides an overview of popular datasets and challenges and concludes with tips on training models at scale.
The detailed results are described at GitHub (in English):
https://github.com/jkatsuta/exp-18-1q
(maddpg/experiments/my_notes/のexp1 ~ exp6)
立教大学のセミナー資料(前篇)です。
資料後篇:
https://www.slideshare.net/JunichiroKatsuta/ss-108099542
ブログ(動画あり):
https://recruit.gmo.jp/engineer/jisedai/blog/multi-agent-reinforcement-learning/
26. 26
開発環境に NSIGHT SYSTEMS がインストールされていない場合
Setting Up and Using Nsight Systems Inside Containers
CUDA 11.4: install
CUDA 11.3: install
CUDA 11.2: install
Mapping an Nsight Systems Host Installation into a Container
NSIGHT SYTEMS
$ apt-get update –y
$ apt-get install -y cuda-nsight-systems-11-3 nsight-systems-2021.1.3
$ apt-get update –y
$ apt-get install -y cuda-nsight-systems-11-2 nsight-systems-2020.4.3
$ docker run --rm -it --network=host --gpus=all -v /opt/nvidia/nsight-systems/2021.1.3:/opt/nvidia/nsight-systems/2021.1.3
nvcr.io/nvidia/pytorch:21.08-py3 bash
$ apt-get update –y
$ apt-get install -y cuda-nsight-systems-11-4 nsight-systems-2021.2.4
27. 27
NSIGHT SYSTEMS を使うには?
Example
cuda – GPU kernel
osrt – OS runtime
nvtx – NVIDIA Tools Extension
cublas – CUDA BLAS library
https://docs.nvidia.com/nsight-systems/2020.3/profiling/index.html#cli-options
NSIGHT SYTEMS
$ nsys profile -t nvtx,cuda,osrt,cublas
--stats=true
-f true
-o pusch_result
python main.py
APIs to be traced
Outputs profiling information similar to nvprof
Overwrite the output
Output filename
28. 28
NSIGHT SYSTEMS を使うには?
Example
cuda – GPU kernel
osrt – OS runtime
nvtx – NVIDIA Tools Extension
cublas – CUDA BLAS library
https://docs.nvidia.com/nsight-systems/2020.3/profiling/index.html#cli-options
NSIGHT SYTEMS
$ nsys profile -t nvtx,cuda,osrt,cublas
--stats=true
-f true
-o pusch_result
python main.py
APIs to be traced
Outputs profiling information similar to nvprof
Overwrite the output
Output filename
Other Userful Options
• --delay (-y) : Collection start delay in seconds
• --duration(-d): Collection duration in seconds.
• --capture-range(-c): none/cudaProfilerApi/nvtx
etc..
29. 29
例: Nsight Systems + NVTX
Nsight Systems プロファイル結果(NVTX あり)
前処理
11.07sec 推論処理(10iteration) 28.924sec
1iteration
アノテーションする事で
タイムライン上で処理を把握しやすくなる!
30. 30
Appendix. 技術ブログ・関連セッション
Deep Learning Examples
• https://github.com/NVIDIA/DeepLearningExamples/
How to Run NGC Deep Learning Containers with Singularity
• https://developer.nvidia.com/blog/how-to-run-ngc-deep-learning-containers-with-singularity/
Profiling and Optimizing Deep Neural Networks with DLProf and PyProf (TensorFlow)
• https://developer.nvidia.com/blog/profiling-and-optimizing-deep-neural-networks-with-dlprof-and-pyprof/
Deep Learning Performance Optimization with Profiling Tools
• https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31228/
Profiling and Optimizing Deep Neural Networks with DLProf and PyProf
• https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31341/
PyTorch Performance Tuning Guide
• https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31831/
NVIDIA プロファイラを用いた Pytorch 学習最適化手法のご紹介