SlideShare a Scribd company logo
Tensor Comprehensions
って、何?
2018/03/11
ブログ (2007年~) : Vengineerの戯言
 http://blogs.yahoo.co.jp/verification_engineer
SlideShare :
 https://www.slideshare.net/ssuser479fa3
Twitter (2009年~) :
@Vengineer
ソースコード解析職人
Announcing Tensor Comprehensions
February 14, 2018
https://research.fb.com/announcing-tensor-comprehensions/
Facebook AI Research
カスタムレイヤーを書くための道具。
カスタムレイヤーを書ける人が限られているので、
それを普通の人でもある程度の性能が出すためのツール。
なので、PyTorchやCaffe2だけでなく、
他のMLフレームワークでも利用可能、ということになっている。
現時点でのターゲットは、CUDA のみ。
現在のバージョンは、v0.1.1
Tensor Comprehensionsって、何?
https://research.fb.com/announcing-tensor-comprehensions/
import tensor_comprehensions as tc
import torch
lang = """
def matmul(float(M,N) A, float(N,K) B) -> (output) {
output(i, j) +=! A(i, kk) * B(kk, j)
}
"""
matmul = tc.define(lang, name="matmul")
mat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda()
out = matmul(mat1, mat2)
始めてみよう!
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/getting_started.html
def matmul(float(M,N) A, float(N,K) B) -> (output) {
output(i, j) +=! A(i, kk) * B(kk, j)
}
=>
for(int i = 0; i < M; i++) {
for(int j = 0; j < K; j++) {
output(i,j) = 0.0f;
for(int kk = 0; kk < N; kk++) {
output(i,j) += A(i,kk) * B(kk,j);
}
}
}
記法例 と 等価コード
def conv(float(B,IP,H,W) input, float(OP,IP,KH,KW) weight)
-> (output) {
output(b, op, h, w) += input(b, ip, h + kh, w + kw)
* weight(op, ip, kh, kw)
}
Simple 2-D convolution (no stride, no padding)
https://facebookresearch.github.io/TensorComprehensions/introduction.html
def maxpool2x2(float(B,C,H,W) input)
-> (output) {
output(b,c,i,j) max= input(b,c,2*i + kw, 2*j + kh)
where kw in 0:2, kh in 0:2
}
Simple 2D max pooling
https://facebookresearch.github.io/TensorComprehensions/introduction.html
Pooling Layers (Average pooling / Max pooling)
Convolution layers (Simple Convolution / Strided Convolution / Strided Convolution
Gradient / Simple Group Convolution / Group Convolution Strided)
Linear layers (Fully Connected layer)
Non-Linear layers (ReLU / Sigmoid / Softmax / Tanh / Cosine)
Math Operations (TensorDot / Matmul / Matmul Gradient / Batch Matmul / Absolute / Add /
Indexing / Lookup Table / Transpose / Concat / Cast / Copy / Scale)
Fused layers (FCRelu / Small MobileNet)
Normalization layers (Batch Normalization / Layer Normalization)
Distance Functions (Cosine Similarity)
レイヤーデータベース
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/layers_database.html
遺伝的アルゴリズムによる自動最適化
パラメータ:
 Number of generations: The number of tuning generation to be run.
 Population size: The number of candidates in each generation.
 Number of elites: The number of best candidates that are preserved intact
           between generations (without any mutations).
 Crossover rate: The rate at which new candidates are bred instead of just surviving across generations.
 Mutation rate: The rate at which candidate options are randomly changed (mutated).
 Number of threads: The number of threads that are used to compile different candidates in parallel.
 GPUs: A comma separated list of GPUs (ids) to use for evaluating candidates (e.g., “0,1,2,3”).
     RNG state: The state used to seed the tuner’s RNG.
 Proto: A protobuf filename to (re)store compilation results
     and profiling information of the candidate solutions.
 min_launch_total_threads: Prune out kernels mapped to fewer than this many threads and block.
                Set this to 1 to avoid pruning.
オートチューナー
https://facebookresearch.github.io/TensorComprehensions/autotuner.html
Tensor Comprehensions in PyTorch
Mar 5, 2018
http://pytorch.org/2018/03/05/tensor-comprehensions.html
PYTORCH
 1). Define your TC language and pass it to tc.define
 2). Create input torch tensors
 3). Run the layer and get output
import tensor_comprehensions as tc
import torch
MATMUL_LANG = """
def matmul(float(M,N) A, float(N,K) B) -> (output) { <= 1)
output(i, j) +=! A(i, kk) * B(kk, j)
}
"""
# the `name` should match the definition name in the `lang`
matmul = tc.define(MATMUL_LANG, name="matmul") <= 1)
mat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda() <= 2)
out = matmul(mat1, mat2) <= 3)
TCを使って、PyTorchのレイヤーを書くには!
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
tensor_comprehensions.define(lang, **kwargs_define)
パラメータ:
lang (string, required)
name (string, required)
training (bool)
backward (string, optional)
constants (dict,  optional)
inject_kernel (string, optional)
cuda_code (string, optional)
戻り値:
  TC layer that you can run by passing the tensors.
レイヤーの定義
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
class tensor_comprehensions.TcUnit(lang, **kwargs_define)
__call__(*inputs, **kwargs)
パラメータ:
*inputs (required)
options (optional)
outputs (optional)
cache (string, optional)
grid (int, 3D list)
block (int, 3D list)
reorder_function (optional)
戻り値:
List of PyTorch tensors/Variables which is the output of running TC layer.
レイヤーの実行
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
import tensor_comprehensions as tc
import torch
lang = """
def add(float(N) A, float(N) B) -> (output) {
output(i) = A(i) + B(i) + 1
}
"""
add = tc.define(lang, name="add")
a, b = torch.randn(100).cuda(), torch.randn(100).cuda()
out = add(a, b, grid=[1, 1, 1], block=[100, 1, 1])
通常は、TCコードを指定する
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
cuda_code = """
extern "C"{
__global__ void my_add(float* __restrict__ output, const float*
__restrict__ A, const float* __restrict B)
{
int t = threadIdx.x;
output[t] = A[t] + B[t];
}
}
"""
add = tc.define(lang, name="add",
inject_kernel="my_add", cuda_code=cuda_code)
オプションで、CUDAコードを指定できる
https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
ブログ (2007年~) : Vengineerの戯言
 http://blogs.yahoo.co.jp/verification_engineer
SlideShare :
 https://www.slideshare.net/ssuser479fa3
ありがとうございました
Twitter (2009年~) :
@Vengineer
ソースコード解析職人

More Related Content

What's hot (20)

PDF
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Mr. Vengineer
 
PDF
Facebook Glow Compiler のソースコードをグダグダ語る会
Mr. Vengineer
 
PDF
Антон Бикинеев, Reflection in C++Next
Sergey Platonov
 
PDF
TVM VTA (TSIM)
Mr. Vengineer
 
PDF
Антон Бикинеев, Writing good std::future&lt; C++ >
Sergey Platonov
 
PDF
Fuzzing: The New Unit Testing
Dmitry Vyukov
 
PDF
C++20 the small things - Timur Doumler
corehard_by
 
PDF
C++ idioms by example (Nov 2008)
Olve Maudal
 
PPTX
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
Sergey Platonov
 
PDF
C++ How I learned to stop worrying and love metaprogramming
cppfrug
 
PPTX
Дмитрий Нестерук, Паттерны проектирования в XXI веке
Sergey Platonov
 
PDF
Joel Falcou, Boost.SIMD
Sergey Platonov
 
PPTX
Дмитрий Демчук. Кроссплатформенный краш-репорт
Sergey Platonov
 
PDF
RAII and ScopeGuard
Andrey Dankevich
 
PDF
Kirk Shoop, Reactive programming in C++
Sergey Platonov
 
PDF
2018 cosup-delete unused python code safely - english
Jen Yee Hong
 
PDF
Clang tidy
Yury Yafimachau
 
PDF
Basic c++ 11/14 for python programmers
Jen Yee Hong
 
PDF
TDD in C - Recently Used List Kata
Olve Maudal
 
PDF
Boost.Python - domesticating the snake
Sławomir Zborowski
 
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Mr. Vengineer
 
Facebook Glow Compiler のソースコードをグダグダ語る会
Mr. Vengineer
 
Антон Бикинеев, Reflection in C++Next
Sergey Platonov
 
TVM VTA (TSIM)
Mr. Vengineer
 
Антон Бикинеев, Writing good std::future&lt; C++ >
Sergey Platonov
 
Fuzzing: The New Unit Testing
Dmitry Vyukov
 
C++20 the small things - Timur Doumler
corehard_by
 
C++ idioms by example (Nov 2008)
Olve Maudal
 
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
Sergey Platonov
 
C++ How I learned to stop worrying and love metaprogramming
cppfrug
 
Дмитрий Нестерук, Паттерны проектирования в XXI веке
Sergey Platonov
 
Joel Falcou, Boost.SIMD
Sergey Platonov
 
Дмитрий Демчук. Кроссплатформенный краш-репорт
Sergey Platonov
 
RAII and ScopeGuard
Andrey Dankevich
 
Kirk Shoop, Reactive programming in C++
Sergey Platonov
 
2018 cosup-delete unused python code safely - english
Jen Yee Hong
 
Clang tidy
Yury Yafimachau
 
Basic c++ 11/14 for python programmers
Jen Yee Hong
 
TDD in C - Recently Used List Kata
Olve Maudal
 
Boost.Python - domesticating the snake
Sławomir Zborowski
 

Similar to Tensor comprehensions (20)

PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
Databricks
 
PDF
Keras and TensorFlow
NopphawanTamkuan
 
PDF
Metaprogramming
Mehmet Emin İNAÇ
 
PPTX
Bring your neural networks to the browser with TF.js - Simone Scardapane
MeetupDataScienceRoma
 
PDF
L Fu - Dao: a novel programming language for bioinformatics
Jan Aerts
 
PDF
Generics Past, Present and Future (Latest)
RichardWarburton
 
PDF
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Raffi Khatchadourian
 
PPTX
C programming language tutorial
javaTpoint s
 
PDF
SimpleArray between Python and C++
Yung-Yu Chen
 
PPTX
Deep Learning and TensorFlow
Oswald Campesato
 
PDF
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Chris Fregly
 
PPTX
Simone Scardapane - Bring your neural networks to the browser with TF.js! - C...
Codemotion
 
PPTX
Introduction to Machine Learning with TensorFlow
Paolo Tomeo
 
PPTX
Angular and Deep Learning
Oswald Campesato
 
PPTX
C_Progragramming_language_Tutorial_ppt_f.pptx
maaithilisaravanan
 
PDF
Unmanaged Parallelization via P/Invoke
Dmitri Nesteruk
 
PDF
TensorFlow for HPC?
inside-BigData.com
 
PDF
DotDotPwn v3.0 [GuadalajaraCON 2012]
Websec México
 
PDF
Guadalajara con 2012
Jaime Restrepo
 
PDF
Standardizing on a single N-dimensional array API for Python
Ralf Gommers
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
Databricks
 
Keras and TensorFlow
NopphawanTamkuan
 
Metaprogramming
Mehmet Emin İNAÇ
 
Bring your neural networks to the browser with TF.js - Simone Scardapane
MeetupDataScienceRoma
 
L Fu - Dao: a novel programming language for bioinformatics
Jan Aerts
 
Generics Past, Present and Future (Latest)
RichardWarburton
 
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Raffi Khatchadourian
 
C programming language tutorial
javaTpoint s
 
SimpleArray between Python and C++
Yung-Yu Chen
 
Deep Learning and TensorFlow
Oswald Campesato
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Chris Fregly
 
Simone Scardapane - Bring your neural networks to the browser with TF.js! - C...
Codemotion
 
Introduction to Machine Learning with TensorFlow
Paolo Tomeo
 
Angular and Deep Learning
Oswald Campesato
 
C_Progragramming_language_Tutorial_ppt_f.pptx
maaithilisaravanan
 
Unmanaged Parallelization via P/Invoke
Dmitri Nesteruk
 
TensorFlow for HPC?
inside-BigData.com
 
DotDotPwn v3.0 [GuadalajaraCON 2012]
Websec México
 
Guadalajara con 2012
Jaime Restrepo
 
Standardizing on a single N-dimensional array API for Python
Ralf Gommers
 
Ad

More from Mr. Vengineer (19)

PDF
XilinxのxsimでSoftware Driven Verification.pdf
Mr. Vengineer
 
PDF
VerilatorとSystemCでSoftware Driven Verification
Mr. Vengineer
 
PDF
VerilatorとSystemC
Mr. Vengineer
 
PDF
Cloud TPU Driver API ソースコード解析
Mr. Vengineer
 
PDF
Cloud Deep Learning Chips Training & Inference
Mr. Vengineer
 
PDF
TensorFlow Lite Delegateとは?
Mr. Vengineer
 
PDF
Pixel Visual Core device driver source code analysis
Mr. Vengineer
 
PDF
TensorFlow XLA 「XLAとは、から、最近の利用事例について」
Mr. Vengineer
 
PDF
Ultra96(UltraZed)実践勉強会
Mr. Vengineer
 
PDF
Tensorflow dynamically loadable XLA plugin ソースコード解析
Mr. Vengineer
 
PDF
「ディープラーニングでは、エコシステムが大切よ!」
Mr. Vengineer
 
PDF
TensorFlow XLA とハードウェア
Mr. Vengineer
 
PDF
2017年のFPGA Community活動について
Mr. Vengineer
 
PDF
Zynq VIPを利用したテストベンチ
Mr. Vengineer
 
PDF
TensorFlow XLAの可能性
Mr. Vengineer
 
PDF
AWS EC2 F1とXilinx SDAccel
Mr. Vengineer
 
PDF
Intel Nervana Graph とは?
Mr. Vengineer
 
PDF
DSPでディープラーニング
Mr. Vengineer
 
PDF
TensorFlow XLAは、 中で何をやっているのか?
Mr. Vengineer
 
XilinxのxsimでSoftware Driven Verification.pdf
Mr. Vengineer
 
VerilatorとSystemCでSoftware Driven Verification
Mr. Vengineer
 
VerilatorとSystemC
Mr. Vengineer
 
Cloud TPU Driver API ソースコード解析
Mr. Vengineer
 
Cloud Deep Learning Chips Training & Inference
Mr. Vengineer
 
TensorFlow Lite Delegateとは?
Mr. Vengineer
 
Pixel Visual Core device driver source code analysis
Mr. Vengineer
 
TensorFlow XLA 「XLAとは、から、最近の利用事例について」
Mr. Vengineer
 
Ultra96(UltraZed)実践勉強会
Mr. Vengineer
 
Tensorflow dynamically loadable XLA plugin ソースコード解析
Mr. Vengineer
 
「ディープラーニングでは、エコシステムが大切よ!」
Mr. Vengineer
 
TensorFlow XLA とハードウェア
Mr. Vengineer
 
2017年のFPGA Community活動について
Mr. Vengineer
 
Zynq VIPを利用したテストベンチ
Mr. Vengineer
 
TensorFlow XLAの可能性
Mr. Vengineer
 
AWS EC2 F1とXilinx SDAccel
Mr. Vengineer
 
Intel Nervana Graph とは?
Mr. Vengineer
 
DSPでディープラーニング
Mr. Vengineer
 
TensorFlow XLAは、 中で何をやっているのか?
Mr. Vengineer
 
Ad

Recently uploaded (20)

PPTX
Boolean Algebra-Properties and Theorems.pptx
bhavanavarri5458
 
PPTX
Normal distriutionvggggggggggggggggggg.pptx
JayeshTaneja4
 
PPTX
Boolean Algebra-Properties and Theorems.pptx
bhavanavarri5458
 
PPTX
basic_parts-of_computer-1618-754-622.pptx
patelravi16187
 
PPT
Susunan & Bagian DRAWING 153UWYHSGDGH.ppt
RezaFbriadi
 
PPTX
Save significantly on desk spaces and overheads with the KVM over IP software
AvexTender
 
PPTX
Basics of Memristors and fundamentals.pptx
onterusmail
 
PPT
community diagnosis slides show health. ppt
michaelbrucebwana
 
PPTX
Aryanbarot28.pptx Introduction of window os for the projects
aryanbarot004
 
PDF
HUAWEI MOBILE PHONE IMPORTED FROM CHINA TO THAILAND REPORT.pdf.pdf
youyou851038
 
PDF
Abbreviations in NC-ISM_syllabus.pdf hejsnsjs
raipureastha08
 
PPTX
Operating-Systems-A-Journey ( by information
parthbhanushali307
 
PPTX
atoma.pptxejejejejeejejjeejeejeju3u3u3u3
manthan912009
 
PPTX
Basics of Memristors from zero to hero.pptx
onterusmail
 
PPTX
PHISHING ATTACKS. _. _.pptx[]
kumarrana7525
 
DOCX
What Is Zoning Map Software and Why It Matters for Communities
riffatparveenseo
 
PPTX
G6Q1 WEEK 2 SCIENCE PPT.pptxLVLLLLLLLLLLLLLLLLL
DitaSIdnay
 
PPTX
办理HFM文凭|购买代特莫尔德音乐学院毕业证文凭100%复刻安全可靠的
1cz3lou8
 
PPTX
DOC-20250728-WAprocess releases large amounts of carbon dioxide (CO₂), sulfur...
samt56673
 
PPTX
PPT on the topic of programming language
dishasindhava
 
Boolean Algebra-Properties and Theorems.pptx
bhavanavarri5458
 
Normal distriutionvggggggggggggggggggg.pptx
JayeshTaneja4
 
Boolean Algebra-Properties and Theorems.pptx
bhavanavarri5458
 
basic_parts-of_computer-1618-754-622.pptx
patelravi16187
 
Susunan & Bagian DRAWING 153UWYHSGDGH.ppt
RezaFbriadi
 
Save significantly on desk spaces and overheads with the KVM over IP software
AvexTender
 
Basics of Memristors and fundamentals.pptx
onterusmail
 
community diagnosis slides show health. ppt
michaelbrucebwana
 
Aryanbarot28.pptx Introduction of window os for the projects
aryanbarot004
 
HUAWEI MOBILE PHONE IMPORTED FROM CHINA TO THAILAND REPORT.pdf.pdf
youyou851038
 
Abbreviations in NC-ISM_syllabus.pdf hejsnsjs
raipureastha08
 
Operating-Systems-A-Journey ( by information
parthbhanushali307
 
atoma.pptxejejejejeejejjeejeejeju3u3u3u3
manthan912009
 
Basics of Memristors from zero to hero.pptx
onterusmail
 
PHISHING ATTACKS. _. _.pptx[]
kumarrana7525
 
What Is Zoning Map Software and Why It Matters for Communities
riffatparveenseo
 
G6Q1 WEEK 2 SCIENCE PPT.pptxLVLLLLLLLLLLLLLLLLL
DitaSIdnay
 
办理HFM文凭|购买代特莫尔德音乐学院毕业证文凭100%复刻安全可靠的
1cz3lou8
 
DOC-20250728-WAprocess releases large amounts of carbon dioxide (CO₂), sulfur...
samt56673
 
PPT on the topic of programming language
dishasindhava
 

Tensor comprehensions

  • 2. ブログ (2007年~) : Vengineerの戯言  http://blogs.yahoo.co.jp/verification_engineer SlideShare :  https://www.slideshare.net/ssuser479fa3 Twitter (2009年~) : @Vengineer ソースコード解析職人
  • 3. Announcing Tensor Comprehensions February 14, 2018 https://research.fb.com/announcing-tensor-comprehensions/ Facebook AI Research
  • 6. import tensor_comprehensions as tc import torch lang = """ def matmul(float(M,N) A, float(N,K) B) -> (output) { output(i, j) +=! A(i, kk) * B(kk, j) } """ matmul = tc.define(lang, name="matmul") mat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda() out = matmul(mat1, mat2) 始めてみよう! https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/getting_started.html
  • 7. def matmul(float(M,N) A, float(N,K) B) -> (output) { output(i, j) +=! A(i, kk) * B(kk, j) } => for(int i = 0; i < M; i++) { for(int j = 0; j < K; j++) { output(i,j) = 0.0f; for(int kk = 0; kk < N; kk++) { output(i,j) += A(i,kk) * B(kk,j); } } } 記法例 と 等価コード
  • 8. def conv(float(B,IP,H,W) input, float(OP,IP,KH,KW) weight) -> (output) { output(b, op, h, w) += input(b, ip, h + kh, w + kw) * weight(op, ip, kh, kw) } Simple 2-D convolution (no stride, no padding) https://facebookresearch.github.io/TensorComprehensions/introduction.html
  • 9. def maxpool2x2(float(B,C,H,W) input) -> (output) { output(b,c,i,j) max= input(b,c,2*i + kw, 2*j + kh) where kw in 0:2, kh in 0:2 } Simple 2D max pooling https://facebookresearch.github.io/TensorComprehensions/introduction.html
  • 10. Pooling Layers (Average pooling / Max pooling) Convolution layers (Simple Convolution / Strided Convolution / Strided Convolution Gradient / Simple Group Convolution / Group Convolution Strided) Linear layers (Fully Connected layer) Non-Linear layers (ReLU / Sigmoid / Softmax / Tanh / Cosine) Math Operations (TensorDot / Matmul / Matmul Gradient / Batch Matmul / Absolute / Add / Indexing / Lookup Table / Transpose / Concat / Cast / Copy / Scale) Fused layers (FCRelu / Small MobileNet) Normalization layers (Batch Normalization / Layer Normalization) Distance Functions (Cosine Similarity) レイヤーデータベース https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/layers_database.html
  • 11. 遺伝的アルゴリズムによる自動最適化 パラメータ:  Number of generations: The number of tuning generation to be run.  Population size: The number of candidates in each generation.  Number of elites: The number of best candidates that are preserved intact            between generations (without any mutations).  Crossover rate: The rate at which new candidates are bred instead of just surviving across generations.  Mutation rate: The rate at which candidate options are randomly changed (mutated).  Number of threads: The number of threads that are used to compile different candidates in parallel.  GPUs: A comma separated list of GPUs (ids) to use for evaluating candidates (e.g., “0,1,2,3”).      RNG state: The state used to seed the tuner’s RNG.  Proto: A protobuf filename to (re)store compilation results      and profiling information of the candidate solutions.  min_launch_total_threads: Prune out kernels mapped to fewer than this many threads and block.                 Set this to 1 to avoid pruning. オートチューナー https://facebookresearch.github.io/TensorComprehensions/autotuner.html
  • 12. Tensor Comprehensions in PyTorch Mar 5, 2018 http://pytorch.org/2018/03/05/tensor-comprehensions.html PYTORCH
  • 13.  1). Define your TC language and pass it to tc.define  2). Create input torch tensors  3). Run the layer and get output import tensor_comprehensions as tc import torch MATMUL_LANG = """ def matmul(float(M,N) A, float(N,K) B) -> (output) { <= 1) output(i, j) +=! A(i, kk) * B(kk, j) } """ # the `name` should match the definition name in the `lang` matmul = tc.define(MATMUL_LANG, name="matmul") <= 1) mat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda() <= 2) out = matmul(mat1, mat2) <= 3) TCを使って、PyTorchのレイヤーを書くには! https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
  • 14. tensor_comprehensions.define(lang, **kwargs_define) パラメータ: lang (string, required) name (string, required) training (bool) backward (string, optional) constants (dict,  optional) inject_kernel (string, optional) cuda_code (string, optional) 戻り値:   TC layer that you can run by passing the tensors. レイヤーの定義 https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
  • 15. class tensor_comprehensions.TcUnit(lang, **kwargs_define) __call__(*inputs, **kwargs) パラメータ: *inputs (required) options (optional) outputs (optional) cache (string, optional) grid (int, 3D list) block (int, 3D list) reorder_function (optional) 戻り値: List of PyTorch tensors/Variables which is the output of running TC layer. レイヤーの実行 https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
  • 16. import tensor_comprehensions as tc import torch lang = """ def add(float(N) A, float(N) B) -> (output) { output(i) = A(i) + B(i) + 1 } """ add = tc.define(lang, name="add") a, b = torch.randn(100).cuda(), torch.randn(100).cuda() out = add(a, b, grid=[1, 1, 1], block=[100, 1, 1]) 通常は、TCコードを指定する https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
  • 17. cuda_code = """ extern "C"{ __global__ void my_add(float* __restrict__ output, const float* __restrict__ A, const float* __restrict B) { int t = threadIdx.x; output[t] = A[t] + B[t]; } } """ add = tc.define(lang, name="add", inject_kernel="my_add", cuda_code=cuda_code) オプションで、CUDAコードを指定できる https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/writing_layers.html
  • 18. ブログ (2007年~) : Vengineerの戯言  http://blogs.yahoo.co.jp/verification_engineer SlideShare :  https://www.slideshare.net/ssuser479fa3 ありがとうございました Twitter (2009年~) : @Vengineer ソースコード解析職人