IntelMKL.dnn_dnn实例资源-CSDN下载

共455个文件

cpp：163个

hpp：135个

h：41个

intel

MKL库

5星 · 超过95%的资源需积分: 50 104 浏览量 2018-08-02 10:28:17 上传评论收藏 1.57MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

Intel MKL.dnn （455个子文件）

_clang-format 3KB

prepare_mkl.bat 2KB

bnorm_3d 85B

bnorm_googlenet_v2 4KB

bnorm_googlenet_v3 6KB

bnorm_lgmb 16B

bnorm_resnet_50 3KB

bnorm_topo 78B

simple_training_net.c 39KB

simple_net.c 18KB

api.c 13KB

simple_convolution.c 9KB

only_convolution.c 6KB

only_pooling.c 5KB

gtest.cc 191KB

gtest-death-test.cc 50KB

gtest-port.cc 42KB

gtest-filepath.cc 14KB

gtest-printers.cc 12KB

gtest-test-part.cc 4KB

gtest-typed-test.cc 4KB

gtest-all.cc 2KB

gtest_main.cc 2KB

MKL.cmake 7KB

platform.cmake 5KB

SDL.cmake 3KB

OpenMP.cmake 3KB

Doxygen.cmake 2KB

profiling.cmake 1KB

conv_3d_unet 579B

conv_a3c 144B

conv_alexnet 269B

conv_all 163B

conv_all_topo 438B

conv_densnet 3KB

conv_dilated 728B

conv_dilated_rfcn 82B

conv_fastrcnn 89B

conv_fastrcnn_p1 4KB

conv_fastrcnn_p2 2KB

conv_fastrcnn_p3 7KB

conv_googlenet_v1 4KB

conv_googlenet_v2 6KB

conv_googlenet_v3 9KB

conv_maskrcnn 64B

conv_maskrcnn_p1 10KB

conv_maskrcnn_p2 669B

conv_mobilenet 5KB

conv_mobilenet_dw 684B

conv_regression_padding 470B

conv_regression_small_spatial 1KB

conv_resnet_50 4KB

conv_resnet_50_sparse 5KB

conv_segnet 348B

conv_ssd_300_voc0712 2KB

conv_tails 901B

conv_unet 986B

conv_vgg_11 460B

conv_vgg_19 860B

conv_xception 2KB

conv_yolov2 16KB

jit_avx512_common_conv_kernel.cpp 156KB

jit_avx2_gemm_f32.cpp 96KB

jit_avx512_core_conv_winograd_kernel_f32.cpp 93KB

jit_avx512_common_convolution_winograd.cpp 91KB

test_pooling_forward.cpp 82KB

jit_avx512_common_gemm_f32.cpp 76KB

jit_avx512_common_convolution.cpp 62KB

jit_avx512_common_conv_winograd_kernel_f32.cpp 59KB

ref_rnn.cpp 49KB

jit_avx512_common_1x1_conv_kernel.cpp 48KB

jit_uni_batch_normalization.cpp 46KB

jit_uni_lrn_kernel_f32.cpp 43KB

test_pooling_backward.cpp 42KB

ref_rnn.cpp 40KB

jit_avx512_core_convolution_winograd.cpp 40KB

jit_avx512_core_u8s8s32x_wino_convolution.cpp 39KB

test_lrn_backward.cpp 37KB

jit_avx2_conv_kernel_f32.cpp 36KB

jit_uni_eltwise.cpp 35KB

jit_transpose_src_utils.cpp 34KB

simple_net.cpp 33KB

jit_avx512_common_1x1_convolution.cpp 31KB

jit_avx512_common_lrn.cpp 29KB

jit_avx512_core_u8s8s32x_1x1_conv_kernel.cpp 26KB

test_batch_normalization.cpp 25KB

ref_wino.cpp 25KB

simple_rnn.cpp 24KB

jit_sse42_1x1_conv_kernel_f32.cpp 24KB

bnorm.cpp 24KB

jit_uni_reorder.cpp 23KB

jit_avx2_1x1_conv_kernel_f32.cpp 23KB

jit_uni_dw_conv_kernel_f32.cpp 23KB

test_lrn_forward.cpp 22KB

conv.cpp 22KB

rnn.cpp 22KB

jit_uni_pool_kernel_f32.cpp 21KB

memory_desc_wrapper.cpp 21KB

cpu_reorder.cpp 21KB

共 455 条

# benchdnn **benchdnn** is a standalone correctness and performance benchmark for [Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN)](/intel/mkl-dnn) library. The purpose of the benchmark is extended and robust correctness verification of the primitives provided by MKL-DNN. So far **benchdnn** supports convolutions and inner products of different data types. It also implicitly tests reorders. ## License **benchdnn** is licensed under [Apache License Version 2.0](http://www.apache.org/licenses/LICENSE-2.0). ## Usage (main driver) **benchdnn** itself is a driver for different implementation specific harnesses. So far it has harness for Intel MKL-DNN convolution, inner product, reorder, batch normalization, and harness for testing itself. The usage: ``` $ ./benchdnn: [--HARNESS] [--mode=MODE] [-vN|--verbose=N] HARNESS-OPTS ``` where: - `HARNESS` is either `conv` [default], `ip`, `reorder`, `bnorm`, `rnn` or `self` - `MODE` -- string that contains flags for benchmark mode. Use `C` or `c` for correctness (used by default), and `P` or `p` for performance - `N` -- verbose level (integer from 0 [default] to ...) - `HARNESS-OPTS` are passed to the chosen harness Returns `0` on success (all tests passed), and non-zero in case of any error happened. ## Usage (convolution harness) The usage: ``` [harness-knobs] [conv-desc] ... ``` where *harness-knobs* are: - `--cfg={f32, u8s8u8s32, ...}` configuration (see below), default `f32` - `--dir={FWD_D (forward data), FWD_B (forward data + bias), BWD_D (backward data), BWD_W (backward weights), BWD_WB (backward weights + bias)}` direction, default `FWD_B` - `--alg={DIRECT, WINO}` convolution algorithm, default DIRECT - `--merge={NONE, RELU}` merged primitive, default NONE (nothing merged) - `--attr="attr_str"` convolution attributes (see in the section below), default `""` (no attributes set) - `--mb=N` override minibatch that is specified in convolution description, default `0` (use mb specified in conv desc) - `--match=regex` check only convolutions that match with regex, default is `".*"`. Notice: Windows may only interpret string arguments surrounded by double quotation marks. - `--skip-impl="str1[:str2]..."` skip implementation (see mkldnn_query_impl_info_str), default `""` - `--allow-unimpl=true|false` do not treat unimplemented configuration as an error, default `false` - `--perf-template=template-str` set template for performance report (see section *Performance measurements*) - `--reset` reset all the parameters set before to default one - `-vN|--verbose=N` verbose level, default `0` - `--batch=file` use options from the given file (see in subdirectory) and *conv-desc* is convolution description. The canonical form is: ``` gXmbXicXihXiwXocXohXowXkhXkwXshXswXphXpwXdhXdwXnS ``` Here X is a number and S is string (n stands for name). Some of the parameters might be omitted if there is either default one (e.g. if g is not specified **benchdnn** uses 1) or if the can be computed automatically (e.g. output shape can be derived from the input one and kernel). Also if either width or height is not specified than it is assumed height == width. Special symbol `_` is ignored, hence maybe used as delimiter. See `str2desc()` in conv/conv_aux.cpp for more details and implicit rules :^) The attribute string *attr_str* is defined as (new lines for readability): ``` [irmode={nearest,down};] [oscale={none,common,per_oc}[:scale];] [post_ops='[{relu,sum[:sum_scale]};]...';] ``` Here `irmode` defines the rounding mode for integer output (default is nearest). Next, `oscale` stands for output_scales. The first parameter is the policy that is defined below. The second optional parameter is a scale that specifies either the one common output scale (for `none` and `common` polices) or a starting point for `per_oc` policy, which uses many scales. The default scale is 1.0. Known policies are: - `none` (default) means no output scales set (i.e. scale = 1.) - `common` corresponds to `mask=0` with common scale factor - `per_oc` corresponds to `mask=1<<1` (i.e. output channels) with different scale factors Next, `post_ops` stands for post operation sequence. Currently supported post ops are: - `relu` with no parameters (i.e. corresponding scale is 1., alg = eltwise_relu, alpha = beta = 0.) - `sum` with optional parameter scale (default 1.) ### convolution configurations (aka precision specification) `--cfg` option specifies what convolution would be used in terms of data type. Also it defines all the magic with data filling inside. For integer type saturation is implicitly implied. Finally configuration defines threshold for computation errors (ideally we want keep it 0 and it seems to work for now). The table below shows cases supported by Intel MKL-DNN and corresponding configurations for **benchdnn**: |src type | wei type | dst type | acc type | cfg | notes |:--- |:--- |:--- |:--- |:--- |:--- | f32 | f32 | f32 | f32 | f32 | inference optimized for sse4.2+, training avx2+ | s16 | s16 | s32 | s32 | s16s16s32s32 | optimized for processors with support of 4vnni, forward pass only (aka FWD_D, FWD_B) | s32 | s16 | s16 | s32 | s32s16s16s32 | optimized for processors with support of 4vnni, backward wrt data only (aka BWD_D) | s16 | s32 | s16 | s32 | s16s32s16s32 | optimized for processors with support of 4vnni, backward wrt weights (aka BWD_W, BWD_WB) | u8 | s8 | f32 | s32 | u8s8f32s32 | optimized for processors with support of avx512vl, forward pass only (aka FWD_D, FWD_B) | u8 | s8 | s32 | s32 | u8s8s32s32 | same notes as for u8s8s32s32 | u8 | s8 | s8 | s32 | u8s8s8s32 | same notes as for u8s8s32s32 | u8 | s8 | u8 | s32 | u8s8u8s32 | same notes as for u8s8s32s32 ## Performance measurements **benchdnn** supports custom performance report. Template is passed via command line and consists of terminal and nonterminal symbols. Nonterminal symbols are printed as is. Description of terminal symbols is given below. There is also a notion of modifiers (marked as @) that change meaning of terminal symbols, e.g. sign '-' means minimum of (in terms of time). See table of modifiers below. > **caution:** threads have to be pinned in order to get consistent frequency | abbreviation | description |:------------ |:----------- | %d | problem descriptor | %D | expanded problem descriptor (conv parameters in csv format) | %n | problem name | %z | direction | %@F | effective cpu frequency computed as clocks[@] / time[@] | %O | number of ops required (padding is not taken into account) | %@t | time in ms | %@c | time in clocks | %@p | ops per second | modifier | description |:-------- |:----------- | | default | - | min (time) -- default | 0 | avg (time) | + | max (time) | | | K | Kilo (1e3) | M | Mega (1e6) | G | Giga (1e9) The definition of expanded problem descriptor is: `g,mb,ic,ih,iw,oc,oh,ow,kh,kw,sh,sw,ph,pw`. The default template can be found in conv/bench_conv.cpp that is defined as `perf,%n,%d,%GO,%GF,%-t,%-Gp,%0t,%0Gp`. That will produce the following output in CSV format: ``` string: perf convolution name full conv-desc number of giga ops calculated effective cpu frequency in GHz (amb clocks[min] / time[min]) minimum time spent in ms best gigaops (since it corresponds to mimimum time) average time spent in ms average gigaops (since it corresponds to average time) ``` ## Examples Run the set of f32 forward convolutions from inputs/conv_all file w/ bias and default minibatch: ``` $ ./benchdnn --conv \ --cfg=f32 --dir=FWD_B --batch=inputs/conv_all ``` Run t

评论收藏

内容反馈