The document presents an approach for accelerating convolutional neural networks (CNNs) using a coarse-grained reconfigurable array (CGRA) called EMAX. EMAX features processing elements with local memory to improve data locality and memory bandwidth utilization. CNN computations like convolutions are mapped to EMAX by assigning weight matrices to constant registers and performing numerous small matrix multiplications in parallel. Evaluation shows EMAX achieves better performance per memory bandwidth and area than GPUs for CNN workloads due to its optimization for small matrix operations.
Related topics: