This document presents a method for high-throughput convolutional neural network (CNN) inference on an FPGA using customized JPEG compression. It decomposes convolutions using channel shift and pointwise operations, employs binary weight quantization, and uses a fully pipelined architecture. Experimental results show the proposed JPEG compression achieves an 82x speedup with 0.3% accuracy drop. When implemented on an FPGA, the CNN achieves 3,321 frames per second at 75 watts, providing over 100x and 10x speedups over CPU and GPU respectively.