The document discusses using temporal shift modules (TSM) for efficient video recognition, where TSM enables temporal modeling in 2D CNNs with no additional computation cost; TSM models achieve better performance than 3D CNNs and previous methods while using less computation, and can be used for applications like online video understanding, low-latency deployment on edge devices, and large-scale distributed training on supercomputers.