R4 - Cascade Residual Learning
R4 - Cascade Residual Learning
Short Summary
The authors of this paper propose a novel cascading CNN architecture containing two stages to address
the challenge of generating high-quality disparities for ill-posed regions in a stereo matching task. This
network may be referred to as Cascade Residual Learning (CRL).
The first stage is essentially an up-convolution module that produces fine-grained disparities. It builds
upon a previous work DispNetC, a CNN which possesses an hour-glass structure with skip connections
and a correlation layer at the end. The authors modify this network by including extra deconvolution
layers to magnify disparity yielding estimates that are at the same size of the input image. The first stage
is called “DispFulNet”. The input into this layer is a stereo (left and right) pair of images and outputs an
initial disparity as well as a synthesized version of the left image with that disparity. The error between
the original left image and the synthesized version are also fed into the second layer.
In the second stage, the disparity is corrected using residual signals at multiple scales. The output of this
stage is a residual signal r 2 which is used the generate the new residual by taking the sum of the initial
disparity and the residual. This allows the network to focus on learning the residual instead of trying to
learn the disparity directly (experimentally shown to improve performance). This stage also has an hour-
glass structure and produces residuals across multiple scales. The second stage is called “DispResNet”.
The method ultimately achieves state-of-the-art on the KITTI 2015 stereo dataset (as of August, 2017)
and takes 0.47 seconds with a Nvidia GTX 1080 to obtain a disparity image.
Main Contributions
Proposed Cascade Residual Learning (two stage approach) for estimating disparity
o Proposed first stage DispFulNet which improves upon DispNetC
o Proposed second stage DispResNet to learn residuals and experimentally show that this
boosts performance
Achieved state-of-the-art on KITTI 2015 stereo dataset at time of publishing (August, 2017)