De Haze Net
De Haze Net
Abstract—Single image haze removal is a challenging ill-posed and saturation-based [3]. In addition, methods using multiple
problem. Existing methods use various constraints/priors to get images or depth information have also been proposed. For
plausible dehazing solutions. The key to achieve haze removal is example, polarization based methods [4] remove the haze
to estimate a medium transmission map for an input hazy image. effect through multiple images taken with different degrees of
arXiv:1601.07661v2 [cs.CV] 17 May 2016
In this paper, we propose a trainable end-to-end system called polarization. In [5], multi-constraint based methods are applied
DehazeNet, for medium transmission estimation. DehazeNet takes
to multiple images capturing the same scene under different
a hazy image as input, and outputs its medium transmission map
that is subsequently used to recover a haze-free image via atmo- weather conditions. Depth-based methods [6] require some
spheric scattering model. DehazeNet adopts Convolutional Neural depth information from user inputs or known 3D models. In
Networks (CNN) based deep architecture, whose layers are practice, depth information or multiple hazy images are not
specially designed to embody the established assumptions/priors always available.
in image dehazing. Specifically, layers of Maxout units are used
for feature extraction, which can generate almost all haze-relevant Single image haze removal has made significant progresses
features. We also propose a novel nonlinear activation function recently, due to the use of better assumptions and priors.
in DehazeNet, called Bilateral Rectified Linear Unit (BReLU), Specifically, under the assumption that the local contrast of the
which is able to improve the quality of recovered haze-free image. haze-free image is much higher than that of hazy image, a local
We establish connections between components of the proposed contrast maximizing method [7] based on Markov Random
DehazeNet and those used in existing methods. Experiments Field (MRF) is proposed for haze removal. Although contrast
on benchmark images show that DehazeNet achieves superior maximizing approach is able to achieve impressive results, it
performance over existing methods, yet keeps efficient and easy
tends to produce over-saturated images. In [8], Independent
to use.
Component Analysis (ICA) based on minimal input is pro-
Keywords—Dehaze, image restoration, deep CNN, BReLU. posed to remove the haze from color images, but the approach
is time-consuming and cannot be used to deal with dense-haze
images. Inspired by dark-object subtraction technique, Dark
I. I NTRODUCTION
Channel Prior (DCP) [9] is discovered based on empirical
are often less effective for some images. Section IV, before conclusion is drawn in Section V.
Haze removal from a single image is a difficult vision task.
In contrast, the human brain can quickly identify the hazy area
from the natural scenery without any additional information. II. R ELATED W ORKS
One might be tempted to propose biologically inspired models Many image dehazing methods have been proposed in the
for image dehazing, by following the success of bio-inspired literature. In this section, we briefly review some important
CNNs for high-level vision tasks such as image classification ones, paying attention to those proposing the atmospheric
[19], face recognition [20] and object detection [21]. In fact, scattering model, which is the basic underlying model of
there have been a few (convolutional) neural network based image dehazing, and those proposing useful assumptions for
deep learning methods that are recently proposed for low-level computing haze-relevant features.
vision tasks of image restoration/reconstruction [22], [23],
[24]. However, these methods cannot be directly applied to
single image haze removal. A. Atmospheric Scattering Model
Note that apart from estimation of a global atmospheric light
magnitude, the key to achieve haze removal is to recover an To describe the formation of a hazy image, the atmospheric
accurate medium transmission map. To this end, we propose scattering model is first proposed by McCartney [26], which
DehazeNet, a trainable CNN based end-to-end system for is further developed by Narasimhan and Nayar [27], [28]. The
medium transmission estimation. DehazeNet takes a hazy atmospheric scattering model can be formally written as
image as input, and outputs its medium transmission map that
is subsequently used to recover the haze-free image by a simple I (x) = J (x) t (x) + α (1 − t (x)) , (1)
pixel-wise operation. Design of DehazeNet borrows ideas from where I (x) is the observed hazy image, J (x) is the real scene
established assumptions/principles in image dehazing, while to be recovered, t (x) is the medium transmission, α is the
parameters of all its layers can be automatically learned from global atmospheric light, and x indexes pixels in the observed
training hazy images. Experiments on benchmark images show hazy image I. Fig. 1 gives an illustration. There are three
that DehazeNet gives superior performance over existing meth- unknowns in equation (1), and the real scene J (x) can be
ods, yet keeps efficient and easy to use. Our main contributions recovered after α and t (x) are estimated.
are summarized as follows.
The medium transmission map t (x) describes the light
1) DehazeNet is an end-to-end system. It directly learns portion that is not scattered and reaches the camera. t (x) is
and estimates the mapping relations between hazy im- defined as
age patches and their medium transmissions. This is
t (x) = e−βd(x) , (2)
achieved by special design of its deep architecture to
embody established image dehazing principles. where d (x) is the distance from the scene point to the camera,
2) We propose a novel nonlinear activation function in and β is the scattering coefficient of the atmosphere. Equation
DehazeNet, called Bilateral Rectified Linear Unit1 (2) suggests that when d (x) goes to infinity, t (x) approaches
(BReLU). BReLU extends Rectified Linear Unit zero. Together with equation (1) we have
(ReLU) and demonstrates its significance in obtaining
accurate image restoration. Technically, BReLU uses α = I (x) , d (x) → inf (3)
the bilateral restraint to reduce search space and im-
prove convergence. In practical imaging of a distance view, d (x) cannot be
3) We establish connections between components of De- infinity, but rather be a long distance that gives a very low
hazeNet and those assumptions/priors used in existing transmission t0 . Instead of relying on equation (3) to get the
dehazing methods, and explain that DehazeNet im- global atmospheric light α, it is more stably estimated based
proves over these methods by automatically learning on the following rule
all these components from end to end.
The remainder of this paper is organized as follows. In α= max I (y) (4)
y∈{x|t(x)≤t0 }
Section II, we review the atmospheric scattering model and
haze-relevant features, which provides background knowledge The discussion above suggests that to recover a clean scene
to understand the design of DehazeNet. In Section III, we (i.e., to achieve haze removal), it is the key to estimate an
present details of the proposed DehazeNet, and discuss how accurate medium transmission map.
it relates to existing methods. Experiments are presented in
1 During the preparation of this manuscript (in December, 2015), we
B. Haze-relevant features
find that a nonlinear activation function called adjustable bounded rectifier
is proposed in [25] (arXived in November, 2015), which is almost identical Image dehazing is an inherently ill-posed problem. Based
to BReLU. Adjustable bounded rectifier is motivated to achieve the objective on empirical observations, existing methods propose various
of image recognition. In contrast, BReLU is proposed here to improve image
restoration accuracy. It is interesting that we come to the same activation assumptions or prior knowledge that are utilized to compute
function from completely different initial objectives. This may also suggest intermediate haze-relevant features. Final haze removal can be
the general usefulness of the proposed BReLU. achieved based on these haze-relevant features.
3
1) Dark Channel: The dark channel prior is based on the where I v (x) and I h (x) can be expressed in the HSV
wide observation on outdoor haze-free images. In most of the color space as I v (x) = maxc∈{r,b,g} I c (x) and I s (x) =
haze-free patches, at least one color channel has some pixels maxc∈{r,b,g} I (x) − minc∈{r,b,g} I (x) maxc∈{r,b,g} I c (x).
c c
whose intensity values are very low and even close to zero. The color attenuation feature is proportional to the scene
The dark channel [9] is defined as the minimum of all pixel depth d (x) ∝ A (x), and is used for transmission estimation
colors in a local patch: easily.
4) Hue Disparity: Hue disparity between the origi-
D (x) = min c
min I (y) , (5) nal image I (x) and its semi-inverse image, Isi (x) =
y∈Ωr (x) c∈{r,g,b} max [I c (x) , 1 − I c (x)] with c ∈ {r, g, b}, has been used to
detect the haze. For haze-free images, pixel values in the three
where I c is a RGB color channel of I and Ωr (x) is a channels of their semi-inverse images will not all flip, resulting
local patch centered at x with the size of r × r. The dark in large hue changes between Isi (x) and I (x). In [29], the
channel feature has a high correlation to the amount of haze hue disparity feature is defined:
in the image, and is used to estimate the medium transmission
h
t (x) ∝ 1 − D (x) directly. H (x) = Isi (x) − I h (x) , (8)
2) Maximum Contrast: According to the atmospheric scat-
tering, the contrast of the image where the superscript ”h” denotes the hue channel of the
P is reduced by P the haze trans-
image in HSV color space. According to (8), the medium
P
mission as x k∇I (x)k = t x k∇J (x)k ≤ x k∇J (x)k .
Based on this observation, the local contrast [7] as the variance transmission t (x) is in inverse propagation to H (x).
of pixel intensities in a s × s local patch Ωs with respect to the
center pixel, and the local maximum of local contrast values III. T HE P ROPOSED D EHAZE N ET
in a r × r region Ωr is defined as:
v The atmospheric scattering model in Section II-A suggests
u 1
u X that estimation of the medium transmission map is the most
2
C (x) = max t kI (z) − I (y)k , (6) important step to recover a haze-free image. To this end,
y∈Ωr (x) |Ωs (y)| we propose DehazeNet, a trainable end-to-end system that
z∈Ωs (y)
explicitly learns the mapping relations between raw hazy
where |Ωs (y)| is the cardinality of the local neighborhood. images and their associated medium transmission maps. In
The correlation between the contrast feature and the medium this section, we present layer designs of DehazeNet, and
transmission t is visually obvious, so the visibility of the image discuss how these designs are related to ideas in existing
is enhanced by maximizing the local contrast showed as (6). image dehazing methods. The final pixel-wise operation to
3) Color Attenuation: The saturation I s (x) of the patch get a recovered haze-free image from the estimated medium
decreases sharply while the color of the scene fades under transmission map will be presented in Section IV.
the influence of the haze, and the brightness value I v (x)
increases at the same time producing a high value for the
A. Layer Designs of DehazeNet
difference. According to the above color attenuation prior [18],
the difference between the brightness and the saturation is The proposed DehazeNet consists of cascaded convolutional
utilized to estimate the concentration of the haze: and pooling layers, with appropriate nonlinear activation func-
tions employed after some of these layers. Fig. 2 shows the
A (x) = I v (x) − I s (x) , (7) architecture of DehazeNet. Layers and nonlinear activations
4
Fig. 2. The architecture of DehazeNet. DehazeNet conceptually consists of four sequential operations (feature extraction, multi-scale mapping, local extremum
and non-linear regression), which is constructed by 3 convolution layers, a max-pooling, a Maxout unit and a BReLU activation function.
of DehazeNet are designed to implement four sequential op- scale invariance. For example, the inception architecture in
erations for medium transmission estimation, namely, feature GoogLeNet [31] uses parallel convolutions with varying filter
extraction, multi-scale mapping, local extremum, and nonlinear sizes, and better addresses the issue of aligning objects in input
regression. We detail these designs as follows. images, resulting in state-of-the-art performance in ILSVRC14
1) Feature Extraction: To address the ill-posed nature of [32]. Motivated by these successes of multi-scale feature ex-
image dehazing problem, existing methods propose various traction, we choose to use parallel convolutional operations in
assumptions and based on these assumptions, they are able the second layer of DehazeNet, where size of any convolution
to extract haze-relevant features (e.g., dark channel, hue dis- filter is among 3 × 3, 5 × 5 and 7 × 7, and we use the same
parity, and color attenuation) densely over the image domain. number of filters for these three scales. Formally, the output
Note that densely extracting these haze-relevant features is of the second layer is written as
equivalent to convolving an input hazy image with appropriate di/3e,(i\3)
filters, followed by nonlinear mappings. Inspired by extremum F2i = W2 ∗F1 + B2 di/3e,(i\3) , (10)
processing in color channels of those haze-relevant features, (3,n /3) (3,n /3)
an unusual activation function called Maxout unit [30] is where W2 = {W2p,q }(p,q)=(1,1)
2
and B2 = {B2p,q }(p,q)=(1,1)
2
selected as the non-linear mapping for dimension reduction. contain n2 pairs of parameters that is break up into 3 groups.
Maxout unit is a simple feed-forward nonlinear activation n2 is the output dimension of the second layer, and i ∈ [1, n2 ]
function used in multi-layer perceptron or CNNs. When used indexes the output feature maps. de takes the integer upwardly
in CNNs, it generates a new feature map by taking a pixel-wise and \ denotes the remainder operation.
maximization operation over k affine feature maps. Based on 3) Local Extremum: To achieve spatial invariance, the cor-
Maxout unit, we design the first layer of DehazeNet as follows tical complex cells in the visual cortex receive responses
from the simple cells for linear feature integration. Ilan et al.
F1i (x) = max f1i,j (x) , f1i,j = W1i,j ∗I + B1i,j , (9) [33] proposed that spatial integration properties of complex
j∈[1,k]
(n ,k) (n ,k)
cells can be described by a series of pooling operations.
where W1 = {W1i,j }(i,j)=(1,1)
1
and B1 = {B1i,j }(i,j)=(1,1)
1
According to the classical architecture of CNNs [34], the
represent the filters and biases respectively, and ∗ denotes the neighborhood maximum is considered under each pixel to
convolution operation. Here, there are n1 output feature maps overcome local sensitivity. In addition, the local extremum is in
in the first layer. W1i,j ∈ R3×f1 ×f1 is one of the total k × n1 accordance with the assumption that the medium transmission
convolution filters, where 3 is the number of channels in the is locally constant, and it is commonly to overcome the noise
input image I (x), and f1 is the spatial size of a filter (detailed of transmission estimation. Therefore, we use a local extremum
in Table I). Maxout unit maps each of kn1 -dimensional vectors operation in the third layer of DehazeNet.
into an n1 -dimensional one, and extracts the haze-relevant
features by automatic learning rather than heuristic ways in F3i (x) = max F2i (y) , (11)
y∈Ω(x)
existing methods.
2) Multi-scale Mapping: In [17], multi-scale features have where Ω (x) is an f3 × f3 neighborhood centered at x, and
been proven effective for haze removal, which densely com- the output dimension of the third layer n3 = n2 . In contrast
pute features of an input image at multiple spatial scales. to max-pooling in CNNs, which usually reduce resolutions of
Multi-scale feature extraction is also effective to achieve feature maps, the local extremum operation here is densely
5
(a) Opposite filter (b) All-pass filter (c) Round filter (d) Maxout
Fig. 3. Rectified Linear Unit (ReLU) and Bilateral Rectified Linear Unit
(BReLU) (e) The actual kernels learned from DehazeNet
Fig. 4. Filter weight and Maxout unit in the first layer operation F1
applied to every feature map pixel, and is able to preserve
resolution for use of image restoration.
4) Non-linear Regression: Standard choices of nonlinear example. If the weight W1 is an opposite filter (sparse matrices
activation functions in deep networks include Sigmoid [35] with the value of -1 at the center of one channel, as in Fig.
and Rectified Linear Unit (ReLU). The former one is eas- 4(a)) and B1 is a unit bias, then the maximum output of the
ier to suffer from vanishing gradient, which may lead to feature map is equivalent to the minimum of color channels,
slow convergence or poor local optima in networks training. which is similar to dark channel [9] (see Equation (5)). In
To overcome the problem of vanishing gradient, ReLU is the same way, when the weight is a round filter as Fig. 4(c),
proposed [36] which offers sparse representations. However, F1 is similar to the maximum contrast [7] (see Equation (6));
ReLU is designed for classification problems and not perfectly when W1 includes all-pass filters and opposite filters, F1 is
suitable for the regression problems such as image restoration. similar to the maximum and minimum feature maps, which are
In particular, ReLU inhibits values only when they are less atomic operations of the color space transformation from RGB
than zero. It might lead to response overflow especially in the to HSV, then the color attenuation [18] (see Equation (7)) and
last layer, because for image restoration, the output values of hue disparity [29] (see Equation (8)) features are extracted. In
the last layer are supposed to be both lower and upper bounded conclusion, upon success of filter learning shown in Fig. 4(e),
in a small range. To this end, we propose a Bilateral Rectified almost all haze-relevant features can be potentially extracted
Linear Unit (BReLU) activation function, shown in Fig. 3, from the first layer of DehazeNet. On the other hand, Maxout
to overcome this limitation. Inspired by Sigmoid and ReLU, activation functions can be considered as piece-wise linear
BReLU as a novel linear unit keeps bilateral restraint and local approximations to arbitrary convex functions. In this paper,
linearity. Based on the proposed BReLU, the feature map of we choose the maximum across four feature maps (k = 4)
the fourth layer is defined as to approximate an arbitrary convex function, as shown in Fig.
4(d).
F4 = min (tmax , max (tmin , W4 ∗ F3 + B4 )) (12)
White-colored objects in an image are similar to heavy haze
Here W4 = {W4 } contains a filter with the size of n3 ×f4 ×f4 , scenes that are usually with high values of brightness and low
B4 = {B4 } contains a bias, and tmin,max is the marginal value values of saturation. Therefore, almost all the haze estimation
of BReLU (tmin = 0 and tmax = 1 in this paper). According models tend to consider the white-colored scene objects as
to (12), the gradient of this activation function can be shown being distant, resulting in inaccurate estimation of the medium
as transmission. Based on the assumption that the scene depth
is locally constant, local extremum filter is commonly to
∂F4 (x)
∂F4 (x) , tmin ≤ F4 (x) < tmax overcome this problem [9], [18], [7]. In DehazeNet, local
= (13)
∂F3 0, ∂F 3
otherwise
maximum filters of the third layer operation remove the local
estimation error. Thus the direct attenuation term J (x) t (x)
The above four layers are cascaded together to form a CNN can be very close to zero when the transmission t (x) is close
based trainable end-to-end system, where filters and biases to zero. The directly recovered scene radiance J (x) is prone to
associated with convolutional layers are network parameters noise. In DehazeNet, we propose BReLU to restrict the values
to be learned. We note that designs of these layers can be of transmission between tmin and tmax , thus alleviating the
connected with expertise in existing image dehazing methods, noise problem. Note that BReLU is equivalent to the boundary
which we specify in the subsequent section. constraints used in traditional methods [9], [18].
Fig. 6. The training process with different low-dimensional mapping in F1 Fig. 7. The training process with different activation function in F4
Fig. 8. The plots between predicted and truth transmission on different activation function in the non-linear regression F4
TABLE IV. T HE AVERAGE RESULTS OF MSE, SSIM, PSNR AND WSNR ON THE SYNTHETIC IMAGES (β = 1 AND α = 1)
Metric ATM [39] BCCR [11] FVR [38] DCP [9] CAP2 [18] RF [17] DehazeNet
MSE 0.0689 0.0243 0.0155 0.0172 0.0075 (0.0068) 0.0070 0.0062
SSIM 0.9890 0.9963 0.9973 0.9981 0.9991 (0.9990) 0.9989 0.9993
PSNR 60.8612 65.2794 66.5450 66.7392 70.0029 (70.6581) 70.0099 70.9767
WSNR 7.8492 12.6230 13.7236 13.8508 16.9873 (17.7839) 17.1180 18.0996
TABLE V. T HE MSE ON THE SYNTHETIC IMAGES BY DIFFERENT SCATTERING COEFFICIENT, IMAGE SCALE AND ATMOSPHERIC AIRLIGHT
Evaluation ATM [39] BCCR [11] FVR [38] DCP [9] CAP2 [18] RF [17] DehazeNet
0.75 0.0581 0.0269 0.0122 0.0199 0.0043 (0.0042) 0.0046 0.0063
CRE 1.00 0.0689 0.0243 0.0155 0.0172 0.0077 (0.0068) 0.0070 0.0062
(β =) 1.25 0.0703 0.0230 0.0219 0.0147 0.0141 (0.0121) 0.0109 0.0084
1.50 0.0683 0.0219 0.0305 0.0134 0.0231 (0.0201) 0.0152 0.0127
CRE Average 0.0653 0.0254 0.0187 0.0177 0.0105 (0.0095) 0.0094 0.0084
[1.0, 1.0, 1.0] 0.0689 0.0243 0.0155 0.0172 0.0075 (0.0068) 0.0070 0.0062
ARE [0.9, 1.0, 1.0] 0.0660 0.0266 0.0170 0.0210 0.0073 (0.0069) 0.0071 0.0072
(α =) [1.0, 0.9, 1.0] 0.0870 0.0270 0.0159 0.0200 0.0070 (0.0067) 0.0073 0.0074
[1.0, 1.0, 0.9] 0.0689 0.0239 0.0152 0.0186 0.0081 (0.0069) 0.0083 0.0062
ARE Average 0.0727 0.0255 0.0159 0.0192 0.0075 (0.0068) 0.0074 0.0067
0.40 0.0450 0.0238 0.0155 0.0102 0.0137 (0.0084) 0.0089 0.0066
SRE 0.60 0.0564 0.0223 0.0154 0.0137 0.0092 (0.0071) 0.0076 0.0060
(s =) 0.80 0.0619 0.0236 0.0155 0.0166 0.0086 (0.0066) 0.0074 0.0062
1.00 0.0689 0.0243 0.0155 0.0172 0.0077 (0.0068) 0.0070 0.0062
SRE Average 0.0581 0.0235 0.0155 0.0144 0.0098 (0.0072) 0.0077 0.0062
10 0.0541 0.0138 0.0150 0.0133 0.0065 (0.0070) 0.0086 0.0059
15 0.0439 0.0144 0.0148 0.0104 0.0072 (0.0074) 0.0112 0.0061
NRE
20 – 0.0181 0.0151 0.0093 0.0083 (0.0085) 0.0143 0.0058
(σ =)
25 – 0.0224 0.0150 0.0082 0.0100 (0.0092) 0.0155 0.0051
30 – 0.0192 0.0151 0.0085 0.0119 (0.0112) 0.0191 0.0049
NRE Average – 0.0255 0.0150 0.0100 0.0088 (0.0087) 0.0137 0.0055
to break the correlation between the medium transmission and Because transmission estimation based on priors are a type
the image content, which will also magnify the effect of outlier. of statistics, which might not work for certain images. The
fourth and fifth figures are determined to be failure cases
E. Qualitative results on real-world images in [9]. When the scene objects are inherently similar to the
Fig. 11 shows the dehazing results and depth maps restored atmospheric light (such as the fair-skinned complexion in the
by DehazeNet, and more results and comparisons can be fourth figure and the white marble in the fifth figure), the
found at http://caibolun.github.io/DehazeNet/. Because all of estimated transmission based on priors (DCP, BCCR, FVR) is
the dehazing algorithms can obtain truly good results on not reliable. Because the dark channel has bright values near
general outdoor images, it is difficult to rank them visually. To such objects, and FVR and BCCR are based on DCP which
compare them, this paper focuses on 5 identified challenging has an inherent problem of overestimating the transmission.
images in related studies [9], [17], [18]. These images have CAP and RF learned from a regression model is free from
large white or gray regions that are hard to handle, because oversaturation, but underestimates the haze degree in the
most existing dehazing algorithms are sensitive to the white distance (see the brown hair in the fourth image and the red
color. Fig. 12 shows a qualitative comparison with six state- pillar in the fifth image). Compared with the six algorithms,
of-the-art dehazing algorithms on the challenging images. Fig. our results avoid image oversaturation and retain the dehazing
12 (a) depicts the hazy images to be dehazed, and Fig. 12 validity due to the non-linear regression of DehazeNet.
(b-g) shows the results of ATM [39], BCCR [11], FVR [38],
DCP [9], CAP [18] and RF [17], respectively. The results of V. C ONCLUSION
DehazeNet are given in Fig. 12 (h).
The sky region in a hazy image is a challenge of dehazing, In this paper, we have presented a novel deep learning
because clouds and haze are similar natural phenomenons approach for single image dehazing. Inspired by traditional
with the same atmospheric scattering model. As shown in haze-relevant features and dehazing methods, we show that
the first three figures, most of the haze is removed in the (b- medium transmission estimation can be reformulated into a
d) results, and the details of the scenes and objects are well trainable end-to-end system with special design, where the
restored. However, the results significantly suffer from over- feature extraction layer and the non-linear regression layer
enhancement in the sky region. Overall, the sky region of these are distinguished from classical CNNs. In the first layer F1 ,
images is much darker than it should be or is oversaturated and Maxout unit is proved similar to the priori methods, and it is
distorted. Haze generally exists only in the atmospheric surface more effective to learn haze-relevant features. In the last layer
layer, and thus the sky region almost does not require handling. F4 , a novel activation function called BReLU is instead of
Based on the learning framework, CAP and RF avoid color ReLU or Sigmoid to keep bilateral restraint and local linearity
distortion in the sky, but non-sky regions are enhanced poorly for image restoration. With this lightweight architecture, De-
because of the non-content regression model (for example, the hazeNet achieves dramatically high efficiency and outstanding
rock-soil of the first image and the green flatlands in the third dehazing effects than the state-of-the-art methods.
image). DehazeNet appears to be capable of finding the sky Although we successfully applied a CNN for haze removal,
region to keep the color, and assures a good dehazing effect there are still some extensibility researches to be carried
in other regions. The reason is that the patch attribute can be out. That is, the atmospheric light α cannot be regarded
learned in the hidden layer of DehazeNet, and it contributes as a global constant, which will be learned together with
to the dehazing effects in the sky. medium transmission in a unified network. Moreover, we think
11
Fig. 11. The haze-free images and depth maps restored by DehazeNet
(a) Hazy image (b) ATM [39] (c) BCCR [11] (d) FVR [38] (e) DCP [9] (f) CAP [18] (g) RF [17] (h) DehazeNet
atmospheric scattering model can also be learned in a deeper [2] J. A. Stark, “Adaptive image contrast enhancement using generalizations
neural network, in which an end-to-end mapping between haze of histogram equalization,” IEEE Transactions on Image Processing,
and haze-free images can be optimized directly without the vol. 9, no. 5, pp. 889–896, 2000.
medium transmission estimation. We leave this problem for [3] R. Eschbach and B. W. Kolpatzik, “Image-dependent color saturation
correction in a natural scene pictorial image,” Sep. 12 1995, uS Patent
future research. 5,450,217.
[4] Y. Y. Schechner, S. G. Narasimhan, and S. K. Nayar, “Instant dehazing
of images using polarization,” in IEEE Conference on Computer Vision
R EFERENCES and Pattern Recognition (CVPR), vol. 1, 2001, pp. I–325.
[1] T. K. Kim, J. K. Paik, and B. S. Kang, “Contrast enhancement system [5] S. G. Narasimhan and S. K. Nayar, “Contrast restoration of weather
using spatially adaptive histogram equalization with temporal filtering,” degraded images,” IEEE Transactions on Pattern Analysis and Machine
IEEE Transactions on Consumer Electronics, vol. 44, no. 1, pp. 82–87, Intelligence, vol. 25, no. 6, pp. 713–724, 2003.
1998. [6] J. Kopf, B. Neubert, B. Chen, M. Cohen, D. Cohen-Or, O. Deussen,
12
M. Uyttendaele, and D. Lischinski, “Deep photo: Model-based photo- [27] S. K. Nayar and S. G. Narasimhan, “Vision in bad weather,” in IEEE
graph enhancement and viewing,” in ACM Transactions on Graphics International Conference on Computer Vision, vol. 2, 1999, pp. 820–
(TOG), vol. 27, no. 5, 2008, p. 116. 827.
[7] R. T. Tan, “Visibility in bad weather from a single image,” in IEEE [28] S. G. Narasimhan and S. K. Nayar, “Contrast restoration of weather
Conference on Computer Vision and Pattern Recognition (CVPR), 2008, degraded images,” IEEE Transactions on Pattern Analysis and Machine
pp. 1–8. Intelligence, vol. 25, no. 6, pp. 713–724, 2003.
[8] R. Fattal, “Single image dehazing,” in ACM Transactions on Graphics [29] C. O. Ancuti, C. Ancuti, C. Hermans, and P. Bekaert, “A fast semi-
(TOG), vol. 27, no. 3, 2008, p. 72. inverse approach to detect and remove the haze from a single image,”
in Computer Vision–ACCV 2010, 2011, pp. 501–514.
[9] K. He, J. Sun, and X. Tang, “Single image haze removal using dark
channel prior,” IEEE Transactions on Pattern Analysis and Machine [30] I. Goodfellow, D. Warde-farley, M. Mirza, A. Courville, and Y. Bengio,
Intelligence, vol. 33, no. 12, pp. 2341–2353, 2011. “Maxout networks,” in Proceedings of the 30th International Confer-
ence on Machine Learning (ICML-13), 2013, pp. 1319–1327.
[10] K. Nishino, L. Kratz, and S. Lombardi, “Bayesian defogging,” Interna-
tional journal of computer vision, vol. 98, no. 3, pp. 263–278, 2012. [31] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in
[11] G. Meng, Y. Wang, J. Duan, S. Xiang, and C. Pan, “Efficient image IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
dehazing with boundary constraint and contextual regularization,” in 2015, pp. 1–9.
IEEE International Conference on Computer Vision (ICCV), 2013, pp.
617–624. [32] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large
[12] K. B. Gibson, D. T. Vo, and T. Q. Nguyen, “An investigation of scale visual recognition challenge,” International Journal of Computer
dehazing effects on image and video coding,” IEEE Transactions on Vision, pp. 1–42, 2014.
Image Processing, vol. 21, no. 2, pp. 662–673, 2012.
[33] I. Lampl, D. Ferster, T. Poggio, and M. Riesenhuber, “Intracellular
[13] J.-P. Tarel and N. Hautiere, “Fast visibility restoration from a single measurements of spatial integration and the max operation in complex
color or gray level image,” in IEEE International Conference on cells of the cat primary visual cortex,” Journal of neurophysiology,
Computer Vision, 2009, pp. 2201–2208. vol. 92, no. 5, pp. 2704–2713, 2004.
[14] J. Yu, C. Xiao, and D. Li, “Physics-based fast single image fog re- [34] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
moval,” in IEEE International Conference on Signal Processing (ICSP), learning applied to document recognition,” Proceedings of the IEEE,
2010, pp. 1048–1052. vol. 86, no. 11, pp. 2278–2324, 1998.
[15] K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE Transactions [35] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of
on Pattern Analysis and Machine Intelligence, vol. 35, no. 6, pp. 1397– data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507,
1409, 2013. 2006.
[16] A. Levin, D. Lischinski, and Y. Weiss, “A closed-form solution to [36] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
natural image matting,” IEEE Transactions on Pattern Analysis and boltzmann machines,” in Proceedings of the 27th International Confer-
Machine Intelligence, vol. 30, no. 2, pp. 228–242, 2008. ence on Machine Learning (ICML-10), 2010, pp. 807–814.
[17] K. Tang, J. Yang, and J. Wang, “Investigating haze-relevant features [37] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
in a learning framework for image dehazing,” in IEEE Conference on S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for
Computer Vision and Pattern Recognition (CVPR), 2014, pp. 2995– fast feature embedding,” in Proceedings of the ACM International
3002. Conference on Multimedia. ACM, 2014, pp. 675–678.
[18] Q. Zhu, J. Mai, and L. Shao, “A fast single image haze removal [38] J.-P. Tarel and N. Hautiere, “Fast visibility restoration from a single
algorithm using color attenuation prior,” IEEE Transactions on Image color or gray level image,” in IEEE International Conference on
Processing, vol. 24, no. 11, pp. 3522–3533, 2015. Computer Vision, 2009, pp. 2201–2208.
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification [39] M. Sulami, I. Glatzer, R. Fattal, and M. Werman, “Automatic recov-
with deep convolutional neural networks,” in Advances in neural ery of the atmospheric light in hazy images,” in IEEE International
information processing systems, 2012, pp. 1097–1105. Conference on Computational Photography (ICCP), 2014, pp. 1–11.
[20] C. Ding and D. Tao, “Robust face recognition via multimodal deep face [40] J. Mai, Q. Zhu, D. Wu, Y. Xie, and L. Wang, “Back propagation neural
representation,” IEEE Transactions on Multimedia, vol. 17, no. 11, pp. network dehazing,” IEEE International Conference on Robotics and
2049–2058, 2015. Biomimetics (ROBIO), pp. 1433–1438, 2014.
[21] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature [41] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional
hierarchies for accurate object detection and semantic segmentation,” in network for image super-resolution,” in Computer Vision–ECCV 2014.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Springer, 2014, pp. 184–199.
2014, pp. 580–587. [42] T.-H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma, “Pcanet: A simple
[22] C. J. Schuler, H. C. Burger, S. Harmeling, and B. Scholkopf, “A deep learning baseline for image classification?” IEEE Transactions on
machine learning approach for non-blind image deconvolution,” in IEEE Image Processing, vol. 24, no. 12, pp. 5017–5032, 2015.
Conference on Computer Vision and Pattern Recognition (CVPR), 2013, [43] B. Scholkopft and K.-R. Mullert, “Fisher discriminant analysis with
pp. 1067–1074. kernels,” Neural networks for signal processing IX, vol. 1, p. 1, 1999.
[23] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using [44] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense
deep convolutional networks,” IEEE Transactions on Pattern Analysis two-frame stereo correspondence algorithms,” International journal of
and Machine Intelligence, no. 99, pp. 1–1, 2015. computer vision, vol. 47, no. 1-3, pp. 7–42, 2002.
[24] D. Eigen, D. Krishnan, and R. Fergus, “Restoring an image taken [45] ——, “High-accuracy stereo depth maps using structured light,” in
through a window covered with dirt or rain,” in IEEE International IEEE Computer Society Conference on Computer Vision and Pattern
Conference on Computer Vision (ICCV), 2013, pp. 633–640. Recognition, vol. 1. IEEE, 2003, pp. I–195.
[25] Z. Wu, D. Lin, and X. Tang, “Adjustable bounded rectifiers: Towards [46] D. Scharstein and C. Pal, “Learning conditional random fields for
deep binary representations,” arXiv preprint arXiv:1511.06201, 2015. stereo,” in IEEE Conference on Computer Vision and Pattern Recogni-
[26] E. J. McCartney, “Optics of the atmosphere: scattering by molecules tion. IEEE, 2007, pp. 1–8.
and particles,” New York, John Wiley and Sons, Inc., vol. 1, 1976. [47] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
13