0% found this document useful (0 votes)
8 views11 pages

Chen 2015

This document proposes a fuzzy restricted Boltzmann machine (FRBM) to enhance deep learning capabilities. The FRBM replaces the constant parameters of a traditional restricted Boltzmann machine (RBM) with fuzzy numbers. This allows the relationships between visible and hidden units to be modeled more flexibly. Experimental results on image inpainting and digit classification show the FRBM has better representation capabilities and is more robust to noise than the RBM. The FRBM is presented as a way to overcome limitations in RBMs and improve deep learning performance.

Uploaded by

Sally Abdulaziz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views11 pages

Chen 2015

This document proposes a fuzzy restricted Boltzmann machine (FRBM) to enhance deep learning capabilities. The FRBM replaces the constant parameters of a traditional restricted Boltzmann machine (RBM) with fuzzy numbers. This allows the relationships between visible and hidden units to be modeled more flexibly. Experimental results on image inpainting and digit classification show the FRBM has better representation capabilities and is more robust to noise than the RBM. The FRBM is presented as a way to overcome limitations in RBMs and improve deep learning performance.

Uploaded by

Sally Abdulaziz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 23, NO.

6, DECEMBER 2015 2163

Fuzzy Restricted Boltzmann Machine for the


Enhancement of Deep Learning
C. L. Philip Chen, Fellow, IEEE, Chun-Yang Zhang, Long Chen, Member, IEEE, and Min Gan

Abstract—In recent years, deep learning caves out a research


wave in machine learning. With outstanding performance, more
and more applications of deep learning in pattern recognition, im-
age recognition, speech recognition, and video processing have been
developed. Restricted Boltzmann machine (RBM) plays an impor-
tant role in current deep learning techniques, as most of existing
deep networks are based on or related to it. For regular RBM,
the relationships between visible units and hidden units are re-
stricted to be constants. This restriction will certainly downgrade
the representation capability of the RBM. To avoid this flaw and
enhance deep learning capability, the fuzzy restricted Boltzmann Fig. 1. Restricted Boltzmann Machine (RBM).
machine (FRBM) and its learning algorithm are proposed in this
paper, in which the parameters governing the model are replaced
by fuzzy numbers. This way, the original RBM becomes a special learning algorithms, including deep autoencoder [2], deep belief
case in the FRBM, when there is no fuzziness in the FRBM model. networks [3], and deep Boltzmann machine [4]. The RBM and
In the process of learning FRBM, the fuzzy free energy function its deep architectures have a large number of applications [5],
is defuzzified before the probability is defined. The experimental
results based on bar-and-stripe benchmark inpainting and MNIST
such as dimensionality reduction [6], classification [7], collabo-
handwritten digits classification problems show that the represen- rative filtering [8], feature learning [9], and topic modeling [10].
tation capability of FRBM model is significantly better than the A detailed knowledge of the RBM and its deep architectures can
traditional RBM. Additionally, the FRBM also reveals better ro- be found in [11] and [12].
bustness property compared with RBM when the training data are Most researchers in deep learning field focus on deep network
contaminated by noises.
design and corresponding fast learning algorithms. Some re-
Index Terms—Deep learning, fuzzy deep networks, fuzzy search works try to improve the deep learning technique from the
restricted Boltzmann machine, image classification, image inpaint- model representation. For example, Gaussian-restricted Boltz-
ing, restricted Boltzmann machine (RBM). mann machines (GRBMs) with Gaussian linear units are pro-
posed to learn representations from real-valued data [13]. It im-
I. INTRODUCTION proves the RBM by replacing binary-valued visible units with
Gaussian ones. The deep networks based on GRBM are also
ESTRICTED Boltzmann machine (RBM), as illustrated
R in Fig. 1, is a stochastic graph model that can learn a joint
probability distribution over its n visible units x = [x1 , . . . , xn ]
developed in recent years, such as Gaussian–Bernoulli deep
Boltzmann machine [14], [15]. Conditional versions of RBMs
have also been developed for collaborative filtering [16], and
and m hidden feature units h = [h1 , . . . , hm ]. The model is gov- temporal RBMs and recurrent RBMs are proposed to model
erned by parameters θ denoting connection weights and biases high-dimensional sequence data, such as motion capture data
between cross-layer units. The RBM [1] was invented in 1986; [17] and video sequences [18].
however, it received a new birth when G. Hinton and his partners For regular RBMs and their existing variants, the parameters
recently proposed several deep networks and corresponding fast that represent the relationships between units in visible and
hidden layers are restricted to be constants. This structure
Manuscript received August 26, 2014; revised December 14, 2014; accepted design will surely lead to many problems. First, it will constrain
January 12, 2015. Date of publication February 24, 2015; date of current ver- the representation capability, since the variables often interact
sion November 25, 2015. This work was supported by the Macau Science in some uncertain ways. Second, the RBMs are not very robust
and Technology Development Fund under Grant 008/2010/A1, UM Multiyear
Research Grants, and the National Natural Science Foundation of China under when the training data samples corrupted by noises. Third,
Grant 61203106. (Corresponding Author: Chun-Yang Zhang.) the parameter learning process of the RBMs is confined in a
C. L. P. Chen and L. Chen are with the Department of Computer and Infor- relatively small space. This is contrary with merits of deep
mation Science, Faculty of Science and Technology, University of Macau,
Macau 999078, China (e-mail: [email protected]; [email protected]). learning. All of these constraints will be reflected by the fitness
C.-Y. Zhang was with the Department of Computer and Information Science, of the target joint probability distribution. To overcome these
Faculty of Science and Technology, University of Macau, Macau 999078, China. disadvantages and reduce the inaccuracy and distortion deduced
He is now with the College of Mathematics and Computer Science, Fuzhou
University, Fuzhou 350116, China (e-mail: [email protected]). by linearization of the relationship between cross-layer units,
M. Gan is with the Department of Computer and Information Science, Hefei this paper proposes the fuzzy restricted Boltzmann machines
University of Technology, Hefei 230009, China (e-mail: [email protected]). (FRBM) and corresponding learning algorithm, where the
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. parameters governing the models are all fuzzy numbers. After
Digital Object Identifier 10.1109/TFUZZ.2015.2406889 the vagueness in the relationship between the visible units
1063-6706 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
2164 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 23, NO. 6, DECEMBER 2015

specified by placing a “bar” over a capital letter, such as W j .


W j (w) represents the membership value at w.

B. Alpha-Cuts
If A is a fuzzy set, the α-cut of A, denoted by A[α], is defined
as
A[α] = {x ∈ Ω|A(x) ≥ α} (2)
where 0 < α ≤ 1.

C. Fuzzy Function
The fuzzy function f , which is extended from real-value func-
Fig. 2. Symmetric triangular fuzzy number. tion f : Y = f (x, W), is defined by
Y = f (x, W) (3)
and their high-level features is introduced into the model, the
where Y is the dependent fuzzy output set, W and W are
fuzzy RBMs demonstrate competitive performances in both
parameters in the two functions [26].
data representation capability and robustness to cope with
1) Extension Principle: The membership function deduced
noises. These similar merits also can be initially found in fuzzy
from the extension principle can be expressed as
regression [19] and fuzzy support vector machine (SVM) [20].
The fuzzy RBMs, which are also designed to boost the Y (y) = sup{min(W 1 (W1 ), . . . , W n (Wn ))|f (x, W) = y}
development of deep learning from the building component W
(4)
(RBMs) of deep networks, has never been introduced before.
where W = (W1 , . . . , Wn )T , and W = (W 1 , . . . , W n )T .
The stochastic gradient descent method integrated with Monte
2) Alpha-Cuts of Y : If f is continuous, the α-cut of Y , i.e.,
Carlo Markov chain (MCMC) approach is employed to train the
Y [α] = [Y 1 (α), Y 2 (α)], has following expression:
proposed fuzzy RBMs. This kind of learning methods are com-

monly used in training RBMs, and proved to be very efficient Y 1 (α) = min{Y (W, x)|W ∈ W[α]}
[12], [21], [22]. Other learning approaches like Bayesian estima- (5)
tion methods have also been developed to learn RBM in [23] and Y 2 (α) = max{Y (W, x)|W ∈ W[α]}.
[24]. Gaussian RBMs, conditional RBMs, temporal RBMs, and
recurrent RBMs are developed through modifying the structure D. Interval Arithmetic
of RBMs. Alternatively, fuzzy RBMs are proposed from differ- For two intervals [a, b] and [c, d], that are two subsets of the
ent perspective of extending the relationships between visible real domain, the fundamental operations of interval arithmetic
and hidden units. Therefore, fuzzy RBMs can also be further [27] are defined as follows:
developed by taking consideration of others variants of RBMs.
The rest of the paper is organized as follows. In Section II, [a, b] + [c, d] = [a + c, b + d]
the preliminaries about fuzzy sets, fuzzy functions, and their [a, b] − [c, d] = [a − c, b − d]
notations are presented. The proposed FRBM and its learning
algorithm are introduced in Section III. After that, the outstand- [a, b] × [c, d] = [min(a × c, a × d, b × c, b × d)
ing performance of the FRBM model is verified by conducting max(a × c, a × d, b × c, b × d)]
experiments on bar-and-stripe (BAS) benchmark inpainting and
MNIST handwritten digits classification in Section IV. Finally, [a, b] ÷ [c, d] = [min(a ÷ c, a ÷ d, b ÷ c, b ÷ d)
the conclusion and remarks are drawn in Section V. max(a ÷ c, a ÷ d, b ÷ c, b ÷ d)], 0 ∈
/ [c, d].

II. PRELIMINARIES To calculate the membership function by using extension


principle (3) and (4) is infeasible, since it involves the maximiza-
A. Fuzzy Number tion and minimization of the original function. There is another
Restricting the discussion on symmetric triangular fuzzy efficient way to extend the function to be the corresponding
number [25], W = [W 1 , . . . , W n ]T is regarded as a vector of fuzzy one. That is to use α-cuts and interval arithmetic
them. The membership function for jth fuzzy number W j can
Y [α] = f (x, W[α]) (6)
be defined as
  where W[α] are intervals that are easy to calculate. Interval
|w − wj |
W j (w) = max 1 − ,0 (1) arithmetic then can be employed to finish the computation of
wj
membership function of the fuzzy function. However, inter-
j is the width
where wj is the center of the fuzzy number, and w val arithmetic may become NP hard problems when f is very
of the fuzzy number as illustrated in Fig. 2. A fuzzy set is complex [28].
CHEN et al.: FUZZY RESTRICTED BOLTZMANN MACHINE FOR THE ENHANCEMENT OF DEEP LEARNING 2165

free energy F, which marginalize hidden units and map (7) into
a simpler one, is deduced as

F(x, θ) = − log e−E (x, h̃,θ) (11)

where F is extended from crisp free energy function F



F(x, θ) = − log e−E (x, h̃,θ) . (12)

Fig. 3. Fuzzy restricted Boltzmann Machine (FRBM). If the fuzzy free energy function is directly employed to de-
fine the probability, it leads to a fuzzy probability [27]. Finally,
the optimization in learning process turns into a fuzzy max-
III. FUZZY-RESTRICTED BOLTZMANN MACHINE AND ITS imum likelihood problem. However, this kind of problem is
LEARNING ALGORITHM quite intractable because the fuzzy objective function is nonlin-
A. Fuzzy-Restricted Boltzmann Machine ear and the membership function is difficult to compute, since
the computation of its alpha-cuts become NP-hard problems
The proposed novel FRBM is illustrated in Fig. 3, in which
[29]. Therefore, it is necessary to transform the problem into
the connection weighs and biases are fuzzy parameters denoted
regular maximum likelihood problem by defuzzifying the fuzzy
by θ. There are several merits of the FRBM model. The first one
free energy function (11). The center of area (centroid) method
is that the FRBM has much better representation than the regular
[30] is employed to defuzzify the fuzzy free energy function
RBM in modeling probabilities over visible and hidden units.
F(x). Then, the likelihood function can be defined by the de-
Specifically, the RBM is only a special case of the FRBM when
fuzzified fuzzy free energy function. Consequently, the fuzzy
no fuzziness exists in the FRBM model. The second one is that
optimization problem becomes real-valued problem, and con-
the robustness of the FRBM model surpasses RBM model. The
ventional optimization approaches can be directly applied to
FRBM shows out more robustness when it comes to the fitting
find the optimal solutions. The centroid of fuzzy number F(x)
of the model with noisy data. All these advantages spring from
is denoted by Fc (x), and has the following form:
the fuzzy extension of the relationships between cross-layer 
variables, and inherit the characteristics of fuzzy models. θF(x, θ)dθ
Fc (x, θ) =  , θ ∈ θ. (13)
Since the FRBM is an extension of the RBM model; there- F(x, θ)dθ
fore, the discussion starts from a brief introduction on the RBM
Naturally, after the fuzzy free energy is defuzzified, the proba-
model. An RBM is an energy-based probabilistic model, in
bility can be defined as
which the probability distribution is defined through an energy
function. Its probability is defined as e−Fc (x;θ) 
Pc (x, θ) = , Z= e−Fc (x̃,θ) . (14)
e−E (x,h,θ) Z

P (x, h, θ) = (7)
Z In the fuzzy RBM model, the objective function is the negative

Z= e−E (x̃, h̃,θ) (8) log-likelihood, which is given by
x̃ 

L(θ, D) = − log Pc (x, θ) (15)
where E(x, h, θ) is the energy function, θ are the parameters x∈D
governing the model, Z is the normalizing factor which is called where D is the training dataset.
the partition function, x̃ and h̃ are two vector variables repre- The learning problem is to find optimal solutions for param-
senting visible and hidden units that are used to traverse and eters θ that minimize the objective function L(θ, D), i.e.
summarize all the configurations of units on the graph. The
energy function for the RBM is defined by min L(θ, D). (16)
θ
E(x, h, θ) = −bT x − cT h − hT Wx (9) In the following section, the detailed procedure to address the
where bj and ci are the offsets, and Wij is the connection weight dual problem of maximum likelihood by utilizing stochastic
between jth visible unit and ith hidden unit, and θ = {b, c, W}. gradient descent method will be investigated.
To establish FRBM, it is necessary to first define the fuzzy
energy function for the model. The fuzzy energy function can B. Fuzzy-Restricted Boltzmann Machines Learning Algorithm
be extended from (9) in accordance with extension principle as In order to solve the optimization problem (16), it is required
follows: to first finish the defuzzification process of the fuzzy free en-
T ergy function in some viable ways. However, it is infeasible to
E(x, h, θ) = −b x − cT h − hT Wx (10)
defuzzify the fuzzy free energy function by using (13) which
where E(x, h, θ) is a fuzzified energy function, and θ = involves integrals. Alternatively, the centroid is calculated by
{b, c, W} are fuzzy parameters. Correspondingly, the fuzzy employing a discrete form, which associates with a number of
2166 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 23, NO. 6, DECEMBER 2015

alpha-cuts of the fuzzy function. Therefore, the alpha-cuts of distribution, then, we have
the fuzzy free energy function and interval arithmetic are first
∂ log Pc (x, θ) ∂Fc (x, θ) 1  ∂Fc (x̃, θ)
investigated to obtain an approximation of the centroid. − ≈ − . (22)
1) Alpha-Cuts of Fuzzy Free Energy Function: As supposed, ∂θ ∂θ |N | ∂θ
x̃∈N
θ is a vector of symmetric triangular fuzzy numbers, and its α- 4) Conditional Probability: For the RBM, the conditional
cut is θ[α] = [θ L , θ R ], where θ L and θ R are lower and upper energy-based probabilities [11] are defined as
bounds of the interval with respect to α, respectively. F(x, θ) is
often a triangular-shaped fuzzy number for nonlinear functions e−E (x,h)
P (h|x) =  (23)
[27]. However, the fuzzy free energy is monotonic decreasing −E (x, h̃)
h̃ e
function with respect to parameters θ when x and h are non-
e−E (x,h)
negative. Therefore, according to interval arithmetic, the α-cut P (x|h) =  −E ( x̃,h) . (24)
of F(x, θ) can be given by x̃ e
In the commonly studied case of the RBM with binary units,
F(x, θ)[α] = F(x, θ[α]) where xj and hi ∈ {0, 1}, the probabilistic version of the usual
= [F(x, θ R ), F(x, θ L )]. (17) neuron activation functions can be derived from conditional
probabilities. They have the following affine forms:
2) Approximation of Centroid: An approximation of the cen-
ec i +W i x
troid of the fuzzy free energy function is provided by discretizing P (hi = 1|x) = = σ(ci + Wi x) (25)
the fuzzy output F(x, θ) and calculating M number of its α- 1 + ec i +W i x

T
cuts. By combining with these α-cuts, the approximate centroid eb j +W . j h
is given by P (xj = 1|h) = b j +W .Tj h
= σ bj + W.jT h (26)
1+e
M
αi [F(x, θiL ) + F(x, θiR )] where Wi and W.j denote the ith row and jth column of W ,
Fc (x, θ) ≈ i=1  (18) respectively, σ is logistic sigmoid function
2 M i=1 αi
ex 1
where α = (α1 , . . . , αN ), α ∈ [0, 1] and θ[αi ] = [θiL , θiR ].
N σ(x) = = .
ex + 1 1 + e−x
As all the α-cuts are bounded intervals [31], for convenience, For fuzzy RBM, the conditional probabilities also become
we only consider a special case, in which fuzzy numbers are fuzzy and can be extended from (25) and (26) as
degraded into intervals (α = 1). Let θ = [θ L , θ R ]. According
to (18), the free energy function can be written as P (hi = 1|x) = σ(ci + W i x)
T
1 P (xj = 1|h) = σ(bj + W .j h).
Fc (x, θ) ≈ [F(x, θ L ) + F(x, θ R )] . (19)
2 After defuzzifying the objective function, MCMC method can
After defuzzifying the fuzzy free energy, the probability defined be employed to sample from these conditional distributions. This
on it returns to (14). Then, the problem is transformed into reg- process is crucial to approximate the objective function due to
ular optimization problem, which can be solved by the gradient the difficulties to calculate the expectations. For the predefined
descend-based stochastic maximum likelihood method. α, the α-cut of fuzzy conditional probabilities are consistent
3) Gradient Related Optimization: The gradients of nega- with the targeting probability distribution. They are given as
tive log-probability with respect to θ L then have a particularly follows:
form (see Appendix A)
P (hi = 1|x)[α] = [PL (hi = 1|x), PR (hi = 1|x)]

∂ log Pc (x, θ) ∂Fc (x, θ L ) ∂Fc (x, θ L ) P (xj = 1|h)[α] = [PL (xj = 1|h), PR (xj = 1|h)]
− = − EP (20)
∂θ L ∂θ L ∂θ L
where PL (hi |x), PR (hi |x), PL (xj |h), and PR (xj |h) are the
where EP (·) means the expectation over the target probability conditional probabilities with respect to the lower bounds and
distribution P . Similarly, the gradients of (16) with respect to upper bounds of the parameters governing the model. They have
θ R is given by following forms:


PL (hi |x) = P (hi |x; θL ) = σ cLi + WiL x
∂ log Pc (x, θ) ∂Fc (x, θ R ) ∂Fc (x, θ R ) 
− = − EP . (21)
∂θ R ∂θ R ∂θ R PR (hi |x) = P (hi |x; θR ) = σ cR R
i + Wi x (27)

It is usually difficult to compute these gradients and


analytically, 
∂ F(x,θ)
as it involves the computation of EP ∂θ (θ = θ L or PL (xj |h) = P (xj |h; θL ) = σ bLj + W.jL h
θ R ). This is nothing less than an expectation over all possible 
PR (xj |h) = P (hj |h; θR ) = σ bR R
j + W.j h . (28)
configurations of the input x. An estimation of the expectation
can be obtained by using a fixed number of model samples. There are six kinds of parameters for visible unit j and hidden
Assume there are N samples that sampled from approximate unit i in the FRBM model, i.e., lower bound of connection
CHEN et al.: FUZZY RESTRICTED BOLTZMANN MACHINE FOR THE ENHANCEMENT OF DEEP LEARNING 2167

weight WijL , visible bias bLj , hidden bias cLi , and their upper
bounds WijR , bR R
j , and ci . For simplicity, the energy function is
denoted by a sum of terms associated with only one hidden unit

m
E(x, h) = −μ(x) − φi (x, hi ) (29)
i=1

where
μ(x) = bT x, φi (x, hi ) = −hi (ci + Wi x). (30)
Then, the free energy of RBM with binary units can be further
simplified explicitly to (see Appendix B) Fig. 4. k-step Gibbs sampling.


m
F(x) = −bT x − log(1 + e(c i +W i x) ). (31) units, hidden units are sampled given visible units, as illustrated
i=1 in Fig. 4
The gradients of the free energy can be calculated explicitly h(k +1) ∼ P (h(k +1) |x(k ) ) (32)
when the RBM has binary units and the energy function has
form (29). x(k +1) ∼ P (x(k +1) |h(k +1) ). (33)
No matter whether the fuzzy parameters in fuzzy RBM are As k → ∞, samples (x , h ) are guaranteed to be accu-
(k ) (k )
symmetric or not, their α-cuts are always intervals with lower rate samples of P (x, h). However, Gibbs sampling is very time-
and upper bounds that need to be learned in the training phase. consuming as k needs to be large enough. An efficient learning
According to (19)–(21), all the gradients (see details in Ap- approach, called contrastive divergence (CD) [32] learning was
pendix C) can be obtained. Associate with (20) and (31), it is proposed in 2002. It shows that learning process still performs
easy to get the following negative log-likelihood gradients for very well, even though only a number of steps are run in Markov
the fuzzy RBM with binary units: chain [33]. The CD learning uses two tricks to speed up the sam-
∂ log Pc (x)   pling process. The first on it to initialize the Markov chain with
− L
= EP PL (hi |x) · xLj − PL (hi |x) · xLj a training example, and the second one is to obtain samples after
∂Wij
only k-steps of Gibbs sampling. This is regarded as CD-k learn-
∂ log Pc (x) ing algorithm. A lot of experiments show that the performances
− = EP [PL (hi |x)] − PL (hi |x)
∂cLi of the approximations are still very good when k = 1.
CD is a different function compared with Kullback–Leibler
∂ log Pc (x)
− = EP [PL (xj |h)] − xLj divergence to measure the difference between approximated dis-
∂bLj tribution and true distribution. Why is CD learning efficient? It
∂ log Pc (x) is because that CD learning provides an approximation of the
− = EP [PR (hi |x) · xR
j ] − PR (hi |x) · xj
R
log-likelihood gradient that has been found to be a successful
∂WijR
update rule for training probabilistic models. Variational justifi-
∂ log Pc (x) cation can provide a theoretical proof to the convergence of the
− = EP [PR (hi |x)] − PR (hi |x) learning processes [11], [34]. Conducting CD−1 learning by
∂cR
i
using (25) and (26), namely x = x(0) → h(0) → x(1) → h(1) ,
∂ log Pc (x)
− = EP [PR (xj |h)] − xR
j
it is easy to get the updating rules for all the parameters (θL and
∂bR
j θR ) in the FRBM model. The pseudocode is demonstrated in
where Pc (x) is the centroid probability defined though (14) Algorithm 1.
combining with (18) and (19).
5) Contrastive Divergence: When to approximate the ex- IV. EXPERIMENTAL RESULTS
pectations, samples of PL (x) and PR (x) can be obtained by In this section, the representation capabilities of the RBM and
running two Markov chains to convergence, using Gibbs sam- FRBM will be examined on two datasets. One is BAS bench-
pling as the transition operator. One is for the lower bounds, mark dataset, and another one is MNIST handwritten digits
and the other is for the upper bounds. Gibbs sampling of the dataset. The RBM and proposed FRBM will be trained in an
joint distribution over N random variables S = (S1 , ..., SN ) is unsupervised way on BAS dataset to recovery incomplete im-
done through a sequence of N sampling substeps of the form ages. The training on noisy BAS dataset is also considered to
Si ∼ P (Si |S−i ), where S−i contains the N − 1 other random compare the robustness of the two models. To compare the clas-
variables in S excluding Si . sification performances of the two models in experiments based
For both the RBM and fuzzy RBM model, S consists of the on MNIST handwritten digits dataset, both RBM and FRBM
set of visible and hidden units. However, since they are condi- are trained in a supervised manner.
tionally independent, one can perform block Gibbs sampling. A On account of the partition function Z in (14), it is very tricky
step in the Markov chain, visible units are sampled given hidden to track the RBM and FRBM training process in unsupervised
2168 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 23, NO. 6, DECEMBER 2015

Algorithm 1 Updating Rules for FRBM


Input: x(0) is a training sample from the training
distribution for the RBM;
 is the learning rate for updating the parameters;
WL and WR are the visible-hidden connection weight
matrix;
bL and bR are the bias vectors for input units;
cL and cR are the bias vectors for hidden units;
Output: The updated parameters in the RBM: WL , bL , cL ,
WR , bR , cR .
for all hidden units i do Fig. 5. BAS benchmark.
L (0) R (0)
compute PL (hi = 1|x(0) ) and PR (hi = 1|x(0) )
by using (27); are 60 000 BAS benchmarks created randomly, and each of them
L (0) L (0)
sample hi ∈ {0, 1} from PL (hi |x(0) ); has 5 × 5 units. Every benchmark is generated by randomly
R (0) R (0)
sample hi ∈ {0, 1} from PR (hi |x(0) ); choosing a direction (row or column) with equal probability
end for first, and then set the states for all units of the row or column
for all visible units j do uniformly at random. Therefore, the dataset consists of 64 dif-
L (1) ferent BAS benchmarks. One example for BAS benchmarks is
compute PL (xj = 1|hL (0) ) and
R (1) illustrated in right part of Fig. 5. The RBM and FRBM models
PR (xj = 1|hR (0) ) by using (28);
L (1) L (1)
are used to learn the joint probability over the BAS benchmark.
sample xj ∈ {0, 1} from PL (xj |hL (0) ); As generative models, fuzzy RBM and RBM learn the prob-
R (1) R (1)
sample xj ∈ {0, 1} from PR (xj |hR (0) ); ability distribution over visible variables and hidden features.
end for Once they are well trained, they can generate samples from
for all hidden units i do the probability distribution P (x). For the BAS benchmark in-
L (1)
compute PL (hi = 1|xL (1) ) and painting problem, it is supposed that there are l known visible
PR (hi
R (1)
= 1|x R (1)
) by using (27); units x1 , . . . , xl , while xl+1 , . . . , xn are unknown observations.
L (1) L (1) Then, one can evaluate the unknown observations from condi-
sample hi ∈ {0, 1} from PL (hi |xL (1) );
R (1) R (1) tional probability P (x1 , . . . , xl |xl+1 , . . . , xn ). In this process,
sample hi ∈ {0, 1} from PR (hi |xR (1) ); the inferences can be easily implemented, such as in image
end for classification and inpainting.
WL = WL + (x(0) · PL (hL (0) = For the image inpainting problem, only a part of the image is
1|x(0) ) − xL (1) · PL (hL (1) = 1|xL (1) )); known. The task is to infer the unknown part. In Fig. 5, the left
bL = bL + (x(0) − xL (1) ); benchmark can be regarded as an incomplete image in which
cL = cL + (PL (hL (0) = 1|x(0) ) − PL (hL (1) = the pixels in the first row are clamped. The right benchmark is
1|xL (1) )); the target image that is required. For a clear illustration, the gray
WR = WR + (x(0) · PR (hR (0) = and black squares are used to denote binary pixel values. The
1|x(0) ) − xR (1) · PR (hR (1) = 1|xR (1) )); objective is to recovery the benchmark by using the FRBM and
bR = bR + (x(0) − xR (1) ); RBM models. First, there are 60 000 BAS benchmarks generated
cR = cR + (PR (hR (0) = 1|x(0) ) − PR (hR (1) = to train the two models. Then, one can recover the benchmark by
1|xR (1) )); conducting Gibbs sampling over P (x1 , . . . , xl |xl+1 , . . . , xn ).
return WL , bL , cL ,WR , bR , cR ; The unknown observations are initially resigned to be 0.5.
In the following experiments conducted to evaluate FRBM on
different energy functions and different types of fuzzy numbers
learning. However, one can observe the learning process by in-
for RBMs’ parameters, the energy function defined as (10) is
specting the negative samples in BAS benchmark dataset. For
regarded as type-1 energy function (EF-1), and the energy of a
the MNIST handwritten digits recognition problem, the classi-
joint configuration ignoring biases terms is regarded as type-2
fication error rate of the testing dataset is also one of significant
energy function (EF-2). The fuzzy RBM with symmetric trian-
criteria except the observation of the negative samples. All the
gular fuzzy number is denoted as FRBM-STFN. Accordingly,
experiments in this paper are run on MATLAB (R2009a) in
fuzzy RBM with Gaussian membership function is denoted as
32-b Windows 7 OS. The computational platform has two 3.40-
FRBM-GMF. In the experiments, the FRBM and RBM model
GHz CPUs and 4.00-GB memory. These experimental results
have 25 visible units and 50 hidden units. For the two differ-
are demonstrated in the following.
ent models, the minibatch learning scheme (100 samples each
batch) and CD-1 learning approach are employed to speed up
A. Bar-and-Stripe Benchmark Inpainting
the training processes. The biases b and c are initialized with
In this part, the practical aspects of training RBM and FRBM zero values, and connection weights are generated randomly
based on BAS benchmark dataset [35] will be discussed. There from standard Gaussian distribution in the initial stages. The
CHEN et al.: FUZZY RESTRICTED BOLTZMANN MACHINE FOR THE ENHANCEMENT OF DEEP LEARNING 2169

Fig. 6. Learning processes of RBM and FRBM based on BAS dataset.


Fig. 9. BAS benchmark inpainting with 20% noises based on RBM.

Fig. 7. BAS benchmark inpainting based on RBM.


Fig. 10. BAS benchmark inpainting with 20% noises based on FRBM.

denotes the Gibbs step in sampling. As the probabilistic model


has randomness and uncertainty, two stable results are collected
and shown in Figs. 7 and 8. These two figures show that the
FRBM model needs fewer Gibbs steps to recover the image than
the RBM model, since the distribution learnt by the FRBM is
much closer to the real distribution. Otherwise, the FRBM also
produces less negative predictions of pixels than the RBM in
each Gibbs step. Therefore, one can conclude that the FRBM has
stronger representation capability than the RBM. This advantage
springs from the fuzzy relationship between the visible and
hidden units. Intuitively, the parameters’ searching space of
RBM can be regarded as a subspace of the FRBM’s searching
Fig. 8. BAS benchmark inpainting based on FRBM.
space when the relationships are extended.

learning rates for updating W , b and c are set to be 0.05. The B. Noisy Bar-and-Stripe Benchmark Inpainting
MSEs (the summation over reconstruction error of all the 60 000 To verify the robustness of the FRBM model, there are 10%
training samples) produced in the two learning phases (50 hid- and 20% training samples replaced with noisy training samples.
den units) are shown in Fig. 6. It is easy to see that the FRBM Each noisy sample is generated by reversing all the pixel values
model generates less reconstruction errors than the RBM model, in a row or column. Then, the FRBM and RBM are trained with
which means the FRBM can learn the probability distribution these noisy samples to investigate the robustness of the two
more accurate than the traditional RBM model. different models. The inpainting results are shown in Figs. 9
The comparative recovery results for the RBM and FRBM and 10. The same conclusion is that the FRBM model learn a
models are demonstrated in Figs. 7 and 8, respectively, where k more accurate distribution than the regular RBM model.
2170 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 23, NO. 6, DECEMBER 2015

TABLE I
BAS BENCHMARK INPAINTING IN TESTING PROCESS: MEAN SQUARE ERROR
(MSE), RECOVERY ERROR RATE (RER)

EF Type Items Noise Level RBM FRBM-STFN FRBM-GMF

EF-1 MSE 0% 1271 256 248


10% 6060 1944 1859
20% 8777 4381 4197
RER 0% 3.18% 0.64% 0.62%
10% 15.15% 4.86% 4.65%
20% 21.94% 10.95% 10.49%
Learning time(s) - 54 91 117
EF-2 MSE 0% 5921 3451 3293
10% 8921 5022 4741
20% 11093 8134 7584
RER 0% 14.80% 8.63% 8.24%
10% 22.30% 12.56% 11.85% Fig. 11. Learning processes of RBM and FRBM based on MNIST dataset.
20% 27.73% 20.34% 18.96%
Learning time(s) – 45 72 91

As there exists the randomness and uncertainty in the proba-


bilistic models, it is necessary to analyze the statistical property
of the MSE and recovery accuracy (error rate). The results are
demonstrated in Table I. The MSEs are the error summation over
16 generated patterns (each has 25 pixel values) from 100 times
of Gibbs sampling. The recovery error rate is percentage of neg-
ative recovery pixel values. The results show that the proposed
FRBM makes a significant improvement to the regular RBM,
also for the noisy case. Additionally, the results show that the
energy function plays an important role in model establishment
and should be defined according to the specific graph model.
Fig. 12. MNIST samples.
The performance of both RBM and FRBM models are much
better when the biases terms are taken into account in energy
functions for this case. The energy functions for current variants
of RBM are designed by considering the structures of the vari- ate the representation capability of the FRBM model. Its deep
ants. Otherwise, the membership function of fuzzy parameters architectures and corresponding pretraining and fine-tuning al-
also has effect on representation capability of the FRBM model. gorithms will be discussed in the future. It is optimistic that
From the results in Table I, the FRBM model with Gaussian fuzzy deep autoencoder, fuzzy deep belief networks, and fuzzy
membership function has better representation of BAS dataset deep Boltzmann machine, that are all consists of the fuzzy
than that with symmetric triangular membership function. The RBMs, will have significant improvements of their original
learning time for FRBM model raises accordingly as the number versions.
of the parameters increases when fuzziness is introduced into The learning processes of RBM and FRBM for MNIST
the model. dataset are shown in Fig. 11. There are 60 000 samples in train-
ing dataset, and 100 hidden units for both RBM and FRBM. One
epoch means one learning process over all training samples that
C. Handwritten Digits Recognition
are divided into 600 minibatches, and all the parameters are
In this experiment, the supervised learning is carried out for updated one time after each minibatch is finished. The MSEs
the FRBMs and RBMs on MNIST handwritten digits dataset in Fig. 11 are the total square reconstruction errors over all the
which contains 60 000 training and 10 000 test handwritten digit training samples. It is clear that FRBM produces less MSE than
images [36]. Each image representing 0 to 9 is constructed with RBM model, which means FRBM can learn the representation
28 × 28 pixels whose intensities range from 0 to 255. This of the MNIST dataset better than the original RBM model.
dataset is useful to examine a large number of learning tech- In Figs. 12 and 13, 100 handwritten digits and their corre-
niques and pattern recognition methods on real-world data. One sponding reconstructions are demonstrated. The reconstructed
fact should be clarified that it is not expected that the perfor- samples are generated after 20 epoch learning on the training
mances of the FRBM and RBM in this experiment can out- dataset, where CD−1 learning was employed on the network
perform other sophisticated machine learning algorithm, such with 100 hidden units. These two figures show that the two-
as SVM, because they are not deep networks. However, DBM, layers FRBM can learn the probability over all the pixels very
DBN, and deep autoencoder have already surpassed the SVM well. In other words, each reconstructed image approximates
approach in this experiment. Herein, the objective is to evalu- the original image closely.
CHEN et al.: FUZZY RESTRICTED BOLTZMANN MACHINE FOR THE ENHANCEMENT OF DEEP LEARNING 2171

random variables by introducing fuzzy parameters, the RBM is


only a special case of the FRBM when no fuzziness exists in
the fuzzy model. In other words, the parameter searching space
of RBM is only a subspace of the FRBM’s searching space.
Therefore, the FRBM has more powerful representation capa-
bility than the regular RBM. On the other hand, the robustness
of the FRBM is also more powerful than the RBM model. These
merits attribute to the fuzziness of the FRBM model.
Both the RBM and FRBM can be trained in both supervised
and unsupervised way, the performances of the FRBM are veri-
fied in the experiments conducted on BAS benchmark inpainting
problem and MNIST handwritten digits classification problem.
The powerful robustness of the FRBM model is also exam-
ined on BAS benchmarks with different levels of noises. From
those experiments, it is reasonable to conclude that the proposed
Fig. 13. Reconstructed MNIST samples after 20 epoch with 100 hidden units. FRBM has more representation capability to model the proba-
bility distribution than the RBM. In the meantime, the FRBM
TABLE II also shows out more powerful robustness that is necessary to
MNIST CLASSIFICATION: ERROR RATE IS THE PERCENTAGE OF THE NEGATIVE address the noisy problem in training datasets and variational
RECOGNITION FOR 10 000 TESTING SAMPLES, m IS THE NUMBER OF HIDDEN real-valued applications.
UNITS
There still exists many open problems for the RBM and
FRBM, such as model selection and developments of deep net-
m Criterion RBM FRBM-STFN FRBM-GMF
works based on them. For the model selection problem, it is
m = 100 Error rate 667/10 000 523/10 000 486/10 000 very hard to determine how many units in hidden layer, and how
Learning time(s) 705 1138 1365 many hidden layers in deep architectures. Other setting prob-
m = 400 Error rate 412/10 000 281/10 000 254/10 000
Learning time(s) 1655 2896 3470 lems, such as Gibbs step, learning rate, and batch learning, also
m = 800 Error rate 367/10 000 249/10 000 236/10 000 impact the performance of the models. On the other hand, the
Learning time(s) 2857 5199 6236 computational cost of training their deep architectures is so high
m = 1000 Error rate 361/10 000 245/10 000 230/10 000
Learning time(s) 3635 6725 7734 that needs to be speed up in some more efficient optimization
approaches [37]. These are the common challenges for training
deep networks. However, after conforming the improvement of
the FRBM, one can have enough confidence to develop its deep
The error rates of the classification for the two models are
architectures and corresponding learning algorithms in the com-
shown in Table II. From the results, it is reasonable to conclude
ing future, such as fuzzy deep autoencoder, fuzzy deep belief
that the FRBM outperforms the regular RBM model, as its
networks, and fuzzy deep Boltzmann machine.
representation capability to model the probability is increased
by relaxing restrictions on the relationships between cross-layer
APPENDIX A
units. When the number of hidden units are 100, the fuzzy RBM
improves the recognition accuracy by 1.44%. It is a significant In the following, the detailed process for calculating the gra-
advancement in the MNIST digits recognition, as the state-of- dients of the negative log-likelihood function is presented as
the-art machine learning recognition algorithms only produce
∂ log P (x, θ) ∂Fc (x, θ) 1  −Fc ( x̃,θ) ∂Fc (x̃, θ)
0.5% improvement compared with the previous approaches. As − = − e
the number of hidden units increasing, the fuzzy RBMs produce ∂θ ∂θ Z ∂θ

less error rate than the RBM but the improvement of the fuzzy ∂Fc (x, θ)  ∂Fc (x̃, θ)
RBM decreases to 1.06%. It is because that both the RBMs and = − P (x̃)
∂θ ∂θ
fuzzy RBMs reach their capacities to model the data. As every x̃
image in MNIST dataset is different and with low resolution

∂Fc (x, θ) ∂Fc (x, θ)


(noises already exist, see Fig. 12), it does not need to factitiously = − EP . (34)
∂θ ∂θ
add noises into the images.
APPENDIX B
V. CONCLUSION
As the energy function is denoted by
In order to improve the representation capability and robust- 
ness of the RBM model, a novel FRBM model is proposed, in E(x, h) = −μ(x) + φi (x, hi ) (35)
which the connection weighs and biases between visible and i
hidden units are fuzzy numbers. It means that the relationship where
between input and its high level features (hidden units in the
graph) are extended. After relaxing the relationship between the μ(x) = bT x, φi (x, hi ) = −hi (ci + Wi x). (36)
2172 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 23, NO. 6, DECEMBER 2015

Then, the likelihood function can be given by all the gradients can be expressed as

1 −E (x) 1  −E (x, h̃) ∂Fc (x)


= −PL (hi |x) · xLj
P (x) = e = e ∂WijL
Z Z

∂Fc (x)
1   m = −PL (hi |x)
= ... eμ(x)− i = 1 φ i (x,h i ) ∂cLi
Z
h1 h2 hm
∂Fc (x)
= −xLj
1   m
∂bLj
= ... eμ(x) e−φ i (x,h i )
Z
h1 i=1h2 hm ∂Fc (x)
= −PR (hi |x) · xR
j
eμ(x)  −φ 1 (x,h 1 )  −φ 2 (x,h 2 ) ∂WijR
= e e
Z ∂Fc (x)
h1 h2 = −PR (hi |x)
 ∂cRi
... e−φ m (x,h m )
∂Fc (x)
hm = −xR
j . (41)
∂bR
μ(x) 
m j
e
= e−φ i (x,h i ) . (37)
Z i=1 REFERENCES
[1] P. Smolensky, Parallel Distributed Processing: Explorations in the Mi-
For the RBM model with binary value units, its free energy can crostructure of Cognition. Cambridge, MA, USA: MIT Press, 1986,
be written as pp. 194–281.
[2] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer-
wise training of deep networks,” Adv. Neural Inform. Process. Syst.,
vol. 19, pp. 153–160, 2007.
F(x) = − log P (x) − log Z [3] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for

m  deep belief nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554, 2006.
= −μ(x) − log e−φ i (x,h i ) [4] R. Salakhutdinov and G. Hinton, “An efficient learning procedure for deep
Boltzmann machines,” Neural Comput., vol. 24, no. 8, pp. 1967–2006,
i=1 hi 2012.
[5] R. Salakhutdinov, J. Tenenbaum, and A. Torralba, “Learning with

m
hierarchical-deep models,” IEEE Trans. Pattern Anal. Mach. Intell.,
= −bT x − log(1 + e(c i +W i x) ). (38) vol. 35, no. 8, pp. 1958–1971, Aug. 2013.
i=1 [6] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of
data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507,
2006.
APPENDIX C [7] J. Schmidhuber, “Multi-column deep neural networks for image clas-
sification,” in Proc. IEEE Conf. Comput. Vision Pattern Recog., 2012,
As the free energy of RBM model is explicitly deduced as pp. 3642–3649.
[8] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,
“Stacked denoising autoencoders: Learning useful representations in a

m deep network with a local denoising criterion,” J. Mach. Learning Res.,
F(x) = −bT x − log(1 + e(c i +W i x) ) vol. 11, pp. 3371–3408, 2010.
[9] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A
i=1 review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 35, no. 8, pp. 1798–1828, Aug. 2013.
[10] R. Salakhutdinov and G. Hinton, “Replicated softmax: An undirected
corresponding to (19) and (30), it is easy to get topic model,” Adv. Neural Inform. Process. Syst., vol. 22, pp. 1607–1614,
2010.
[11] Y. Bengio, “Learning deep architectures for AI,” Foundations Trends
1 1
Fc (x) = − μ(x; θL ) − μ(x; θR ) Mach. Learning, vol. 2, no. 1, pp. 1–127, 2009.
2 2 [12] G. Hinton, “A practical guide to training restricted Boltzmann machines,”
Dept. Comput. Sci., Univ. Toronto, Toronto, ON, Canada, Tech. Rep.
1
m
L L L UTML TR 2010-003, 2010.
− log(1 + eφ i (x,h i ,c i ,W i ) ) [13] N. Wang, J. Melchior, and L. Wiskott, “Gaussian-binary restricted Boltz-
2 i=1
mann machines on modeling natural image statistics,” arXiv preprint
arXiv:1401.5900, 2014.
1
m
R R R [14] G. W. Taylor, G. E. Hinton, and S. T. Roweis, “Two distributed-state
− log(1 + eφ i (x,h i ,c i ,W i ) ). (39) models for generating high-dimensional time series,” J. Mach. Learning
2 i=1
Res., vol. 12, pp. 1025–1068, 2011.
[15] K. H. Cho, T. Raiko, and A. Ilin, “Gaussian–Bernoulli deep Boltzmann
machine,” in Proc. Int. Conf. Neural Netw., Aug. 2013, pp. 1–7.
In term of the following donations: [16] R. Salakhutdinov, A. Mnih, and G. Hinton, “Restricted Boltzmann ma-
chines for collaborative filtering,” in Proc. 24th Int. Conf. Mach. Learning,
2007, pp. 791–798.
PL (hi |x) = σ(cLi + WiL x)) [17] G. W. Taylor, G. E. Hinton, and S. Roweis, “Modeling human motion using
binary latent variables,” in Advances in Neural Information Processing
PR (hi |x) = σ(cR R
i + Wi x)) (40) Systems. Cambridge, MA, USA: MIT Press, 2007, pp. 1345–1352.
CHEN et al.: FUZZY RESTRICTED BOLTZMANN MACHINE FOR THE ENHANCEMENT OF DEEP LEARNING 2173

[18] I. Sutskever and G. E. Hinton, “Learning multilevel distributed representa- Chun-Yang Zhang received the B.S. degree in math-
tions for high-dimensional sequences,” in Proc. Int. Conf. Artificial Intell. ematics from Beijing Normal University, Zhuhai,
Statist., 2007, pp. 548–555. China, in 2010, and the M.S. and Ph.D. degrees
[19] H.-F. Wang and R.-C. Tsaur, “Insight of a fuzzy regression model,” Fuzzy from the Faculty of Science and Technology, Uni-
Sets Syst., vol. 112, no. 3, pp. 355–369, 2000. versity of Macau, Macau, China, in 2012 and 2015,
[20] P.-Y. Hao and J.-H. Chiang, “Fuzzy regression analysis by support vector respectively.
learning approach,” IEEE Trans. Fuzzy Syst., vol. 16, no. 2, pp. 428–441, He is currently with the College of Mathematics
Apr. 2008. and Computer Science, Fuzhou University, Fuzhou,
[21] A. Fischer and C. Igel, “Training restricted Boltzmann machines: An China. His research interests include computational
introduction,” Pattern Recog., vol. 47, no. 1, pp. 25–39, 2014. intelligence, machine learning and Big Data analysis.
[22] C.-Y. Zhang and C. Chen, “An automatic setting for training restricted
Boltzmann machine,” in Proc. IEEE Int. Conf. Syst., Man Cybern., 2014,
pp. 4037–4041.
[23] M. Aoyagi, “Learning coefficient in bayesian estimation of restricted
Boltzmann machine,” J. Algebraic Statist., vol. 4, no. 1, pp. 31–58, 2013.
[24] M. Aoyagi and K. Nagata, “Learning coefficient of generalization error in
Bayesian estimation and Vandermonde matrix-type singularity,” Neural
Comput., vol. 24, no. 6, pp. 1569–1610, 2012.
[25] G. J. Klir and B. Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applica-
tions. Englewood Cliffs, NJ, USA: Prentice-Hall, 1995. Long Chen (M’11) received the B.S. degree in in-
[26] S. Dutta and M. Chakraborty, “Fuzzy relation and fuzzy function over formation sciences from Peking University, Beijing,
fuzzy sets: A retrospective,” Soft Comput., vol. 19, no. 1, pp. 99–112, China, in 2000, the M.S.E. degree from the Insti-
2014. tute of Automation, Chinese Academy of Sciences,
[27] J. J. Buckley, Fuzzy Probabilities: New Approach and Applications. New Beijing, in 2003, the M.S. degree in computer engi-
York, NY, USA: Springer, 2009. neering from the University of Alberta, Edmonton,
[28] W. Pedrycz, A. Skowron, and V. Kreinovich, Handbook of Granular Com- AB, Canada, in 2005, and the Ph.D. degree in electri-
puting. New York, NY, USA: Wiley, 2008. cal engineering from the University of Texas at San
[29] W. A. Lodwick and J. Kacprzyk, Fuzzy Optimization: Recent Advances Antonio, San Antonio, TX, USA, in 2010.
and Applications. New York, NY, USA: Springer, 2010. From 2010 to 2011, he was a Postdoctoral Fellow
[30] N. N. Karnik and J. M. Mendel, “Centroid of a type-2 fuzzy set,” Inform. with the University of Texas at San Antonio. He is
Sci., vol. 132, no. 14, pp. 195–220, 2001. currently an Assistant Professor with the Department of Computer and Infor-
[31] O. Castillo and P. Melin, Type-2 Fuzzy Logic: Theory and Applications. mation Science, University of Macau, Macau, China. His current research inter-
New York, NY, USA: Springer, 2008. ests include computational intelligence, Bayesian methods, and other machine
[32] G. Hinton, “Training products of experts by minimizing contrastive diver- learning techniques and their applications. He has been working in publica-
gence,” Neural Comput., vol. 14, no. 8, pp. 1771–1800, 2002. tion matters for many IEEE conferences and was the Publications Cochair of
[33] M. A. Carreira-Perpinan and G. E. Hinton, “On contrastive divergence the IEEE International Conference on Systems, Man and Cybernetics in 2009,
learning,” in Proc. 10th Int. workshop artificial intelligence and statistics, 2012, and 2014.
NP: Society for Artificial Intelligence and Statistics, 2005, pp. 33–40.
[34] M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, “An intro-
duction to variational methods for graphical models,” Mach. Learning,
vol. 37, no. 2, pp. 183–233, 1999.
[35] C. M. Bishop, Pattern Recognition and Machine Learning. New York,
NY, USA: Springer-Verlag, 2006.
[36] G. E. Hinton, “Learning multiple layers of representation,” Trends Cogni-
tive Sci., vol. 11, no. 10, pp. 428–434, 2007.
[37] C. P. Chen and C.-Y. Zhang, “Data-intensive applications, challenges,
techniques and technologies: A survey on big data,” Inform. Sci., vol. 275,
Min Gan received the B.S. degree in computer sci-
pp. 314–347, 2014.
ence and engineering from the Hubei University of
Technology, Wuhan, China, in 2004, and the Ph.D.
degree in control science and engineering from Cen-
C. L. Philip Chen (S’88–M’88–SM’94–F’07) re- tral South University, Changsha, China, in 2010.
ceived the M.S. degree in electrical engineering from He is currently an Associate Professor with the
the University of Michigan, Ann Arbor, MI, USA, in School of Electrical Engineering and Automation,
1985, and the Ph.D. degree in electrical engineering Hefei University of Technology, Hefei, China. His
from Purdue University, West Lafayette, IN, USA, in current research interests include neural networks,
1988. system identification, and nonlinear time series
After having worked in the U.S. for 23 years as analysis.
a tenured Professor, as the Department Head and an
Associate Dean in two different universities, he is
currently the Dean of the Faculty of Science and
Technology, University of Macau, Macau, China, and
the Chair Professor of the Department of Computer and Information Science.
From 2012-2013, he was the IEEE SMC Society President, currently, he is the
Editor-in-Chief of IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNET-
ICS: SYSTEMS and an Associate Editor of several IEEE Transactions. He is also
the Chair of TC 9.1 Economic and Business Systems of IFAC. His research
areas are systems, cybernetics, and computational intelligence. In addition, he
is a Program Evaluator for Accreditation Board of Engineering and Technology
Education in computer engineering, electrical engineering, and software engi-
neering programs.
Dr. Chen is a Fellow of the AAAS.

You might also like