Big Data Deep Learning: Challenges and Perspectives

Deep learning is currently an extremely active research area in machine learning and pattern recognition society. It has gained huge successes in a broad area of applications such as speech recognition, computer vision, and natural language processing. With the sheer size of data available today, big data brings big opportunities and transformative potential for various sectors; on the other hand, it also presents unprecedented challenges to harnessing data and information. As the data keeps getting bigger, deep learning is coming to play a key role in providing big data predictive analytics solutions. In this paper, we provide a brief overview of deep learning and highlight current research efforts and the challenges to big data, as well as the future trends

Uploaded by

Praneeth Bobba

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

245 views

Big Data Deep Learning: Challenges and Perspectives

Uploaded by

Praneeth Bobba

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Received April 20, 2014, accepted May 13, 2014, date of publication May 16, 2014, date of current

version May 28, 2014.

Digital Object Identifier 10.1109/ACCESS.2014.2325029

Big Data Deep Learning: Challenges and

Perspectives
XUE-WEN CHEN1 , (Senior Member, IEEE), AND XIAOTONG LIN2
1 Department of Computer Science, Wayne State University, Detroit, MI 48404, USA
2 Department of Computer Science and Engineering, Oakland University, Rochester, MI 48309, USA
Corresponding author: X.-W. Chen ([email protected])

ABSTRACT Deep learning is currently an extremely active research area in machine learning and pattern
recognition society. It has gained huge successes in a broad area of applications such as speech recognition,
computer vision, and natural language processing. With the sheer size of data available today, big data
brings big opportunities and transformative potential for various sectors; on the other hand, it also presents
unprecedented challenges to harnessing data and information. As the data keeps getting bigger, deep learning
is coming to play a key role in providing big data predictive analytics solutions. In this paper, we provide a
brief overview of deep learning, and highlight current research efforts and the challenges to big data, as well
as the future trends.

INDEX TERMS Classifier design and evaluation, feature representation, machine learning, neural nets
models, parallel processing.

I. INTRODUCTION Data in fields like search engines, medicine, and astronomy.

Deep learning and Big Data are two hottest trends in the As an extremely active subfield of machine learning, deep
rapidly growing digital world. While Big Data has been learning is considered, together with Big Data, as the big
defined in different ways, herein it is referred to the expo- deals and the bases for an American innovation and economic
nential growth and wide availability of digital data that are revolution [9].
difficult or even impossible to be managed and analyzed using In contrast to most conventional learning methods, which
conventional software tools and technologies. Digital data, in are considered using shallow-structured learning architec-
all shapes and sizes, is growing at astonishing rates. For exam- tures, deep learning refers to machine learning techniques
ple, according to the National Security Agency, the Internet that use supervised and/or unsupervised strategies to automat-
is processing 1,826 Petabytes of data per day [1]. In 2011, ically learn hierarchical representations in deep architectures
digital information has grown nine times in volume in just five for classification [10], [11]. Inspired by biological observa-
years [2] and by 2020, its amount in the world will reach 35 tions on human brain mechanisms for processing of natural
trillion gigabytes [3]. This explosion of digital data brings big signals, deep learning has attracted much attention from the
opportunities and transformative potential for various sectors academic community in recent years due to its state-of-the-art
such as enterprises, healthcare industry manufacturing, and performance in many research domains such as speech recog-
educational services [4]. It also leads to a dramatic paradigm nition [12], [13], collaborative fultering [14], and computer
shift in our scientific research towards data-driven discovery. vision [15], [16]. Deep learning has also been successfully
While Big Data offers the great potential for revolutioniz- applied in industry products that take advantage of the large
ing all aspects of our society, harvesting of valuable knowl- volume of digital data. Companies like Google, Apple, and
edge from Big Data is not an ordinary task. The large and Facebook, who collect and analyze massive amounts of data
rapidly growing body of information hidden in the unprece- on a daily basis, have been aggressively pushing forward
dented volumes of non-traditional data requires both the deep learning related projects. For example, Apples Siri, the
development of advanced technologies and interdisciplinary virtual personal assistant in iPhones, offers a wide variety of
teams working in close collaboration. Today, machine learn- services including weather reports, sport news, answers to
ing techniques, together with advances in available compu- users questions, and reminders etc. by utilizing deep learning
tational power, have come to play a vital role in Big Data and more and more data collected by Apple services [17].
analytics and knowledge discovery (see [5][8]). They are Google applies deep learning algorithms to massive chunks of
employed widely to leverage the predictive power of Big messy data obtained from the Internet for Googles translator,
2169-3536
2014 IEEE. Translations and content mining are permitted for academic research only.
Personal use is also permitted, but republication/redistribution requires IEEE permission.
514 See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. VOLUME 2, 2014
X.-W. Chen and X. Lin: Big Data Deep Learning

Androids voice recognition, Googles street view, and image RBM) [28]. Consequently, each node is independent of other
search engine [18]. Other industry giants are not far behind nodes in the same layer given all nodes in the other layer. This
either. For example, Microsofts real-time language transla- characteristic allows us to train the generative weights W of
tion in Bing voice search [19] and IBMs brain-like computer each RBMs using Gibbs sampling [29], [30].
[18], [20] use techniques like deep learning to leverage Big
Data for competitive advantage.
As the data keeps getting bigger, deep learning is coming
to play a key role in providing big data predictive analytics
solutions, particularly with the increased processing power
and the advances in graphics processors. In this paper, our
goal is not to present a comprehensive survey of all the
related work in deep learning, but mainly to discuss the most
important issues related to learning from massive amounts of
data, highlight current research efforts and the challenges to
big data, as well as the future trends. The rest of the paper is
organized as follows. Section 2 presents a brief review of two
commonly used deep learning architectures. Section 3 dis-
cusses the strategies of deep learning from massive amounts
of data. Finally, we discuss the challenges and perspectives of
FIGURE 1. Illustration of a deep belief network architecture. This
deep learning for Big Data in Section 4. particular DBN consists of three hidden layers, each with three neurons;
one input later with five neurons and one output layer also with five
neurons. Any two adjacent layers can form a RBM trained with unlabeled
II. OVERVIEW OF DEEP LEARNING (1)
data. The outputs of current RBM (e.g., hi in the first RBM marked in
Deep learning refers to a set of machine learning techniques (2)
red) are the inputs of the next RBM (e.g., hi in the second RBM marked
that learn multiple levels of representations in deep archi- in green). The weights W can then be fine-tuned with labeled data after
tectures. In this section, we will present a brief overview pre-training.
of two well-established deep architectures: deep belief net-
works (DBNs) [21][23] and convolutional neural networks Before fine-tuning, a layer-by-layer pre-training of RBMs
(CNNs) [24][26]. is performed: the outputs of a RBM are fed as inputs to the
next RBM and the process repeats until all the RBMs are pre-
A. DEEP BELIEF NETWORKS trained. This layer-by-layer unsupervised learning is critical
Conventional neural networks are prone to get trapped in in DBN training as practically it helps avoid local optima
local optima of a non-convex objective function, which often and alleviates the over-fitting problem that is observed when
leads to poor performance [27]. Furthermore, they cannot take millions of parameters are used. Furthermore, the algorithm is
advantage of unlabeled data, which are often abundant and very efficient in terms of its time complexity, which is linear
cheap to collect in Big Data. To alleviate these problems, to the number and size of RBMs [21]. Features at different
a deep belief network (DBN) uses a deep architecture that layers contain different information about data structures with
is capable of learning feature representations from both the higher-level features constructed from lower-level features.
labeled and unlabeled data presented to it [21]. It incorporates Note that the number of stacked RBMs is a parameter pre-
both unsupervised pre-training and supervised fine-tuning determined by users and pre-training requires only unlabeled
strategies to construct the models: unsupervised stages intend data (for good generalization).
to learn data distributions without using label information and For a simple RBM with Bernoulli distribution for both the
supervised stages perform local search for fine tuning. visible and hidden layers, the sampling probabilities are as
Fig. 1 shows a typical DBN architecture, which is com- follows [21]:
posed of a stack of Restricted Boltzmann Machines (RBMs) I
!
and/or one or more additional layers for discrimination tasks. X
p hj = 1 | v; W =

wij vi + aj (1)
RBMs are probabilistic generative models that learn a joint
i=1
probability distribution of observed (training) data without
using data labels [28]. They can effectively utilize large and
amounts of unlabeled data for exploiting complex data struc-
J

tures. Once the structure of a DBN is determined, the goal p (vi = 1 | h; W ) =
X
wij hj + bi (2)
for training is to learn the weights (and biases) between
j=1
layers. This is conducted firstly by an unsupervised learning
of RBMs. A typical RBM consists of two layers: nodes in one where v and h represents a I 1 visible unit vector and a
layer are fully connected to nodes in the other layer and there J 1 hidden unit vector, respectively; W is the matrix of
is no connection for nodes in the same layer (see Fig.1, for weights (wij ) connecting the visible and hidden layers; aj and
example, the input layer and the first hidden layer H1 form a bi are bias terms; and () is a sigmoid function. For the case