0% found this document useful (0 votes)
41 views

Unsupervised Learning Using Back Propagation in Neural Networks

This document proposes an unsupervised learning algorithm using a modified backpropagation method in neural networks. The algorithm performs clustering on a set of unlabeled input vectors by treating it as a supervised learning problem. Specifically, it maps the input vectors to discrete output labels and learns the mapping using backpropagation to minimize the error between the actual and predicted labels. This allows leveraging the backpropagation learning rule for unsupervised clustering without external labels.

Uploaded by

Prince Gurajena
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Unsupervised Learning Using Back Propagation in Neural Networks

This document proposes an unsupervised learning algorithm using a modified backpropagation method in neural networks. The algorithm performs clustering on a set of unlabeled input vectors by treating it as a supervised learning problem. Specifically, it maps the input vectors to discrete output labels and learns the mapping using backpropagation to minimize the error between the actual and predicted labels. This allows leveraging the backpropagation learning rule for unsupervised clustering without external labels.

Uploaded by

Prince Gurajena
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/283615095

Unsupervised Learning using Back Propagation in Neural Networks

Research · November 2015


DOI: 10.13140/RG.2.2.23211.75041

CITATIONS READS
0 164

1 author:

Manish Bhatt
University of New Orleans
11 PUBLICATIONS   146 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Piezophototronics View project

All content following this page was uploaded by Manish Bhatt on 31 January 2018.

The user has requested enhancement of the downloaded file.


1
analysis, little or no underlying pattern in data is known, and

Unsupervised Learning no output labels are available. The goal of exploratory analysis
is to group a set of unlabeled input vectors into a certain
hidden “natural” groups based on the underlying data
using Back-Propagation structure. It can be somewhat different from unsupervised
predictive algorithms like Vector Quantization, Probability
in Neural Networks Density function approximation etc. in the sense that it might
not entail the accurate characterization of unobserved samples
Manish Bhatt, ​Member, IEEE based on a certain learned or known probability distribution.
However in practice, predictive vector quantizers are used for
​ Abstract—Data analysis is a primal skill of human beings. non-predictive clustering analysis also[8][9][10].
Cluster analysis, primitive exploration of data based on little or It is important to note that clustering analysis is a subjective
no prior knowledge of the structure underlying it, consists of
research developed across various disciplines. Artificial Neural process in Nature[14], which means that there can be no
Networks on the other hand are abstractions of learning systems absolute judgement as to the efficiency of the process of
based on the human brain. The brain basically learns from clustering. Also, the term difference between the clusters is
experience, associations and disassociations, and from error vague, because of which experts define clustering as grouping
improvement, and clusters some new object based on comparing
of instance together so that the instances in one cluster are
known features from known objects. In this paper, we propose a
modified back-propagation algorithm for neural network that more similar to one another than instances belonging to
performs supervised learning using a set of feature vectors. different clusters. Clustering algorithms partition data into a
certain number of clusters based on certain set of feature
​ Index Terms—neural networks, unsupervised learning, back vectors. The fact that there is no universally agreed upon
propagation, and machine learning. definition makes the process of clustering so interesting. Most
researchers define clusters based on internal similarity, and
I. I​NTRODUCTION
external dissimilarity, i.e instances in the same cluster must be

D​ ATA mining and data analysis is fundamental aspect of


similar to one another whereas the instance in different
clusters should not. The definition of similarity and
dissimilarity should be distinct. Given a set of
many engineering applications. In the world today, people
encounters a large amount of information which they store and
​ n instances, where ,​ a
analyze as Data, and one fundamental way of analyzing data is
to classify or cluster them into groups based on certain hard partitioning algorithm tries to find ​K partitions of ,
common features. Human beings are nothing but much such that no cluster is empty, there are no instances of samples
evolved pattern recognition systems[1]; when they discover that belong to two or more clusters at the same time, and the
something new, they always seek to find features that they
have already found in pre-known things to relate to. Basically, union of all these clusters gives us back the set . In some
classification systems are either supervised or unsupervised, cases, the it is allowed that a certain instance belongs to two or
depending on whether they assign new inputs to one of a finite
number of discrete supervised classes or unsupervised more clusters with a degree of membership for
categories, respectively [2], [3], [4]. In supervised the i,j th​​ cluster, as introduced in fuzzy set theory.
Artificial Neural Networks are abstractions of the
classification, a set of input vectors( )​ is mapped onto
interconnection of the synapses in the human brain. These
a​ set of labels ( systems, so to speak, approximate functions that maps a set of
​), where ​d is the input dimension and
e is the output dimension by learning a mapping function of inputs to a set of outputs. In order words, an ANN is a black
box, which learns how the inputs and the outputs are related
some form ​, where w is the set of weight by some learning rule. As mentioned above, these systems
vectors, or adjustable parameters. The adjustment, or
optimization of these parameters can be performed using some learn the mapping function where is the set
version of learning rule, which is also termed as an inducer,
whose sole purpose is to minimize risk function based on a of input instance and is the set of adjustable parameters,
finite set of input-output examples. When an inducer which are optimized based on some learning rule. For
converges, an induced classifier is created[5]. unsupervised clustering applications, various competition
In unsupervised classification, also called exploratory data based learning approaches can be used like a Self-organizing
1
map, Boltzmann’s learning rule etc., whereas for supervised
Manish Bhatt is currently an undergraduate student at the University of
New Orleans with a double major in Electrical Engineering and Computer learning, the most popular learning algorithm, called
Science(e-mail: [email protected]) Backpropagation of errors is used. In references[11][12][13],
introductions about these algorithms can be found. In this
paper, we present an alternative algorithm for clustering using
neural networks and tweaked Backpropagation algorithm used
in traditional supervised learning. Previous work has shown
that some researchers have attempted to get good results, but
as far as our knowledge goes, this paper is the first attempt at
approaching the problem in the way we did.

II. ALGORITHM DESCRIPTION

Let ​ be an instance of the input


feature vector available to us for clustering. Let

be the ​d- input instances available to


Fig 1: The system architecture for unsupervised learning. If the data has to
us, where .​ Let say that we know that these data be clustered in p-groups, then p-networks have to be initialized in this case.
instances have to be divided into ​p clusters. Here is what Each input vector is presented to each of the networks, and the network’s
cluster_name parameter with the lowest error becomes the cluster identifier of
needs to be done during the learning phase: the input feature vector.

● Divide all ​ of dimension ​ to two different It can be seen that the technique mentioned has certain
similarities with the traditional k-means algorithm, but the
vectors ​ and ​ of dimensions ​a and ​b such important distinction to note is that k-means looks for an
that ​a+b = n. exemplar point whereas our technique looks for exemplar
● Initialize ​p different neural networks with all zero hypersurfaces. It is to be noted that the iterative process does
initial weights. All neural networks need to start from not guarantee the convergence to a global minimum.
the same state, and in our case, the states were chosen Stochastic processes like simulated annealing and genetic
to be weights that were all zeros. algorithm, can find the global minimum with expensive
● Initialize other variables to save the current state of computations.
these networks.

● Present either of and


​ ​ as input vector and
the other as label respectively to all of the initialized
networks for learning.
● Pick the network with the lowest mean square error.
Restore other networks to the previous state. If the
end of epoch’s hasn’t been reached, then go to step 3.

Now, after the networks have been trained, they are


presented with randomly selected samples of the input vectors
and the labels. Here is what needs to be done during the
testing phase:
● Initialize other networks to store the state of the Fig 2: This represents the architecture of a single neural network. The input
networks to be used in clustering. layer consists of a input neurons. Each hidden layer has a+b = n hidden layers,
and b represents the number of output neurons. The non-linear transformation
Present either of and function used here is tanh(x).
● ​ ​ which are selected
randomly as input vector and the other as label
To perform empirical validation of our algorithm, we
respectively to all of the said networks.
perform two experiments with a 60D dataset of feature vectors
● Propagate backwards the error, and record what the
regarding SONAR signals from [15] using the MLP using our
Mean Square error is for all networks.
algorithm, and the second using Weka version 3.7[16] for
● Pick the cluster_name attribute of the network that
K-Means and SOM. The experiments in this paper were
returns with the smallest error as the cluster of the
performed on a Quad-core i7 Dell Precision laptop with 32
said sample instance.
GB of RAM. The Multilayer Perceptron with backpropagation
● Restore the networks back to the initial state, and go
and momentum class was designed and implemented in
to the second step.
python, it was used to conduct the experiments. The results are
presented in the next section.
View publication stats

dimensionality than the feature space. More geometrically


III. RESULTS meaningful distances of points to the manifolds will be
The dataset consisted of 60 feature vectors with 207 investigated in the future. Additionally, inclusion of a
instances, which was to be classified into two classes “R” and regularizing term in the objective function might help in
“M”. As per the algorithm, two networks were initialized with gradually reducing the complexity of the NNs during training.
35 input vectors and 25 label vectors and two hidden layers For instance, in the current implementation, NNs that claim
with 60 neurons each. The choice of two hidden layers over only a few patterns may employ a large number of necessary
one was taken because of the overall decrease in the Error weights and thus, be prone to over-fitting.
values in comparison. Pandas was used to import the CSV file
consisting of feature vectors was used to divide it into features REFERENCES
and labels. The results are tabulated in Table I.
[1] M. Anderberg, Cluster Analysis for Applications. New York: Academic,
1973.
TABLE I [2] C. Bishop, Neural Networks for Pattern Recognition. New York: Oxford
COMPARASION BETWEEN ALGORITHMS Univ. Press, 1995​.
[3] V. Cherkassky and F. Mulier, Learning From Data: Concepts, Theory,
METHOD VALUES OF ​a AND CORRECTLY and Methods. New York: Wiley, 1998.
b CLUSTERED [4] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. New
NUMBERS York: Wiley, 2001.
[5] J. Kleinberg, “An impossibility theorem for clustering,” in Proc. 2002
BP-TWEAKED 35 AND 25 R = 104 Conf. Advances in Neural Information Processing Systems, vol. 15,
M = 105 2002, pp. 463–470.
​ [6] W.-K. Chen, ​Linear Networks and Systems. Belmont, CA: Wadsworth,
BP-TWEAKED 25 AND 35 R = 104 1993, pp. 123–135.
M = 105 [7] --- Classification, 2nd ed. London, U.K.: Chapman & Hall, 1999.
[8] P. Hansen and B.Jaumard, “Cluster analysis and mathematical
K-MEANS 60 R = 106 programming,” Math. Program., vol. 79, pp. 191–215, 1997.
M = 101 [9] A. Jain and R. Dubes, Algorithms for Clustering Data. Englewood
Cliffs, NJ: Prentice-Hall, 1988.
X-MEANS 60 R=105 [10] L. Kaufman and P. Rousseeuw, Finding Groupsin Data: An Introduction
M=102 to Cluster Analysis: Wiley, 1990.
[11] Haykin, Simon S., and Simon S. Haykin. Neural Networks and Learning
Machines. New York: Prentice Hall/Pearson, 2009.
The comparison of various clustering algorithms for the dataset used. EM [12] Nigrin, Albert. Neural Networks for Pattern Recognition. Cambridge,
stands for Expectation Maximization. MA: MIT, 1993. Print.
[13] Butler, Charles, and Maureen Caudill. Naturally Intelligent Systems.
Cambridge, MA: MIT, 1992. Print.
[14] A. Baraldi and E. Alpaydin, “Constructive feedforward ART clustering
In future works we hope to further polish this algorithm, networks—Part I and II,” IEEE Trans. Neural Netw., vol. 13, no. 3, pp.
because we believe that the backpropagation algorithm has a 645–677, May 2002.
[15] "UCI Machine Learning Repository: Connectionist Bench (Sonar, Mines
significant potential for unsupervised learning.
vs. Rocks) Data Set." UCI Machine Learning Repository: Connectionist
Bench (Sonar, Mines vs. Rocks) Data Set. N.p., n.d. Web. 07 Nov. 2015.
IV. CONCLUSION [16] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter
Reutemann, Ian H. Witten (2009); The WEKA Data Mining Software:
In this paper, we successfully implemented tweaked An Update; SIGKDD Explorations, Volume 11, Issue 1.
backpropagation learning algorithm to perform unsupervised
learning. Each ANN was associated to a single cluster, and
based on the error values, the input pattern was clustered into
two groups. It is also shown that the said method is
comparable to other methods available.
Several points in the algorithm may be improved in future
work. Currently, the exemplar curves are limited in the sense
that they have to be represented by a singular neural network.
The problem can be solved in future work by considering
multiple networks to be associated to a single cluster, i.e by
boosting, and by investigating merging of exemplar curves to
produce loop-like and other arbitrary-shaped exemplars with
no limitation in terms of the relation between the coordinates
of a data point. Another issue to be addressed in future work is
the fact that the distances from the clusters, as calculated in
this work, are not necessarily distances from the manifold.
They simply represent regression error vectors of lower

You might also like