0% found this document useful (0 votes)
34 views

Review of Some Machine Learning Techniques For

This paper reviews some machine learning techniques for big data highlighting their uses/applications, strengths and weaknesses in learning data patterns. The techniques reviewed include Bayesian network, association rules, naïve bayes, decision trees, nearest neighbor and super vector machines (SVM).
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Review of Some Machine Learning Techniques For

This paper reviews some machine learning techniques for big data highlighting their uses/applications, strengths and weaknesses in learning data patterns. The techniques reviewed include Bayesian network, association rules, naïve bayes, decision trees, nearest neighbor and super vector machines (SVM).
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Journal of Trend in Research and Development, Volume 6(5), ISSN: 2394-9333

www.ijtrd.com

Review of Some Machine Learning Techniques for


Big Data, Their Uses, Strengths and Weaknesses
1
Asogwa D.C, 2Anigbogu S.O, 3Onyesolu M.O and 4Chukwuneke C.I.,
1,2,3,4
Department of Computer Science, Faculty of Physical Sciences, NnamdiAzikiwe University, Awka, Nigeria

Abstract: Machine learning is a field of computer science Variety among different data representations in a given
which gives computers an ability to learn without being repository poses unique challenges with Big Data, which
explicitly programmed. Machine learning is used in a variety requires Big Data preprocessing of unstructured data in order
of computational tasks where designing and programming to extract structured/ordered representations of the data for
explicit algorithms with good performance is not easy.Big data human and/or downstream consumption. In today’s data-
are now rapidly expanding in all science and engineering intensive technology era, data Velocity – the increasing rate at
domains. The potentials of these increased volumes of data are which data is collected and obtained – is just as important as
obviously very significant to every aspect of our lives. To aid the Volume and Variety characteristics of Big Data. While the
us in decision making and future predictions requires new possibility of data loss exists with streaming data if it is
ways of thinking and new learning techniques to address the generally not immediately processed and analyzed, there is the
various challenges.Traditional analytical approaches are option to save fast-moving data into bulk storage for batch
insufficient to analyze big data because they are highly processing at a later time. However, the practical importance
scalable and unstructured data captured in real time. Machine of dealing with Velocity associated with Big Data is the
learning (ML), addresses this challenge, by enabling a system quickness of the feedback loop, that is, process of translating
to automatically learn patterns from data that can be leveraged data input into useable information. This is especially
in future predictions.This paper reviews some machine important in the case of time-sensitive information processing.
learning techniques for big data highlighting their Some companies such as Twitter, Yahoo, and IBM have
uses/applications, strengths and weaknesses in learning data developed algorithms that address the analysis of streaming
patterns. The techniques reviewed include Bayesian network, data (Wu et al, 2014). Veracity in Big Data deals with the
association rules, naïve bayes, decision trees, nearest neighbor trustworthiness or usefulness of results obtained from data
and super vector machines (SVM). analysis, and brings to light the old adage ―Garbage-In-
Garbage-Out‖ for decision making based on Big Data
Keywords: Machine Learning, Supervised Learning,
Analytics. As the number of data sources and types increases,
Unsupervised Learning, Classification, Big Data
sustaining trust in Big Data Analytics presents a practical
I. INTRODUCTION challenge.
Machine learning is a branch of artificial intelligence that Algorithm models for dealing with big data take different
allows computer systems to learn directly from examples, data, shapes, depending on their purpose. Using different algorithms
and experience. Through enabling computers to perform to provide comparisons can offer some surprising results about
specific tasks intelligently, machine learning systems can carry the data being used. They can come as a collection of
out complex processes by learning from data, rather than scenarios, an advanced mathematical analysis, or even a
following pre-programmed rules.Big Data generally refers to decision tree. Some models function best only for certain data
data that exceeds the typical storage, processing, and and analyses. For example, classification algorithms with
computing capacity of conventional databases and data decision rules can be used to screen out problems, such as a
analysis techniques. As a resource, Big Data requires tools and loan applicant with a high probability of defaulting.
methods that can be applied to analyze and extract patterns
Unsupervised clustering algorithms can be used to find
from large-scale data. The rise of Big Data has been caused by
relationships within an organization’s dataset. These
increased data storage capabilities, increased computational
algorithms can be used to find different kinds of groupings
processing power, and availability of increased volumes of
within a customer base, or to decide what customers and
data, which give organization more data than they have
services can be grouped together. An unsupervised clustering
computing resources and technologies to process. In addition
approach can offer some distinct advantages, as compared to
to the obvious great volumes of data, Big Data is also
the supervised learning approaches. One example is the way
associated with other specific complexities, often referred to as
novel applications can be discovered by studying how the
the four Vs: Volume, Variety, Velocity, and Veracity
connections are grouped when a new cluster is formed.
(Grolinger, et al 2014). The unmanageable large Volume of
data poses an immediate challenge to conventional computing They are different existing models for machine learning on big
environments and requires scalable storage and a distributed data and they include: Decision Tree based model, linear
strategy to data querying and analysis. However, this large regression based model, Neural Network, Bayesian Network,
Volume of data is also a major positive feature of Big Data. Nearest Neighbor and many others.
Many companies, such as Facebook, Yahoo, Google, already
Brief review of some machine learning techniques
have large amounts of data and have recently begun tapping
into its benefits (Almeida & Bernardino, 2015). A general Generally, the field of machine learning is divided into three
theme in Big Data systems is that the raw data is increasingly subdomains: supervised learning, unsupervised learning, and
diverse and complex, consisting of largely un- reinforcement learning (Adams et al, 2008). Briefly,
categorized/unsupervised data along with perhaps a small supervised learning requires training with labeled data which
quantity of categorized/ supervised data. Working with the has inputs and desired outputs. In contrast with the supervised

IJTRD | Sep – Oct 2019


Available [email protected] 215
International Journal of Trend in Research and Development, Volume 6(5), ISSN: 2394-9333
www.ijtrd.com
learning, unsupervised learning does not require labeled environment. Based on these three essential learning
training data and the environment only provides inputs without paradigms, a lot of theory mechanisms and application services
desired targets. Reinforcement learning enables learning from have been proposed for dealing with data tasks.
feedback received through interactions with an external

Fig 1: machine learning algorithms (Diksha Sharma &Neeraj Kumar, 2017)


Bayesian Network no target variable; therefore, the only network structure is
directed acyclic graph (DAG). In supervised learning, we need
A Bayesian network is a graphical model that consists of two
only the variables that are around the target, that is, the
parts, G, P, where G is a directed acyclic graph (DAG) whose
parents, the children, and the other parents of the children
nodes represent random variables and arcs between nodes
(spouses). Besides the naive Bayes, there are other types of
represent conditional dependency of the random variables and
network structures:
P is a set of conditional probability distributions, one for each
node conditional on its parents. The conditional probability i Tree augmented naive Bayes (TAN)
distributions for each node may be prior or learned. When ii Bayesian network augmented Naive Bayes (BAN)
building Bayesian networks from prior knowledge alone, the iii Parent child Bayesian network (PC) and
probabilities will be prior (Bayesian). When learning the iv Markov blanket Bayesian network (MB).
networks from data, the probabilities will be posterior
These network structures differ in which links are allowed
(learned).
between nodes. They are classified based on the three types of
Bayesian networks do not necessarily imply that they rely on links (from target to input, from input to target, and between
Bayesian statistics. Rather, they are called because they use the input nodes) and whether to allow spouses of the target.
Bayes’ rule for probabilistic inference. It is possible to use
Strengths of Bayesian network:
Bayesian statistics to learn a Bayesian network, but there are
also many other techniques that are more closely related to It is straight forward to create a network, create the nodes and
traditional statistical methods. For example, it is common to connect them and then assign probabilities and conditional
use frequents methods to estimate the parameters of the probabilities. When existing observations are applied, the
conditional probability distributions. Based on the topology of overall results change due to the nature of the graph.
the structure, there are different types of networks.
Weaknesses:
Bayesian networks can be used for both supervised learning
and unsupervised learning. In unsupervised learning, there is Coding can be very difficult and require some proper planning
on paper.

IJTRD | Sep – Oct 2019


Available [email protected] 216
International Journal of Trend in Research and Development, Volume 6(5), ISSN: 2394-9333
www.ijtrd.com
To make the final prediction output more accurate, a domain for classification in machine learning. For perceptron,
expert is needed to help with the initial values of the solutions are highly dependent on the initialization and
probabilities. termination criteria.For a specific kernel that transforms the
data from the input space to the feature space, training returns
Naives Bayes
uniquely defined SVM model parameters for a given training
Naïve Bayes gives a simple approach, with clear semantics, to set, whereas the perceptron and GA classifier models are
representing, using, and learning probabilistic knowledge. different each time training is initialized. The aim of GAs and
Impressive results can be achieved using it. It has often been perceptron is only to minimize error during training, which
shown that Naïve Bayes rivals and indeed outperforms more will translate into several hyperplanes’ meeting this
sophisticated classifiers on many datasets. The moral is, requirement. If many hyperplanes can be learned during the
always try the simple things first. Repeatedly in machine training phase, only the optimal one is retained, because
learning people have eventually, after an extended struggle, training is practically performed on samples of the population
obtained good results using sophisticated learning methods even though the test data may not exhibit the same distribution
only to discover years later that simple methods such as 1R as the training set. When trained with data that are not
and Naïve Bayes do just as well—or even better. representative of the overall data population, hyperplanes are
prone to poor generalization.
There are many datasets for which Naïve Bayes does not do so
well, however, and it is easy to see why. Because attributes are Strengths:
treated as though they were completely independent, the
 SVM’s are very good when one has no idea on the
addition of redundant ones skews the learning process.
data.
Dependenciesbetween attributes inevitably reduce the power
of Naïve Bayes to discern what is going on. They can,  Works well with even unstructured and semi
however, be ameliorated by using a subset of the attributes in structured data like text, Images and trees.
the decision procedure, making a careful selection of which  The kernel trick is real strength of SVM. With an
ones to use. appropriate kernel function, one can solve any
complex problem.
Strengths:  Unlike in neural networks, SVM is not solved for
Naïve Bayes gives a simple approach with clear semantics, to local optima.
representing, using and learning probabilistic knowledge. It is  It scales relatively well to high dimensional data.
often shown that it outperforms more sophisticated classifiers  SVM models have generalization in practice; the risk
on many datasets. Impressive results can be achieved using it. of overfitting is less in SVM.

Weaknesses: Weaknesses
The addition of redundant attributes skews the learning process  Choosing a ―good‖ kernel function is not easy.
because they are treated as though they were completely  Long training time for large datasets.
independent.  Difficult to understand and interpret the final model,
variable weights and individual impact.
Super Vector Machines (SVM)
 Since the final model is not so easy to see, one cannot
SVM offers a principled approach to machine learning do small calibrations to the model hence it is tough to
problems because of its mathematical foundation in statistical incorporate business logic.
learning theory. SVM constructs its solution in terms of a
Association rules:
subset of the training input. SVM has been extensively used
for classification, regression, novelty detection tasks, and Association rules are often sought for very large datasets, and
feature reduction. efficient algorithms are highly valued. In practice, the amount
of computation needed to generate association rulesdepends
In classification tasks a discriminant machine learning
critically on the minimum coverage specified. The accuracy
technique aims at finding, based on an independent and
has lessinfluence because it does not affect the number of
identically distributed (iid) training dataset, a discriminant
passes that one must makethrough the dataset. In many
function that can correctly predict labels for newly acquired
situations one will want to obtain a certain numberof rules—
instances. Unlike generative machine learning approaches,
say 50—with the greatest possible coverage at a prespecified
which require computations of conditional probability
minimum accuracy level. One way to do this is to begin by
distributions, a discriminant classification function takes a data
specifying the coverageto be rather high and to then
point x and assigns it to one of the different classes that are a
successively reduce it, re-executing the entirerule-finding
part of the classification task. Less powerful than generative
algorithm for each coverage value and repeating this until
approaches, which are mostly used when prediction involves
thedesired number of rules has been generated.
outlier detection, discriminant approaches require fewer
computational resources and less training data, especially for a Association rules are often used when attributes are binary—
multidimensional feature space and when only posterior eitherpresent or absent—and most of the attribute values
probabilities are needed. From a geometric perspective, associated with a giveninstance are absent.
learning a classifier is equivalent to finding the equation for a
multidimensional surface that best separates the different Uses/applications:
classes in the feature space.  Market Basket Analysis: given a database of customer
SVM is a discriminant technique, and, because it solves the transactions, where each transaction is a set of items
convex optimization problem analytically, it always returns the the goal is to find groups of items which are
same optimal hyperplane parameter—in contrast to genetic frequently purchased together.
algorithms (GAs) or perceptron, both of which are widely used

IJTRD | Sep – Oct 2019


Available [email protected] 217
International Journal of Trend in Research and Development, Volume 6(5), ISSN: 2394-9333
www.ijtrd.com
 Telecommunication (each customer is a transaction  Error rate at most twice that of Bayes error rate.
containing the set of phone calls) (Michiro et al, 2005)
 Credit Cards/ Banking Services (each card/account is
Weaknesses:
a transaction containing the set of customer’s
payments)  Classifying unknown records are relatively expensive
 Medical Treatments (each patient is represented as a  Requires distance computation of k-nearest neighbors
transaction containing the ordered set of diseases)  Computationally intensive, especially when the size
 Basketball-Game Analysis (each game is represented of the training set grows
as a transaction containing the ordered set of ball  Accuracy can be severely degraded by the presence of
passes) noisy or irrelevant features
Strengths: Decision trees
 Uses large itemset property. Decision tree learning is a method for approximating discrete-
 Easily parallelized valued target functions, in which the learned function is
 Easy to implement. represented by a decision tree. Learned trees can also be re-
represented as sets of if-then rules to improve human
Weaknesses: readability. These learning methods are among the most
 Assumes transaction database is memory resident. popular of inductive inference algorithms and have been
 Requires many database scans. successfully applied to a broad range of tasks. In machine
learning, decision trees partition the data set in appropriate
Nearest neighbor: values until a tree structure has emerged. This process is called
Nearest-neighbor instance-based learning is simple and often recursive partitioning (Strobl, 2009).
works very well.k-nearest-neighbor strategy is adopted, where Decision tree algorithm tries to find the best way to partition
some fixed,small, number k of nearest neighbors—say five— the data so that parts are as homogeneous as possible. If a fully
are located and used togetherto determine the class of the test homogeneous part is impossible, more common value is
instance through a simple majority vote. Another way of chosen.
proofing the database against noise is to choosethe exemplars
that are added to it selectively and judiciously. The aim with any decision tree is to create a workable model
that will predictthe value of a target variable based on the set
The nearest-neighbor method originated many decades ago, of input variables.
and statisticiansanalyzed k-nearest-neighbor schemes in the
early 1950s. If the number of traininginstances is large, it Uses for Decision Trees
makes intuitive sense to use more than one nearestneighbor, This includes how one selects different options within an
but clearly this is dangerous if there are few instances. It can automated telephonecall. The options are essentially decisions
be shownthat when k and the number n of instances both that are being made for one to getto the desired department.
become infinite in such a waythat k/n A0, the probability of These decision trees are used effectively in manyindustry
error approaches the theoretical minimum forthe dataset. The areas.
nearest-neighbor method was adopted as a
classificationmethod in the early 1960s and has been widely Financial institutions use decision trees. One of the
used in the field of pattern recognitionfor more than three fundamental use cases isin option pricing, where a binary-like
decades. decision tree is used to predict the priceof an option in either a
bull or bear market.Marketers use decision trees to establish
Nearest-neighbor classification was notoriously slow until kD- customers by type and predictwhether a customer will buy a
trees began tobe applied in the early 1990s, although the data specific type of product.In the medical field, decision tree
structure itself was developedmuch earlier. In practice, these models have been designed to diagnoseblood infections or
trees become inefficient when the dimension ofthe space even predict heart attack outcomes in chest pain
increases and are only worthwhile when the number of patients.Variables in the decision tree include diagnosis,
attributes issmall—up to 10. Ball trees were developed much treatment, and patient data.The gaming industry now uses
more recently and are aninstance of a more general structure multiple decision trees in movement recognitionand facial
sometimes called a metric tree. Sophisticatedalgorithms can recognition. The Microsoft Kinect platform uses this method
create metric trees that deal successfully with thousandsof totrack body movement. The Kinect team used one million
dimensions.Instead of storing all training instances, one can images and trainedthree trees. Within one day, and using a
compress them into regions.Numeric attributes can be 1,000-core cluster, the decision trees were classifying specific
discretized intointervals, and ―intervals‖ consisting of a single body parts across the screen.
point can be used for nominalones. Then, given a test instance,
one can determine which intervals it residesin and classify it Strengths of Decision Trees
by voting, a method called voting feature intervals. They are easyto read. After a model is generated, it is easy to
Thesemethods are very approximate, but very fast, and can be report back to others regardinghow the tree works. Also, with
useful for initial analysisof large datasets. decision trees one can handle numericalor categorized
Strengths: information.

 Simple technique that is easily implemented In terms of data preparation, there is not much to do. As long
 Building model is cheap as the data is formalizedin something like comma separated
 Extremely flexible classification scheme variables, then one can create a workingmodel. This also
makes it easy to validate the model using various tests.
 Well suited forMulti-modal classes, records with
Withdecision trees one use white-box testing—meaning the
multiple class labels

IJTRD | Sep – Oct 2019


Available [email protected] 218
International Journal of Trend in Research and Development, Volume 6(5), ISSN: 2394-9333
www.ijtrd.com
internal workings can be observed but not changed; one can Psychol Methods. 2009 Dec; 14(4): 323–348. doi:
view the steps that are being used whenthe tree is being 10.1037/a0016973
modeled.Decision trees perform well with reasonable amounts [3] Diksha Sharma &Neeraj Kumar, 2017, A Review on
of computing power. The decision tree learning can handle a Machine Learning Algorithms, Tasks and
large set of data well. Applications‖‖ International Journal of Advanced
Research in Computer Engineering &
Limitations of Decision Trees
Technology (IJARCET) Volume 6, Issue 10, October
One of the main issues of decision trees is that they can 2017, ISSN: 2278 – 1323
createoverly complex models, depending on the data presented [4] Grolinger, M. Hayes, W. Higashino, A. L'Heureux,&
in the training set. D. S. Allison, 2014.‖Challenges for MapReduce in
Big Data‖, Proc. of the IEEE 10th 2014 World
To avoid the machine learning algorithm’s over-fitting the
Congress on Services.
data, it is sometimesworth reviewing the training data and
https://www.researchgate.net/.../263694670_Challenge
pruning the values to categories, whichwill produce a more
s_for_MapReduce_in_Big_Data
refined and better-tuned model.Some of the decision tree [5] Kumar,N. and Gupta, S., 2016. Offline Handwritten
concepts can be hard to learn because the model cannotexpress Gurmukhi Character Recognition: A Review.
them easily. This shortcoming sometimes results in a larger-
International Journal of Software Engineering and Its
than-normal model. One might be required to change the
Applications, 10(5), pp.77-86.
model or look at different methodsof machine learning.
[6] MichihiroKuramochi and George Karypis, 2005,
CONCLUSION ―Gene Classification using Expression Profiles:
A Feasibility Study‖, International Journal on Artificial
Machine learning algorithms are widely used in variety of Intelligence Tools. Vol. 14, No. 4, pp. 641-660.
applications like digital image processing(image [7] Muhammad, I. and Yan, Z., 2015. Supervised Machine
recognition),(Kumar & Gupta, 2016), big data analysis, Learning Approaches: A Survey. ICTACT Journal
(Sharma et al, 2014), Speech Recognition, Medical Diagnosis, on Soft Computing, 5(3).
Statistical Arbitrage, Learning Associations, Classification, [8] Pedro Daniel Coimbra de Almeida& Jorge Bernardino,
Prediction etc. ‎2015 ―Big data open source platforms‖ Published in:
Each technique has different application areas and is useful in Big Data (BigData Congress), 2015 IEEE International
different domains based on its advantages. Thus, keeping in Congress
mind the limitations of each of the techniques and also the https://ieeexplore.ieee.org/document/7207229/
prime focus being the improvement in performance and [9] Sharma, D., Pabby, G. and Kumar, N.,2017,
efficiency one should use that technique, which best suits a ―Challenges Involved in Big Data Processing &
particular application.The article illustrates the concept of Methods to Solve Big Data Processing
machine learning with its uses/applications, strengths and Problems‖.IJRASET,5(8),pp.841-844.
weaknesses.It also highlights the various types of learning such [10] Singh, S., Kumar, N. and Kaur, N., 2014. Design and
as supervised learning, unsupervised learning and development Of Rfid Based Intelligent Security
reinforcement learning. System. International Journal of Advanced Research in
Computer Engineering & Technology (IJARCET)
References Volume, 3.
[1] B Adam, IFC Smith, &F Asce, 2008, ― Reinforcement [11] Talwar, A. and Kumar, Y., 2013. Machine Learning:
learning for structural control‖. J Comput Civil Eng An artificial intelligence methodology. International
22(2), 133–139. Journal of Engineering and Computer Science, 2,
[2] CarolinStrobl, James Malley, and Gerhard Tutz, 2009, pp.3400-3404.
―An Introduction to Recursive Partitioning‖ NCBI, [12] Wu, X., Zhu, X., Wu, G., & Ding, W. (2014). ―Data
Mining with Big Data‖. TKDE, 26 (1), 97- 107.

IJTRD | Sep – Oct 2019


Available [email protected] 219

You might also like