0% found this document useful (0 votes)
18 views

Chap5 Supervised Unsupervised

This document provides an introduction to machine learning. It defines machine learning as building patterns of behavior through learning based on available environmental information, which allows AI algorithms to adapt without being explicitly programmed. The document outlines different types of learning modes including inductive, deductive, analog, related, and behavioral learning. It also describes how machine learning marked a change in AI by focusing on algorithms that can learn models from data rather than being programmed with static solutions. Examples of machine learning applications in mobility, production, finance, and agriculture are also mentioned.

Uploaded by

jmnogales87sena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Chap5 Supervised Unsupervised

This document provides an introduction to machine learning. It defines machine learning as building patterns of behavior through learning based on available environmental information, which allows AI algorithms to adapt without being explicitly programmed. The document outlines different types of learning modes including inductive, deductive, analog, related, and behavioral learning. It also describes how machine learning marked a change in AI by focusing on algorithms that can learn models from data rather than being programmed with static solutions. Examples of machine learning applications in mobility, production, finance, and agriculture are also mentioned.

Uploaded by

jmnogales87sena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

ARTIFICIAL INTELLIGENCE

MACHINE LEARNING I
Artificial Intelligence – Machine Learning I

© Structuralia 2
Artificial Intelligence – Machine Learning I

INDEX

INDEX ............................................................................................................................................................................ 3

1. INTRODUCTION ....................................................................................................................................................... 4

2. MACHINE LEARNING BASICS ................................................................................................................................ 8

2.1 Machine Learning Process .................................................................................................................................... 8


2.2 Basic Concepts ...................................................................................................................................................... 8
2.3 Learning Model .................................................................................................................................................... 12
2.4 Machine Learning Techniques............................................................................................................................. 14

3. SUPERVISED LEARNING ...................................................................................................................................... 17

3.1 Decision Tree ...................................................................................................................................................... 17


3.2 Random Forest .................................................................................................................................................... 23
3.3 Linear Regression ............................................................................................................................................... 25
3.4 Artificial Neuron Network ..................................................................................................................................... 27
3.5 K-Nearest Neighbour ........................................................................................................................................... 35

4. CONCLUSIONS ...................................................................................................................................................... 38

5. REFERENCES ........................................................................................................................................................ 40

3 © Structuralia
Artificial Intelligence – Machine Learning I

1. INTRODUCTION
The ability to learn is one of the most important mental functions of living beings as it allows them
to adapt to the needs of the environment in order to increase their chances of survival. Through
the learning process, living beings can modify, adapt and/or acquire abilities, skills, knowledge
and/or behaviours by interacting with the environment around them. This process can be even
more complex from a human perspective where the learning process is not limited only to the
environment and its interactions with it, but they also influence certain social factors related to
values and moral principles that significantly influence the different skills, skills, knowledge and
behaviours that we are able to learn. This type of differences (bias) can be significantly
appreciated when we compare two individuals with the same cognitive abilities that have grown
in different geographic areas of the planet where there are significant differences in the way of
life. Probably both individuals will behave differently to similar stimuli as they have created
different behavioural patterns based on the information they have received.

This powerful ability to construct behaviour models based on the extraction and manipulation of
information from the environment caused researchers to begin to imitate the human learning
process in order to increase the capabilities of different algorithms of Artificial Intelligence which
were limited by the expert knowledge that was introduced to them to be able to reason and by the
behaviours (actions, operations, etc.) which were defined by the system designers. That is, the
different AI algorithms were not able to adapt to new situations if they were not foreseen in the
models that were used to produce the reasoning process. This new area of knowledge belonging
to Artificial Intelligence began to be called Machine Learning (ML) because its objective was to
build patterns of behaviour through learning based on the information available from the
environment. Depending on how the process of building the learning models takes place we can
differentiate between different learning modes:

▪ Inductive learning: This type of learning mode consists of building models from a
generalization process using simple examples. That is, it allows to identify a set of
common patterns that allow to define certain characteristics of the different examples used
in the learning process. This process is based on inductive reasoning[11] by which
humans draw general conclusions independently through a process of analysis of
available information in order to solve a given problem. The knowledge used is always
considered new and can modify or invalidate the previously learned model.

© Structuralia 4
Artificial Intelligence – Machine Learning I

▪ Analytical or deductive learning: This type of learning mode consists in building models
from a deductive process by identifying a general description from a set of examples which
are explained in a specific and complete way. This process is based on deductive
reasoning[12] by which humans commonly learn at school where a series of rules or laws
are presented which are demonstrated based on a series of examples that explain how
different laws work. New knowledge does not invalidate prior knowledge, but only
completes or reinforces it.
▪ Analogue learning: This type of learning mode consists in building models that allow
generating solutions to new problems by searching for similarities with previously solved
problems so that the solution of similar problems is adapted to the particularities of the
new problem in order to learn a solution[13]. This type of learning concept is applied to
design the reasoning systems based on the case described in the previous topic. Some
researchers do not consider it as a total learning process as it depends on an adaptation
phase of can produce many variations in the final solution.
▪ Related learning: This type of learning mode consists of building models based on the
connections of simple entities[] in order to learn certain concepts based on how these
connections are configured. These types of learning processes have been tried to be
applied in the process of building neuron networks in order to create dynamic systems
that adapt their connections.
▪ Behavioural learning: This type of learning mode, based on behavioural psychology,
consists in constructing models based on the observation and analysis of the reactions
(behaviours) applied in the environment as a response to a certain stimulus that can be a
certain reward.

The emergence of Machine Learning[10] marked a radical change in the way in which the process
of creating systems based on Artificial Intelligence was being addressed, because from this
moment on, we didn’t try to build algorithms that were able to solve problems automatically, but
we tried to build algorithms that were able to learn models based on the information they have
available. This meant the emergence of different types of families of methods that addressed
different facets of the learning process starting with the early versions of the perceptron[2][3] up
to the complex models built through the use of Deep Learning (Deep Learning) based on millions
of data and complex structures based on neuron networks[4]. In recent years the emergence of
different frameworks[14][15], which facilitate the processing and generation of learning models,

5 © Structuralia
Artificial Intelligence – Machine Learning I

has meant that companies have focused their efforts on using such techniques to offer new
services and functionalities in a wide range of areas such as:

▪ Mobility: Identification of more efficient road routes that minimize the number of traffic jams
in the main arteries of the city decreasing time, consumption and/or pollution.
▪ Production systems: Prediction of operating errors in supervised or semi-supervised
production system.
▪ Finance: Prevention of bank fraud, money laundering and customizing the offer of
products based on user and market data in order to know which type of products are best
adapted to a certain user.
▪ Agriculture: Identification of the best growing areas for each type of product, identification
of dates of collection of the products at the optimum time of ripening, as well as
identification of possible problems in the growth process, identification of diseases
(computer vision).
▪ Energy: Identification of the best growing areas for each type of product, identification of
dates of collection of the products at the optimum time of ripening, as well as identification
of possible problems in the growth process, identification of diseases (artificial vision).
Identification of changes in the cost of energy in order to recommend to customers when
increasing or decreasing their consumption in order to minimize costs.
▪ Health: Diagnosis and detection of diseases through the analysis of user information. As
well as the use of artificial vision system for the detection of tumours.
▪ Pharmaceutical: Optimize the different clinical studies performed by automatically
selecting patients based on their characteristics.
▪ Mass media: Personalization of advertising and recommendations through the use of
multimodal user data.
▪ Logistics: Real-time optimization of product prices, opening times, quantity of product
stored based on customers' previous behaviour in similar situations or dates.
▪ Public and social sector: Optimization of the process of allocating resources for urban
development in order to improve the quality of life of users in order to minimize crime,
adapt cleaning services to the number of people in an area. For example, increase the
number of cleaners on certain days of the year due to the existence of events or a greater
influx of tourists.

© Structuralia 6
Artificial Intelligence – Machine Learning I

▪ Travel and hotel management: Identification of more efficient routes in order to optimize
the route of flights, vehicles for the transport of perishable products or for the execution of
deliveries through the use of local road transport.

7 © Structuralia
Artificial Intelligence – Machine Learning I

2. MACHINE LEARNING BASICS


This section presents the basics of Machine Learning. To this end, a theoretical description of the
concept of Machine Learning based on the concept of Human Learning will be presented,
describing the different basic elements necessary to understand how the process of generating a
learning model works, The following will describe the basic aspects of the models that we can
learn and finally describe the different techniques grouped by type or family.

2.1 Machine Learning Process

From a machine perspective machine learning can be defined as the process of acquiring
knowledge automatically by using training examples that define the characteristics of the concept
to be learned. This process of knowledge acquisition (learning) can be seen as a process of
generating changes in the learning system (student) which are defined by the information
obtained from the environment (input examples). This information can be defined by another
system that teaches us (teacher) that makes an identification of the information (tagged data) or
by extracting raw information that is not identified (unlabelled data). These imply that learning
systems must be able to work with a very wide range of input information, which may include
incomplete, uncertain, noise, inconsistencies, etc in order to construct imperfect models which
can be analyzed in order to know their degree of existence and suitability to the problem.
Depending on how input and output information is used during the learning process, a distinction
can be made between four large groups or families of techniques: supervised learning,
unsupervised learning, semi-supervised learning, and reinforcement learning. Each of these
families will be described in detail throughout this and the next topic.

2.2 Basic Concepts

In order to correctly describe how different Machine Learning techniques work, it is necessary to
introduce some basic concepts. It presents the learning process of any supervised or semi-
supervised algorithm. Each of the different elements in the figure are described below in order to
describe in detail each of the basics used in the learning process.

© Structuralia 8
Artificial Intelligence – Machine Learning I

Test set Precision

Additional
parameters

Training Algorithm Model


set

Training set

Atributes

Instances

Figure 1.

▪ Attribute: Basic description of the learning process of a supervised or semi-supervised


algorithm. This type of information represents a basic feature that attempts to describe
some kind of property over the data (colour, size, distance, etc). Attributes are usually
classified into two categories based on the values they can take:
o Continuous: These are attributes that take a fixed value within a perfectly bounded
non-finite interval where given two observable values there is always a third
intermediate value that could take the continuous attribute. For example, the
temperature of a given room measured in degrees Celsius can take actual-type values
in a range between -50º and 50º Celsius.
o Discreet: They are those attributes that take their value among the elements that form
a finite set. For example, possible paint colours used to paint a vehicle.

9 © Structuralia
Artificial Intelligence – Machine Learning I

▪ Instance: An instance is the basic information structure used to represent each of the
examples that is part of the data sets that will be used in the learning process. Each
instance, in turn, is composed of a finite set of attributes that describe it. The instances of
the same set have to be formed by the same type of attributes. Table 1shows a set of
instances referring to the climatology of the environment in order to know whether or not you
can play a game of tennis. Each instance is composed of continuous (Humidity) and discrete
(Sky, Temperature and Wind) attributes.

ID Sky Temperature Humidity Wind Play (Tag)

1 Sunny High 65.28 Mild Yes=Yes

2 Cloudy High 60.45 Mild Yes=Yes

3 Rainy Normal 68.12 Intense No

Table 1 - Example of a set of instances with different types of attributes.

▪ Aim (class): It is a special attribute used to make the prediction, that is to say it is the
objective of the prediction. For example, the probability that a patient has a certain disease
or the price at which a house will be sold.
▪ Data set: It is the set of instances that are used for the learning process. How this set of
information is used depends on the type of technique we are using. That is, the information
may have been previously collected and stored in files, or other type of storage format,
which will be used to build models or can be collected in real time during the training
process.
▪ Model: It is the result of the learning process. Formally it can be defined as the set of rules
or patterns inferred from the training set used to build it.
▪ Algorithm: It is the mechanism by which the learning process occurs. This algorithm works
differently depending on the type of output and input we want to use to build the model.
Each of the different supervised, unsupervised, and semi-supervised learning algorithms
can be defined as one of the three types of techniques.

© Structuralia 10
Artificial Intelligence – Machine Learning I

Classification Regression Clustering

Figure 2. Example of the spatial distribution of a set of instances based on the operation of three types of algorithms

▪ Classification: Classification algorithms are those that try to predict a discrete type
output consisting of a discrete type attribute (classes) from a set of data that may or
may not be labeled (Left Image Figure). The different input instances (training and
testing) are sorted into categories or classes. Each of these classes are the tags that
have been assigned to the data. Two types of algorithms are usually differentiated:
binary classification where it is only necessary to predict two target classes (Yes or
No) and multiclass classification where it is necessary to predict more than 2 target
classes.
▪ Regression: Regression or estimation algorithms are those that attempt to predict a
continuous-type output consisting of a numeric value (real number) from a labeled data
set (central image Figure 2). Each of the different input instances (training and
testing) has a continuous type "output" attribute.
▪ Clustering: Clustering algorithms (clustering) are those that try to predict a discrete
type output from a set of data that may or may not be labeled (Right Image Figure 2).
In other words, these algorithms are able to group the elements based on the
similarities between their attributes in a given set of groups, whose number must be
included as one more entry of the algorithm.

11 © Structuralia
Artificial Intelligence – Machine Learning I

2.3 Learning Model

The process described in the previous section allows us to know which the basic components are
to be able to generate a learning model, but it is not that simple. First, this process differs
depending on each of the different families of methods. Unsupervised learning algorithms often
do not include a test phase, the information used for the learning process is no longer labeled,
and reinforcement learning algorithms model information by using states, actions and rewards
that from a very general point of view can be similar completely change the learning process.
Action state pairs can be seen as instances and reinforcement as something similar to class. All
this will be described in detail in the next topic of this course.

With regard to the learning process in general, it is necessary to introduce some basic concepts
concerning the model and the treatment of the information used to construct it.

▪ Discretization: This is the process by which a continuous type variable is transformed into
a discrete type variable. This process is usually done by using the values available in the
training set, based on their values the set is divided into ranges where each will
correspond to a discrete value of the new variable. This division is commonly done
manually, which leads to biases in the learning process. Although there are techniques
that perform automatic divisions of the set in ranges of similar size.
▪ Standardization: It is the process of adjusting the different values of an attribute in order
to decrease important differences between the values or represent them in another type
of scale that allows analysing more effectively their effect on the rest of the attributes or
their meaning. The most common is to perform a normalization at values between 0 and
1.
▪ Outlier value: Outlier values are values that can take the attributes very distant from the
other values that the other instances of the sets (training and testing) used in the learning
process possess. Such important changes in the values of an attribute can mean that a
measurement error has occurred or that an unusual case has been detected that is not
common to reproduce. Normally this type of values is removed from the training and/or
test sets because they can significantly affect the model generation process. Although
this removal process can be quite complicated if the sets used in the learning process
are very large, so some techniques include processes for detecting outliers in order to
eliminate them and/or treat them differently during the learning process.

© Structuralia 12
Artificial Intelligence – Machine Learning I

Some of the most widely used methods for detecting these values are: (1) Monovariate
methods that search for outliers individually in each attribute in order to remove them; (2)
Multivariate methods that search for outliers of instances by combining multiple
attributes; and (3) The Minkowski error method that unlike the other two does not try to
eliminate outliers but to minimize their impact on the model in order not to manipulate the
training set as the other methods perform. An example of atypical values would be those
presented in Figure 3 , where the value of the attributes of the atypical values (red) differ
from the value that should correspond to them next the path of the sigmoidal.

Figure 3. Example of a data set with outliers

▪ Overfitting: Over-tuning, over-learning, or overfitting is the effect of over-training a model


with a supervised learning algorithm. One of the problems. The learning process must
generate a model that allows predicting the result of other instances by generalizing from
what has been learned through the use of training instances However, when the training
package is made up of instances that do not represent the problem as a whole or are
made up of very few or rare cases, an over-training effect may occur. This effect means
that the model generated by the algorithm is not able to generalize and is adjusted to very
specific characteristics of the training data that have no causal relationship with the
objective function.

13 © Structuralia
Artificial Intelligence – Machine Learning I

▪ Noise: Noise is the set of attributes or values of a specific attribute that contribute nothing
to the learning process and that may have been introduced through an incorrect
measurement process, a data insertion error and are simply outliers that can affect the
learning process due to their rarity. That is, it is impossible to use this set of values and/or
attributes in order to generate a model that is able to generalize based on the instances
used to construct it.
Noise usually brings many disadvantages when it comes to building a model, since it
tends to increase the complexity of the data, complicating the execution of the algorithms
that end up generating models that do not generalize correctly. Before building a model,
it is very important to analyse the training instances in order to eliminate the possible
noise that exists. This process can be very complicated depending on the size of the
training and test set, so there are different applications that try to eliminate the noise
automatically although not this type of hardware does not usually ensure the total
elimination of noise or can even erroneously remove information that is relevant to the
learning process.
▪ Bias: Bias is probably one of the most important concepts of Machine Learning because
its emergence can cause our models to be useless. Bias is probably one of the most
important concepts of Automatic Learning because its emergence can cause our models
to be useless. A typical example of bias has appeared in many of the early machine vision
systems used in autonomous vehicles where the training process had only been
performed on medium-sized adults. This resulted in the model not being able to identify
children as humans. A more detailed description of the concept of bias and its influence
on the different techniques of Artificial Intelligence will be made in unit 9.

2.4 Machine Learning Techniques

Currently, Machine Learning is formed by a wide number of techniques, which can be classified
into four families of techniques based on how the information they use to define inputs and outputs
is modeled.

▪ Supervised: These types of techniques are referred to as teacher learning techniques


because the information they use during the learning process is complete. That is, the
training and test instances are formed by attributes that define the inputs the inputs and
the expected output for each instance (tagged data).

© Structuralia 14
Artificial Intelligence – Machine Learning I

Its operation consists in identifying a function (model) that is capable of mapping between
known inputs and outputs. The techniques of this group are based on classification and
regression.

▪ Unsupervised: These types of techniques are more complex than the previous ones
because the information they use during the learning process is incomplete. That is, the
training and test instances are formed only by input attributes. The expected output for
any of the instances (unlabeled data) is not known. Its operation consists in the
identification of a function (model) that is able to group the different instances based on
certain common characteristics. The techniques of this group are based on classification
and grouping.

▪ Semi-supervised: Sometimes this type of techniques are included in one of the previous
groups since most of the algorithms of this group can be included in one of the previous
ones because they apply some type of transformation in the input data in order to be able
to apply one of the algorithms of the previous groups or combine algorithms of both
groups. In my opinion, it should be classified in an independent group because they use
partially incomplete information. That is, the training or test instances do not have a
completely similar structure because some instances are labeled and others are not.

15 © Structuralia
Artificial Intelligence – Machine Learning I

Supervised learning Unsupervised learning Reinforced learning

Classification Regresion Clustering

Vector support Linear Value iteration


machines reglession
K-means

KNN Hidden Markov Policy iteration


SVR,GPR
modules

Neuron networks

Logistic Self-organized
Naïve Bayes regression maps Q learning

Decision trees Gaussian Mixture montecarlo


method

Random Forest Principal


Components
Analysis

Figure 4. Description of different Machine Learning techniques grouped by family.

▪ By way of reinforcement: This is a special type of unsupervised machine learning


techniques because they use a different way of representing examples of training based
on the use of states and actions, where the application of actions produces feedback in
the environment through rewards. In reinforcement learning the goal is to build a policy to
solve a given problem. This policy is usually a sequence of actions that allow you to solve
your problem optimally.

Figure 4 presents different types of Machine Learning algorithms distributed by family


(Supervised, Unsupervised and Reinforcing) and by technique (Regression, Classification and
Clustering). In this figure only some of the most used techniques are presented, but there are
many more and even different versions of each of them. In red are marked the different techniques
that will be described in detail in this topic.

© Structuralia 16
Artificial Intelligence – Machine Learning I

3. SUPERVISED LEARNING
Supervised learning is one of the families of learning methods based on inductive learning[4]
because the learning process is produced by labeled instances (examples). Supervised learning
is one of the families of learning methods based on inductive learning[4] because the learning
process is produced by labeled instances (examples). This type of learning model consists in
defining the information of the environment in pairs of the form (input - output) so that for new
entries whose output is unknown the system will be able to predict the expected output based on
what you have learned during the training process. The process to identify or define this function
is divided into two phases:

▪ Training phase: t is the main phase of the learning process and consists of selecting the
most relevant characteristics of each of the input examples by comparing them with other
previously analyzed examples, through a matching process (pattern identification), so
that when significant differences are detected there is an adaptation of the model that is
being built based on learning. This process can be seen as a pattern identification system
by generalizing the similar characteristics between the different examples in order to build
a model that is able to identify the different groups of examples with characteristics
similar. That is, the result of the training process is a system formed by a set of patterns
that have been identified based on the characteristics of the examples used in the training
process.
▪ Test phase: This is the secondary and optional phase of the learning process that allows
us to validate the outcome of the training phase by using examples that have not been
used in the training phase in order to check. It is usually considered as an optional phase
because it cannot be applied to all types of Machine Learning algorithms.

3.1 Decision Tree

A decision tree (classification or regression) is a data structure that allows representing


information in the form of a tree. This data structure is formed by a single node, called a root,
which is connected to a series of successor nodes called intermediate nodes. Intermediate nodes
have a finite set of successors that are directly connected to it by an arc or branch. Those nodes
that have no successor are called terminal nodes or leaves. The way this data structure is
branching out at a spatial level resembles a tree.

17 © Structuralia
Artificial Intelligence – Machine Learning I

The construction of learning models using decision trees is one of the most widely used inductive
learning-based machine learning techniques because they are mainly able to build a predictor
using an n-treearyan to represent and categorize a series of conditions on the basis of which you
can describe the different instances used to build the tree. In addition, they are able to visually
represent the resulting tree we can get the prediction of any new instance just following a path
from the root to one of the leaf nodes.

This phenomenon makes it easy to check the quality of the model, its homogeneity, etc. The
decision and regression trees are made up of three elements:

▪ Intermediate nodes, including the root, which corresponds to a certain attribute of the
instance structure.
▪ The leaf nodes that correspond to the target classes that depending on whether the tree
is of classification or regression will be attributes of discrete or continuous type
respectively.
▪ The arcs (decision alternatives) that connect the intermediate nodes with their successors
(intermediate nodes and sheets) that represent the possible values of the attributes that
have been selected to produce the model.

Yes

Humidity
No
Cloudy
Sky Yes
Yes
Wind
No

Figure 5. Example of a decision tree

© Structuralia 18
Artificial Intelligence – Machine Learning I

Figure 5 represents an example of a tree classification which allows us to decide whether to play
a tennis game or not: It is a binary classification tree because all leaf nodes (classes) are of
discrete type where the possible values that the target attribute can take are two (YES or NO).
The different instances used to build it are formed by three types of attributes (sky, humidity, and
wind) whose values are discrete.

3.1.1. ID3 Algorithm

The ID3 (Iterative Dichotomiser 3)[5] algorithm is a learning method considered as the precursor
of decision tree building algorithms since many of the different algorithms of this type are based
on it. This algorithm makes it possible to create statistical classifiers by constructing decision trees
from instances with discrete attributes by using the entropy concept of mathematical information
theory[6], which allows the uncertainty of a set of information to be measured. That is, it indicates
the degree of disorder of the data of a set, whose value varies between 0 and 1. The entropy for
the construction of a binary classification tree for a set of instances 𝑆can be defined as:

𝐻 (𝑆) = −𝑝(2) log 2 𝑝(2) − 𝑝(1) log 2 𝑝(1)

where p(2) is the fraction of class 2 examples in 𝑝1 and 𝑝0 is the fraction of class 1 examples in.𝑆
If all the examples of the set S belong only to one of the classes the value of the entropy is 0. On
the other hand, if the values of 𝑆 y 𝑆 are perfectly balanced, that is, it has a value of 0.5, the value
of the entropy would be 1, which means that the set 𝑆 is perfectly balanced since it includes the
same number of examples of each class. This concept of entropy can be generalized for multi-
class classification trees by the following formula, with c being the number of classes:

𝐻 (𝑆) = ∑ − 𝑝(𝑖) log 2 𝑝(𝑖)


𝑖=1

The value of the entropy is used to calculate the gain or effectiveness of an attribute to divide the
set of instances into n subsets (one for each possible value of the selected attribute).

19 © Structuralia
Artificial Intelligence – Machine Learning I

The gain is the expected value of the entropy after producing a partition of the instance set, which
is calculated based on the following formula:

|𝑆𝑣 |
𝐺 (𝑆, 𝐴) = 𝐻 (𝑆) − ∑ 𝐻(𝑆𝑣 )
|𝑆|
𝑣 ∈ 𝑣𝑎𝑙𝑜𝑟𝑒𝑠(𝐴)

where A is the selected attribute and 𝑆𝑣 is the instance set where attribute A is set to v. The ID3
algorithm is a recursive divide-and-conquer process that attempts to minimize the value of entropy
in order to get smaller trees. The functioning of the algorithm consists of a greedy or voracious
search process on different subsets of set S, which is formed by all the instances of the problem,
using with estimation measure the gain (heuristic). The algorithm begins by calculating the value
of the entropy of each of the attributes not selected from the elements of the set S (all in the initial
step), selecting the attribute that has the lowest entropy value (higher information gain). A tree
node is created for the selected attribute and then the S set is divided or partitioned into n subsets
based on the possible attribute values. For each of the generated subsets the same process is
applied, calculating the value of the gain for all attributes, and selecting the attribute that
minimizes entropy until:

▪ All elements in the subset belong to the same class, so a leaf node is created and labeled
with the class.
▪ There is no other attribute to select, so a leaf node is created, and the majority class is
labeled to all elements in the subset.

The process continues until all elements of set S have been assigned to a leaf node. The
functioning of this algorithm was improved through multiple versions (ID4[7] and ID5r[4]) in order
to improve the functioning and performance of the algorithm.

© Structuralia 20
Artificial Intelligence – Machine Learning I

3.1.2. CART Algorithm

The CART (Classification and Regression Trees) algorithm [8] is a learning method for creating
statistical classifiers and regressors by building decision trees (classification and regression) from
instances with discrete and continuous attributes by using the concept of impurity that defines the
homogeneity of each node. That is, the concept of impurity defines as similar are the elements of
a set based on a certain criterion which in this case is one of the possible classes. The impurity
value is calculated differently depending on the type of tree being created:

▪ Classification: The impurity for the process of generation of a classification tree can be
calculated on the basis of the entropy or on the basis of the Gini diversity index, which can
be calculated on the basis of the following formula for binary classification:

𝑖 (𝑆) = 𝑝(1) 𝑝(2)

where (1)p is the fraction of class 1 examples in p(2) and(1) is the fraction of class 2 examples
inp(2). The calculation of the impurity can be extended for multiclass classification using
the following formula, with c being the number of classes:

𝑖(𝑆) = ∑ 𝑝(𝑖 ) 𝑝(𝑗)


𝑖 ≠𝑗

▪ Regression: The impurity for the process of generating a regression tree is calculated as
the aggregation of variances of all terminal nodes.

The CART algorithm is a process formed by two phases that generate a tree, called saturated,
which is subsequently optimized by the application of different pruning techniques.

▪ Generation of a saturated tree by means of a recursive divide and conquer algorithm that
tries to minimize the value of the impurity of the nodes of the tree in order to obtain a
greater homogeneity in the examples that belong to each one of the nodes of the tree.
This process tends to generate trees of great depth because they have tried to create
nodes that represent instances with the greatest possible homogeneity.

21 © Structuralia
Artificial Intelligence – Machine Learning I

▪ Optimization of the tree through a pruning process. Pruning consists of finding the
saturated tree subtree with the best quality based on the prediction ratio of the results and
lower vulnerability to noise of the input instances (training). For this, a quality analysis of
the terminal nodes of the possible subtrees of the saturated tree is performed. For a T tree
the quality is defined based on the following formula:

𝑐 (𝑇) = ∑ 𝑝[𝑡]𝑟(𝑡)
𝑡 𝜖 𝑇̃

where 𝑇̃ is the set of terminal nodes of tree T and r(t) is a measure of the quality of node t which
is similar to the sum of the squares of the residuals in linear regression. It is possible to use the
impurity of the t node as a measure of quality.

This tree generation process can be used for the generation of both regression and classification
trees by varying the way impurity and quality of terminal nodes are calculated for the pruning
process.

3.1.3. C4.5 Algorithm

The C4.5[9] algorithm, called J48 in its open source version used in many learning frameworks,
is a learning method that improves the different versions of the ID3 algorithm that allows creating
statistical classifiers by constructing decision trees from instances with discrete and continuous
attributes. This method includes important improvements that solve some of the problems that
had their predecessors being the most important of them the possibility of using continuous
attributes. To do this, the algorithm performs an analysis of the attributes of the input instances in
order to discrete the attributes by separating them into ranges based on the values of the input
instances. It also includes two important improvements in order to optimize the generation
process and avoid over-learning: (1) Allows the use of pruning processes (tree size simplification)
during (the subdivision process is stopped based on some criteria) and at the end (a number of
modifications are made to the tree so that certain branches are merged or transformed into leaf
nodes) the generation process; and (2) Allows to ignore data attributes lost during the generation
process, that is, those instances that have no value for the attribute being analyzed are not taken
into account.

© Structuralia 22
Artificial Intelligence – Machine Learning I

Like its predecessor, the C4.5 algorithm is a recursive divide-and-conquer process that attempts
to minimize the value of entropy but introduces a new measure of gain, called the gain ratio, which
allows to calculate more efficiently the gain of choosing an attribute.

𝐺(𝑆, 𝐴)
𝑟𝑎𝑡𝑖𝑜(𝑆, 𝐴) =
𝐻𝑝𝑎𝑟𝑐𝑖𝑎𝑙 (𝑆, 𝐴)

|𝑆𝑣 | |𝑆𝑣 |
𝐻𝑝𝑎𝑟𝑐𝑖𝑎𝑙 (𝑆, 𝐴) = − ∑ log 2
|𝑆| |𝑆|
𝑣 ∈ 𝑣𝑎𝑙𝑜𝑟𝑒𝑠(𝐴)

The functioning of the algorithm is similar to that of ID3 except in the way of calculating the gain
of each attribute where the two formulas previously presented are used and in the generation
phase of the different subsets where C4.5 generates intermediate nodes or sheets introduce new
base cases in order to improve the size of the tree as for example in situations where no gain of
information is obtained for any of the subsets generated based on the values of the attribute. In
these cases, an intermediate node is created above the tree using the expected value of the
class. The C4.5 algorithm has been improved leading to different versions (C5.0 and See5) in
order to improve its performance (speed and memory usage) and add more functionalities to build
smaller trees or remove attributes that are not used in the sorting process.

3.2 Random Forest

Random forests or decision forests[17] random are a learning method for the generation of
classifiers or regressors based on decision trees that were defined to avoid the process of
overfitting of the models produced by different decision tree generation techniques. This method
works by generating a random population of decision trees that are used to decide in common
the value of the output variable. Depending on the type of population trees the decision process
will be different. For a population formed by classification trees, a voting process will be carried
out, with the majority class predicted to exit the method. On the other hand, for a population
formed by regression trees, the average of the predictions of each of the trees is calculated.
Figure 6 is an example of the operation of a classifier or regressors built by random tree forest is
shown.

23 © Structuralia
Artificial Intelligence – Machine Learning I

Instance

Vote

Final prediction
Figure 6. Example of operation of a random decision forest

The process of generating a random decision forest is to manipulate the input set (training) in
order to generate a random input set that will be used to train each of the trees in the forest. For
the process of generating the training sets, a Bootstrap aggregating algorithm is used, also called
a bagging algorithm. Given a training set E with a number n of elements the bagging algorithm will
generate m training sets of size n by the uniform selection of instances of the set E. This generation
system based on a sampling with replacement implies that some that a part of the instances that make
up the different training set will be repeated. Each of these sets will be used to build a decision tree,
which can be generated by different techniques. Once defined all trees (models) are combined in order
to generate a common prediction calculated by averaging the output (for regression) or voting (for
classification).

© Structuralia 24
Artificial Intelligence – Machine Learning I

3.3 Linear Regression

Linear regression is a mathematical method used to study the relationship between a set of
independent variables (predictive variables) and a dependent variable (criterion or predicted
variable)[18]. Depending on the number of independent variables used in the prediction process,
we can differentiate between simple regression when we only use a dependent variable and
multiple regression when we use more than one independent variable. This technique allows
developing a linear equation that can be used to construct a prediction model from instances with
discrete input and output attributes.

3.3.1. Simple Lineal Regression

Simple linear regression is to calculate a linear equation that relates an independent variable x
and a dependent variable y. The linear equation to be calculated has the following structure:

𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝜀

where y is the dependent variable that we want to predict based on all the others,𝛽0 and 𝜀are
unknown parameters that must be calculated, x is the independent variable and is the error made
in the prediction. The learning process would consist of using input instances (training), each
composed of two values one for the independent variable x and one for the dependent variable,
to calculate the values 𝛽1 and 𝛽1 which would give rise to the equation of a line where 𝛽0 it would
be the ordered at the origin, that is the height at which the line cuts the axis Y, and 𝛽0 is the slope
of the line, is the increase or decrease that occurs in the variable and when the variable x
increases or decreases a unit. This figure presents the result of a simple regression process that
consists of approximating a regression line that better adjusts the cloud of points produced by the
values used to calculate it.

25 © Structuralia
Artificial Intelligence – Machine Learning I

Figure 7: Example of the line resulting from a simple linear regression process

One of the big problems of this technique is that given a set of input instances (training) there are
infinite straight lines that can pass through the cloud of points so it is necessary to choose the line
that passes as close as possible to most of the points. So, the learning goal is to find the line that
best represents the set of points. There are different mathematical techniques that allow to adjust
a simple function, being the most used the technique of least squares that allows to identify the
line that makes minimum the sum of the squares of the vertical distances between each point and
the line.

3.3.2. Multiple Linear Regression

Simple linear regression is a particular case of multiple linear regression that consists of
calculating an equation that relates a set of independent variables x 1, x2, ... , xn and a dependent
variable y. The equation for this regression case has the following structure:

𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑛 𝑥𝑛 + 𝜀

© Structuralia 26
Artificial Intelligence – Machine Learning I

The functioning of multiple linear regression is exactly similar to simple linear regression except
with the difference that we have multiple independent variables and that the equation does not
represent a line, but it will represent a plane when we have two independent variables, a three-
dimensional space when we have three independent variables and a hyperspace when we have
more than three variables.

3.4 Artificial Neuron Network

An Artificial Neuron Network (RNA) is a network represented spatially by a directed graph where
each node represents neurons and the arcs represent the synapses or connections between
neurons[19]. This structure attempts to mimic the distribution of neurons in the human brain in
order to construct a type of algorithm that is capable of simulating the processing capacity of our
brains. This structure attempts to mimic the distribution of neurons in the human brain in order to
construct a type of algorithm that is capable of simulating the processing capacity of our brains.
For this it is necessary to make a brief introduction of the structure of a human neuron in order to
understand which its similarities. Human neurons are made up of three elements: (1) the soma
or cell body is the main part of the neurons (10 to 80 microns in length); (2) the multiple
prolongations from different parts of the soma, called dendrites, whose function is to receive
impulses from other neurons and send them to the soma; and (I3) the axon, which is an extension
of the soma that extends in the opposite direction to the dendrites and has the function of driving
a nerve impulse from the soma to another neuron, muscle or gland in the human body. Figure 8
presents the detailed structure of a human neuron and its different parts.

27 © Structuralia
Artificial Intelligence – Machine Learning I

Nissl
substance
General structure of a
neuron

Myelin sealth
Nodes of
Axon Ranvier

Axon hillock

Nucleus
Axon Body or
terminal soma

Dendrite

Figure 8. Biological structure of a neuron

Based on this biological structure, researchers Pitts and McCulloch defined the first neural
network model in 1943[20]. This model consisted of a network with two layers of neurons
connected to each other, where the first layer was formed by a set of nodes or formal neurons
(presynaptic neurons) representing the network input and the second layer was formed by a single
node or formal neuron (postsynaptic neuron) representing the network output. A formal neuron
was a logical gate with two possible internal states (on or off) represented by a variable. This
network functioned as a discriminator of the state of the logical gate, so that the neurons of the
first layer received the inputs, that were sent to the second layer where a mathematical operation
was applied, obtaining a value on which a function was applied that activation, based on a
threshold, which defined the state of the output. That is, if the value obtained from the operation
on the inputs was greater than the threshold the output was 1 and if not, the output was 0. From
this and other models that were developed in the following years, Rumelhart and McClelland
introduced in 1986 what is now known as the standard model of artificial neuron, which is
represented in the Figure 9, where the components of an artificial neuron are presented as well
as the correspondence between the elements of a biological neuron and an artificial neuron

As you can see in the following image, the artificial neuron tries to completely imitate the structure
of a biological neuron

© Structuralia 28
Artificial Intelligence – Machine Learning I

Biological
Axon Synapses neurons
Dendrite Body
Axon
neck
Axon

Output
Activation
function

Artificial
Output Weight
neurons
Sum and threshold

Figure 9: Basic elements of an artificial neuron

▪ A set of xj inputs and a set of wij weights, called synaptics, where j is a value between 1
and n.
▪ A hi propagation function defined from the set of inputs and synaptic weights.
In other words:

ℎ𝑖 (𝑥1 , … , 𝑥𝑛 , 𝑤𝑖1 , … , 𝑤𝑖𝑛 )

The propagation function defines the mathematical operation by which the different inputs
received by the neuron are combined. The most commonly used propagation function is
to linearly combine the synaptic inputs and weights, using the following mathematical
operation:

ℎ𝑖 (𝑥1 , … , 𝑥𝑛 , 𝑤𝑖1 , … , 𝑤𝑖𝑛 ) = ∑ 𝑤𝑖𝑗 𝑥𝑗


𝑖=1

29 © Structuralia
Artificial Intelligence – Machine Learning I

In addition, it is very common to add to the propagation function an additional parameter


θi, commonly referred to as threshold, which is usually subtracted from the result of the
operation.

ℎ𝑖 (𝑥1 , … , 𝑥𝑛 , 𝑤𝑖1 , … , 𝑤𝑖𝑛 ) = ∑ 𝑤𝑖𝑗 𝑥𝑗 − θ𝑖


𝑖=1

▪ An activation function, which simultaneously represents the output of the neuron and its
activation state. Mathematically the activation function can be defined:

𝑦𝑖 = 𝑓𝑖 (𝑥𝑖 ) = 𝑓𝑖 (∑ 𝑤𝑖𝑗 𝑥𝑗 )
𝑖=1

Table 2 presents some of the most common activating functions used by the neuron network.
Different types of activation functions are normally used for each layer of the network.

Name Function Range Graph

Identity 𝑦=𝑥 [−∞, +∞]

𝑦 = 𝑠𝑖𝑔𝑛(𝑥) {−1, +1}

Step

𝑦 = 𝐻(𝑥) {0, +1}

© Structuralia 30
Artificial Intelligence – Machine Learning I

𝑦
Lineal
−1, 𝑠𝑖 𝑥 < −𝑙 [−1, +1]
= {𝑥, 𝑠𝑖 + 𝑙 ≤ 𝑥 ≤ −𝑙
+1, 𝑠𝑖 𝑥 < +𝑙

1
𝑦= [0, +1]
1 + 𝑒 −𝑥

Sigmoidal
𝑦 = tanh(𝑥) [−1, +1]

2
Gaussian 𝑦 = 𝐴𝑒 −𝐵𝑥 [0, +1]

Sinusoidal 𝑦 = 𝐴 sin(𝜔𝑥 + 𝜑) [−1, +1]

𝑒 𝑥𝑖
Softmax 𝑦= [0, +1]
∑𝑖𝑗=1 𝑒 𝑥𝑗

Table 2 - Most common activation functions

31 © Structuralia
Artificial Intelligence – Machine Learning I

3.4.1. Neuron Network Architecture

Once the basic components, that is neurons, have been defined, it is necessary to define how
they are distributed and connected in the network. The distribution of the different network
connections largely determines the functioning of the neuron network. The different nodes
(neurons) that make up the network are connected to each other through synapses. These
connections are unidirectional, that is, information only propagates in a single sense from the
presynaptive neuron to the possynaptic neuron. In general, neurons are grouped together based
on the propagation function they use in larger entities called layers. The sequential combination
of different layers constitutes a network of artificial neurons. In any artificial neuron network three
types of layers can be distinguished:
▪ Input or sensory layer: t is the initial layer of the neuron network and is composed of
neurons that receive information (data or signals) from the environment.
▪ Output layer: It is the final layer of the neuron network and generates the output or
response of the network.
▪ Hidden layer: It is the layer or intermediate layers of the network of neurons where the
different mathematical operations that allow the network to process and manipulate
information in order to generate the output of the network of neurons are performed. The
simplest case is that there is only one layer, but all the layers that are needed can be
included. The inclusion of multiple hidden layers has given rise to deep neuron networks
that have become popular in recent years through Deep Learning. Although this type of
architecture is very efficient for certain processes it greatly increases the complexity of the
neuron network.

In Figure 10 is an example of a network of unidirectional neurons composed of 4 layers (multilayer


network) is presented in the paper, where there is a layer with three input neurons, two hidden
layers with 4 fully connected neurons and an output layer with 2 neurons.

© Structuralia 32
Artificial Intelligence – Machine Learning I

Hidden layers

Output
Input

Figure 10. Example of a multilayer network that consist of 2 hidden layers and an output layer.

Depending on the number of hidden layers that we include in our artificial neuron network we can
speak of single-layer networks that are only composed of a single hidden layer or multi-layer
networks where there are multiple hidden layers. In addition, depending on the connections
between the different layers, we can distinguish between three types of artificial neuron networks:
▪ One-way networks (feedforward): The networks of feedforward neurons/one-way
networks (because neurons obtain information from the anterior layer or environment) are
the first and simplest artificial neural network structure built by humans [].
It is this network structure, the information moves in a single direction. That is, from the
input layer, through the different hidden layers (if any) to the output layers. There are no
loops or feedback backwards in these types of networks. These types of networks, despite
their simplicity, offer very good results in processes in which the use of information from
future events within the network is not required.
▪ Elman networks (simple feedback): Elman neuron networks are called simple recurring
networks as they include feedback between the contiguous layers. That is, in this type of
network they possess a memory of the events immediately before. This information is
used in the process of updating the weights during the learning process.

33 © Structuralia
Artificial Intelligence – Machine Learning I

▪ Recurrent networks (full feedback): The networks of fully connected recurring neurons,
unlike the networks of Elman neurons, have feedback connections to all the elements that
form the network. Each neuron in the network is connected to all the elements that
surround it from a spatial point of view. That is, a neuron is connected to the neurons of
the posteriors and anterior layers and to itself.

3.4.2. Training Process

Once the structure of the neuron network has been defined it is necessary to compute the value
of the different weights of each of the layers of the network, for this purpose a training process is
used composed of two phases:
▪ Training phase: In this phase, the value of the weights of the network is calculated, using
a set of data or patterns (training examples) that are used to calculate the weights
(parameters) that define the model of the network of neurons. The different weights are
calculated iteratively, according to the values of the training examples, in order to minimize
the error made between the output obtained by the neural network and the desired output.
▪ Test phase: In this, an analysis is made of the quality of the training process, that is to
say, it is analyzed if the model built in the previous phase is able to generalize to new
cases or has been adjusted too much to the particularities present in the training examples
(overfitting). To avoid the problem of overfitting, it is advisable to use a second group of
data other than training, the validation group, to control the weight calculation process.

3.4.3. Types of neuron networks

Neuron networks can be built in different ways depending on the number of layers, the type of
neurons used, and the way the different layers are connected.
But there are a number of basic configurations that identify certain types of neuron networks,
some of which are described below:

▪ Simple perceptron the perceptron (P) is the simplest network of neurons formed by only
two layers (input and output) that allows to construct a simple discriminator.

© Structuralia 34
Artificial Intelligence – Machine Learning I

▪ Multi-layer Perceptron The multilayer perceptron (MP)[]is a network of neurons based on


the simple perceptron that introduces the concept of the occult layer. The basic structure
of this network is made up of at least three layers (input, hide and output) whose interaction
makes it possible to construct a universal type approximate. That is, it allows to
approximate any type of continuous function defined in a space.𝑅𝑛

▪ Recurrent networks Recurrent Neural Networks (RNN) are an extension of multi-layer


perceptron where connections occur between neurons in both directions (forward and
backward). There are two types of recurring networks: simple ones that only have
connection with the neurons of the continuous layers; and complete ones that can be
connected with all the neurons of the different layers including themselves.

▪ Long Short-Term Memory: Long neuron networks with short-term memory (LSTM) are a
type of neuron network that includes memory blocks that recall certain types of
information.

▪ Convolutional Networks: Convolutional Neural Networks (CNN) are a type of multilayer


perceptron-like neuron network where the neurons in some of their layers are
convolutional, sampling-reducing, and simple. Convolutional neurons try to simulate the
functioning of neurons in the human visual cortex by applying filters (matrix operations)
which allow the extraction of features from input data that correspond to the pixels of an
image.

▪ Generative Adversarial Network: Generative Adversarial Network (GAN) is a combination


of two networks of neurons that compete with each other in a zero-sum game. This is a
situation in which the gain or loss of one participant is exactly balanced against the gain
or loss of the other participant.

3.5 K-Nearest Neighbour

The nearest K-neighbour algorithm (K-Nearest Neighbour, KNN) is a non-parameterized learning


method for the generation of classifiers or regressors based on the concept of proximity[16]. This
algorithm does not produce any model after the learning phase but stores all instances used for
the learning process in a data structure that is consulted in order to predict the output value (class
or value) of new instances.

35 © Structuralia
Artificial Intelligence – Machine Learning I

Figure 11. Example of how the KNN algorithm classification process works using a different K values.

The operation of the algorithm is very simple once the generation of the data structure that stores
the training instances has taken place.
Given a new instance that must be classified, the distance to all the instances stored in the data
structure is calculated, which are ordered based on the distance value obtained from the lowest
to the highest. The nearest k instances are then selected and the new instance is assigned the
most frequent class among the instances selected as closest in case a classification process is
being carried out; or the output value is weighted based on the values of the instances selected
as closest in the case of a regression process. Figure 11 presents a visual example of a
classification process for a new instance using a KNN algorithm so that the distance to the rest
of the instance is calculated. In this case you can see how the value of the class of the new
instance varies depending on the value of k we select.

▪ For k = 1 the new instance (black) would be classified as blue as the closest instance is
blue.
▪ For k = 2 the new instance (black) could be classified as red or blue since the two closest
instances belong to two different classes. It would be necessary to define some kind of
technique to eliminate the tie.
▪ For k = 3 the new instance (black) would be classified as red since of the three closest
instances two are red and one is blue.

© Structuralia 36
Artificial Intelligence – Machine Learning I

Taking into account the performance observed in the example described in Figure 11 it is
necessary to configure the KNN algorithm based on two elements:

▪ The k value that is usually selected manually after the learning process but in some cases
possible to calculate the value by some type of algorithm based on the distribution of
instances of the training set.
▪ The function of calculating the distance that will allow to define the similarity between two
instances. The most common function is the euclidean distance presented in the following
equation:

𝑑(𝑥𝑖 , 𝑥𝑗 ) = √∑(𝑥𝑟𝑖 − 𝑥𝑟𝑗 )2


𝑟=1

KNN is one of the most widely used algorithms for classification due mainly to its simplicity. The
time of classification of new instances can be very high if the set of training instances is very large,
because the calculation of the distance with all the elements stored in the data structure during
the learning process must be done.
There are different versions of the KNN algorithm that try to exploit the two elements that can be
configured, some of these versions are: (1) KNN with rejection; (2) KNN with medium distance;
and (3) KNN with minimum distance.

37 © Structuralia
Artificial Intelligence – Machine Learning I

4. CONCLUSIONS
Machine Learning is one of the most promising techniques of artificial intelligence as it allows us
to construct reasoning models based on the information available from the environment in a way
similar to how we believe human learning processes work. Although like many other AI
techniques, it is extremely sensitive and dependent on the way the environment information is
modeled. In the case of machine learning, the definition of data, both at the level of structure and
at the level of a sufficient number of representative examples of the concept to be learned, is very
important because a poor definition of the attributes of the information used during the learning
process can lead to the drawing of erroneous conclusions or produce models that do not properly
generalize due to the lack of information or to the use of biased information.

Some examples of such problems can be seen in the very definition of the different Supervised
Learning methods described in this topic. These types of techniques need precise information
(labels) about the different concepts that represent the information that is provided to them to get
to build a model. That is, all examples used for the learning process must be labeled in order to
identify their characteristics and relate them to labels and to be able to predict the label of future
unlabeled examples. These process has two important problems:

▪ The process of labeling real-world information in a precise enough way in order to build
very specific models is very complicated and involves a great deal of effort at the level of
physical resources. For example, if we wanted to create a classifier for all the types of
trees that exist on planet earth we should take specific information from all the possible
attributes of each tree, label them and build a training set representative enough to create
a model that allows us to classify any type of tree and we should also create a set of test
data in order to analyse the effectiveness of our model.

▪ The definition of attributes describing instances. How many attributes and which are
necessary to correctly describe a concept, as it is possible to identify all the information
that is necessary to correctly identify each of the different types of trees that exist on the
planet earth. This is another major problem of supervised learning which still depends too
much on the way humans describe information.

© Structuralia 38
Artificial Intelligence – Machine Learning I

But after knowing how supervised learning works, we must ask ourselves whether it is possible
to accurately label any concept or whether we are able to manually define all the attributes needed
to describe these concepts. In the next topic we will describe how the different methods of
Unsupervised Learning work that are able to construct models by analysing the similarities
between the attributes of the sets of instances without the need to know the label of each these
instances.
We will also describe what Deep Learning is and how it is able to extract information and create
new features automatically in order to improve the learning process. To finish we will learn that it
is the Reinforcement Learning that tries to simulate the way in which living beings learn by using
rewards that allow us to learn how to interact physically in the environment.

39 © Structuralia
Artificial Intelligence – Machine Learning I

5. REFERENCES
[1] M. Minsky. (1967). Computation. Finite and infinite machines. Prentice Hall.

[2] M. Minsky. (1969). Perceptrons: an introduction to computational geometry. MIT Press,


Cambridge, Massachusets.

[3] Schmidhuber, J. (2015). Deep Learning in Neural Networks: An Overview. Neural


Networks. Volume 61 pp 85-117.

[4] Tom M. Mitchell. (1997) Machine Learning. McGraw-Hill, New York.

[5] Quinlan, J.R. (1986) Induction of decision trees. Machine Learning, Volume 1(1), pp 81-106.

[6] Shannon, C.E. (1984) A mathematical theory of communication. Bell System Technical
Journal, volume 27, pp 379–423 y 623–656.

[7] Utgoff Paul E. (1989) Incremental Induction of Decision, Trees, Machine Learning, volume 4,
pp 161-186.

[8] Izenman, A.J. (2008) Modern Multivariate Statistical Techniques. Springer.

[9] Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Pattern Classification (2nd edition). John
Wiley.

[10] Arthur S. (1959) Some Studies in Machine Learning Using the Game of Checkers. IBM
Journal of Research and Development. Volume 3(3). pp 210–229.

[11] Copi, I. M.; Cohen, C.; Flage, D. E. (2006). Essentials of Logic (Second edition). Upper
Saddle River, NJ: Pearson Education. ISBN 978-0-13-238034-8.

[12] Sternberg, R. J. (2009). Cognitive Psychology. Belmont, CA: Wadsworth. ISBN 978-0-495-
50629-4.

[13] Watson I. and Marir F. (1994). Case-base reasoning: A review, The Knowledge Engineering
Review, volumen 9(4), pp 327-354.

[14] Dean J. and Ghemawat S. (2004). MapReduce: simplified data processing on large clusters,
Proceedings of the 6th conference on Symposium on operating systems design and
implementation.

[15] Zaharia M., Chowdhury M., Franklin M. J., Shenker S. and Stoica I. (2010). Spark: Cluster
Computing with Working Sets. HotCloud 2010.

© Structuralia 40
Artificial Intelligence – Machine Learning I

[16] Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric


regression. The American Statistician. Volume 46(3).

[17] Ho, Tin Kam (1995). Random Decision Forests. Proceedings of the 3rd International
Conference on Document Analysis and Recognition, Montreal, QC, pp.278–282.

[18] Freedman D. A. (2009). Statistical Models: Theory and Practice. Cambridge University
Press.

[19] Haykin S. (1999). Neural Networks: A Comprehensive Foundation. Prentice Hall.

[20] McCulloch, W. and Pitts W. (1943). A Logical Calculus of Ideas Immanent in Nervous
Activity. Bulletin of Mathematical Biophysics. Volume 5(4). pp 115-133.

41 © Structuralia

You might also like