0% found this document useful (0 votes)
48 views

Chapter 1 Introduction To Neural Network and Machine Learning

This document provides an introduction to neural networks and machine learning. It defines an artificial neural network as a system that mimics the human brain to recognize patterns in data through a learning process. The basic unit of a neural network is the neuron or node, which receives weighted inputs and applies an activation function to produce an output. Neural networks have input, hidden, and output layers connected by weights. Machine learning allows computers to learn from data without being explicitly programmed, improving performance on tasks through experience. The key components of learning are data storage, abstraction of knowledge from stored data through training models, and generalization to new examples.

Uploaded by

samisey316
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Chapter 1 Introduction To Neural Network and Machine Learning

This document provides an introduction to neural networks and machine learning. It defines an artificial neural network as a system that mimics the human brain to recognize patterns in data through a learning process. The basic unit of a neural network is the neuron or node, which receives weighted inputs and applies an activation function to produce an output. Neural networks have input, hidden, and output layers connected by weights. Machine learning allows computers to learn from data without being explicitly programmed, improving performance on tasks through experience. The key components of learning are data storage, abstraction of knowledge from stored data through training models, and generalization to new examples.

Uploaded by

samisey316
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Chapter 1.

Introduction to Neural Network and Machine


Learning
1.1. Neural Network
An artificial neural network (ANN) is a series of algorithms that aim at recognizing
underlying relationships in a set of data through a process that mimics the way the human brain
operates. Such a system “learns” to perform tasks by analyzing examples, generally without
being programmed with task-specific rules.
A typical cortical neuron:
 Gross physical structure:
 There is one axon that branches
 There is a dendritic tree that collects input from other neurons.
 Axons typically contact dendritic trees at synapses
 A spike of activity in the axon causes charge to be injected into the post-synaptic neuron.
 Spike generation:
 There is an axon hillock that generates outgoing spikes whenever enough charge has
flowed in at synapses to depolarize the cell membrane.

• The basic unit of computation in a neural network is the neuron, often called a node or unit.
It receives input from some other nodes, or from an external source and computes an
output.
• Each input has an associated weight (w), which is assigned on the basis of its
relative importance to other inputs. The node applies a function to the weighted
sum of its inputs.
• The idea is that the synaptic strengths (the weights w) are learnable and control the
strength of influence and its direction: excitory (positive weight) or inhibitory
(negative weight) of one neuron on another.
• In the basic model, the dendrites carry the signal to the cell body where they all get
summed.
Fundamental to Machine Learning
• If the final sum is above a certain threshold, the neuron can fire, sending
a spike along its axon.
• In the computational model, we assume that the precise timings of the spikes do not
matter, and that only the frequency of the firing communicates information. we
model the firing rate of the neuron with an activation function (e.x sigmoid
function), which represents the frequency of the spikes along the axon.

Neural networks are organized in various layers:


 Input layer: the input layer neurons receive the information supposed to explain the
problem to be analyzed;

 Hidden layer: the hidden layer is an intermediate layer allowing neural networks to model
nonlinear phenomena. This said to be “hidden” because there is no direct contact with the
outside world. The outputs of each hidden layer are the inputs of the units of the following
layer;
 Output layer: the output layer is the last layer of the network; it produces the result, the
prediction.
 Connections and weights: The network consists of connections, each connection
transferring the output of a neuron i to the input of a neuron j. In this sense i is the
predecessor of j and j is the successor of i, Each connection is assigned a weight Wij.
 Activation function: In artificial neural networks this function is also called the transfer
function. ex. Sigmoid, Tanh, ReLU, Leaky ReLU.
 Learning rule: The learning rule is a rule or an algorithm which modifies the parameters
of the neural network, in order for a given input to the network to produce a favored output.
This learning process typically amounts to modifying the weights and thresholds.
 Bias: An additional parameter used along with the sum of the product of weights and inputs
to produce an output.

Activation Function:

Fundamental to Machine Learning


In neural Network, the perceptron is the first and simplest neural network model, a supervised
learning algorithm invented in 1957 by Frank Rosenblatt, a notable psychologist in the field of
artificial intelligence. This network is said to be simple because it only has two layers: an input

Fundamental to Machine Learning


layer and an output layer. This structure involves only one matrix of weights and all the units of
the input layer are connected to the output layer’s.
A perceptron takes

 Perceptron takes n input values (x1, x2, …, xn)


 n synaptic coefficients (or weights: w1, w2, …, wn);
 A bias (B): a neuron in which the activation function is equal to 1. Like the other neurons,
a bias connects itself to the previous layer neurons through a weight, usually called the
threshold.
 Each input value must then be multiplied by its respective weight (wixi), and the result
of each of these products must be added to obtain a weighted sum. The neuron will then
generate one of two possible values, determined by the fact that the result of the sum is
lower or higher than the threshold θ.

Once the weighted sum is obtained, it is necessary to apply an activation function, which uses to map

inputs to outputs. The activation function, also known as the transfer function, is an essential component

of the neural network. In addition to introducing the non-linearity concept into the network, it aims at

converting the signal entering a unit (neuron) into an output signal (response).

1.2. Machine Learning


i. What is Machine Learning
One of the main motivations why we develop (computer) programs to automate various kinds of
processes. Originally developed as a subfield of Artificial Intelligence (AI), one of the goals behind
machine learning was to replace the need for developing computer programs “manually.” Considering
that programs are being developed to automate processes, we can think of machine learning as the
process of “automating automation.” In other words, machine learning lets computers “create”

Fundamental to Machine Learning


programs (often, the intent for developing these programs is making predictions) themselves. In other
words, machine learning is the process of turning data into programs.

It is said that the term machine learning was first coined by Arthur Lee Samuel, a pioneer in the AI
field, in 19591. Some definitions:

 Machine learning is the field of study that gives computers the ability to learn without being
explicitly programmed. Machine learning is programming computers to optimize a
performance criterion using example data or past experience. We have a model defined up to
some parameters, and learning is the execution of a computer program to optimize the
parameters of the model using the training data or past experience. The model may be
predictive to make predictions in the future, or descriptive to gain knowledge from data, or
both.
 A computer program which learns from experience is called a machine learning program or
simply a learning program. Such a program is sometimes also referred to as a learner.

 ML- is an AI technology that appears to emulate human performance typically by


learning, coming to its own conclusions.

 The goal of machine learning is "to devise learning algorithms that do the learning
automatically without human intervention or assistance."

Figure: Machine learning vs. “classic” programming.

 A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P, if its performance at tasks in T, as measured by P, improves with
experience E. — Tom Mitchell, Machine Learning Professor at Carnegie Mellon University.
 Examples:-
o Handwriting recognition learning problem
 Task T: Recognizing and classifying handwritten words within images
 Performance P: Percent of words correctly classified

Fundamental to Machine Learning


 Training experience E: A dataset of handwritten words with given
classifications
o A robot driving learning problem
 Task T: Driving on highways using vision sensors
 Performance measure P: Average distance traveled before an error
 training experience: A sequence of images and steering commands recorded
while observing a human driver
o A chess learning problem
 Task T: Playing chess
 Performance measure P: Percent of games won against opponents
 Training experience E: Playing practice games against itself.

ii. Components of Learning


 Data storage: - Facilities for storing and retrieving huge amounts of data are an important
component of the learning process. Humans and computers alike utilize data storage as a
foundation for advanced reasoning.
o In a human being, the data is stored in the brain and data is retrieved using
electrochemical signals.
o Computers use hard disk drives, flash memory, random access memory and similar
devices to store data and use cables and other technology to retrieve data.
 Abstraction: it is the second component of the learning process, which is the process of
extracting knowledge about stored data. This involves creating general concepts about the
data as a whole. The creation of knowledge involves application of known models and creation
of new models. The process of fitting a model to a dataset is known as training. When the
model has been trained, the data is transformed into an abstract form that summarizes the
original information.
 Generalization: The third component of the learning process is known as generalization. The
term generalization describes the process of turning the knowledge about stored data into a
form that can be utilized for future action. These actions are to be carried out on tasks that are
similar, but not identical, to those what have been seen before. In generalization, the goal is to
discover those properties of the data that will be most relevant to future tasks.
 Evaluation: Evaluation is the last component of the learning process. It is the process of giving
feedback to the user to measure the utility of the learned knowledge. This feedback is then
utilized to effect improvements in the whole learning process

Fundamental to Machine Learning


Figure- Components of Learning

 Machine Learning is used when:

 Human expertise does not exist (E.g. navigating on Mars),


 Humans are unable to explain their expertise (speech recognition)
 Solution changes in time (routing on a computer network)
 Solution needs to be adapted to particular cases (user biometrics)
 Models must be customized (personalized medicine)
 Models are based on huge amounts of data (genomics)

iii. Learning Models


Machine learning is concerned with using the right features to build the right models that achieve the
right tasks. The basic idea of Learning models has divided into three categories: -

1. Logical models
2. Geometric models
3. Probabilistic models

Logical models use a logical expression to divide the instance space into segments and hence
construct grouping models. A logical expression is an expression that returns a Boolean value, i.e., a
True or False outcome. Once the data is grouped using a logical expression, the data is divided into
homogeneous groupings for the problem we are trying to solve. For example, for a classification
problem, all the instances in the group belong to one class. There are mainly two kinds of logical
models: Tree models and Rule models. Rule models consist of a collection of implications or IF-
THEN rules. For tree-based models, the ‘if-part’ defines a segment and the ‘then-part’ defines the
behavior of the model for this segment.

Geometric models: In this section, we consider models that define similarity


by considering the geometry of the instance space. In Geometric models,
features could be described as points in two dimensions (x- and y-axis) or a
three-dimensional space (x, y, and z). We could use geometric concepts like:

Fundamental to Machine Learning


 Lines or planes to segment (classify) the instance space. These are called Linear models. we
have an equation of the form f (x) = mx + c where c represents the intercept and m represents
the slope.
 Alternatively, we can use the geometric notion of distance to represent similarity. In this case,
if two points are close together, they have similar values for features and thus can be classed as
similar. We call such models as Distance-based models. As the name implies, distance-based
models work on the concept of distance. In the context of Machine learning, the concept of
distance is not based on merely the physical distance between two points. Instead, we could
think of the distance between two points considering the mode of transport between two points.
The distance metrics commonly used are Euclidean, Minkowski, Manhattan, and
Mahalanobis.

o Euclidean Distance: represents the shortest distance between two points.

n = number of dimensions
pi, qi = data points
o Manhattan Distance: Manhattan Distance is the sum of absolute differences
between points across all the dimensions.

o Minkowski Distance: the generalized form of Euclidean and Manhattan


Distance.

Here, p represents the order of the norm.

o Hamming Distance: Hamming Distance measures the similarity between two


strings of the same length. The Hamming Distance between two strings of the same
length is the number of positions at which the corresponding characters are different.

Fundamental to Machine Learning


Probabilistic models. The process of modelling represents and manipulates the level of uncertainty
with respect to these variables. There are two types of probabilistic models: Predictive and Generative.
Predictive probability models use the idea of a conditional probability distribution P (Y |X) from which
Y can be predicted from X. Generative models estimate the joint distribution P (Y, X). We can do this
using the Bayes rule defined as

• P(Y/X) = Posterior probability (probability of hypothesis is true given the evidence)


• P(X/Y) = Likelihood ratio (probability of seeing the evidence if the hypothesis is true)
• P(Y) = Class Prior probability (probability of hypothesis is true, before any evidence is
present)
• P(X) = Predictor Prior probability (probability of observing the evidence)
The Naïve Bayes algorithm is based on the idea of Conditional Probability. Conditional probability is
based on finding the probability that something will happen, given that something else has already
happened. The task of the algorithm then is to look at the evidence and to determine the likelihood of
a specific class and assign a label accordingly to each entity.

Geometric models Logical models Probabilistic models

E.g. K-nearest neighbors, Decision tree, random Naïve Bayes, Gaussian


linear regression, support forest, … process regression,
vector machine, logistic conditional random field,
regression,

Figure - Designing a Learning System

For any learning system, we must be knowing the three elements — T (Task), P (Performance
Measure), and E (Training Experience).

iv. Types of Learning

Fundamental to Machine Learning


In general, machine-learning algorithms can be classified into three types.
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning

Supervised learning
A training set of examples with the correct responses (targets) is provided and, based on this training
set, the algorithm generalizes to respond correctly to all possible inputs. This is also called learning
from exemplars. Supervised learning is the machine learning task of learning a function that maps an
input to an output based on example input-output pairs.
In supervised learning, each example in the training set is a pair consisting of an input object (typically
a vector) and an output value. A supervised learning algorithm analyzes the training data and produces
a function, which can be used for mapping new examples. In the optimal case, the function will correctly
determine the class labels for unseen instances. Both classification and regression problems are
supervised learning problems. A wide range of supervised learning algorithms are available, each with
its strengths and weaknesses. There is no single learning algorithm that works best on all supervised
learning problems.

Figure: Supervised learning Structure

Fundamental to Machine Learning


Example Consider the following data regarding patients entering a clinic. The data consists of the
gender and age of the patients and each patient is labeled as “healthy” or “sick”.

Unsupervised learning
Correct responses are not provided, but instead the algorithm tries to identify similarities between the
inputs so that inputs that have something in common are categorized together. The statistical approach
to unsupervised learning is known as density estimation. Unsupervised learning is a type of machine
learning algorithm used to draw inferences from datasets consisting of input data without labeled
responses. In unsupervised learning algorithms, a classification or categorization is not included in the
observations. There are no output values and so there is no estimation of functions. Since the examples
given to the learner are unlabeled, the accuracy of the structure that is output by the algorithm cannot
be evaluated. The most common unsupervised learning method is cluster analysis, which is used for
exploratory data analysis to find hidden patterns or grouping in data.
Example Consider the following data regarding patients entering a clinic. The data consists of the
gender and age of the patients.

Figure- Unsupervised Learning Structure

Fundamental to Machine Learning


Reinforcement learning:
This is somewhere between supervised and unsupervised learning. The algorithm gets told when the
answer is wrong, but does not get told how to correct it. It has to explore and try out different
possibilities until it works out how to get the answer right. Reinforcement learning is sometime called
learning with a critic because of this monitor that scores the answer, but does not suggest improvements.
Reinforcement learning is the problem of getting an agent to act in the world so as to maximize its
rewards. A learner (the program) is not told what actions to take as in most forms of machine learning,
but instead must discover which actions yield the most reward by trying them. In the most interesting
and challenging cases, actions may affect not only the immediate reward but also the next situations
and, through that, all subsequent rewards.
Example Consider teaching a dog a new trick: we cannot tell it what to do, but we can reward/punish it
if it does the right/wrong thing. It has to find out what it did that made it get the reward/punishment.
We can use a similar method to train computers to do many tasks, such as playing backgammon or
chess, scheduling jobs, and controlling robot limbs. Reinforcement learning is different from supervised
learning. Supervised learning is learning from examples provided by a knowledgeable expert.

Figure – Reinforcement Learning


Example

Figure – Summary of Categories of Machine Learning

v. Data Representation and Mathematical Notation


If we have the ith pair in a labeled training set D as < x [i] , y[i] >.

 x: A scalar denoting a single training example with 1 feature (e.g., the height of a person)

Fundamental to Machine Learning


 x: A training example with m features (e.g,. with m = 3 we could represent the height,
weight, and age of a person), represented as a column vector (i.e., a matrix with 1 column,
x ∈ R m),

X: Design matrix, X ∈ R n×m, which stores n training examples, where m is the number of features.

Note that in order to distinguish the feature index and the training example index; we will use a
square-bracket superscript notation to refer to the ith training example and a regular subscript
notation to refer to the jth feature:

The corresponding targets are represented in a column vector y, y ∈ R n :

Definition: A hypothesis h is consistent with a set of training examples D if and only if h(x) = c(x) for
each example (x, c(x)) in D.

Note difference between definitions of consistent and satisfies

 An example x is said to satisfy hypothesis h when h(x) = 1, regardless of whether x is a positive


or negative example of the target concept.
 An example x is said to consistent with hypothesis h iff h(x) = c(x)
Definition: version space- The version space, denoted V SH, D with respect to hypothesis space H and
training examples D, is the subset of hypotheses from H consistent with the training examples in D

Fundamental to Machine Learning


vi. Applications of Machine Learning
Application of machine learning methods to large databases is called data mining. In data mining, a
large volume of data is processed to construct a simple model with valuable use, for example, having
high predictive accuracy.

The following is a list of some of the typical applications of machine learning.

1. In retail business, machine learning is used to study consumer behavior.


2. In finance, banks analyze their past data to build models to use in credit applications, fraud
detection, and the stock market.
3. In manufacturing, learning models are used for optimization, control, and troubleshooting.
4. In medicine, learning programs are used for medical diagnosis.
5. In telecommunications, call patterns are analyzed for network optimization and maximizing
the quality of service.
6. In science, large amounts of data in physics, astronomy, and biology can only be analyzed fast
enough by computers. The World Wide Web is huge; it is constantly growing and searching
for relevant information cannot be done manually.
7. In artificial intelligence, it is used to teach a system to learn and adapt to changes so that the
system designer need not foresee and provide solutions for all possible situations.
8. It is used to find solutions to many problems in vision, speech recognition, and robotics.
9. Machine learning methods are applied in the design of computer-controlled vehicles to steer
correctly when driving on a variety of roads.
10. Machine learning methods have been used to develop programmers for playing games such as
chess, backgammon and Go.

Examples of machine learning applications :

• Email spam detection

• Face detection and matching (e.g., iPhone X)

• Web search (e.g., DuckDuckGo, Bing, Google)

• Sports predictions

• Post office (e.g., sorting letters by zip codes)

• ATMs (e.g., reading checks)

• Credit card fraud

• Stock predictions

• Smart assistants (Apple Siri, Amazon Alexa, . . . )

• Product recommendations (e.g., Netflix, Amazon)

• Self-driving cars (e.g., Uber, Tesla)

Fundamental to Machine Learning


• Language translation (Google translate)

• Sentiment analysis

vii. Components of Machine Learning Algorithms


 Representation. The first component is the “representation,” i.e., which hypotheses we can
represent given a certain algorithm class.
 Optimization. The second component is the optimization metric that we use to fit the model.
Evaluation.
 The evaluation component is the step where we evaluate the performance of the model after
model fitting.

To extend this list slightly, these are the following 5 steps that we want to think about when
approaching a machine learning application:

1. Define the problem to be solved.


2. Collect (labeled) data.
3. Choose an algorithm class.
4. Choose an optimization metric for learning the model.
5. Choose a metric for evaluating the model.
viii. Glossary of ML
Glossary Machine learning borrows concepts from many other fields and redefines what has been
known in other fields under different names. Below is a small glossary of machine learningspecfic
terms along with some key concepts to help navigate the machine learning literature.

• Training example: A row in the table representing the dataset. Synonymous to an observation,
training record, training instance, training sample (in some contexts, sample refers to a collection of
training examples).

• Training: model fitting, for parametric models similar to parameter estimation.

• Feature, x: a column in the table representing the dataset. Synonymous to predictor, variable, input,
attribute.

• Target, y: Synonymous to outcome, output, response variable, dependent variable, (class) label,
ground truth.

• Predicted output, ˆy: use this to distinguish from targets; here, means output from the model.

• Loss function: Often used synomymously with cost function; sometimes also called error function.
In some contexts the loss for a single data point, whereas the cost function refers to the overall
(average or summed) loss over the entire dataset.

• Hypothesis: A hypothesis is a certain function that we believe (or hope) is similar to the true
function, the target function that we want to model. In context of spam classification, it would be a
classification rule we came up with that allows us to separate spam from non-spam emails.

• Model: In the machine learning field, the terms hypothesis and model are often used
interchangeably. In other sciences, they can have different meanings: A hypothesis could be the
“educated guess” by the scientist, and the model would be the manifestation of this guess to test this
hypothesis.

Fundamental to Machine Learning


• Learning algorithm: Again, our goal is to find or approximate the target function, and the learning
algorithm is a set of instructions that tries to model the target function using our training dataset. A
learning algorithm comes with a hypothesis space, the set of possible hypotheses it explores to model
the unknown target function by formulating the final hypothesis.

• Classifier: A classifier is a special case of a hypothesis (nowadays, often learned by a machine


learning algorithm). A classifier is a hypothesis or discrete-valued function that is used to assign
(categorical) class labels to particular data points. In an email classification example, this classifier
could be a hypothesis for labeling emails as spam or non-spam. Yet, a hypothesis must not necessarily
be synonymous to the term classifier. In a different application, our hypothesis could be a function for
mapping study time and educational backgrounds of students to their future, continuous-valued, SAT
scores – a continuous target variable, suited for regression analysis.

• Hyper parameters: Hyper parameters are the tuning parameters of a machine learning algorithm –
for example, the regularization strength of an L2 penalty in the mean squared error cost function of
linear regression, or a value for setting the maximum depth of a decision tree. In contrast, model
parameters are the parameters that a learning algorithm fits to the training data – the parameters of the
model itself. For example, the weight coefficients (or slope) of a linear regression line and its bias (or
y-axis intercept) term are model parameters.

Review Questions
1. Install required software types, select one algorithm type from each machine learning
types, and demonstrate it based on the given data (from the lab assistant).
2. How can you minimize errors of ML?
3. What is the main difference of supervised learning from unsupervised learning types?
4. Machine learning may not be applicable in every problem domains. Why?
5. Different learning models have different learning strategy. Please describe which model
helps to alleviate related to robot problems.
6. Why python and anaconda are commonly used to develop machine learning models?
7. Explain the differences among AI, ML and DL.

Fundamental to Machine Learning

You might also like