Chapter 1 Introduction To Neural Network and Machine Learning
Chapter 1 Introduction To Neural Network and Machine Learning
• The basic unit of computation in a neural network is the neuron, often called a node or unit.
It receives input from some other nodes, or from an external source and computes an
output.
• Each input has an associated weight (w), which is assigned on the basis of its
relative importance to other inputs. The node applies a function to the weighted
sum of its inputs.
• The idea is that the synaptic strengths (the weights w) are learnable and control the
strength of influence and its direction: excitory (positive weight) or inhibitory
(negative weight) of one neuron on another.
• In the basic model, the dendrites carry the signal to the cell body where they all get
summed.
Fundamental to Machine Learning
• If the final sum is above a certain threshold, the neuron can fire, sending
a spike along its axon.
• In the computational model, we assume that the precise timings of the spikes do not
matter, and that only the frequency of the firing communicates information. we
model the firing rate of the neuron with an activation function (e.x sigmoid
function), which represents the frequency of the spikes along the axon.
Hidden layer: the hidden layer is an intermediate layer allowing neural networks to model
nonlinear phenomena. This said to be “hidden” because there is no direct contact with the
outside world. The outputs of each hidden layer are the inputs of the units of the following
layer;
Output layer: the output layer is the last layer of the network; it produces the result, the
prediction.
Connections and weights: The network consists of connections, each connection
transferring the output of a neuron i to the input of a neuron j. In this sense i is the
predecessor of j and j is the successor of i, Each connection is assigned a weight Wij.
Activation function: In artificial neural networks this function is also called the transfer
function. ex. Sigmoid, Tanh, ReLU, Leaky ReLU.
Learning rule: The learning rule is a rule or an algorithm which modifies the parameters
of the neural network, in order for a given input to the network to produce a favored output.
This learning process typically amounts to modifying the weights and thresholds.
Bias: An additional parameter used along with the sum of the product of weights and inputs
to produce an output.
Activation Function:
Once the weighted sum is obtained, it is necessary to apply an activation function, which uses to map
inputs to outputs. The activation function, also known as the transfer function, is an essential component
of the neural network. In addition to introducing the non-linearity concept into the network, it aims at
converting the signal entering a unit (neuron) into an output signal (response).
It is said that the term machine learning was first coined by Arthur Lee Samuel, a pioneer in the AI
field, in 19591. Some definitions:
Machine learning is the field of study that gives computers the ability to learn without being
explicitly programmed. Machine learning is programming computers to optimize a
performance criterion using example data or past experience. We have a model defined up to
some parameters, and learning is the execution of a computer program to optimize the
parameters of the model using the training data or past experience. The model may be
predictive to make predictions in the future, or descriptive to gain knowledge from data, or
both.
A computer program which learns from experience is called a machine learning program or
simply a learning program. Such a program is sometimes also referred to as a learner.
The goal of machine learning is "to devise learning algorithms that do the learning
automatically without human intervention or assistance."
A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P, if its performance at tasks in T, as measured by P, improves with
experience E. — Tom Mitchell, Machine Learning Professor at Carnegie Mellon University.
Examples:-
o Handwriting recognition learning problem
Task T: Recognizing and classifying handwritten words within images
Performance P: Percent of words correctly classified
1. Logical models
2. Geometric models
3. Probabilistic models
Logical models use a logical expression to divide the instance space into segments and hence
construct grouping models. A logical expression is an expression that returns a Boolean value, i.e., a
True or False outcome. Once the data is grouped using a logical expression, the data is divided into
homogeneous groupings for the problem we are trying to solve. For example, for a classification
problem, all the instances in the group belong to one class. There are mainly two kinds of logical
models: Tree models and Rule models. Rule models consist of a collection of implications or IF-
THEN rules. For tree-based models, the ‘if-part’ defines a segment and the ‘then-part’ defines the
behavior of the model for this segment.
n = number of dimensions
pi, qi = data points
o Manhattan Distance: Manhattan Distance is the sum of absolute differences
between points across all the dimensions.
For any learning system, we must be knowing the three elements — T (Task), P (Performance
Measure), and E (Training Experience).
Supervised learning
A training set of examples with the correct responses (targets) is provided and, based on this training
set, the algorithm generalizes to respond correctly to all possible inputs. This is also called learning
from exemplars. Supervised learning is the machine learning task of learning a function that maps an
input to an output based on example input-output pairs.
In supervised learning, each example in the training set is a pair consisting of an input object (typically
a vector) and an output value. A supervised learning algorithm analyzes the training data and produces
a function, which can be used for mapping new examples. In the optimal case, the function will correctly
determine the class labels for unseen instances. Both classification and regression problems are
supervised learning problems. A wide range of supervised learning algorithms are available, each with
its strengths and weaknesses. There is no single learning algorithm that works best on all supervised
learning problems.
Unsupervised learning
Correct responses are not provided, but instead the algorithm tries to identify similarities between the
inputs so that inputs that have something in common are categorized together. The statistical approach
to unsupervised learning is known as density estimation. Unsupervised learning is a type of machine
learning algorithm used to draw inferences from datasets consisting of input data without labeled
responses. In unsupervised learning algorithms, a classification or categorization is not included in the
observations. There are no output values and so there is no estimation of functions. Since the examples
given to the learner are unlabeled, the accuracy of the structure that is output by the algorithm cannot
be evaluated. The most common unsupervised learning method is cluster analysis, which is used for
exploratory data analysis to find hidden patterns or grouping in data.
Example Consider the following data regarding patients entering a clinic. The data consists of the
gender and age of the patients.
x: A scalar denoting a single training example with 1 feature (e.g., the height of a person)
X: Design matrix, X ∈ R n×m, which stores n training examples, where m is the number of features.
Note that in order to distinguish the feature index and the training example index; we will use a
square-bracket superscript notation to refer to the ith training example and a regular subscript
notation to refer to the jth feature:
Definition: A hypothesis h is consistent with a set of training examples D if and only if h(x) = c(x) for
each example (x, c(x)) in D.
• Sports predictions
• Stock predictions
• Sentiment analysis
To extend this list slightly, these are the following 5 steps that we want to think about when
approaching a machine learning application:
• Training example: A row in the table representing the dataset. Synonymous to an observation,
training record, training instance, training sample (in some contexts, sample refers to a collection of
training examples).
• Feature, x: a column in the table representing the dataset. Synonymous to predictor, variable, input,
attribute.
• Target, y: Synonymous to outcome, output, response variable, dependent variable, (class) label,
ground truth.
• Predicted output, ˆy: use this to distinguish from targets; here, means output from the model.
• Loss function: Often used synomymously with cost function; sometimes also called error function.
In some contexts the loss for a single data point, whereas the cost function refers to the overall
(average or summed) loss over the entire dataset.
• Hypothesis: A hypothesis is a certain function that we believe (or hope) is similar to the true
function, the target function that we want to model. In context of spam classification, it would be a
classification rule we came up with that allows us to separate spam from non-spam emails.
• Model: In the machine learning field, the terms hypothesis and model are often used
interchangeably. In other sciences, they can have different meanings: A hypothesis could be the
“educated guess” by the scientist, and the model would be the manifestation of this guess to test this
hypothesis.
• Hyper parameters: Hyper parameters are the tuning parameters of a machine learning algorithm –
for example, the regularization strength of an L2 penalty in the mean squared error cost function of
linear regression, or a value for setting the maximum depth of a decision tree. In contrast, model
parameters are the parameters that a learning algorithm fits to the training data – the parameters of the
model itself. For example, the weight coefficients (or slope) of a linear regression line and its bias (or
y-axis intercept) term are model parameters.
Review Questions
1. Install required software types, select one algorithm type from each machine learning
types, and demonstrate it based on the given data (from the lab assistant).
2. How can you minimize errors of ML?
3. What is the main difference of supervised learning from unsupervised learning types?
4. Machine learning may not be applicable in every problem domains. Why?
5. Different learning models have different learning strategy. Please describe which model
helps to alleviate related to robot problems.
6. Why python and anaconda are commonly used to develop machine learning models?
7. Explain the differences among AI, ML and DL.