0% found this document useful (0 votes)

15 views

COSC 210 INTRODUCTION TO MACHINE LEARNING Module I-1

Uploaded by

Ustäz Däñ Mätërwällé

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

COSC 210 INTRODUCTION TO MACHINE LEARNING Module I-1

Uploaded by

Ustäz Däñ Mätërwällé

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

1

GOMBE STATE UNIVERSITY

DEPARTMENT OF COMPUTER SCIENCE
COSC 210 (2 CU) Introduction to Machine Learning 2023/2024 Session
Module I
INTRODUCTION TO MACHINE LEARNING
In this chapter, we consider different definitions of the term “Machine Learning” and explain what
is meant by “Learning” in the context of Machine Learning. We also discuss the various
components of the Machine Learning process. There are also brief discussions about different
types learning like supervised learning, unsupervised learning and reinforcement learning.
1.1 Introduction
1.1.1 Definition of Machine Learning
Arthur Samuel, an early American leader in the field of computer gaming and artificial
intelligence, coined the term “Machine Learning” in 1959 while at IBM. He defined Machine
Learning as “the field of study that gives computers the ability to learn without being explicitly
programmed.” However, there is no universally accepted definition for Machine Learning.
Different authors define the term differently. We give below three more definitions.
1. Machine Learning is programming computers to optimize a performance criterion using
example data or past experience. We have a model defined up to some parameters, and
learning is the execution of a computer program to optimize the parameters of the model
using the training data or past experience. The model may be predictive to make
predictions in the future, or descriptive to gain knowledge from data, or both.
2. The field of study known as Machine Learning is concerned with the question of how to
construct computer programs that automatically improve with experience.
3. Machine Learning can be broadly defined as computational methods using experience to
improve performance or to make accurate predictions. Here, experience refers to the past
information available to the learner, which typically takes the form of electronic data
collected and made available for analysis. This data could be in the form of digitized
human-labeled training sets, or other types of information obtained via interaction with
the environment. In all cases, its quality and size are crucial to the success of the
predictions made by the learner
Remarks
In the above definitions we have used the term “model” and we will be using this term at several
contexts later. It appears that there is no universally accepted one sentence definition of this
term. Loosely, it may be understood as some mathematical expression or equation, or some
mathematical structures such as graphs and trees, or a division of sets into disjoint subsets, or a
2

set of logical “if . . . then . . . else . . .” rules, or some such thing. It may be noted that this is not
an exhaustive list.
1.1.2 Definition of Learning
Definition
A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks T, as measured by P, improves with experience
E.
Examples
i) Handwriting recognition learning problem
• Task T: Recognising and classifying handwritten words within images
• Performance P: Percent of words correctly classified
• Training experience E: A dataset of handwritten words with given
classifications
ii) A robot driving learning problem
• Task T: Driving on highways using vision sensors
• Performance measure P: Average distance traveled before an error
• Training experience: A sequence of images and steering commands recorded
while observing a human driver
iii) A chess learning problem
• Task T: Playing chess
• Performance measure P: Percent of games won against opponents
• Training experience E: Playing practice games against itself
Machine Learning program
A computer program which learns from experience is called a Machine Learning program or
simply a learning program. Such a program is sometimes also referred to as a learner.
1.2 Machines Learning Process
1.2.1 Basic components of learning process
The learning process, whether by a human or a machine, can be divided into four components,
namely, data storage, abstraction, generalization and evaluation. Figure 1.1 illustrates the various
components and the steps involved in the learning process.
3

Figure 1.1: Components of Learning Process

1. Data Storage
Facilities for storing and retrieving huge amounts of data are an important
component of the learning process. Humans and computers alike utilize data storage as a
foundation for advanced reasoning.
• In a human being, the data is stored in the brain and data is retrieved using
electrochemical signals.
• Computers use hard disk drives, flash memory, random access memory and similar
devices to store data and use cables and other technology to retrieve data.
2. Abstraction
The second component of the learning process is known as abstraction.
Abstraction is the process of extracting knowledge about stored data. This involves
creating general concepts about the data as a whole. The creation of knowledge involves
application of known models and creation of new models. The process of fitting a model
to a dataset is known as training. When the model has been trained, the data is
transformed into an abstract form that summarizes the original information.
3. Generalization
The third component of the learning process is known as generalisation. The term
generalization describes the process of turning the knowledge about stored data into a
form that can be utilized for future action. These actions are to be carried out on tasks
that are similar, but not identical, to those what have been seen before. In generalization,
the goal is to discover those properties of the data that will be most relevant to future
tasks.
4. Evaluation
Evaluation is the last component of the learning process. It is the process of giving
feedback to the user to measure the utility of the learned knowledge. This feedback is
then utilised to effect improvements in the whole learning process.
1.3 Applications of Machine Learning
Application of Machine Learning methods to large databases is called data mining. In data mining,
a large volume of data is processed to construct a simple model with valuable use, for example,
having high predictive accuracy. The following is a list of some of the typical applications of
Machine Learning.
1. In retail business, Machine Learning is used to study consumer behaviour.
4

2. In finance, banks analyze their past data to build models to use in credit applications, fraud
detection, and the stock market.
3. In manufacturing, learning models are used for optimization, control, and troubleshooting.
4. In medicine, learning programs are used for medical diagnosis.
5. In telecommunications, call patterns are analyzed for network optimization and maximizing the
quality of service.
6. In science, large amounts of data in physics, astronomy, and biology can only be analyzed fast
enough by computers. The World Wide Web is huge; it is constantly growing and searching for
relevant information cannot be done manually.
7. In artificial intelligence, it is used to teach a system to learn and adapt to changes so that the
system designer need not foresee and provide solutions for all possible situations.
8. It is used to find solutions to many problems in vision, speech recognition, and robotics.
9. Machine Learning methods are applied in the design of computer-controlled vehicles to steer
correctly when driving on a variety of roads.

10. Machine Learning methods have been used to develop programmes for playing games such
as chess, backgammon and Go.

11. Machine Learning is used in text or document classification, e.g., spam detection

12. It is also used in Natural Language Processing, e.g., morphological analysis, part-of-speech
tagging, statistical parsing, named-entity recognition

The list goes on, but here are more applications of Machine Learning : Speech recognition- speech
synthesis, speaker verification, Optical character recognition (OCR), Computational biology
applications, e.g., protein function or structured prediction, Computer vision tasks, e.g., image
recognition, face detection, Fraud detection (credit card, telephone) and network intrusion,
Unassisted vehicle control (robots, navigation), Recommendation systems, search engines,
information extraction systems, etc.
1.4 Understanding Data
Since an important component of the Machine Learning process is data storage, we briefly
consider in this section the different types and forms of data that are encountered in the Machine
Learning process.
1.4.1 Unit of observation
Unit of observation is the smallest entity with measured properties of interest for a study.
5

Examples
• A person, an object or a thing
• A time point
• A geographic region
• A measurement
Sometimes, units of observation are combined to form units such as person-years.
1.4.2 Examples, Features and Labels
Datasets that store the units of observation and their properties can be imagined as collections
of data consisting of Examples and Features.
Examples
An “example” is an instance of the unit of observation for which properties have been recorded.
An “example” is also referred to as an “instance”, or “case” or “record.” (It may be noted that the
word “example” has been used here in a technical sense.) It typically represents a single
observation or unit of data used for training or testing a model.
Features
A “feature” is the set of attributes, often represented as a vector, associated to an example. It is
a recorded property or a characteristic of examples. It is also referred to as “attribute”, or
“variable”.
Label: Values or categories assigned to examples. In classification problems, examples are
assigned specific categories, for instance, the spam and non-spam categories in a binary
classification problem. In regression, items are assigned real-valued labels. Label is the output or
target variable that the model is trying to predict or classify. Labels are used in supervised
learning.
Examples for “examples”, “features” and “Label”
Case1: Cancer detection
Consider the problem of developing a model for detecting cancer. In this study we note the
following.
(a) The units of observation are the patients.
(b) The examples are members of a sample of cancer patients.
(c) The features can be: Gender, Age, Blood pressure, the findings of the pathology report after a
biopsy, etc.
6

(d) Label: Cancer status.

Case 2. House prices prediction
(a) The units of observation are the houses.
(b) The examples are sample of houses within a particular region.
(c) The features might include: square footage, number of bedrooms, location, building age, etc.
(d) Label: prices of the houses.
Case 3. Pet selection: Suppose we want to predict the type of pet a person will choose.
(a) The units are the persons.
(b) The examples are members of a sample of persons who own pets.
(c) The features might include: age, home region, family income, etc. of persons who own pets,
etc.
(d) Label: Pet names.
Case 4. Class of Degree Prediction: Suppose we want to predict the class of degree a student is
likely to graduate with in Gombe State University.
(a) The units of observations are students.
(b) The examples are sample of student who have graduated.
(c) The features might include: age, family background, type of sponsorship, O’level grades, UTME
Score, etc.
(d) Label: Class of Degrees.
1.4.3 Dataset
A data set is a collection of related information or records. The information may be on some entity
or some subject area. For example (Fig. 1.2), we may have a data set on students in which each
record consists of information about a specific student. Again, we can have a data set on student
performance which has records providing performance, i.e. marks on the individual subjects.
7

Figure 1.2: Examples of Data set

1.4.4. Training set: In Machine Learning, data is split into training data and test data. Training set
are examples used to train a learning algorithm. In our spam problem, the training sample consists
of a set of email examples along with their associated labels. The training sample varies for
different learning scenarios.
1.4.5 Validation sample: Examples used to tune the parameters of a learning algorithm when
working with labeled data. Learning algorithms typically have one or more free parameters, and
the validation sample is used to select appropriate values for these model parameters.
1.4.6 Test sample: Examples used to evaluate the performance of a learning algorithm. The test
sample is separate from the training and validation data and is not made available in the learning
stage. In the spam problem, the test sample consists of a collection of email examples for which
the learning algorithm must predict labels based on features. These predictions are then
compared with the labels of the test sample to measure the performance of the algorithm.

1.5 Different forms of data

In the realm of Machine Learning, understanding data types, also known as measurement scales,
is crucial for effective data analysis. This understanding guides the selection of the appropriate
visualization and Machine Learning methods.

Data can broadly be divided into following two types:

1. Qualitative data
2. Quantitative data
1.5.1 Qualitative Data
Qualitative data also called categorical data provides information about the quality of an object
or information which cannot be measured. For example, if we consider the quality of
performance of students in terms of ‘Good’, ‘Average’, and ‘Poor’, it falls under the category of
qualitative data. Also, name or roll number of students are information that cannot be measured
using some scale of measurement. So they would fall under qualitative data.
Qualitative data can be further subdivided into two types as follows:
1. Nominal data
2. Ordinal data
1.5.1.1 Nominal Data
Nominal data is one which has no numeric value, but a named value. It is used for assigning
named values to attributes. Nominal values cannot be quantified. Examples of nominal data are:
1. Blood group: A, B, O, AB, etc.
2. Nationality: Indian, American, British, etc.
3. Gender: Male, Female.
4. Colour: Red, Green, Blue, etc.
1.5.1.2 Ordinal Data
Ordinal data, in addition to possessing the properties of nominal data, can also be naturally
ordered. This means ordinal data also assigns named values to attributes but unlike nominal data,
they can be arranged in a sequence of increasing or decreasing value so that we can say whether
a value is better than or greater than another value. Examples of ordinal data are:
1. Customer satisfaction: ‘Very Happy’, ‘Happy’, ‘Unhappy’, etc.
2. Grades: A, B, C, etc.
3. Hardness of Metal: ‘Very Hard’, ‘Hard’, ‘Soft’, etc.
1.5.2 Quantitative Data
Quantitative data also refer to as numeric data, relates to information about the quantity of an
object – hence it can be measured. For example, if we consider the attribute ‘score’, it can be
measured using a scale of measurement. Quantitative data is also termed as numeric data. There
are two types of quantitative data:
1. Interval data
9

2. Ratio data
1.5.2.1 Interval Data
Interval data is numeric data for which not only the order is known, but the exact difference
between values is also known. An ideal example of interval data is Celsius temperature. The
difference between each value remains the same in Celsius temperature. For example, the
difference between 12°C and 18°C degrees is measurable and is 6°C as in the case of difference
between 15.5°C and 21.5°C. Other examples include date, time, etc.
Interval data do not have something called a ‘true zero’ value. For example, there is nothing called
‘0 temperature’ or ‘no temperature’. Hence, only addition and subtraction applies for interval
data. The ratio cannot be applied. This means, we can say a temperature of 40°C is equal to the
temperature of 20°C + temperature of 20°C. However, we cannot say the temperature of 40°C
means it is twice as hot as in temperature of 20°C.
1.5.2.2 Ratio Data
Ratio data represents numeric data for which exact value can be measured. Absolute zero is
available for ratio data. Also, these variables can be added, subtracted, multiplied, or divided. The
central tendency can be measured by mean, median, or mode and methods of dispersion such as
standard deviation. Examples of ratio data include height, weight, age, salary, etc.
Figure 1.3 gives a summarized view of different types of data that we may find in a typical Machine
Learning problem.

Figure 1.3 Type of Data

2.1 Types of Machine Learning

Machine Learning incorporates several hundred statistical-based algorithms and choosing the
right algorithm or combination of algorithms for the job is a constant challenge for anyone
working in this field. But before we examine specific algorithms, it is important to understand the
three overarching categories of Machine Learning. These three categories are:
10

1. Supervised learning – Also called predictive learning. A machine predicts the class of unknown
objects based on prior class-related information of similar objects.
2. Unsupervised learning – Also called descriptive learning. A machine finds patterns in unknown
objects by grouping similar objects together.
3. Reinforcement learning – A machine learns to act on its own to achieve the given goals.
These categories differ in the types of training data available to the learner, the order and method
by which training data is received and the test data used to evaluate the learning algorithm. Figure
2.1 Shows the different categories of Machine Learning

Figure 2.1: Types of Machine Learning

Following are subdivisions of Machine Learning:
i. Supervised learning

ii. Unsupervised learning

iii. Reinforcement learning

2.1.1. Supervised Machine Learning

Supervised Machine Learning involves the objective of understanding the mapping function
'f' that relates the input variable (X) to the output variable (Y), as represented in the equation
(2.1).
Y = f (X ) (2.1)
11

Where
Y : the out variable (Target)
X : the input variable (Set of features)
Supervised learning concentrates on learning patterns through connecting the relationship
between variables and known outcomes and working with labeled datasets. Supervised learning
works by feeding the machine sample data with various features (represented as “X”) and the
correct value output of the data (represented as “y”). The fact that the output and feature values
are known qualifies the dataset as “labeled.” The algorithm then deciphers patterns that exist in
the data and creates a model that can reproduce the same underlying rules with new data. The
algorithm learns from a training set and ceases learning once a satisfactory level of performance
is achieved.
Supervised Machine Learning can be categorized into:
i. Classification (where the output variable requires categorization)

ii. Regression (where the output is a real value).

Examples of Supervised Machine Learning algorithms include linear regression, random forest,
and Support Vector Machine (SVM).

Figure 2.2 is a simple depiction of the supervised learning process. Labelled training data
containing past information comes as an input. Based on the training data, the machine builds a
predictive model that can be used on test data to assign a label for each example in the test data.

Figure 2.2: Supervised Learning

2.1.2 Unsupervised Machine Learning

Unsupervised Machine Learning revolves around the objective of understanding and exploring
unlabelled input data (X) without the aid of historical data. The objective is to take a dataset as
input and try to find natural groupings or patterns within the data elements or examples.
Therefore, unsupervised learning is often termed as descriptive model and the process of
unsupervised learning is referred as pattern discovery or knowledge discovery.

Unsupervised Machine Learning is categorized into:

i. Association (aimed at discovering rules to describe the data)

ii. Clustering (focused on identifying inherent groups within the data).

Examples of Unsupervised Machine Learning methods include Apriori (Association) and k-means
(clustering).
Figure 2.3 Depict the unsupervised learning process.

Figure 2.3: Unsupervised Learning

2.1.3 Reinforcement Learning

Reinforcement learning is the third and most advanced algorithm category in Machine Learning.
It is a Machine Learning process that continuously improves its model by laveraging feedback
from previous iterations. Reinforcement Learning is centered around the goal of mapping actions
to situations in a way that maximizes the obtained rewards. This mapping process involves
considering not only the immediate rewards but also the rewards in subsequent steps.

Reinforcement learning can be complicated and is probably best explained through an analogy to
a video game. As a player progresses through the virtual space of a game, they learn the value of
various actions under different conditions and become more familiar with the field of play. Those
learned values then inform and influence a player’s subsequent behavior and their performance
immediately improves based on their learning and past experience. Reinforcement learning is
very similar, where algorithms are set to train the model through continuous learning. A standard
reinforcement learning model has measurable performance criteria where outputs are not
tagged—instead, they are graded. In the case of self-driving vehicles, avoiding a crash will allocate
a positive score and in the case of chess, avoiding defeat will likewise receive a positive score.

Examples of Reinforcement Learning methods include Monte-Carlo, Markov decision, Q-learning

and Temporal Difference methods. Reinforcement Learning process is shown in Figure 2.4

Figure 2.4: Reinforcement Learning

Differences Between Supervised, Unsupervised and Reinforcement Machine Learning

The differences between the three categories of Machine Learning is shown in Table 2.1

Table 2.1: Differences between Supervised, Unsupervised and Reinforcement Learning

2.2 Probability and Statistics Review

In this section we will discuss the tools, equations, and models of probability that are useful for
Machine Learning domain.
2.2.1 Importance of Statistical Tools in Machine Learning
In Machine Learning, we train the system by using a limited data set called ‘training data’ and
based on the confidence level of the training data we expect the Machine Learning algorithm to
depict the behaviour of the larger set of actual data. If we have observation on a subset of events,
called ‘sample’, then there will be some uncertainty in attributing the sample results to the whole
set or population. So, the question was how a limited knowledge of a sample set can be used to
predict the behaviour of a real set with some confidence. It was realized by mathematicians that
even if some knowledge is based on a sample, if we know the amount of uncertainty related to
it, then it can be used in an optimum way without causing loss of knowledge. Probability theory
provides a mathematical foundation for quantifying this uncertainty of the knowledge. This is
depicted in figure 2.5.

Figure 2.5 Knowledge and Uncertainty

2.2.2 Probalility Theory Review
The basic concept of Machine Learning is that we want to have a limited set of ‘Training’ data
that we use as a representative of a large set of Actual data and through probability distribution
we try to find out how an event which is matching with the training data can represent the
outcome with some confidence.
Foundation rules
p(A) denotes the probability that the event A is true.
0 ≤ p(A) ≤ 1 denotes that the probability of this event happening lies between 0 and 1, where
p(A) = 0 means the event will definitely not happen, and
p(A) = 1 means the event will definitely happen.
p(A̅ ) denotes the probability of the event not A,
defined as p(A̅ ) = 1 − p(A).
16

It is also common practice to write A = 1 to mean the event A is true, and A = 0 to mean the event
A is false. So, this is a binary event where the event is either true or false but can’t be something
indefinite. The probability of selecting an event A, from a sample size of X is defined as

where n is the number of times the instance of event A is present in the sample of size X.
2.2.2.1 Probability of a Union of two Events
Two events A and B are called mutually exclusive if they can’t happen together. For any two
events, A and B, the probability of A or B is defined as:

if A and B are Mutually exclusive.

2.2.2.2 Joint Probabilities (Product rule)
The probability of the joint event A and B is defined as the product rule:

where p(A|B) is defined as the conditional probability of event A happening if event B happens.
Based on this joint distribution on two events p(A, B).
2.2.2.3 Conditional Probability
We define the conditional probability of event A, given that event B is true, as follows:

where, p(A, B) is the joint probability of A and B and can also be denoted as p(A ∩ B)
Similarly,
17

Example
In a toy-making shop, the automated machine produces few defective pieces. It is observed that
in a lot of 1,000 toy parts, 25 are defective. If two random samples are selected for testing without
replacement (meaning that the first sample is not put back to the lot and thus the second sample
is selected from the lot size of 999) from the lot, calculate the probability that both the samples
are defective.
Solution:
Let A denote the probability of first part being defective and B denote the second part being
defective. Here, we have to employ the conditional probability of the second part being found
defective when the first part is already found defective. By law of probability,

As we are selecting the second sample without replacing the first sample into the lot and the first
one is already found defective, there are now 24 defective pieces out of 999 pieces left in the lot.
Thus,

= 0.0006
Which is the probability of both the parts being found defective.
3.0 CATEGORIES OF SUPERVISED MACHINE LEARNING.
As we have discussed earlier, Supervised Machine Learning is categorized into Classification and
Regression, we will now discuss these two categories.
3.1 Classification
Classification is a type of supervised learning where a target feature, which is of categorical type,
is predicted for test data on the basis of the information imparted by the training data. The
responsibility of the classification model is to assign class label to the target feature based on the
value of the predictor features.
18

A classification problem is one where the output variable is a category such as ‘red’ or ‘blue’ or
‘malignant tumour’ or ‘benign tumour’, etc. The target categorical feature is known as class.

A critical classification problem in the context of the banking domain is identifying potentially
fraudulent transactions. Because there are millions of transactions which have to be scrutinized
to identify whether a particular transaction might be a fraud transction, it is not possible for any
human being to carry out this task. Machine Learning is leveraged efficiently to do this task, and
this is a classic case of classification. On the basis of the past transaction data, especially the ones
labelled as fraudulent, all new incoming transactions are marked or labelled as usual or
suspicious. The suspicious transactions are subsequently segregated for a closer review.

Figure 3.1 Classification Model

Some typical classification problems include the following:
• Image classification
• Disease prediction
19

• Win–loss prediction of games

• Prediction of natural calamity such as earthquake, flood, etc.
• Handwriting recognition
3.2 Classification Learning Steps
The Classification step is represented as in figure 3.2

Figure 3.2 Classification Model Step

Problem Identification:
Identifying the problem is the first step in the supervised learning model. The problem needs to
be a well-formed problem,i.e. a problem with well-defined goals and benefit, which has a long-
term impact.
Identification of Required Data:
20

On the basis of the problem identified above, the required data set that precisely represents the
identified problem needs to be identified/evaluated. For example: If the problem is to predict
whether a tumour is malignant or benign, then the corresponding patient data sets related to
malignant tumour and benign tumours are to be identified.
Data Pre-processing:
This is related to the cleaning/transforming the data set. This step ensures that all the
unnecessary/irrelevant data elements are removed. Data pre-processing refers to the
transformations applied to the identified data before feeding the same into the algorithm.
Because the data is gathered from different sources, it is usually collected in a raw format and is
not ready for immediate analysis. This step ensures that the data is ready to be fed into the
Machine Learning algorithm.
Definition of Training Data Set:
Before starting the analysis, the user should decide what kind of data set is to be used as a training
set. In the case of signature analysis, for example, the training data set might be a single
handwritten alphabet, an entire handwritten word (i.e. a group of the alphabets) or an entire line
of handwriting (i.e. sentences or a group of words). Thus, a set of ‘input meta-objects’ and
corresponding ‘output meta-objects’ are also gathered. The training set needs to be actively
representative of the real-world use of the given scenario. Thus, a set of data input (X) and
corresponding outputs (Y) is gathered either from human experts or experiments.
Algorithm Selection:
This involves determining the structure of the learning function and the corresponding learning
algorithm. This is the most critical step of supervised learning model. On the basis of various
parameters, the best algorithm for a given problem is chosen.
Training:
The learning algorithm identified in the previous step is run on the gathered training set for
further fine tuning. Some supervised learning algorithms require the user to determine specific
control parameters (which are given as inputs to the algorithm). These parameters (inputs given
to algorithm) may also be adjusted by optimizing performance on a subset (called as validation
set) of the training set.
Evaluation with the Test Data Set:
21

Training data is run on the algorithm, and its performance is measured here. If a suitable result is
not obtained, further training of parameters may be required.

3.3 Common Classification Algorithms

The following are some common classification algorithms and we will discuss at least one
algorithm.
1. k-Nearest Neighbour (kNN)
2. Logistic Regression
3. Decision tree
4. Random forest
5. Support Vector Machine (SVM)
6. Naïve Bayes classifier
A Sigmoid Function.
A sigmoid function produces an S-shaped curve that can convert any number and map it into a
numerical value between 0 and 1, but it does so without ever reaching those exact limits.
22

Figure 3.3: A Sigmoid Function used to Classify Data Points

3.3.1 Logistic Regression
Logistic regression adopts sigmoid function to analyze data and predict discrete classes that exist
in a dataset. Although logistic regression shares a visual resemblance to linear regression, it is
technically a classification technique. Whereas linear regression addresses numerical equations
and forms numerical predictions to discern relationships between variables, logistic regression
predicts discrete classes. Logistic regression is typically used for binary classification to predict
two discrete classes, e.g. has cancer or not. To do this, the sigmoid function (shown as follows) is
added to compute the result and convert numerical results into an expression of probability
between 0 and 1.
23

where:
x = the numerical value you wish to transform
e = Euler's constant, 2.718
In a binary case, a value of 0 represents no chance of occurring, and 1 represents a certain chance
of occurring. The degree of probability for values located between 0 and 1 can be calculated
according to how close they rest to 0 (impossible) or 1 (certain possibility) on the scatterplot.
Figure 3.4 shows the example of Logistic Regression.

Figure 3.4 An Example of Logistic Regression

Logistic regression with more than two outcome values is known as multinomial logistic
regression, which can be seen in Figure 3.5.
24

Figure 3.5 An example of Multinomial Logistic Regression

3.3.2 k -Nearest Neighbour (kNN)
The kNN algorithm is a simple but extremely powerful classification algorithm. The name of the
algorithm originates from the underlying philosophy of kNN – i.e. people having similar
background or mindset tend to stay close to each other. In other words, neighbours in a locality
have a similar background. In the same way, as a part of the kNN algorithm, the unknown and
unlabelled data which comes for a prediction problem is judged on the basis of the training data
set elements which are similar to the unknown element. So, the class label of the unknown
element is assigned on the basis of the class labels of the similar training data set elements
(metaphorically can be considered as neighbours of the unknown element).

Working of K-NN
Let us try to understand the algorithm with a simple data set. Consider a very simple Student data
set as depicted in Figure 3.6. It consists of 15 students studying in a class. Each of the students
25

has been assigned a score on a scale of 10 on two performance parameters – ‘Aptitude’ and
‘Communication’. Also, a class value is assigned to each student based on the following criteria:
1. Students having good communication skills as well as a good level of aptitude have been
classified as ‘Leader’.
2. Students having good communication skills but not so good level of aptitude have been
classified as ‘Speaker’ .
3. Students having not so good communication skill but a good level of aptitude have been
classified as ‘Intel’.

Figure 3.6: Student Data set

While building a classification model, a part of the labelled input data is retained as test data. The
remaining portion of the input data is used to train the model – hence known as training data.
The motivation to retain a part of the data as test data is to evaluate the performance of the
model.
In context of the Student data set, to keep the things simple, we assume one data element of the
input data set as the test data. As depicted in Figure 3.7, the record of the student named Josh is
26

assumed to be the test data. Now that we have the training data and test data identified, we can
start with the modelling.

Figure 3.7 : Segregated Student Data set

So, as depicted in Figure 3.8, the training data points of the Student data set considering only the
features ‘Aptitude’ and ‘Communication’ can be represented as dots in a two- dimensional feature
space.
27

Figure 3.8: 2-D Representation of Student Data Set

As shown in the figure, the training data points having the same class value are coming close to
each other. The reason for considering two-dimensional data space is that we are considering just
the two features of the Student data set, i.e. ‘Aptitude’ and ‘Communication’, for doing the
classification. The feature ‘Name’ is ignored because, as we can understand, it has no role to play
in deciding the class value. The test data point for student Josh is represented as an asterisk in
the same space. To find out the closest or nearest neighbours of the test data point, Euclidean
distance of the different dots need to be calculated from the asterisk. Then, the class value of the
closest neighbours helps in assigning the class value of the test data element.
Values of K ( Number of Neighbours)
Now, let us try to find the answer to the second question, i.e. how many similar elements should
be considered. The answer lies in the value of ‘k’ which is a user-defined parameter given as an
input to the algorithm. In the kNN algorithm, the value of ‘k’ indicates the number of neighbours
that need to be considered. For example, if the value of k is 3, only three nearest neighbours or
three training data elements closest to the test data element are considered. Out of the three
data elements, the class which is predominant is considered as the class label to be assigned to
the test data. In case the value of k is 1, only the closest training data element is considered. The
28

class label of that data element is directly assigned to the test data element. This is depicted in
Figure 3.9.

Figure 3.9: Distance calculation between Test and Training Points

Let us now try to find out the outcome of the algorithm for the Student data set we have. In other
words, we want to see what class value kNN will assign for the test data for student Josh. Again,
let us refer back to Figure 3.9. As is evident, when the value of k is taken as 1, only one training
data point needs to be considered. The training record for student Gouri comes as the closest one
to test record of Josh, with a distance value of 1.118. Gouri has class value ‘Intel’. So, the test data
point is also assigned a class label value ‘Intel’. When the value of k is assumed as 3, the closest
neighbours of Josh in the training data set are Gouri, Susant, and Bobby with distances being
1.118, 1.414, and 1.5, respectively. Gouri and Bobby have class value ‘Intel’, while Susant has class
value ‘Leader’. In this case, the class value of Josh is decided by majority voting. Because the class
value of ‘Intel’ is formed by the majority of the neighbours, the class value of Josh is assigned as
‘Intel’. This same process can be extended for any value of k.
Choosing the Value of K
It is often a tricky decision to decide the value of k. The reasons are as follows:
29

• If the value of k is very large (in the extreme case equal to the total number of records in
the training data), the class label of the majority class of the training data set will be
assigned to the test data regardless of the class labels of the neighbours nearest to the
test data.
• If the value of k is very small (in the extreme case equal to 1), the class value of a noisy
data or outlier in the training data set which is the nearest neighbour to the test data will
be assigned to the test data.
The best k value is somewhere between these two extremes.
Few strategies, highlighted below, are adopted by Machine Learning practitioners to arrive at a
value for k.
• One common practice is to set k equal to the square root of the number of training
records.
• An alternative approach is to test several k values on a variety of test data sets and choose
the one that delivers the best performance.
• Another interesting approach is to choose a larger value of k, but apply a weighted voting
process in which the vote of close neighbours is considered more influential than the vote
of distant neighbours.
kNN Algorithm
Input: Training data set, test data set (or data points), value of ‘k’ (i.e. number of nearest
neighbours to be considered)
Steps:
Do for all test data points
Calculate the distance (usually Euclidean distance) of the test data point from the different
training data points. Find the closest ‘k’ training data points, i.e. training data points whose
distances are least from the test data point.
If k = 1
Then assign class label of the training data point to the test data point
Else
30

Whichever class label is predominantly present in the training data points, assign that
class label to the test data point
End do
4.0 REGRESSION
In machine learning, a regression problem is the problem of predicting the value of a numeric
variable based on observed values of the variable. The value of the output variable may be a
number, such as an integer or a floating point value. These are often quantities, such as amounts
and sizes. The input variables may be discrete or real-valued. Regression analysis is used to
determine the strength of a relationship between variables. Regression is essentially finding a
relationship (or) association between the dependent variable (Y) and the independent variable(s)
(X), i.e. to find the function ‘f ’ for the association Y = f (X).
Regression is used for the development of models which are used for prediction of the numerical
value of the target feature of a data instance.
Consider the data on car prices given in Table 4.1.
Table 4.1: Example of Data for Regression
31

Suppose we are required to estimate the price of a car aged 25 years with distance 53240 KM and
weight 1200 pounds. This is an example of a regression problem because we have to predict the
value of the numeric variable “Price”.
The most common regression algorithms are:
• Simple linear regression
• Multiple linear regression
• Polynomial regression
• kernel ridge regression (KRR),
• support vector regression (SVR),
• Lasso
• Maximum likelihood estimation (least squares) etc.
4.1 Linear Regression.
Linear regression comprises a straight line that splits the data points on a scatterplot. The goal of
linear regression is to split the data in a way that minimizes the distance between the regression
line and all data points on the scatterplot. This means that if you were to draw a vertical line from
the regression line to each data point on the graph, the aggregate distance of each point would
equate to the smallest possible distance to the regression line.
32

Figure 4.1 Linear Regression Line

The regression line is plotted on the scatterplot in Figure 4.1. The technical term for the regression
line is the hyperplane, and you will see this term used throughout your study of Machine
Learning. A hyperplane is practically a trendline. Another important feature of regression is slope,
which can be conveniently calculated by referencing the hyperplane. As one variable increases,
the other variable will increase at the average value denoted by the hyperplane. The slope is
therefore very useful in formulating predictions. For example, if you wish to estimate the value of
Bitcoin at 800 days, you can enter 800 as your x coordinate and reference the slope by finding the
corresponding y value represented on the hyperplane. In this case, the y value is USD $1,850.
33

Figure 4.2: The Value of Bitcoin at day 800

As shown in Figure 4.2, the hyperplane reveals that you actually stand to lose money on your
investment at day 800 (after buying on day 736)! Based on the slope of the hyperplane, Bitcoin is
expected to depreciate in value between day 736 and day 800—despite no precedent in your
dataset for Bitcoin ever dropping in value. While it’s needless to say that linear regression isn’t a
fail-proof method to picking investment trends, the trendline does offer a basic reference point
to predict the future. If we were to use the trendline as a reference point earlier in time, say at
day 240, then the prediction posted would have been more accurate. At day 240 there is a low
degree of deviation from the hyperplane, while at day 736 there is a high degree of deviation.
Deviation refers to the distance between the hyperplane and the data point.
34

Figure 4.3: The Distance of the Data Points to the Hyperplane

In general, the closer the data points are to the regression line, the more accurate the final
prediction. If there is a high degree of deviation between the data points and the regression line,
the slope will provide less accurate predictions. Basing your predictions on the data point at day
736, where there is high deviation, results in poor accuracy. In fact, the data point at day 736
constitutes an outlier because it does not follow the same general trend as the previous four data
points. What’s more, as an outlier it exaggerates the trajectory of the hyperplane based on its
high y-axis value. Unless future data points scale in proportion to the y-axis values of the outlier
data point, the model’s predictive accuracy will suffer.
Calculation Example
Although your programming language will take care of this automatically, it’s useful to understand
how linear regression is actually calculated. We will use the following dataset (table 4.2) and
formula to perform linear regression.
35

Table 4.2:

Linear Regression Formular

Reminder: In supervised Learning, the output is obtained by evaluating a mapping function given
by
Y = f(x)
For linear Regresion, the mapping function is given by
Y = a + bx
Where:
Y = Dependent variable
X = Independent variable
a = intercept and
b = slope of the straight line, as shown in Figure 1.
36

Figure 4.4 Simple Linear Regfression

Where:
Σ = Total sum
Σx = Total sum of all x values
Σy = Total sum of all y values
37

Σxy = Total sum of x*y for each row

Σx 2= Total sum of x*x for each row
n = Total number of rows
Using our example dataset, we expand our table as follows,
Table 4.3:

Σx = 1 + 2 + 1 + 4 + 3 = 11
Σy = 3 + 4 + 2 + 7 + 5 = 21
Σxy = 3 + 8 + 2 + 28 + 15 = 56
Σx 2= 1 + 4 + 1 + 16 + 9 = 31
n = 5.

a = ((21 x 31) – (11 x 56)) / (5(31) – (11)2)

= (651 – 616) / (155 – 121)
= 35 / 34
= 1.029
b = (5(56) – (11 x 21)) / (5(31) – (11)2 )
= (280 – 231) / (155 – 121)
= 49 / 34
38

= 1.44
Insert the “a” and “b” values into a linear equation.
y = a + bx
y = 1.029 + 1.441x T
The linear equation y = 1.029 + 1.441x dictates how to draw the hyperplane.

Let’s now test the regression line by looking up the coordinates for x = 2.
y = 1.029 + 1.441(x)
y = 1.029 + 1.441(2)
y = 3.911
In this case, the prediction is very close to the actual result of 4.0.

Figure 4.5: The Linear Regression Hyperplane Plotted on the Scatterplot

5.0 MODEL REPRESENTATION AND INTERPRETABILITY
We have already seen that the goal of supervised Machine Learning is to learn or derive a target
function which can best determine the target variable from the set of input variables. A key
consideration in learning the target function from the training data is the extent of generalization.
39

This is because the input data is just a limited, specific view and the new, unknown data in the
test data set may be differing quite a bit from the training data.
Fitness of a target function approximated by a learning algorithm determines how correctly it is
able to classify a set of data it has never seen.
5.1 Underfitting
If the target function is kept too simple, it may not be able to capture the essential nuances and
represent the underlying data well. A typical case of underfitting may occur when trying to
represent a non-linear data with a linear model as demonstrated by both cases of underfitting
shown in figure 3.5.
Many times underfitting happens due to unavailability of sufficient training data. Underfitting
results in both poor performance with training data as well as poor generalization to test data.
Underfitting can be avoided by:
1. using more training data
2. reducing features by effective feature selection
40

5.2 Overfitting
Overfitting refers to a situation where the model has been designed in such a way that it emulates
the training data too closely. In such a case, any specific deviation in the training data, like noise
or outliers, gets embedded in the model. It adversely impacts the performance of the model on
the test data. Overfitting, in many cases, occur as a result of trying to fit an excessively complex
model to closely match the training data. This is represented with a sample data set in figure 3.5
. The target function, in these cases, tries to make sure all training data points are correctly
partitioned by the decision boundary. However, more often than not, this exact nature is not
replicated in the unknown test data set. Hence, the target function results in wrong classification
in the test data set. Overfitting results in good performance with training data set, but poor
generalization and hence poor performance with test data set. Overfitting can be avoided by:
1. using re-sampling techniques like k-fold cross validation
2. hold back of a validation data set
3. remove the nodes which have little or no predictive power for the given Machine Learning
problem.
5.3 Bias – variance trade-off
In supervised learning, the class value assigned by the learning model built based on the training
data may differ from the actual class value. This error in learning can be of two types – errors due
to ‘bias’ and error due to ‘variance’.
Let’s try to understand each of them in details.
5.3.1 Errors due to Bias
Errors due to bias arise from simplifying assumptions made by the model to make the target
function less complex or easier to learn. In short, it is due to underfitting of the model. Parametric
models generally have high bias making them easier to understand/interpret and faster to learn.
These algorithms have a poor performance on data sets, which are complex in nature and do not
align with the simplifying assumptions made by the algorithm. Underfitting results in high bias.
5.3.2 Errors due to Variance
Errors due to variance occur from difference in training data sets used to train the model. Different
training data sets (randomly sampled from the input data set) are used to train the model. Ideally
the difference in the data sets should not be significant and the model trained using different
training data sets should not be too different. However, in case of overfitting, since the model
closely matches the training data, even a small difference in training data gets magnified in the
41

mode

So, the problems in training a model can either happen because either
(a) The model is too simple and hence fails to interpret the data grossly or
(b) The model is extremely complex and magnifies even small differences in the training data.
As is quite understandable:

• Increasing the bias will decrease the variance, and

• Increasing the variance will decrease the bias
On one hand, parametric algorithms are generally seen to demonstrate high bias but low
variance. On the other hand, non-parametric algorithms demonstrate low bias and high variance.
As can be observed in Figure 1. , the best solution is to have a model with low bias as well as low
variance. However, that may not be possible in reality. Hence, the goal of supervised Machine
Learning is to achieve a balance between bias and variance. The learning algorithm chosen and
the user parameters which can be configured helps in striking a trade-off between bias and
variance. For example, in a popular supervised algorithm k-Nearest Neighbors or kNN, the user
configurable parameter ‘k’ can be used to do a trade-off between bias and variance. In one hand,
42

when the value of ‘k’ is decreased, the model becomes simpler to fit and bias increases. On the
other hand, when the value of ‘k’ is increased, the variance increases.
6.1 Model Evaluation
To evaluate the performance of the model, the number of correct classifications or predictions
made by the model has to be recorded. A classification is said to be correct if, say for example in
the given problem, it has been predicted by the model that the team will win and it has actually
won.
Based on the number of correct and incorrect classifications or predictions made by a model, the
accuracy of the model is calculated. If 99 out of 100 times the model has classified correctly, e.g.
if in 99 out of 100 games what the model has predicted is same as what the outcome has been,
then the model accuracy is said to be 99%. However, it is quite relative to say whether a model
has performed well just by looking at the accuracy value. For example, 99% accuracy in case of a
sports win predictor model may be reasonably good but the same number may not be acceptable
as a good threshold when the learning problem deals with predicting a critical illness. In this case,
even the 1% incorrect prediction may lead to loss of many lives. So the model performance needs
to be evaluated in light of the learning problem in question. Also, in certain cases, erring on the
side of caution may be preferred at the cost of overall accuracy.
There are four possibilities with regards to the cricket match win/loss prediction:
1. The model predicted win and the team won – True Positive (TP)
2. The model predicted win and the team lost – False Positive (FP)
3. The model predicted loss and the team won – False Negative (FN)
4. The model predicted loss and the team lost – True Negative (TN)
43

6.2 Confusion Matrix

A matrix containing correct and incorrect predictions in the form of TPs, FPs, FNs and TNs is known
as confusion matrix. In the problem of predicting whether a patient with a tumor has cancer or
not, the confusion matrix for that problem can be represented as shown in the table below:

Actual Outcome
Predicted Outcome

Positive(Cancer) Negative (No Cancer)

Positive(Cancer) TP FP

Negative (no Cancer) FN TN

For any classification model, performance of the model can be evaluated using the confusion
matrix. Some of the performance metrics that can be evaluated are as follows:
Model Accuracy: Model accuracy is given by total number of correct classifications divided by
total number of classifications done.
44

Error Rate : The percentage of misclassifications is indicated using error rate which is measured
as:

Precision: Precision gives the proportion of positive predictions which are truly positive, and
indicates the reliability of a model in predicting a class of interest. It is given by:

Recall: Recall indicates the proportion of correct prediction of positives to the total number of
positives. Recall is given by:

Sensitivity: The sensitivity of a model measures the proportion of TP examples or positive cases
which were correctly classified. It is measured as

Specificity: Specificity is also another good measure to indicate a good balance of a model being
excessively conservative or excessively aggressive. Specificity of a model measures the proportion
of negative examples which have been correctly classified. A higher value of specificity indicates
a better model performance.

Example: Given the follwing confution matrix, calculate the following performance measure:
i. Accuracy
ii. Precision
iii. Recall
iv. sensitivity
v. Specicity
45
46

References:
Foundations of Machine Learning. Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar.
The MIT Press Cambridge, Massachusetts London, England. © 2012 Massachusetts Institute of
Technology
Machine Learning. Saikat Dutt, Subramanian Chandramouli and Amit Kumar Das. Pearson.
Machine Learning for Absolute Beginners. Oliver Theobald. 2017

Treatment and Management of Mental Health Conditions During Pregnancy and Postpartum ACOG June 2023
100% (3)
Treatment and Management of Mental Health Conditions During Pregnancy and Postpartum ACOG June 2023
27 pages
How To Download Scribd Documents Without Download Option - Filelem
100% (1)
How To Download Scribd Documents Without Download Option - Filelem
10 pages
Rubric Attendance MM
No ratings yet
Rubric Attendance MM
1 page
ML Book
No ratings yet
ML Book
187 pages
ABAP Unit1 Notes
No ratings yet
ABAP Unit1 Notes
14 pages
Index: Unit No Topic Page No
No ratings yet
Index: Unit No Topic Page No
5 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
42 pages
Chapter 2 Notes
No ratings yet
Chapter 2 Notes
16 pages
Machine Learning Textbook
No ratings yet
Machine Learning Textbook
191 pages
Reinforcement Learning: Parallelizing Genetic Algorithms
No ratings yet
Reinforcement Learning: Parallelizing Genetic Algorithms
5 pages
Machine Learning Unit 4
100% (1)
Machine Learning Unit 4
78 pages
Machine Learning (R17A0534) Lecture Notes: B.Tech Iv Year - I Sem (R17) (2020-21)
No ratings yet
Machine Learning (R17A0534) Lecture Notes: B.Tech Iv Year - I Sem (R17) (2020-21)
9 pages
ML Notes Unit 1-2
No ratings yet
ML Notes Unit 1-2
55 pages
Machine Learning Techniques (KCS 055) 04.09.2023
No ratings yet
Machine Learning Techniques (KCS 055) 04.09.2023
13 pages
191AIC502T - Machine Learning - Unit 1
No ratings yet
191AIC502T - Machine Learning - Unit 1
41 pages
ML Unit 1 Notes
No ratings yet
ML Unit 1 Notes
134 pages
Cognate x Spidey
No ratings yet
Cognate x Spidey
46 pages
Machine Learning
No ratings yet
Machine Learning
135 pages
Unit 1 ml2
No ratings yet
Unit 1 ml2
23 pages
ML Introduction-06!08!21 (1)
No ratings yet
ML Introduction-06!08!21 (1)
25 pages
MLT Q and A
No ratings yet
MLT Q and A
19 pages
AI&ML BM4251 Unit 1-5 Notes
No ratings yet
AI&ML BM4251 Unit 1-5 Notes
116 pages
Machine Learning Intro
No ratings yet
Machine Learning Intro
27 pages
2023-24_ML_NOTES_1
No ratings yet
2023-24_ML_NOTES_1
25 pages
ML UNIT-1 Notes PDF
No ratings yet
ML UNIT-1 Notes PDF
22 pages
Aimlf Unit 3
No ratings yet
Aimlf Unit 3
20 pages
1 - Module5 - Machine Learning
100% (1)
1 - Module5 - Machine Learning
78 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
5 pages
unit1
No ratings yet
unit1
6 pages
My ML Notes
No ratings yet
My ML Notes
6 pages
Lecture 1.2 Introduction to Machine Learning
No ratings yet
Lecture 1.2 Introduction to Machine Learning
31 pages
UNIT III
No ratings yet
UNIT III
39 pages
Unit I Notes Machine Learning Techniques 1
No ratings yet
Unit I Notes Machine Learning Techniques 1
21 pages
ML NOTES(UNIT 1&2)
No ratings yet
ML NOTES(UNIT 1&2)
42 pages
machine-learning-techniques-unit-1
No ratings yet
machine-learning-techniques-unit-1
21 pages
ML 1
No ratings yet
ML 1
79 pages
unit-i-notes-machine-learning-techniques
No ratings yet
unit-i-notes-machine-learning-techniques
21 pages
Unit I Notes Machine Learning Techniques
No ratings yet
Unit I Notes Machine Learning Techniques
21 pages
ML Unit 1-Notes
No ratings yet
ML Unit 1-Notes
21 pages
1 ml
No ratings yet
1 ml
21 pages
Unit I Notes Machine Learning Techniques
No ratings yet
Unit I Notes Machine Learning Techniques
21 pages
Intro - Types of Machine Learning
No ratings yet
Intro - Types of Machine Learning
24 pages
Chapter-1 Ml Intro
No ratings yet
Chapter-1 Ml Intro
36 pages
22K61A0203-Ml AddPage AddPage AddPage Removed Removed (2) AddPage Removed (1) AddPage
No ratings yet
22K61A0203-Ml AddPage AddPage AddPage Removed Removed (2) AddPage Removed (1) AddPage
66 pages
ML Unit 1
No ratings yet
ML Unit 1
20 pages
ML 5units
No ratings yet
ML 5units
284 pages
Unit-1
No ratings yet
Unit-1
88 pages
ML - Full Slides Srikanth Allamshatty
No ratings yet
ML - Full Slides Srikanth Allamshatty
369 pages
ML Full Slides Final
No ratings yet
ML Full Slides Final
458 pages
Unit - 5.1 - Introduction To Machine Learning
No ratings yet
Unit - 5.1 - Introduction To Machine Learning
38 pages
Csit (r22) 3-2 Machine Learning Digital Notes
No ratings yet
Csit (r22) 3-2 Machine Learning Digital Notes
120 pages
Updated Unit 1
No ratings yet
Updated Unit 1
57 pages
Introduction To ML
100% (1)
Introduction To ML
39 pages
A PDF
No ratings yet
A PDF
26 pages
Unit 1 Introduction of Machine Learning Notes
No ratings yet
Unit 1 Introduction of Machine Learning Notes
57 pages
ML CH 1
No ratings yet
ML CH 1
53 pages
textbook ML_removed_removed
No ratings yet
textbook ML_removed_removed
44 pages
Unit-1-Introduction (Fundamentals of ML & AI) January 29, 2024
No ratings yet
Unit-1-Introduction (Fundamentals of ML & AI) January 29, 2024
80 pages
Lecture 01 - Machine Learning Basics Revision
No ratings yet
Lecture 01 - Machine Learning Basics Revision
80 pages
UNIT I-Machine Learning
No ratings yet
UNIT I-Machine Learning
68 pages
What Are The Basic Concepts in Machine Learning
No ratings yet
What Are The Basic Concepts in Machine Learning
6 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
Machine Learning: Fundamentals and Applications
From Everand
Machine Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Angioedema and It's Homoeopathic Therapeutics
No ratings yet
Angioedema and It's Homoeopathic Therapeutics
9 pages
Black History Month Celebrating Oscar Peterson
No ratings yet
Black History Month Celebrating Oscar Peterson
2 pages
Group 6 Session Guide Environmental Factors Affecting Motivation
100% (1)
Group 6 Session Guide Environmental Factors Affecting Motivation
4 pages
ZR Smoke & Haze Fluids
No ratings yet
ZR Smoke & Haze Fluids
8 pages
FCE Listening Practice Test 19 Printable
No ratings yet
FCE Listening Practice Test 19 Printable
8 pages
Impact of Learning Styles On The Academic Performance of Junior High School Students of Golden Sunbeams Christian School, Antipolo City
No ratings yet
Impact of Learning Styles On The Academic Performance of Junior High School Students of Golden Sunbeams Christian School, Antipolo City
63 pages
Assignment of Emerging - Technology
No ratings yet
Assignment of Emerging - Technology
2 pages
Philippine Christian University: 1648 Taft Avenue Corner Pedro Gil ST., Manila
No ratings yet
Philippine Christian University: 1648 Taft Avenue Corner Pedro Gil ST., Manila
4 pages
Steel Bending
100% (1)
Steel Bending
3 pages
How to Achieve Operational Excellence
No ratings yet
How to Achieve Operational Excellence
32 pages
Unit 4_Lock Based Protocols-Concurrency Control
No ratings yet
Unit 4_Lock Based Protocols-Concurrency Control
27 pages
Academic Writing: A Good Day
No ratings yet
Academic Writing: A Good Day
18 pages
Ict t1
No ratings yet
Ict t1
6 pages
Article 4 Ben Ufo Truth
No ratings yet
Article 4 Ben Ufo Truth
3 pages
RFID Reader: Datasheet 4-Jan-10 1206
No ratings yet
RFID Reader: Datasheet 4-Jan-10 1206
4 pages
AnexGATE AG25
No ratings yet
AnexGATE AG25
2 pages
Application and Validation of CFD in A Turbomachinery Design System
No ratings yet
Application and Validation of CFD in A Turbomachinery Design System
11 pages
Club of Mozambique Info
100% (1)
Club of Mozambique Info
9 pages
MRRA HANGAR 5 Fire Suppression Foam Suppression System 7-6-2023 Annual
No ratings yet
MRRA HANGAR 5 Fire Suppression Foam Suppression System 7-6-2023 Annual
5 pages
Department of Electrical Engineering: MCQ Set 1
100% (1)
Department of Electrical Engineering: MCQ Set 1
2 pages
Beatrice Nyaga's CV-Revised 2021
No ratings yet
Beatrice Nyaga's CV-Revised 2021
4 pages
Chem Project RAYON THREAD
No ratings yet
Chem Project RAYON THREAD
15 pages
Stories and The Brain The Neuroscience of Narrative 1st Edition Paul B. Armstrong 2024 Scribd Download
100% (4)
Stories and The Brain The Neuroscience of Narrative 1st Edition Paul B. Armstrong 2024 Scribd Download
62 pages
Why How To Truss A Chicken
No ratings yet
Why How To Truss A Chicken
7 pages
Maglov - Tpack Lesson Redesign
No ratings yet
Maglov - Tpack Lesson Redesign
2 pages
Terminology (Hvac)
No ratings yet
Terminology (Hvac)
5 pages
IBM (President Herbert Cabaluna) - Officers and Platforms 06 October 2020
No ratings yet
IBM (President Herbert Cabaluna) - Officers and Platforms 06 October 2020
2 pages

COSC 210 INTRODUCTION TO MACHINE LEARNING Module I-1

Uploaded by

COSC 210 INTRODUCTION TO MACHINE LEARNING Module I-1

Uploaded by

1

GOMBE STATE UNIVERSITY

Figure 1.1: Components of Learning Process

(d) Label: Cancer status.

Figure 1.2: Examples of Data set

1.5 Different forms of data

Data can broadly be divided into following two types:

Figure 1.3 Type of Data

2.1 Types of Machine Learning

Figure 2.1: Types of Machine Learning

ii. Unsupervised learning

iii. Reinforcement learning

2.1.1. Supervised Machine Learning

ii. Regression (where the output is a real value).

Figure 2.2: Supervised Learning

2.1.2 Unsupervised Machine Learning

Unsupervised Machine Learning is categorized into:

ii. Clustering (focused on identifying inherent groups within the data).

Figure 2.3: Unsupervised Learning

2.1.3 Reinforcement Learning

Examples of Reinforcement Learning methods include Monte-Carlo, Markov decision, Q-learning

Figure 2.4: Reinforcement Learning

Differences Between Supervised, Unsupervised and Reinforcement Machine Learning

Table 2.1: Differences between Supervised, Unsupervised and Reinforcement Learning

2.2 Probability and Statistics Review

Figure 2.5 Knowledge and Uncertainty

if A and B are Mutually exclusive.

Figure 3.1 Classification Model

• Win–loss prediction of games

Figure 3.2 Classification Model Step

3.3 Common Classification Algorithms

Figure 3.3: A Sigmoid Function used to Classify Data Points

Figure 3.4 An Example of Logistic Regression

Figure 3.5 An example of Multinomial Logistic Regression

Figure 3.6: Student Data set

Figure 3.7 : Segregated Student Data set

Figure 3.8: 2-D Representation of Student Data Set

Figure 3.9: Distance calculation between Test and Training Points

Figure 4.1 Linear Regression Line

Figure 4.2: The Value of Bitcoin at day 800

Figure 4.3: The Distance of the Data Points to the Hyperplane

Linear Regression Formular

Figure 4.4 Simple Linear Regfression

Σxy = Total sum of x*y for each row

a = ((21 x 31) – (11 x 56)) / (5(31) – (11)2)

Figure 4.5: The Linear Regression Hyperplane Plotted on the Scatterplot

• Increasing the bias will decrease the variance, and

6.2 Confusion Matrix

Positive(Cancer) Negative (No Cancer)

Negative (no Cancer) FN TN

You might also like