0% found this document useful (0 votes)

10 views

03

Uploaded by

mesfin snow

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

03

Uploaded by

mesfin snow

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Michael Melese (Ph.D.

)
[email protected]
11/11/21
¡ An algorithm is a procedure or set of steps or rules to
accomplish a task. Algorithms are one of the fundamental
concepts in, or building blocks of, computer science.
¡ Some of the basic types of tasks that algorithms can solve are
§ sorting, searching, and graph-based computational problems
¡ In data science, there are at least three classes of algorithms
one should be aware of;
§ Data munging, preparation, and processing algorithms, such as
sorting, MapReduce, or Pregel.
§ Optimization algorithms for parameter estimation, including
Stochastic Gradient Descent, Newton’s Method, and Least Squares.
§ Machine learning algorithms.

11/11/21
¡ Machine learning algorithms are largely used to
predict, classify, or cluster.
¡ Machine learning algorithms are the basis of artificial
intelligence (AI) such as image recognition, speech
recognition, recommendation systems, ranking and
personalization of content.
¡ Machine learning algorithms are described as
learning a target function (f) that best maps input
variables (X) to an output variable (Y): Y = f(X)

11/11/21
¡ Linear Regression
¡ Logistic Regression
¡ Linear Discriminant Analysis
¡ K-Means
¡ Classification and Regression Trees
¡ Naive Bayes
¡ K-Nearest Neighbors
¡ Learning Vector Quantization
¡ Support Vector Machines
¡ Bagging and Random Forest
¡ Boosting and AdaBoost
¡ PCA
11/11/21
¡ Is one of the fundamental supervised machine-learning algorithms
due to its relative simplicity and well-known properties.
¡ Is one of the most well known and understood algorithms in
statistics and ML.
¡ Express the mathematical relationship between two variables or
attributes.
¡ Predictive modeling is primarily concerned with minimizing the
error of a model or making the most accurate predictions, at the
expense of explain ability.

11/11/21
¡ Linear regression might be simple linear or
multivariant.
§ The case of one explanatory variable is called simple linear
regression.
§ More than one explanatory variable, the process is
called multiple linear regression.
¡ Assumption
§ There is a linear relationship between an outcome variable
(dependent variable) and a predictor (independent variable
or feature).

11/11/21
¡ The relationship between independent and dependent
variables by fitting a best line using the coefficients a and
b are derived from the given input by minimizing the
sum of squared difference of distance between data
points and regression line.

where: y – Dependent variable,

a – Intercept,
y= a+ b*x x – Independent variable and
b – slope

11/11/21
$% ∑ + − +, - ∑ ( − (̅ -
!=# $% = $& =
$& .−1 .−1
0 = +, − !(̅
∑ ( − (̅ + − +, Where: r – Pearson correlation coefficient variable,
#= $& $% – Standard deviation,
∑ ( − (̅ -∑ ( − (̅ - (̅ – x mean and
+, – y mean

(∑ +)(∑ ( - ) − (∑ ()(∑ (+) 3 ∑ &% 4(∑ &)(∑ %)

0= b=
.(∑ ( - ) − (∑ ()- 3(∑ & 5 )4(∑ &)5

11/11/21
Glucose
No Age (X)
level (Y)
XY X2 Y2 486 11409 − (247)(20485)
!=
1 43 99 4257 1849 9801 6 11409 − (247)/
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
! = 65.1416
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022 2 /3456 7(/48)(452)
b= 2 9943: 7 (/48);

b= 0.38522

y=65.141 + 0.38522*x
11/11/21
§ Construct the linear that shows the growth of the
population in Ethiopia using the data from The data is
from 51 different states of USA. The variables are y =
year 2002 birth rate per 1000 females 15 to 17 years old
and x = poverty rate, which is the percent of the state’s
population living in households with incomes below the
federally defined poverty level. (Data source: Mind On
Statistics, 3rd edition, Utts and Heckard).

11/11/21
¡ Clustering is the process of partitioning a group of data points
into a small number of clusters.
¡ K-means clustering is a type of unsupervised learning, which
is used when you have unlabeled data.
§ The goal of this algorithm is to find groups in the data, with the number
of groups represented by the variable K.
¡ The algorithm works iteratively to assign each data point to
one of K groups based on the features provided.
¡ Data points are clustered based on feature similarity. The
results of the K-means clustering algorithm are:
¡ The centroids of the K clusters, which can be used to label
new data
¡ Each data point is assigned to a single cluster.

11/11/21
¡ Behavioral segmentation
§ Segment by purchase history, activities on application, website,
or platform
§ Define personas based on interests
¡ Inventory categorization
§ Group inventory by sales activity and manufacturing metrics
¡ Sorting sensor measurements
§ Detect activity types in motion sensors
§ Group images, Separate audio and Identify groups in health
monitoring
¡ Detecting bots or anomalies
§ Separate valid activity groups from bots
§ Group valid activity to clean up outlier detection

11/11/21
¡ The Κ-means clustering algorithm uses iterative refinement to produce a
cluster. The algorithm inputs are the number of clusters Κ and the data set.
§ The data set is a features for each data point. The algorithms starts with initial
estimates of Κ centroids, which can either be randomly generated or randomly
selected from data set.
¡ The algorithm then iterates between the following step:
§ Initially, randomly pick k centroids (or points that will be the center of your
clusters) in d-space. Try to make them near the data but different from one
another.
§ Then assign each data point to the closest centroid.
§ Move the centroids to the average location of the data points (which correspond
to users in this example) assigned to it.
§ Repeat the preceding two steps until the assignments don’t change, or change
very little. (i.e., no data points change clusters, the sum of the distances is
minimized, or some maximum number of iterations is reached).
11/11/21
¡ The algorithm finds the clusters and data set labels for a
particular pre-chosen K. To find the number of clusters, the
user needs to run the K-means clustering algorithm for a range
of K and compare the results.
§ There is no method for determining exact value of K, but an accurate
estimate can be obtained using the following techniques.
Given n data points xi, i=1...n to be partitioned in k clusters

where ci is the set of points that belong to cluster. The K-

means clustering uses the square of the Euclidean
distance d(x,µi)=ǁx−µiǁ2.

11/11/21
¡ Given the following two table cluster the data using
k-means algorithm
No x y Cluster
No x y Cluster A 1 1
1 185 72 B 1 0
2 170 56 C 0 2
3 169 60 D 2 4
4 179 68 E 3 4
5 182 72 F 1 2
6 188 77 G 2 3
H 1 0
11/11/21
¡ KNN can be used for both classification and regression
predictive problems.
¡ K nearest neighbors is a simple algorithm that stores all
available cases and classifies new cases based on majority
(similarity) vote.
¡ KNN has been used in statistical estimation and pattern
recognition. Three important aspects of KNN:
§ Ease to interpret output
§ Calculation time and
§ Predictive Power

11/11/21
¡ Distance measures are only valid for continuous
variables.

¡ Categorical variables the Hamming distance

11/11/21
¡ Given table cluster the data using k-NN algorithm
No Durability Strength Cluster Distance

A 7 7 Weak

B 7 4 Weak

C 3 4 Strong

D 3 4 Strong

E 1 3 Strong

F 5 5 ?????

¡ The data is from 51 different states of USA. The variables are y = year
2002 birth rate per 1000 females 15 to 17 years old and x = poverty rate,
which is the percent of the state’s population living in households with
incomes below the federally defined poverty level. (Data source: Mind On
Statistics, 3rd edition, Utts and Heckard).

11/11/21
Weight height Cluster Distance
¡ Given the following predict 51 167 Underweight
§ Weight 60 with Height 180
62 182 Normal

69 176 Normal

64 160 Overweight

65 172 Normal

56 174 Underweight

68 158 Overweight

57 173 Normal

58 169 Normal

68 158 Overweight

55 170 Normal

58 184 Underweight

11/11/21
¡ SVM is a supervised ML algorithm used for both
classification or regression mostly used in classification
problems.
¡ The algorithm plot each data item as a point in n-
dimensional space with the value of each feature being
the value of a particular coordinate.
§ Classification done by finding the hyper-plane that differentiate the two
classes very well.

11/11/21
¡ is a classification technique based on Bayes’ theorem with an
assumption of independence among predictors.
¡ Naive Bayes model is easy to build and particularly useful for very
large data sets. Along with simplicity, Naive Bayes is known
to outperform even highly sophisticated classification methods.

!(#/") ∗ !(")
! "# =
!(#)
¡ Given
§ P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
§ P(c) is the prior probability of class.
§ P(x|c) is the likelihood which is the probability of predictor given class.
§ P(x) is the prior probability of predictor.

11/11/21
¡ Given the following
Weather Play Weather No Yes Probability
Sunny No Sunny 2 3 =5/14 (0.36)
Overcast Yes Overcast 4 =4/14 (0. 29)
Rainy Yes
Rainy 3 2 =5/14 (0.36)
Sunny Yes
Total 5 9
Sunny Yes
Probability =5/14 (0.36) =9/14 (0.64)
Overcast Yes
Rainy No § What is the probability of players will play if weather is sunny ?
Rainy No
5 6
Sunny Yes ((*+,,-/-/*)∗((-/*) ∗
Rainy Yes
! "#$ $%&&" = ((*+,,-)
= 0.6 = 6 78
9
78
Sunny No
Overcast Yes
§ What is the probability of players will play if weather is rainy ?
Overcast Yes
Rainy No

11/11/21

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
58% (81)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
1001 Songs
69% (72)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Highway Construction Production Rates and Estimated Contracct Times
No ratings yet
Highway Construction Production Rates and Estimated Contracct Times
95 pages
Predict Classify Cluster
No ratings yet
Predict Classify Cluster
12 pages
APznzab0G8iLD5cDfn798Gn-fXshRpam8ullbf6ZS5Hd4l0BEcKNHy9gDG24DS66RfgvnKXAQjMAivMmmi5cmDWF9tqOaPMy3afuzafCU1kpG1xfQIr7b98q406ZWiqt50nL8WhMI6azoYzWSgf7c7khnqww3VlQ9I90ROmc0QL4DbmipYYoLleGYR6TO4UYmc_PsaQB5v0XmLUwPEub3QuwGdUnUEr2dp_hV4bds0MuRbpJ
No ratings yet
APznzab0G8iLD5cDfn798Gn-fXshRpam8ullbf6ZS5Hd4l0BEcKNHy9gDG24DS66RfgvnKXAQjMAivMmmi5cmDWF9tqOaPMy3afuzafCU1kpG1xfQIr7b98q406ZWiqt50nL8WhMI6azoYzWSgf7c7khnqww3VlQ9I90ROmc0QL4DbmipYYoLleGYR6TO4UYmc_PsaQB5v0XmLUwPEub3QuwGdUnUEr2dp_hV4bds0MuRbpJ
34 pages
ML unit-2 (CEC)
No ratings yet
ML unit-2 (CEC)
96 pages
Classification
No ratings yet
Classification
50 pages
Lectures 7 and 8 - Data Anaysis in Management - MBM
No ratings yet
Lectures 7 and 8 - Data Anaysis in Management - MBM
78 pages
ML Unit 3
No ratings yet
ML Unit 3
12 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Data Sciene - Unit 5 Material
No ratings yet
Data Sciene - Unit 5 Material
15 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Intro Data Science: Cluster Analysis
No ratings yet
Intro Data Science: Cluster Analysis
60 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
کتاب چهارم بارگزاری شده
No ratings yet
کتاب چهارم بارگزاری شده
63 pages
Lect8 IoT BigDataAnalyticsTechniques
No ratings yet
Lect8 IoT BigDataAnalyticsTechniques
20 pages
Module 3 (1)
No ratings yet
Module 3 (1)
63 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
Decision Trees. These Models Use Observations About Certain
No ratings yet
Decision Trees. These Models Use Observations About Certain
6 pages
DATA MINING UNIT-2 (1)
No ratings yet
DATA MINING UNIT-2 (1)
37 pages
Machine algorithm
No ratings yet
Machine algorithm
3 pages
Pattern Analysis-Machine Learning
No ratings yet
Pattern Analysis-Machine Learning
74 pages
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
No ratings yet
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
18 pages
ML_Lec-16
No ratings yet
ML_Lec-16
16 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
Lecture 4
No ratings yet
Lecture 4
64 pages
Classification
No ratings yet
Classification
74 pages
K-Means Cluster Analysis UC Business Analytics R Programming Guide
No ratings yet
K-Means Cluster Analysis UC Business Analytics R Programming Guide
19 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
Cluster Analysis
No ratings yet
Cluster Analysis
29 pages
Week 10
No ratings yet
Week 10
41 pages
Unit 7 Clustering
No ratings yet
Unit 7 Clustering
56 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Lec09 Clustering
No ratings yet
Lec09 Clustering
27 pages
Data Mining Assignment 3
No ratings yet
Data Mining Assignment 3
9 pages
Model Definition11
No ratings yet
Model Definition11
6 pages
g (y) = βo + β (Age) - (a)
No ratings yet
g (y) = βo + β (Age) - (a)
6 pages
Model Definition
No ratings yet
Model Definition
6 pages
Clustering
No ratings yet
Clustering
80 pages
SML Hand Note Bau by DT
No ratings yet
SML Hand Note Bau by DT
1 page
BCA Semester VI Data Mining Module 4 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 4 (Presentation Kind of N
56 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Classification
No ratings yet
Classification
7 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
Kmeans
No ratings yet
Kmeans
6 pages
Unit 3 - KmeansClustering
No ratings yet
Unit 3 - KmeansClustering
17 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
20 pages
Unit 6
No ratings yet
Unit 6
22 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Machine Learning-4
No ratings yet
Machine Learning-4
73 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
Untitled Document 15
No ratings yet
Untitled Document 15
7 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
CS8091 - Big Data Analytics - Unit 2
No ratings yet
CS8091 - Big Data Analytics - Unit 2
44 pages
8-cluster
No ratings yet
8-cluster
33 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chapter III
No ratings yet
Chapter III
8 pages
Chapter 2 Part 1
No ratings yet
Chapter 2 Part 1
20 pages
Chapter 1&2
No ratings yet
Chapter 1&2
57 pages
CH 7
No ratings yet
CH 7
25 pages
Cha 1
No ratings yet
Cha 1
20 pages
Chapter 2 Part 3
No ratings yet
Chapter 2 Part 3
13 pages
F Chapter 4 Harmonics and Migitation
No ratings yet
F Chapter 4 Harmonics and Migitation
16 pages
4.3.2. Relative Motion Analysis of General Plane Motion
No ratings yet
4.3.2. Relative Motion Analysis of General Plane Motion
2 pages
CH 5
No ratings yet
CH 5
12 pages
Work Shop 1 Handout
No ratings yet
Work Shop 1 Handout
16 pages
CH 4
No ratings yet
CH 4
19 pages
Applied Modern Physics
No ratings yet
Applied Modern Physics
17 pages
CH 1
No ratings yet
CH 1
28 pages
F-M-A, Worksheet, 2023
No ratings yet
F-M-A, Worksheet, 2023
4 pages
Chapter 1,2,3
No ratings yet
Chapter 1,2,3
28 pages
Chapter Two (Welding)
No ratings yet
Chapter Two (Welding)
26 pages
Inclusive Slide
No ratings yet
Inclusive Slide
202 pages
Electronic Ciriuct Simulation LTspice Softwer
No ratings yet
Electronic Ciriuct Simulation LTspice Softwer
47 pages
(PDF) Chapter 7 - Dielectrics - Free Download PDF
No ratings yet
(PDF) Chapter 7 - Dielectrics - Free Download PDF
32 pages
Yuunivarsiitii Mattuu Mattu University
No ratings yet
Yuunivarsiitii Mattuu Mattu University
2 pages
CH 3
No ratings yet
CH 3
15 pages
Work Shet RV
No ratings yet
Work Shet RV
9 pages
Chapter - One (1) Control
No ratings yet
Chapter - One (1) Control
7 pages
Vector Differential Calculus
No ratings yet
Vector Differential Calculus
27 pages
Chapter 4, Kinematics of Rigid Body, Introduction
No ratings yet
Chapter 4, Kinematics of Rigid Body, Introduction
7 pages
Greenmeadows Draft
No ratings yet
Greenmeadows Draft
38 pages
Ch06 - Formulation of Bar and Beam Elements
No ratings yet
Ch06 - Formulation of Bar and Beam Elements
29 pages
Secnav
No ratings yet
Secnav
21 pages
KMP 2
No ratings yet
KMP 2
7 pages

03

Uploaded by

03

Uploaded by

Michael Melese (Ph.D.

where: y – Dependent variable,

(∑ +)(∑ ( - ) − (∑ ()(∑ (+) 3 ∑ &% 4(∑ &)(∑ %)

where ci is the set of points that belong to cluster. The K-

¡ Categorical variables the Hamming distance

You might also like