0% found this document useful (0 votes)

56 views12 pages

Machine Learning Bloque 4

This document discusses computational complexity and unsupervised learning techniques for clustering unlabeled data. It covers: 1) Computational complexity analyzes the resources like time and space needed to solve problems, and how algorithms can be improved by reducing their order of growth. 2) K-means clustering partitions observations into K clusters by minimizing distances between observations and assigned cluster centroids. The optimal number of clusters K can be estimated using the elbow method. 3) Hierarchical clustering builds a dendrogram tree of clusters without needing to predefine the number of clusters as in K-means. It groups observations sequentially based on their distances.

Uploaded by

Alba Morales

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views12 pages

Machine Learning Bloque 4

Uploaded by

Alba Morales

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

04 UNSUPERVISED LEARNING

4.0 COMPUTATIONAL COMPLEXITY

We don’t have infinity resources to solve a ML problema. We don’t
have even infinite time to wait. That means that having the best
algorithm is not useful if its Will waste all of our resources or Will
last years in its execution. Computation Complexity is the part of
Computer Sciences that analyse the resources needed to solve a
problema. We need to consider two types of resources: Time (CPU
cycles of our compurter) and Space (memory employed in solving
the problema).

Ex1: sum of natural numbers.

Let’s analyse the resources in terms of Limite:

− We need 3 variables (X, i, Limite)

− We need to perform 1 + Limite*2 + 1 operations and grows with Limite

So, if we name the growing variables as “n”, our algorithm needs:

− 3+0·n → space in memory

− 2+2·n → CPU operations (time)

Our algorithm is of order O(n) and needs a space of O(1)

So, if we know the complexity of an algorithm and the space it ocuppiesin memory, we could
improve our computer. If an algorithm is O(n) and we duplicate the CPU we solve it in half the
time

Ex 2 – Finding roots of an equation 3·x + 7·y - 2·z =134

We need 1 + nnn (3 + 3 +1 +1)=1+8n3 = O(n3 )

If in our problema n is 10.000 ➔ 1012 operations. If each

operation last 1μs and we duplicate the CPU instead of
277 hours, we Will use 139. Is that aceptable?

1
04 UNSUPERVISED LEARNING

4.1 K-MEANS CLUSTERING

CLUSTERING: We have seen in the introduction and in the examples of data pre-processing that
normally the data is defined by a series of characteristics and values, which is the result of the
learning that we want to calculate. In the example of the ICU wine database, we saw that there
were three types of wine (1,2,3). However, this is not always the case. Many times, we have the
data without any label that tells us what it corresponds to. Normally this happens in the early
stages of a data analysis or mining, when we are not able to identify tags that group this data.
For example, if we have a bunch of photographs from the Internet without any classification
criteria, we can use unsupervised learning to try to group them in a coherent way.

• Clustering tries to look for homogeneous groupings in observations.

• One of the typical applications of clustering is customer segmentation in marketing. Almost
all companies store data on the behaviour of their customers (location, use of products,
spending, incidents, queries, payments, etc.), but they do not know very well what to do
with them and if all the data can be extracted lessons. One of the first things we can do is
clustering to classify the data so we can continue the analysis.
• There are many techniques in clustering, the two best known being K-means and
hierarchical clustering.

K-MEANS CLUSTERING: In K-means clustering we seek to

partition the observations into a pre-specied number of
clusters.

The operation of this technique is as follows: we create a number K of non-overlapping clusters

and assign each data (in principle observations) to a single cluster. In case the result is not
adequate, we can redistribute the data to another cluster, until we get a classification that meets
our criteria.

Let Cj be the set of observations found in cluster j. Obviously, the set of clusters satisfies: 𝐶1 ∪
𝐶2 ∪ ⋯ ∪ 𝐶𝑘 = 1,2, … , 𝑛 𝑎𝑛𝑑 𝐶𝑖 ∩ 𝐶𝑗 = 0. That is, all observations are in some cluster and there
is no single observation in two different clusters. How can we know if one clustering is better or
worse than another?Which is the best k for kMeans?
• By business needs – We need to classify our customers en k cluster for some purpose.
• By number of measures – We could estimate k as Sqrt(n/2) for example.
• Using the Within Cluster Variations – We define a distance between data and assign
data to the nearest.

Within-cluster variation W(Ck ). Intuitively, if that measure is small, the observations will be very
“close" and our clustering will be good. Therefore, let's minimize W(Ck ) for all k, that is:

Let’s have a look at k (number of clusters): When increasing k we also reduce the WCV, but
makes no sense for our goal (get the data classified). How must we define k, using WCV?

2
04 UNSUPERVISED LEARNING

There are many options for defining W. We are going to use the Euclidean distance, so that our
clustering problem is as follows:

For K clusters and n observations, we have K n possible clusters: for 150 observations and 3
clusters, the options are 1071 different possible clusters. We cannot address it, even for such a
simple case. To fix it, let's use the following algorithm:

1. We choose a value for K (number of clusters)

2. We assign each observation to a cluster (a value of K)
3. In each of the K clusters we calculate the centroid (vector
of means of the characteristic p)
4. We assign each observation that has the nearest centroid,
according to its Euclidean distance
5. We repeat steps 3 and 4 until we get that there are no
more data changes between clusters, or we get an
acceptable result according to some other criterion.

It can happen, as with any other minimization problem, that the algorithm finds a local
minimum. The solution is to run the algorithm several times with different initial data
assignments and that we stay with the one that gets a lower value for ΣW.

In the description of the previous algorithm, we calculated the new centroids once we assigned
all the data. This is known as Lloyd's algorithm (Lloyd k-mean). However, we would be more
precise if before assigning the next point we recalculated the centroids (MacQueen's algorithm).

If we have a lot of data, continuously calculating the distances of the data to the centroids can
be exhausting for the CPU. Elkan's algorithm makes use of the triangular inequality between a
point and different centroids, to avoid performing calculations. › We will use Kmeans of sklearns.
After initializing Kmeans, we have to invoke fit_predict to get the results.

THE ELBOW METHOD

One of the drawbacks of the k-Means method is that we have to
set k without knowing what the most appropriate value is. To
help us at this point we can use the elbow method, which
consists of running the algorithm for different values of k and
calculating the internal distortion, defined as the sum of the
squares of the distance from the data to the nearest centroid
(inertia). represent it graphically. As we can see, k=3 is a good
value.

SILHOUETTE METHOD
Suppose we have clustered a dataset with k-Means. We define the average distance, or
dissimilarity, of a point i, belonging to the cluster CK as the average distance between that point
and the rest of the points of its same cluster:

3
04 UNSUPERVISED LEARNING

Below we calculate the average distance, or dissimilarity, between that point and the rest of the
clusters:

And now we calculate the smallest of those numbers for each i:

That is, b(i) is the minimum average distance from i to another cluster other than your own. That
cluster L, for which b(i) is minimal, would be the second-best classification option for i.

Finally, we define the silhouette of a data i as:

And s(i)=0 if CK has only one element. From this formula we can see that:

Let us now look at the extreme cases. For example, if

a(i)<>b(i) then the opposite happens: element i would be
much better classified in cluster L. Finally, if both distances
are equal, the a(i) could belong to either cluster.

4.2 HIERARCHICAL CLUSTERING

One of the disadvantages of k-Means clustering is that we have to define in advance the number
of clusters we want and perform various tests to see which is the number that best suits our
model.

Hierarchical clustering is another clustering method in which information is represented in a tree

called a dendogram. Let's first analyze the dendogram generation algorithm and then its
interpretation.

DENDOGRAM A dendogram is a representation of observations (horizontal

axis) and their clustering (vertical axis): The observations are represented in
green, as if they were the endings of the filaments of the dendrites of a
neuron, while the groupings of observations are represented on the vertical
axis as the tree of dendrites, until they reach the nucleus of the neuron.

4
04 UNSUPERVISED LEARNING

There are many ways to define a distance between observations. Perhaps the simplest and most
used is the Euclidean distance, but it is not the only one. In fact, we can use the following:

In the following image we see three sets of observations on 20

variables. Observations 1 and 3 have similar values and
therefore a lower Euclidean distance than with 2, however they
are poorly correlated.

CLUSTER LINKAGE Suppose we have N observations. The first step

is to define some kind of distance between observations
(dissimilarity), which is usually the Euclidean distance, but we can
also use Euclidean square, Manhattan distance, Chebyshev,
Mahalanobis, Hamming codes, or Levenshtein.

At the beginning of the algorithm, each observation (n) is its own

cluster. And this is our initial dendrogram.

We pick first cluster (point a) and calculate its distance to all other clusters to find which one is
the nearest. Once found (suppose it is CB ), we merge both clusters into a single Cab .

Wow! We just have created a cluster. And our initial dendogram Will have the
following shape:

We continue with the next cluster (point C, f.i.). Buuuuut, we need to compute the distance to
other one-point-clusters and to 1 two-point-cluster (AB). How should be done this?

To obtain the dissimilarity between clusters we cannot use the Euclidean distance. Instead, we
define the linkage.

5
04 UNSUPERVISED LEARNING

Once decided the type of distance to be used, we continue with the rest of the 1-point clusters
and repeat again until we get only one cluster.

In the following image we can see different types of

linkage. The complete that measures the maximum
distances is the most balanced.

We repeat the algorithm until there is only one cluster left and represent the
result .

As we can see, we have a representation of the observations (final axons, in

green) and various unions by proximity between the different clusters
(dendrites).

The way to generate a final clustering with the dendogram is to decide the
vertical point at which to cluster. For example, if we cut in y=9 we have two
clusters :

If we cut in 5 we have 3 clusters left: The term hierarchical refers to the fact
that we can establish a hierarchy (an order) in the data. If our data refers to
objects for which a measure of distance is unclear, hierarchical clustering is not
the best option.

Decisions when using k-Means clustering

There are many decisions we have to make when applying k-Means to a set of observations. To
illustrate this, we are going to assume that we are analyzing the shopping cart of some
supermarkets and that we have the data of what each customer has bought. I mean:

6
04 UNSUPERVISED LEARNING

1. Standardization – When there is data of very different values we may be interested in

standardizing the data. For example, if we are going to consider in our analysis the expense
per customer, the fact that a customer buys a toothbrush can change everything.
2. Dissimilarity – What metric do we have to use? If we use the number of elements as an
indicator, the distances between products will be very different from if we use total
expenditure (e.g. Coca-Cola).
3. Linkage – To obtain the right linkage, we will have to perform many tests, or use sets of tests
that allow us to determine the most appropriate method to the problem we want to solve.
4. Number of clusters - We've seen this before, both in k-Means and in the cut in the
dendograms.

One of the problems of clustering is that we necessarily assign each to a cluster, which is not
always necessary. If, for example, we have a lot of noise, outliers, or measures for which it is not
yet convenient to define a cluster, we will have an unrealistic classification. There are mixed
methods of classification, such as soft K-means, that can help in this problem. See book Hastie.

Another problem is stability in the face of changes in observations. If we remove or add some
observations, clusters can undergo major alterations.

4.3 PCA ANALYSIS (fotos)

Principal components analysis (PCA) refers to the process by which principal components
análisis. Principal components are computed, and the subsequent use of these components in
understanding the data. PCA is an unsupervised approach, since it involves only a set of features
X1;X2; : : : ;Xp, and no associated response Y . Apart from producing derived variables for use in
supervised learning problems, PCA also serves as a tool for data visualization (visualization of
the observations or visualization of the variables). It can also be used as a tool for data
imputation, that is, for lling in missing values in a data matrix.

Suppose we want to visualize n observations with measurements in a set of characteristics p (X1

, X2 ... Xp ) as part of an exploratory data analysis. We could do this by examining two-
dimensional scatter plots of the data, each of which contains measurements of n observations
on two of the features.

However, there are 𝑝/2 = (𝑝( 𝑝−1))/ 2 possible charts; for example, with p = 10 there are 45
possibilities! If p is large, it will not be possible to look at them all; in addition, and most likely
none provide information, since each of them contains only a small fraction of the total
information present in the data set. Clearly we need a better method to visualize n observations
when p is large. In particular, we would like to find a low-dimensional representation of the data
that captures as much information as possible..

PCA provides a tool to do just this. Find a low-dimensional representation of a dataset that
contains as much of the variability as possible. The underlying idea is that not all the observations
we have are equally interesting. PCA looks for a small number of dimensions that are as
representative as possible. Each of the dimensions found by PCA is a linear combination of the
characteristics p.

The first principal component of a set of features X1 , X2 ... Xp with zero mean is the normalized
(normalized, means that the norm is 1) linear combination of the features, that has the largest
variance:

7
04 UNSUPERVISED LEARNING

So, our objective is to find the linear combination of features that account for the largest
varianze. That is, find the:

Subject to the normalization constraint:

Since X has 0 mean, also Z has 0 mean and the problema reduces to maximize variance

PCA Analysis can also be used to compress images.

1) We Split the image in channels, do a PCA of 20 components, and apply channel
transformation
2) Then we stack the channel data to conform the image and sabe
3) Images for original, PCA=20, 50, 100 and 200
4.4 AFFINITY PROPAGATION
This method has the advantage of no a-prioristic decisión of the number of clusters (k). Its
based on the distance (euclidean) among the different observations (negative distances).

We then construct three matrixes: Similarity (S), Responsability (R), Availability (A).

Similarity Matrix: Its constructed using the negative distance between two observations (k,l)

For the diagonal we select the lowest value of the whole S matrix
(for having less clusters) of the máximum, for the opposite. Also, an
intermediate value can be chosen. Also, its recommended to use the
median of column values, mean, etc

We fill the diagonal with the lowest value of all the S Matrix:
▪ Availability Matrix: we begin with all elements of the matrix equal to zero 𝒂𝑎,𝑏 = 0
▪ Responsability Matrix: Then we evaluate the
responsability. As a=0 r is s
minus the máximum value
of the row

Availability Matrix – We compute this matrix with the following formulas:

8
04 UNSUPERVISED LEARNING

Now, we construct the Criterion Matrix –

4.5 BIRCH ANALYSIS

Balanced Iterative Reducing and Clustering using
Hierarchies (BIRCH) is a clustering algorithm
(unsupervised) very suitable for large amounts of
data.

Main difference with other algorithms is that it

doesn’t need to load into memory all the data. Its also
used in data mining combined with other clustering
methods.
It uses a tree like structure called Clustering Features
Tree (CFT) and each node of the tree contains several
Clustering Features (CF) leaves, or inner nodes.
Each CF has the form of (N, LS, SS)

▪ N – Number of features of the CF

▪ LS – Is a vector of the sum of features of the members of the
cluster.
▪ SS – Is a vector of the sum of squares of the feature of the
members of the cluster

So, a CF is a multi-vector (N,LS, SS) and can be added to another CF (N1+ N2 ,LS1+LS2 ,SS1+SS2 )
A CF Tree has a root node is comprised of some CF, pointing each of them to an inner CF node,
which also is built of some CF nodes, which also point every node to another CF node ….. Until
we reach the leafs (no more CF nodes). Features are stored on the leafs nodes. Inner and root
nodes holds only aggregates. The vector of any CF node equals the sum of the vector of his
bottom nodes, until the leaf’s:

CF1= CF7+···+ CF12

CF7= CF90+···+ CF94

Three important parameters:

• › B = Branching factor - maximum CF number in each internal node
• › L = maximum CF number in each leaf
• › T = máximum sample diameter threshold of each CF in each node.

9
04 UNSUPERVISED LEARNING

Also, we can define some statistics for every cluster.

HOW TO CREATE A TREE?

1. Define initial parameters: B, L and T
2. Populate the first CF (leaf node) with the first feature, and update all statistics (N, LS, SS,
X0 , R, D)
3. Try to add the next feature:
4. Look at the root nodes and select the node with centroid closer to the feature value.
5. Descend through the tree and select the appropriate leaf (centroid next to the feature)
6. Evaluate the diameter of the node.
7. If its less than T, add it to the node and update statistics (L, LS, SS, X0 ) of all branch
8. If its greater than T, you should create a new node on the same level and update.
9. If n=B an is not possible to add a new feature, create a new node and update statistics
10. To avoid the tree gets unbalanced, redistribute the nodes.

EXAMPLE: Consider the following set: {22,9,12,15,18,27,11,36,10,3,14,32} B=2 ; L=5 ; T=5

Step 1. 22

Step 2. 9
We try to insert in Node 1 and evaluate
statistics:
CF=(n,LS,SS) =(2,22+9,22²+9²)
=(2,31,565)

13 > T ➔ We cannot insert 9 into the first node, so we créate a new

same-level node (leaf node).

Step 3. 12
We look at the centroids. 12 is closest to the second node. We try to insert on it and evaluate
Diameter to compare with T CF=(n,LS,SS)=(2,9+12,9²+12²)=(2,21,225) and D=3 And update
statistics:

Step 4. 15

10
04 UNSUPERVISED LEARNING

We look at the centroids. 15 is closest to 10.5, the second node. We try to insert on it and
evaluate Diameter to compare with T CF=(n,LS,SS)=(3,21+15,225+15²)=(3,36,450) and D=4.24
And update statistics:

Step 5. 18
We look at the centroids. 18 is closest to 22, the first node. We try to insert on it and evaluate
Diameter to compare with T CF=(n,LS,SS)=(2,22+18,22²+18²)=(2,40,808) and D=4 And update
statistics:

Step 6. 27,
We look at the centroids. 27 is closest to 20, the first node. We try to insert on it and evaluate
Diameter to compare with T CF=(n,LS,SS)=(3,40+27,808+27²)=(3,67,1537) and D=6.37
We need to create a new node, but B=2 and we cannot add it to the root level➔ Splitting the
tree

Step 7. 11
We look at the centroids of the root level. 11 is closest to 12, the second node. We descend
down and select the unique leaf (3,36,450). Let’s try to insert on it and evaluate Diameter to
compare with T CF=(n,LS,SS)=(4,47,571) and D=3.53
And update statistics:

Step 8. 36

We look at the centroids of the root level. 36 is closest to 22.23, the first root node. We descend
down and look again the centroids. 36 is close to 26.
Lets try to insert on it and evaluate Diameter to compare with T CF=(n,LS,SS)=(2,63,2025) and
D=9.
But there is no space to create a new leaf node
(B=2), so we need to Split the node and create
a Branch. But, there is space on other same-
level nodes. We need to re-distribute the
nodes:

11
04 UNSUPERVISED LEARNING

Now, we look again at the centroid of the moved leaf cluster. CF=(n,LS,SS)=(2,63,2022) and
D=8.6
We add it to the free leave and update
statistics of the leave node and all upper
nodes!!

We can continue with this procedure until we get:

- Clusters are the leave nodes of the tree.

- After creating the tree, we get all the
clusters.
- To add new data, we don’t need to look
at the whole tree.
- Neither need we to revaluate
parameters of all the tree, only from
the affected nodes
- The complexity of this method is O(n)
- We can categorize data flows on a
continuous way with BIRCH.

And if we have multi-featured data? Instead of a vector {N,LS,SS] we will a multivector: {{N1 ,
LS1 , SS1 }, {N2 , LS2 , SS1 },…{Nn , LSn , SS n }}

Data Analyst - Assignment
50% (2)
Data Analyst - Assignment
3 pages
UNIT-1 Regression vs. Classification
No ratings yet
UNIT-1 Regression vs. Classification
25 pages
Clustering Examples
No ratings yet
Clustering Examples
47 pages
Microsoft Power Tools For Data Analysis Class Introduction Video
100% (1)
Microsoft Power Tools For Data Analysis Class Introduction Video
16 pages
Rainfall Prediction Using Machine Learning
No ratings yet
Rainfall Prediction Using Machine Learning
50 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Capstone Research Project - PGDM3921
No ratings yet
Capstone Research Project - PGDM3921
38 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Clustering
No ratings yet
Clustering
80 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
Unit 2 - Descriptive Analytics
No ratings yet
Unit 2 - Descriptive Analytics
74 pages
Module 4
No ratings yet
Module 4
63 pages
GraphPad Prism Slides
No ratings yet
GraphPad Prism Slides
79 pages
CE345 - Lecture #9 - Clustering
No ratings yet
CE345 - Lecture #9 - Clustering
56 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
Unit 4
No ratings yet
Unit 4
63 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Manova-Test 2
No ratings yet
Manova-Test 2
43 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Obstacle Avoiding Car
No ratings yet
Obstacle Avoiding Car
10 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Ch5. Introduction To Big Data and Data Analytics
No ratings yet
Ch5. Introduction To Big Data and Data Analytics
11 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
66 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Clustering
No ratings yet
Clustering
75 pages
EML %TH Module
No ratings yet
EML %TH Module
40 pages
Unit 4
No ratings yet
Unit 4
74 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
Week 9
No ratings yet
Week 9
66 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
IDS Unit-3 L2
No ratings yet
IDS Unit-3 L2
26 pages
Clustering
No ratings yet
Clustering
20 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
K Means
No ratings yet
K Means
25 pages
21AI71 Module 5 Textbook
No ratings yet
21AI71 Module 5 Textbook
25 pages
Clustering
No ratings yet
Clustering
84 pages
Som New
No ratings yet
Som New
21 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
Pearson Product Moment Correlation Coefficient (Pearson R) Final
100% (1)
Pearson Product Moment Correlation Coefficient (Pearson R) Final
20 pages
Module 4 - 5TH Sem
No ratings yet
Module 4 - 5TH Sem
23 pages
Xiao 19 Fprmanuscript Presentbiasandfinancialbehavior
No ratings yet
Xiao 19 Fprmanuscript Presentbiasandfinancialbehavior
34 pages
Unsupervised Learning Final
No ratings yet
Unsupervised Learning Final
17 pages
Unit Iv
No ratings yet
Unit Iv
19 pages
Introduction To Machine Learning-Presentation
No ratings yet
Introduction To Machine Learning-Presentation
28 pages
Supervised Learning vs. Unsupervised Learning
No ratings yet
Supervised Learning vs. Unsupervised Learning
7 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
Clustering
No ratings yet
Clustering
35 pages
Unit 4 Mining
No ratings yet
Unit 4 Mining
12 pages
Lecture 3 Types of Machine Learning
No ratings yet
Lecture 3 Types of Machine Learning
40 pages
Retail Sales Forecasting
No ratings yet
Retail Sales Forecasting
31 pages
K Means Final
No ratings yet
K Means Final
10 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Stephen Reji Resume Final
No ratings yet
Stephen Reji Resume Final
2 pages
Biostats Merged
No ratings yet
Biostats Merged
8 pages
DMAIC Process
No ratings yet
DMAIC Process
6 pages
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
No ratings yet
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
23 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Abdul Azam - Final Research Report
No ratings yet
Abdul Azam - Final Research Report
9 pages
FM WCM Analysis
No ratings yet
FM WCM Analysis
44 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Clustering
No ratings yet
Clustering
39 pages
Database Lesson 1 To 5
No ratings yet
Database Lesson 1 To 5
10 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Research 2 Assesment Superio
No ratings yet
Research 2 Assesment Superio
14 pages
A3.2 - Means and Proportions, One and Two Samples
No ratings yet
A3.2 - Means and Proportions, One and Two Samples
4 pages
Template Igj
No ratings yet
Template Igj
5 pages
Clustering
No ratings yet
Clustering
7 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Social Strategy 2
No ratings yet
Social Strategy 2
10 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
Bai Tap Data
No ratings yet
Bai Tap Data
4 pages
04 Jackknife PDF
No ratings yet
04 Jackknife PDF
22 pages
Cucconi Test
No ratings yet
Cucconi Test
2 pages
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Task Request - Data Analytics Team
No ratings yet
Task Request - Data Analytics Team
3 pages
Reliabilitas Dan Validitas: Case Processing Summary
No ratings yet
Reliabilitas Dan Validitas: Case Processing Summary
3 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Machine Learning Bloque 4

Uploaded by

Machine Learning Bloque 4

Uploaded by

04 UNSUPERVISED LEARNING

4.0 COMPUTATIONAL COMPLEXITY

Ex1: sum of natural numbers.

Let’s analyse the resources in terms of Limite:

− We need 3 variables (X, i, Limite)

So, if we name the growing variables as “n”, our algorithm needs:

− 3+0·n → space in memory

Our algorithm is of order O(n) and needs a space of O(1)

Ex 2 – Finding roots of an equation 3·x + 7·y - 2·z =134

We need 1 + n*n*n (3 + 3 +1 +1)=1+8n3 = O(n3 )

If in our problema n is 10.000 ➔ 1012 operations. If each

4.1 K-MEANS CLUSTERING

• Clustering tries to look for homogeneous groupings in observations.

K-MEANS CLUSTERING: In K-means clustering we seek to

The operation of this technique is as follows: we create a number K of non-overlapping clusters

1. We choose a value for K (number of clusters)

THE ELBOW METHOD

And now we calculate the smallest of those numbers for each i:

Finally, we define the silhouette of a data i as:

Let us now look at the extreme cases. For example, if

4.2 HIERARCHICAL CLUSTERING

Hierarchical clustering is another clustering method in which information is represented in a tree

DENDOGRAM A dendogram is a representation of observations (horizontal

In the following image we see three sets of observations on 20

CLUSTER LINKAGE Suppose we have N observations. The first step

At the beginning of the algorithm, each observation (n) is its own

In the following image we can see different types of

As we can see, we have a representation of the observations (final axons, in

Decisions when using k-Means clustering

1. Standardization – When there is data of very different values we may be interested in

4.3 PCA ANALYSIS (fotos)

Suppose we want to visualize n observations with measurements in a set of characteristics p (X1

Subject to the normalization constraint:

PCA Analysis can also be used to compress images.

Availability Matrix – We compute this matrix with the following formulas:

Now, we construct the Criterion Matrix –

4.5 BIRCH ANALYSIS

Main difference with other algorithms is that it

▪ N – Number of features of the CF

CF1= CF7+···+ CF12

CF7= CF90+···+ CF94

Three important parameters:

Also, we can define some statistics for every cluster.

HOW TO CREATE A TREE?

EXAMPLE: Consider the following set: {22,9,12,15,18,27,11,36,10,3,14,32} B=2 ; L=5 ; T=5

13 > T ➔ We cannot insert 9 into the first node, so we créate a new

We can continue with this procedure until we get:

- Clusters are the leave nodes of the tree.

You might also like

We need 1 + nnn (3 + 3 +1 +1)=1+8n3 = O(n3 )