Unsupervised learning clustering

Nov 16, 2017Download as PPTX, PDF5 likes1,455 views

This document discusses unsupervised learning and clustering. It defines unsupervised learning as modeling the underlying structure or distribution of input data without corresponding output variables. Clustering is described as organizing unlabeled data into groups of similar items called clusters. The document focuses on k-means clustering, describing it as a method that partitions data into k clusters by minimizing distances between points and cluster centers. It provides details on the k-means algorithm and gives examples of its steps. Strengths and weaknesses of k-means clustering are also summarized.

CH -10
Unsupervised Learning and Clustering
By:
Arshad Farhad
20177716

Contents
 Supervised vs Unsupervised learning
 Introduction to clustering
 K-means Clustering
 Hierarchical clustering
 Conclusion

Supervised Vs Unsupervised Learning
 Supervised learning is where you have input variables (x) and an output variable (Y)
and you use an algorithm to learn the mapping function from the input to the
output.
Y = f(X)
 The goal is to approximate the mapping function so well that when you have new
input data (x) that you can predict the output variables (Y) for that data
 Unsupervised learning is where you only have input data (X) and no corresponding
output variables
 The goal for unsupervised learning is to model the underlying structure or
distribution in the data in order to learn more about the data.
 Unsupervised learning problems can be further grouped into clustering and
association problems.
 Clustering
 Association

What is clustering?
• The organization of unlabeled data into similarity
groups called clusters.
• A cluster is a collection of data items which are “similar”
between them, and “dissimilar” to data items in other
clusters.

Distance (dissimilarity) measures
 Euclidean distance between points i and j is the length of the line segment
connecting them
 In Cartesian coordinates, if i = (i1, i2,…in) and q = (q1, q2,…qn) then the
distance (d) from i to j, or from j to i is given by:

Cluster Evaluation
• Intra-cluster cohesion (compactness):
– Cohesion measures how near the data points in a cluster
are to the cluster centroid.
– Sum of squared error (SSE) is a commonly used
measure.
• Inter-cluster separation (isolation):
– Separation means that different cluster centroids should
be far away from one another.

K-Means clustering
• K-means (MacQueen, 1967) is a partitional clustering
algorithm
• The k-means algorithm partitions the given data into
k clusters:
– Each cluster has a cluster center, called centroid.
– k is specified by the user

K-means algorithm
• Given k, the k-means algorithm works as follows:
1. Choose k (random) data points (seeds) to be the initial
centroids, cluster centers
2. Assign each data point to the closest centroid
3. Re-compute the centroids using the current cluster
memberships
4. If a convergence criterion is not met, repeat steps 2 and 3

K-means clustering example: step 1
Choose k (random)

K-means clustering example – step 2
Assign each data point to the closest centroid

Why use K-means?
• Strengths:
– Simple: easy to understand and to implement
– Efficient: Time complexity: O(tkn),
– where n is the number of data points,
– k is the number of clusters, and
– t is the number of iterations.
– Since both k and t are small. k-means is considered a linear
algorithm.
• K-means is the most popular clustering algorithm.
• Note that: it terminates at a local optimum if SSE is used.
The global optimum is hard to find due to complexity.

Weaknesses of K-means
• The algorithm is only applicable if the mean is
defined.
– For categorical data, k-mode - the centroid is
represented by most frequent values.
• The user needs to specify k.
• The algorithm is sensitive to outliers
– Outliers are data points that are very far away
from other data points.
– Outliers could be errors in the data recording or so
me special data points with very different values.

K-means summary
• Despite weaknesses, k-means is still the most
popular algorithm due to its simplicity and ef
ficiency
• No clear evidence that any other clustering
algorithm performs better in general
• Comparing different clustering algorithms is a
difficult task. No one knows the correct clust
ers!

This document discusses various unsupervised machine learning clustering algorithms. It begins with an introduction to unsupervised learning and clustering. It then explains k-means clustering, hierarchical clustering, and DBSCAN clustering. For k-means and hierarchical clustering, it covers how they work, their advantages and disadvantages, and compares the two. For DBSCAN, it defines what it is, how it identifies core points, border points, and outliers to form clusters based on density.

Decision tree and random forestLippo Group Digital

The document discusses decision trees and random forest algorithms. It begins with an outline and defines the problem as determining target attribute values for new examples given a training data set. It then explains key requirements like discrete classes and sufficient data. The document goes on to describe the principles of decision trees, including entropy and information gain as criteria for splitting nodes. Random forests are introduced as consisting of multiple decision trees to help reduce variance. The summary concludes by noting out-of-bag error rate can estimate classification error as trees are added.

Communication network .pptNargis Ehsan

This document discusses different types of communication networks within an organization: 1. Internal communication occurs within the organization, such as between a teacher and student. 2. External communication links the organization to outside entities, like between heads of two organizations. 3. Vertical communication flows up and down the hierarchy, including downward communication from managers to subordinates and upward communication with feedback from subordinates to managers. 4. Horizontal communication occurs between peers of equal rank in an organization. 5. Diagonal communication combines vertical and horizontal flows across different levels and departments in an organization.

Naive BayesCloudxLab

- Naive Bayes is a classification technique based on Bayes' theorem that uses "naive" independence assumptions. It is easy to build and can perform well even with large datasets. - It works by calculating the posterior probability for each class given predictor values using the Bayes theorem and independence assumptions between predictors. The class with the highest posterior probability is predicted. - It is commonly used for text classification, spam filtering, and sentiment analysis due to its fast performance and high success rates compared to other algorithms.

Nursing technology informatics presentationLeeann Sills

Leeann Sills presents on the topic of nursing informatics. Nursing informatics involves using computers and information technology to support nursing practice, education, research, and administration. It aims to help manage and process nursing data and information to support decision making. Some key benefits include improved access to patient records, decreased data entry redundancy, and increased time for patient care through automation. Challenges include upfront costs, need for training, and ensuring privacy and security of patient information. Nursing informatics is still emerging but will continue growing in importance as health care delivery increasingly relies on technology.

Presentation on RoboticsMuhammad Awais

Unit 3 dsa LINKED LISTPUNE VIDYARTHI GRIHA'S COLLEGE OF ENGINEERING, NASHIK

The document discusses linked lists, which are a linear data structure consisting of nodes connected to each other via pointers. Each node contains data and a pointer to the next node. There are several types of linked lists including singly linked lists where each node has a next pointer, doubly linked lists where each node has next and previous pointers, and circular linked lists where the last node points to the first node. The document covers terminology, advantages and disadvantages, operations, and implementations of different types of linked lists such as dynamic vs static memory allocation and uses in applications.

Terminology of treeRacksaviR

Unsupervised learningamalalhait

This document discusses unsupervised learning approaches including clustering, blind signal separation, and self-organizing maps (SOM). Clustering groups unlabeled data points together based on similarities. Blind signal separation separates mixed signals into their underlying source signals without information about the mixing process. SOM is an algorithm that maps higher-dimensional data onto lower-dimensional displays to visualize relationships in the data.

supervised learningAmar Tripathi

This document provides an overview of machine learning concepts including supervised learning, unsupervised learning, and reinforcement learning. It explains that supervised learning involves learning from labeled examples, unsupervised learning involves categorizing without labels, and reinforcement learning involves learning behaviors to achieve goals through interaction. The document also discusses regression vs classification problems, the learning and testing process, and examples of machine learning applications like customer profiling, face recognition, and handwritten character recognition.

Machine learning clusteringCosmoAIMS Bassett

This document discusses machine learning concepts including supervised vs. unsupervised learning, clustering algorithms, and specific clustering methods like k-means and k-nearest neighbors. It provides examples of how clustering can be used for applications such as market segmentation and astronomical data analysis. Key clustering algorithms covered are hierarchy methods, partitioning methods, k-means which groups data by assigning objects to the closest cluster center, and k-nearest neighbors which classifies new data based on its closest training examples.

K Nearest NeighborsTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

The document discusses the K-nearest neighbors (KNN) algorithm, a simple machine learning algorithm used for classification problems. KNN works by finding the K training examples that are closest in distance to a new data point, and assigning the most common class among those K examples as the prediction for the new data point. The document covers how KNN calculates distances between data points, how to choose the K value, techniques for handling different data types, and the strengths and weaknesses of the KNN algorithm.

Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn

This document discusses support vector machines (SVM) and provides an example of using SVM for classification. It begins with common applications of SVM like face detection and image classification. It then provides an overview of SVM, explaining how it finds the optimal separating hyperplane between two classes by maximizing the margin between them. An example demonstrates SVM by classifying people as male or female based on height and weight data. It also discusses how kernels can be used to handle non-linearly separable data. The document concludes by showing an implementation of SVM on a zoos dataset to classify animals as crocodiles or alligators.

K means clusteringkeshav goyal

K-means clustering is an algorithm that groups data points into k clusters based on their attributes and distances from initial cluster center points. It works by first randomly selecting k data points as initial centroids, then assigning all other points to the closest centroid and recalculating the centroids. This process repeats until the centroids are stable or a maximum number of iterations is reached. K-means clustering is widely used for machine learning applications like image segmentation and speech recognition due to its efficiency, but it is sensitive to initialization and assumes spherical clusters of similar size and density.

Machine Learning-Linear regressionkishanthkumaar

Linear regression is a supervised machine learning technique used to model the relationship between a continuous dependent variable and one or more independent variables. It is commonly used for prediction and forecasting. The regression line represents the best fit line for the data using the least squares method to minimize the distance between the observed data points and the regression line. R-squared measures how well the regression line represents the data, on a scale of 0-100%. Linear regression performs well when data is linearly separable but has limitations such as assuming linear relationships and being sensitive to outliers and multicollinearity.

Ensemble methods in machine learningSANTHOSH RAJA M G

Ensemble methods combine multiple machine learning models to obtain better predictive performance than from any individual model. There are two main types of ensemble methods: sequential (e.g AdaBoost) where models are generated one after the other, and parallel (e.g Random Forest) where models are generated independently. Popular ensemble methods include bagging, boosting, and stacking. Bagging averages predictions from models trained on random samples of the data, while boosting focuses on correcting previous models' errors. Stacking trains a meta-model on predictions from other models to produce a final prediction.

Hierarchical ClusteringCarlos Castillo (ChaTo)

Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...Simplilearn

This Naive Bayes Classifier tutorial presentation will introduce you to the basic concepts of Naive Bayes classifier, what is Naive Bayes and Bayes theorem, conditional probability concepts used in Bayes theorem, where is Naive Bayes classifier used, how Naive Bayes algorithm works with solved examples, advantages of Naive Bayes. By the end of this presentation, you will also implement Naive Bayes algorithm for text classification in Python. The topics covered in this Naive Bayes presentation are as follows: 1. What is Naive Bayes? 2. Naive Bayes and Machine Learning 3. Why do we need Naive Bayes? 4. Understanding Naive Bayes Classifier 5. Advantages of Naive Bayes Classifier 6. Demo - Text Classification using Naive Bayes - - - - - - - - Simplilearn’s Machine Learning course will make you an expert in Machine Learning, a form of Artificial Intelligence that automates data analysis to enable computers to learn and adapt through experience to do specific tasks without explicit programming. You will master Machine Learning concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, hands-on modeling to develop algorithms and prepare you for the role of Machine Learning Engineer Why learn Machine Learning? Machine Learning is rapidly being deployed in all kinds of industries, creating a huge demand for skilled professionals. The Machine Learning market size is expected to grow from USD 1.03 billion in 2016 to USD 8.81 billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period. You can gain in-depth knowledge of Machine Learning by taking our Machine Learning certification training course. With Simplilearn’s Machine Learning course, you will prepare for a career as a Machine Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to: 1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling. 2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project. 3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning. 4. Understand the concepts and operation of support vector machines, kernel SVM, Naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more. - - - - - - - -

KNNBhuvneshYadav13

The document discusses the K-nearest neighbors (KNN) algorithm, a supervised machine learning classification method. KNN classifies new data based on the labels of the k nearest training samples in feature space. It can be used for both classification and regression problems, though it is mainly used for classification. The algorithm works by finding the k closest samples in the training data to the new sample and predicting the label based on a majority vote of the k neighbors' labels.

K - Nearest neighbor ( KNN )Mohammad Junaid Khan

K means Clustering AlgorithmKasun Ranga Wijeweera

K-means clustering is an algorithm that groups data points into k number of clusters based on their similarity. It works by randomly selecting k data points as initial cluster centroids and then assigning each remaining point to the closest centroid. It then recalculates the centroids and reassigns points in an iterative process until centroids stabilize. While efficient, k-means clustering has weaknesses in that it requires specifying k, can get stuck in local optima, and is not suitable for non-convex shaped clusters or noisy data.

Machine Learning ClusteringRupak Roy

Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony

K mean-clusteringAfzaal Subhani

This document provides an introduction to k-means clustering, including: 1. K-means clustering aims to partition n observations into k clusters by minimizing the within-cluster sum of squares, where each observation belongs to the cluster with the nearest mean. 2. The k-means algorithm initializes cluster centroids and assigns observations to the nearest centroid, recomputing centroids until convergence. 3. K-means clustering is commonly used for applications like machine learning, data mining, and image segmentation due to its efficiency, though it is sensitive to initialization and assumes spherical clusters.

K Nearest Neighbor AlgorithmTharuka Vishwajith Sarathchandra

Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn

This document provides an overview of machine learning, including: - Machine learning allows computers to learn from data without being explicitly programmed, through processes like analyzing data, training models on past data, and making predictions. - The main types of machine learning are supervised learning, which uses labeled training data to predict outputs, and unsupervised learning, which finds patterns in unlabeled data. - Common supervised learning tasks include classification (like spam filtering) and regression (like weather prediction). Unsupervised learning includes clustering, like customer segmentation, and association, like market basket analysis. - Supervised and unsupervised learning are used in many areas like risk assessment, image classification, fraud detection, customer analytics, and more

Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn

This presentation on Machine Learning will help you understand what is clustering, K-Means clustering, flowchart to understand K-Means clustering along with demo showing clustering of cars into brands, what is logistic regression, logistic regression curve, sigmoid function and a demo on how to classify a tumor as malignant or benign based on its features. Machine Learning algorithms can help computers play chess, perform surgeries, and get smarter and more personal. K-Means & logistic regression are two widely used Machine learning algorithms which we are going to discuss in this video. Logistic Regression is used to estimate discrete values (usually binary values like 0/1) from a set of independent variables. It helps to predict the probability of an event by fitting data to a logit function. It is also called logit regression. K-means clustering is an unsupervised learning algorithm. In this case, you don't have labeled data unlike in supervised learning. You have a set of data that you want to group into and you want to put them into clusters, which means objects that are similar in nature and similar in characteristics need to be put together. This is what k-means clustering is all about. Now, let us get started and understand K-Means clustering & logistic regression in detail. Below topics are explained in this Machine Learning tutorial part -2 : 1. Clustering - What is clustering? - K-Means clustering - Flowchart to understand K-Means clustering - Demo - Clustering of cars based on brands 2. Logistic regression - What is logistic regression? - Logistic regression curve & Sigmoid function - Demo - Classify a tumor as malignant or benign based on features About Simplilearn Machine Learning course: A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning. We recommend this Machine Learning training course for the following professionals in particular: 1. Developers aspiring to be a data scientist or Machine Learning engineer 2. Information architects who want to gain expertise in Machine Learning algorithms 3. Analytics professionals who want to work in Machine Learning or artificial intelligence 4. Graduates looking to build a career in data science and Machine Learning Learn more at: https://www.simplilearn.com/

Machine Learning (Classification Models)Makerere Unversity School of Public Health, Victoria University

MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf1052LaxmanrajS

MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf1052LaxmanrajS

More Related Content

What's hot (20)

Unsupervised learningamalalhait

supervised learningAmar Tripathi

Machine learning clusteringCosmoAIMS Bassett

K Nearest NeighborsTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn

K means clusteringkeshav goyal

Machine Learning-Linear regressionkishanthkumaar

Ensemble methods in machine learningSANTHOSH RAJA M G

Hierarchical ClusteringCarlos Castillo (ChaTo)

Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...Simplilearn

KNNBhuvneshYadav13

K - Nearest neighbor ( KNN )Mohammad Junaid Khan

K means Clustering AlgorithmKasun Ranga Wijeweera

Machine Learning ClusteringRupak Roy

Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony

K mean-clusteringAfzaal Subhani

K Nearest Neighbor AlgorithmTharuka Vishwajith Sarathchandra

Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn

Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn

Machine Learning (Classification Models)Makerere Unversity School of Public Health, Victoria University

Unsupervised learningamalalhait

supervised learningAmar Tripathi

Machine learning clusteringCosmoAIMS Bassett

K Nearest NeighborsTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn

K means clusteringkeshav goyal

Machine Learning-Linear regressionkishanthkumaar

Ensemble methods in machine learningSANTHOSH RAJA M G

Hierarchical ClusteringCarlos Castillo (ChaTo)

Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...Simplilearn

KNNBhuvneshYadav13

K - Nearest neighbor ( KNN )Mohammad Junaid Khan

K means Clustering AlgorithmKasun Ranga Wijeweera

Machine Learning ClusteringRupak Roy

Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony

K mean-clusteringAfzaal Subhani

K Nearest Neighbor AlgorithmTharuka Vishwajith Sarathchandra

Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn

Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn

Machine Learning (Classification Models)Makerere Unversity School of Public Health, Victoria University

Similar to Unsupervised learning clustering (20)

MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf1052LaxmanrajS

MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf1052LaxmanrajS

Ensemble_instance_unsupersied_learning 01_02_2024.pptxvigneshmatta2004

K-means ClusteringJidhu Mohan M

K_means ppt in machine learning conceptsUdayNani14

Unsupervised learning and clustering.pdfofficialnovice7

15857 cse422 unsupervised-learningAnil Yadav

This document provides an overview of supervised and unsupervised learning, with a focus on clustering as an unsupervised learning technique. It describes the basic concepts of clustering, including how clustering groups similar data points together without labeled categories. It then covers two main clustering algorithms - k-means, a partitional clustering method, and hierarchical clustering. It discusses aspects like cluster representation, distance functions, strengths and weaknesses of different approaches. The document aims to introduce clustering and compare it with supervised learning.

Unsupervised Learning in Machine LearningPyingkodi Maran

The document discusses various unsupervised learning techniques including clustering algorithms like k-means, k-medoids, hierarchical clustering and density-based clustering. It explains how k-means clustering works by selecting initial random centroids and iteratively reassigning data points to the closest centroid. The elbow method is described as a way to determine the optimal number of clusters k. The document also discusses how k-medoids clustering is more robust to outliers than k-means because it uses actual data points as cluster representatives rather than centroids.

Neural nw k meansEng. Dr. Dennis N. Mwighusa

k-Means is a rather simple but well known algorithms for grouping objects, clustering. Again all objects need to be represented as a set of numerical features. In addition the user has to specify the number of groups (referred to as k) he wishes to identify. Each object can be thought of as being represented by some feature vector in an n dimensional space, n being the number of all features used to describe the objects to cluster. The algorithm then randomly chooses k points in that vector space, these point serve as the initial centers of the clusters. Afterwards all objects are each assigned to center they are closest to. Usually the distance measure is chosen by the user and determined by the learning task. After that, for each cluster a new center is computed by averaging the feature vectors of all objects assigned to it. The process of assigning objects and recomputing centers is repeated until the process converges. The algorithm can be proven to converge after a finite number of iterations. Several tweaks concerning distance measure, initial center choice and computation of new average centers have been explored, as well as the estimation of the number of clusters k. Yet the main principle always remains the same. In this project we will discuss about K-means clustering algorithm, implementation and its application to the problem of unsupervised learning

PPT s10-machine vision-s2Binus Online Learning

This document discusses unsupervised machine learning techniques for clustering unlabeled data. It covers k-means clustering, which partitions data into k groups based on minimizing distance between points and cluster centroids. It also discusses agglomerative hierarchical clustering, which successively merges clusters based on their distance. As an example, it shows hierarchical clustering of texture images from five classes to group similar textures.

CSA 3702 machine learning module 3Nandhini S

This document provides an overview of clustering and k-means clustering algorithms. It begins by defining clustering as the process of grouping similar objects together and dissimilar objects separately. K-means clustering is introduced as an algorithm that partitions data points into k clusters by minimizing total intra-cluster variance, iteratively updating cluster means. The k-means algorithm and an example are described in detail. Weaknesses and applications are discussed. Finally, vector quantization and principal component analysis are briefly introduced.

clustering using different methods in .pdfofficialnovice7

Unsupervised Learning.pptxGandhiMathy6

machine learning - Clustering in RSudhakar Chavan

The document provides an overview of clustering methods and algorithms. It defines clustering as the process of grouping objects that are similar to each other and dissimilar to objects in other groups. It discusses existing clustering methods like K-means, hierarchical clustering, and density-based clustering. For each method, it outlines the basic steps and provides an example application of K-means clustering to demonstrate how the algorithm works. The document also discusses evaluating clustering results and different measures used to assess cluster validity.

Types of clustering and different types of clustering algorithmsPrashanth Guntal

The document discusses different types of clustering algorithms: 1. Hard clustering assigns each data point to one cluster, while soft clustering allows points to belong to multiple clusters. 2. Hierarchical clustering builds clusters hierarchically in a top-down or bottom-up approach, while flat clustering does not have a hierarchy. 3. Model-based clustering models data using statistical distributions to find the best fitting model. It then provides examples of specific clustering algorithms like K-Means, Fuzzy K-Means, Streaming K-Means, Spectral clustering, and Dirichlet clustering.

26-Clustering MTech-2017.pptvikassingh569137

The document discusses the concept of clustering, which is an unsupervised machine learning technique used to group unlabeled data points that are similar. It describes how clustering algorithms aim to identify natural groups within data based on some measure of similarity, without any labels provided. The key types of clustering are partition-based (like k-means), hierarchical, density-based, and model-based. Applications include marketing, earth science, insurance, and more. Quality measures for clustering include intra-cluster similarity and inter-cluster dissimilarity.

K means Clustering - algorithm to cluster n objectsVoidVampire

K_MeansK_MeansK_MeansK_MeansK_MeansK_MeansK_Means.pptNishant83346

CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1

Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications

K-Means clustering uses an iterative procedure which is very much sensitive and dependent upon the initial centroids. The initial centroids in the k-means clustering are chosen randomly, and hence the clustering also changes with respect to the initial centroids. This paper tries to overcome this problem of random selection of centroids and hence change of clusters with a premeditated selection of initial centroids. We have used the iris, abalone and wine data sets to demonstrate that the proposed method of finding the initial centroids and using the centroids in k-means algorithm improves the clustering performance. The clustering also remains the same in every run as the initial centroids are not randomly selected but through premeditated method.

MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf1052LaxmanrajS

MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf1052LaxmanrajS

Ensemble_instance_unsupersied_learning 01_02_2024.pptxvigneshmatta2004

K-means ClusteringJidhu Mohan M

K_means ppt in machine learning conceptsUdayNani14

Unsupervised learning and clustering.pdfofficialnovice7

15857 cse422 unsupervised-learningAnil Yadav

Unsupervised Learning in Machine LearningPyingkodi Maran

Neural nw k meansEng. Dr. Dennis N. Mwighusa

PPT s10-machine vision-s2Binus Online Learning

CSA 3702 machine learning module 3Nandhini S

clustering using different methods in .pdfofficialnovice7

Unsupervised Learning.pptxGandhiMathy6

machine learning - Clustering in RSudhakar Chavan

Types of clustering and different types of clustering algorithmsPrashanth Guntal

26-Clustering MTech-2017.pptvikassingh569137

K means Clustering - algorithm to cluster n objectsVoidVampire

K_MeansK_MeansK_MeansK_MeansK_MeansK_MeansK_Means.pptNishant83346

CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1

Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications

Recently uploaded (20)

Electronic_Mail_Attacks-1-35.pdf by xploitniftliyevhuseyn

HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/ HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client. Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience. In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including - Accessing the console - Locating and interpreting log files - Accessing the data folder within the browser’s cache (using OPFS) - Understand the difference between single- and multi-user scenarios - Utilizing Client Clocking

Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.

Cyber Awareness overview for 2025 month of securityriccardosl1

HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-und-verwaltung-von-multiuser-umgebungen/ HCL Nomad Web wird als die nächste Generation des HCL Notes-Clients gefeiert und bietet zahlreiche Vorteile, wie die Beseitigung des Bedarfs an Paketierung, Verteilung und Installation. Nomad Web-Client-Updates werden “automatisch” im Hintergrund installiert, was den administrativen Aufwand im Vergleich zu traditionellen HCL Notes-Clients erheblich reduziert. Allerdings stellt die Fehlerbehebung in Nomad Web im Vergleich zum Notes-Client einzigartige Herausforderungen dar. Begleiten Sie Christoph und Marc, während sie demonstrieren, wie der Fehlerbehebungsprozess in HCL Nomad Web vereinfacht werden kann, um eine reibungslose und effiziente Benutzererfahrung zu gewährleisten. In diesem Webinar werden wir effektive Strategien zur Diagnose und Lösung häufiger Probleme in HCL Nomad Web untersuchen, einschließlich - Zugriff auf die Konsole - Auffinden und Interpretieren von Protokolldateien - Zugriff auf den Datenordner im Cache des Browsers (unter Verwendung von OPFS) - Verständnis der Unterschiede zwischen Einzel- und Mehrbenutzerszenarien - Nutzung der Client Clocking-Funktion

What is Model Context Protocol(MCP) - The new technology for communication bw...Vishnu Singh Chundawat

The MCP (Model Context Protocol) is a framework designed to manage context and interaction within complex systems. This SlideShare presentation will provide a detailed overview of the MCP Model, its applications, and how it plays a crucial role in improving communication and decision-making in distributed systems. We will explore the key concepts behind the protocol, including the importance of context, data management, and how this model enhances system adaptability and responsiveness. Ideal for software developers, system architects, and IT professionals, this presentation will offer valuable insights into how the MCP Model can streamline workflows, improve efficiency, and create more intuitive systems for a wide range of use cases.

Build Your Own Copilot & Agents For DevsBrian McKeiver

Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo

Procurement Insights Cost To Value Guide.pptxJon Hansen

Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul

Artificial intelligence is changing how businesses operate. Companies are using AI agents to automate tasks, reduce time spent on repetitive work, and focus more on high-value activities. Noah Loul, an AI strategist and entrepreneur, has helped dozens of companies streamline their operations using smart automation. He believes AI agents aren't just tools—they're workers that take on repeatable tasks so your human team can focus on what matters. If you want to reduce time waste and increase output, AI agents are the next move.

TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc

Most consumers believe they’re making informed decisions about their personal data—adjusting privacy settings, blocking trackers, and opting out where they can. However, our new research reveals that while awareness is high, taking meaningful action is still lacking. On the corporate side, many organizations report strong policies for managing third-party data and consumer consent yet fall short when it comes to consistency, accountability and transparency. This session will explore the research findings from TrustArc’s Privacy Pulse Survey, examining consumer attitudes toward personal data collection and practical suggestions for corporate practices around purchasing third-party data. Attendees will learn: - Consumer awareness around data brokers and what consumers are doing to limit data collection - How businesses assess third-party vendors and their consent management operations - Where business preparedness needs improvement - What these trends mean for the future of privacy governance and public trust This discussion is essential for privacy, risk, and compliance professionals who want to ground their strategies in current data and prepare for what’s next in the privacy landscape.

Greenhouse_Monitoring_Presentation.pptx.hpbmnnxrvb

TrsLabs - Fintech Product & Business ConsultingTrs Labs

Hybrid Growth Mandate Model with TrsLabs Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant. An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices. Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company Talk to us & Unlock the competitive advantage

Drupalcamp Finland – Measuring Front-end Energy ConsumptionExove

IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...organizerofv

2025-05-Q4-2024-Investor-Presentation.pptxSamuele Fogagnolo

AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix

Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025 https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/ Is AI just another technology, or does it fundamentally change the way we live and think? Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater. At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts. At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.

tecnologias de las primeras civilizaciones.pdffjgm517

ThousandEyes Partner Innovation Updates for May 2025ThousandEyes

Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company

Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.

Electronic_Mail_Attacks-1-35.pdf by xploitniftliyevhuseyn

HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda

Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.

Cyber Awareness overview for 2025 month of securityriccardosl1

HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda

What is Model Context Protocol(MCP) - The new technology for communication bw...Vishnu Singh Chundawat

Build Your Own Copilot & Agents For DevsBrian McKeiver

Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo

Procurement Insights Cost To Value Guide.pptxJon Hansen

Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul

TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc

Greenhouse_Monitoring_Presentation.pptx.hpbmnnxrvb

TrsLabs - Fintech Product & Business ConsultingTrs Labs

Drupalcamp Finland – Measuring Front-end Energy ConsumptionExove

IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...organizerofv

2025-05-Q4-2024-Investor-Presentation.pptxSamuele Fogagnolo

AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix

tecnologias de las primeras civilizaciones.pdffjgm517

ThousandEyes Partner Innovation Updates for May 2025ThousandEyes

Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company

Unsupervised learning clustering

1. CH -10 Unsupervised Learning and Clustering By: Arshad Farhad 20177716

2. Contents  Supervised vs Unsupervised learning  Introduction to clustering  K-means Clustering  Hierarchical clustering  Conclusion

3. Supervised Vs Unsupervised Learning  Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output. Y = f(X)  The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data  Unsupervised learning is where you only have input data (X) and no corresponding output variables  The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.  Unsupervised learning problems can be further grouped into clustering and association problems.  Clustering  Association

4. What is clustering? • The organization of unlabeled data into similarity groups called clusters. • A cluster is a collection of data items which are “similar” between them, and “dissimilar” to data items in other clusters.

5. What do we need for clustering?

6. Distance (dissimilarity) measures  Euclidean distance between points i and j is the length of the line segment connecting them  In Cartesian coordinates, if i = (i1, i2,…in) and q = (q1, q2,…qn) then the distance (d) from i to j, or from j to i is given by:

7. Cluster Evaluation • Intra-cluster cohesion (compactness): – Cohesion measures how near the data points in a cluster are to the cluster centroid. – Sum of squared error (SSE) is a commonly used measure. • Inter-cluster separation (isolation): – Separation means that different cluster centroids should be far away from one another.

8. How many clusters?

9. Clustering Techniques

10. Clustering Techniques

11. Clustering Techniques Divisive K-means

12. K-Means clustering • K-means (MacQueen, 1967) is a partitional clustering algorithm • The k-means algorithm partitions the given data into k clusters: – Each cluster has a cluster center, called centroid. – k is specified by the user

13. K-means algorithm • Given k, the k-means algorithm works as follows: 1. Choose k (random) data points (seeds) to be the initial centroids, cluster centers 2. Assign each data point to the closest centroid 3. Re-compute the centroids using the current cluster memberships 4. If a convergence criterion is not met, repeat steps 2 and 3

14. K-means clustering example: step 1 Choose k (random)

15. K-means clustering example – step 2 Assign each data point to the closest centroid

16. K-means clustering example – step 3

17. K-means clustering example

18. K-means clustering example

19. K-means clustering example

20. Why use K-means? • Strengths: – Simple: easy to understand and to implement – Efficient: Time complexity: O(tkn), – where n is the number of data points, – k is the number of clusters, and – t is the number of iterations. – Since both k and t are small. k-means is considered a linear algorithm. • K-means is the most popular clustering algorithm. • Note that: it terminates at a local optimum if SSE is used. The global optimum is hard to find due to complexity.

21. Weaknesses of K-means • The algorithm is only applicable if the mean is defined. – For categorical data, k-mode - the centroid is represented by most frequent values. • The user needs to specify k. • The algorithm is sensitive to outliers – Outliers are data points that are very far away from other data points. – Outliers could be errors in the data recording or so me special data points with very different values.

22. K-means summary • Despite weaknesses, k-means is still the most popular algorithm due to its simplicity and ef ficiency • No clear evidence that any other clustering algorithm performs better in general • Comparing different clustering algorithms is a difficult task. No one knows the correct clust ers!

23. `Thank You!’

Unsupervised learning clustering

Recommended

More Related Content

What's hot (20)

Similar to Unsupervised learning clustering (20)

Recently uploaded (20)

Unsupervised learning clustering