Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
By Artem Kovera
()
About this ebook
Machine Learning Made Easy to Understand with Clustering Algorithms.
Clustering algorithms are commonly used in a variety of applications. There are four major tasks for clustering:
- Making simplification for further data processing. In this case, the data is split into different groups which then are processed individually. In business, for instance, we can find different groups of customers sharing some similar features using cluster analysis. Then, we can use this information to develop different marketing strategies and apply them to all these separate groups of customers. Or, we can cluster a marketplace in a specific niche to find what kinds of products are selling better than other ones to make a decision what kind of products to produce. Usually, clustering is one of the first techniques that help explore a dataset we are going to work with to get some sense of the structure of the data.
- Compression of the data. We can implement cluster analysis on a giant data set. Then from each cluster, we can pick just several items. In this case, we usually lose much less information than in the case where we pick data points without preceding clustering. Clustering algorithms are being used to compress not only large data sets but also relatively small objects like images.
- Picking out unusual data points from the dataset. This procedure is done, for example, for the detection of fraudulent transactions with credit cards. In medicine, similar procedures can be used, for example, to identify new forms of illnesses.
- Building a hierarchy of objects. This is implemented for classification of biological organisms. It is also applied, for example, in search engines to group different text documents inside the search engines’ datasets.
In an introductory chapter, you will find:
- Different types of machine learning;
- Features in datasets;
- Dimensionality of datasets;
- The ‘curse’ of dimensionality;
- Dealing with underfitting and overfitting
In the following chapters, we will implement these concepts in practice, working with clustering algorithms.
This e-book provides detailed explanations of several widely-used clustering approaches with visual representations:
Hierarchical agglomerative clustering;
K-means;
DBSCAN;
Neural network-based clustering
You will learn different strengths and weaknesses of these algorithms as well as the practical strategies to overcome the weaknesses. In addition, we will briefly touch upon some other clustering methods.
This book mostly focuses on how the algorithms work behind the scenes. However, there is some code in this book. The examples of the algorithms are presented in Python 3.
Related to Machine Learning with Clustering
Related ebooks
Python Machine Learning For Beginners: Handbook For Machine Learning, Deep Learning And Neural Networks Using Python, Scikit-Learn And TensorFlow Rating: 1 out of 5 stars1/5R Data Science Essentials Rating: 2 out of 5 stars2/5Bayesian Analysis with Python Rating: 4 out of 5 stars4/5Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4 Rating: 0 out of 5 stars0 ratingsPython Data Science Essentials Rating: 0 out of 5 stars0 ratingsPython Machine Learning: A Step by Step Beginner’s Guide to Learn Machine Learning Using Python Rating: 0 out of 5 stars0 ratingsA Practical Approach for Machine Learning and Deep Learning Algorithms: Tools and Techniques Using MATLAB and Python Rating: 0 out of 5 stars0 ratingsMachine Learning with Spark and Python: Essential Techniques for Predictive Analytics Rating: 0 out of 5 stars0 ratingsMachine Learning for the Web Rating: 0 out of 5 stars0 ratingsPython Data Visualization Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsR Machine Learning Essentials Rating: 0 out of 5 stars0 ratingsBuilding a Recommendation System with R Rating: 0 out of 5 stars0 ratingsMachine Learning for Beginners: Learn to Build Machine Learning Systems Using Python (English Edition) Rating: 0 out of 5 stars0 ratingsTime Series with Python: How to Implement Time Series Analysis and Forecasting Using Python Rating: 3 out of 5 stars3/5Principles of Data Science Rating: 4 out of 5 stars4/5Combinatorial Algorithms: Enlarged Second Edition Rating: 4 out of 5 stars4/5Learning Data Mining with Python Rating: 0 out of 5 stars0 ratingsPython Data Analysis Rating: 4 out of 5 stars4/5NumPy Cookbook Rating: 5 out of 5 stars5/5Julia Cookbook Rating: 0 out of 5 stars0 ratingsNumPy: Beginner's Guide - Third Edition Rating: 4 out of 5 stars4/5Mastering Time Series Analysis and Forecasting with Python Rating: 0 out of 5 stars0 ratingsNumPy Beginner's Guide Rating: 5 out of 5 stars5/5
Mathematics For You
Basic Math & Pre-Algebra For Dummies Rating: 4 out of 5 stars4/5Basic Math & Pre-Algebra Workbook For Dummies with Online Practice Rating: 3 out of 5 stars3/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5The Little Book of Mathematical Principles, Theories & Things Rating: 3 out of 5 stars3/5Mental Math: Tricks To Become A Human Calculator Rating: 2 out of 5 stars2/5Calculus Made Easy Rating: 4 out of 5 stars4/5What If?: Serious Scientific Answers to Absurd Hypothetical Questions Rating: 5 out of 5 stars5/5Quantum Physics for Beginners Rating: 4 out of 5 stars4/5My Best Mathematical and Logic Puzzles Rating: 4 out of 5 stars4/5Mental Math Secrets - How To Be a Human Calculator Rating: 5 out of 5 stars5/5Fluent in 3 Months: How Anyone at Any Age Can Learn to Speak Any Language from Anywhere in the World Rating: 3 out of 5 stars3/5How to Solve It: A New Aspect of Mathematical Method Rating: 4 out of 5 stars4/5Algebra I Workbook For Dummies Rating: 3 out of 5 stars3/5Algebra II For Dummies Rating: 3 out of 5 stars3/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5Calculus Essentials For Dummies Rating: 5 out of 5 stars5/5Algebra - The Very Basics Rating: 5 out of 5 stars5/5Calculus For Dummies Rating: 4 out of 5 stars4/5Pre-Calculus For Dummies Rating: 5 out of 5 stars5/5The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need Rating: 5 out of 5 stars5/5Limitless Mind: Learn, Lead, and Live Without Barriers Rating: 4 out of 5 stars4/5Precalculus: A Self-Teaching Guide Rating: 4 out of 5 stars4/5Game Theory: A Simple Introduction Rating: 4 out of 5 stars4/5Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis Rating: 0 out of 5 stars0 ratingsMath Magic: How To Master Everyday Math Problems Rating: 3 out of 5 stars3/5Relativity: The special and the general theory Rating: 5 out of 5 stars5/5Must Know Math Grade 8 Rating: 0 out of 5 stars0 ratingsSneaky Math: A Graphic Primer with Projects Rating: 0 out of 5 stars0 ratingsQuick Arithmetic: A Self-Teaching Guide Rating: 2 out of 5 stars2/5The Moscow Puzzles: 359 Mathematical Recreations Rating: 5 out of 5 stars5/5
Reviews for Machine Learning with Clustering
0 ratings0 reviews
Book preview
Machine Learning with Clustering - Artem Kovera
Introduction to machine learning and clustering
Hierarchical clustering
1.The main idea and advantages/disadvantages of the algorithm
2.Different metrics for computing the distance between clusters
3.Hierarchical agglomerative clustering using the SciPy library
K-means algorithm
1.The major principles of the algorithm
2.Implementing k-means using the Scikit-learn library
3.The disadvantages of k-means and methods to overcome them
4.Introduction to the expectation–maximization (EM) algorithm
DBSCAN
1.The major principles of the algorithm
2.Implementing DBSCAN using the Scikit-learn library
3.Advantages and disadvantages of DBSCAN
Neural network-based clustering
1.General idea of clustering using artificial neural networks
2.Constructing a simple neural net for clustering using Numpy arrays
3.Introduction to self-organizing maps
Introduction to machine learning and clustering
The amount of data in digital format has been growing exponentially in the last decades, and this tendency will certainly continue. Currently, data is a very valuable resource. For example, most companies extensively collect various data, and some companies even sell data to other companies. Apparently, the success of any business in the near future will be largely determined by the efficiency of working with large amounts of data. But the data deluge is relevant not only to business; it is also extremely widespread in many other areas, such as science, education, medicine, state governance, and many others.
The discipline specifically designed to work with all sorts of data has been known for a long time. It is called statistics. However, traditional statistical approaches cannot be successfully applied to large amounts of data that we encounter today without using electronic computational devices. And here comes to the rescue our hero – machine learning. Machine learning can be viewed as a form of applied statistics for solving various optimization problems using computer algorithms. An algorithm is just a set of instructions.
Even more importantly, machine learning gives computer programs another remarkable ability – the ability to adapt to changes in an environment and learn from experience. This is the most important feature of machine learning.
In a traditional programming approach, we explicitly give a computer a set of instructions to execute. In machine learning, we also give instructions (a machine learning algorithm) to a computer, but the algorithm generates a model on the data given to the computer, and then this model can make predictions on new data. That is how the learning from previous experience comes into play. As the algorithm gets more data,