This document summarizes key points from a textbook on machine learning with Python. It introduces machine learning and supervised learning algorithms. Machine learning involves extracting knowledge from data and can be used for applications like recommendations, predictions, science, and more. Supervised learning algorithms learn from input-output pairs provided by a teacher to make predictions on new examples. Collecting good training data varies by task and may involve labor, expertise, or simply waiting for real world data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
170 views
1 Introduction
This document summarizes key points from a textbook on machine learning with Python. It introduces machine learning and supervised learning algorithms. Machine learning involves extracting knowledge from data and can be used for applications like recommendations, predictions, science, and more. Supervised learning algorithms learn from input-output pairs provided by a teacher to make predictions on new examples. Collecting good training data varies by task and may involve labor, expertise, or simply waiting for real world data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45
Introduction
APT 3025: APPLIED MACHINE LEARNING
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Course Overview • This course introduces machine learning from a practical, hands-on perspective. • We will use Python and the scikit-learn library for our practical exercises. • The course text is: • Introduction to Machine Learning with Python by Andreas C. Muller and Sarah Guido.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 What is Machine Learning? • Machine learning is about extracting knowledge from data • It is a field at the intersection of statistics, artificial intelligence, and computer science. • It is also known as predictive analytics or statistical learning. • The application of machine learning methods has in recent years become ubiquitous in everyday life.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Commercial Applications • Automatic recommendations of which vieos to watch on YouTube • What food to order through Jumia Food, Glovo or Uber Eats • Which books to buy on Amazon • What to pay for you Uber ride (dynamic pricing) • Recognizing people in photographs posted on Facebook • What stories and ads appear in your feed on social media
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Applications in Science • Outside of commercial applications, machine learning has had a tremendous influence on the way data-driven research is done today. • The tools introduced in this course have been applied to diverse scientific problems such as: • understanding stars • finding distant planets • discovering new particles • analyzing DNA sequences • providing personalized medical treatments
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Applications in the Humanities • Natural language processing (computational linguistics) • Analyzing ancient texts to predict authorship • Reconstruction of ancient ruins in digital form • Repair damaged ancient writings or photographs • Animating ancient portraits
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Why Machine Learning? • In the early days of "intelligent" applications, many systems used handcoded if-else rules to process data or adjust to user input. • Think how a spam filter could be written using this technique. • You could make up a blacklist of words that would result in an email being marked as spam. • This would be an example of using expert knowledge to design a rule- based "intelligent" application.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Why Machine Learning? • Using handcoded rules to make decisions has two major disavantages: • Manually crafting decision rules requires that humans have a good understanding of the problem. • The logic required to make a decision is specific to a single domain and task. Changing the task even slightly might require a rewrite of the whole system. • Designing rules requires a deep understanding of how a decision should be make by a human expert.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Why Machine Learning? • One example where this handcoded approach will fail is in detecting faces in images. • Today, every smartphone can detect a face in an image. However, face detection was an unsolved problem until as recently as 2001. • The main problem is that the way in which pixels (which make up an image in a computer) are "perceived" by the computer is very different from how humans perceive a face.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Why Machine Learning? • This difference in representation makes it basically impossible for a human to come up with a good set of rules to describe what constitutes a face in a digital image. • Using machine learning, however, simply presenting a program with a large collection of images of faces is enough for an algorithm to determine what characteristics are needed to identify a face.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Problems Machine Learning Can Solve • The most successful kinds of machine learning algorithms are those that automate decision-making processes by generalizing from known examples. • In this setting, known as supervised learning, the user provides the algorithm with pairs of inputs and desired outputs. • The algorithm finds a way to produce the desired output given a new, unseen input.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Spam Classification • Going back to our example of spam classification, using machine learning, the user provides the algorithm with a large number of emails (the input), together with information about whether any of these emails are spam (the desired output). • Given a new email, the algorithm will then produce a prediction as to whether the new email is spam.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Supervised Learning Algorithms • Machine learning algorithms that learn from input/output pairs are called supervised learning algorithms because a "teacher" provides supervision to the algorithms in the form of the desired outputs for each example that they learn from. • While creating a dataset of inputs of inputs and outputs is often a laborious manual process, supervised learning algorithms are well understood and their performance is easy to measure.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Examples of Supervised Learning Tasks • Identifying the postal code from handwritten digits on an envelope • Here the input is a scan of the handwriting, and the desired output is the actual digits in the postal code. • To create a dataset for building a machine learning model, you need to collect many envelopes. Then you can read the postal codes yourself and store the digits as your desired outcomes.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Examples of Supervised Learning Tasks • Determining whether a tumor is benign based on a medical image • Here the input is the image, and the output is whether the tumor is benign. • To create a dataset for building a model, you need a database of medical images. • You also need an expert opinion, so a doctor needs to look at all of those images and decide which tumors are benign and which are not.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Examples of Supervised Learning Tasks • Detecting fraudulent activity in credit card transactions • Here the input is a record of the credit card transactions, and the output is whether it is likely to be fraudulent or not. • Collecting a dataset means storing all transactions and recording if a user reports any transaction as fraudulent.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Data Collection Methods Vary • While reading envelopes is laborious, it is easy and cheap. • Obtaining medical imaging and diagnoses, on the other hand, requires not only expensive machinery but also rare and expensive expert knowledge, not to mention the ethical concerns and privacy issues. • In the example of detecting credit card fraud, data collection is much simpler. Your customers will provide you with the desired output, as they will report fraud. All you have to do to obtain the input/output pairs of fraudulent and non fradulent activity is wait.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Unsupervised Algorithms • Unsupervised algorithms are the other type of algorithm that we will cover in this course. • In unsupervised learning, only the input data is known, and no known output data is given to the algorithm. • While there are many successful applications of these methods, they are usually harder to understand and evaluate.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Examples of Unsupervised Learning Tasks • Identifying topics in a set of blog posts • If you have a large collection of text data, you migth want to summarize it and find prevalent themes in it. You might not know beforehad what these topics are, or how many topics there might be. Therefore, there are no known outputs.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Examples of Unsupervised Learning Tasks • Segmenting customers into groups with similar preferences • Given a set of customer records, you might wan to identify which customers are similar, and whether there are groups of customers with similar preferences. • For a shopping site, there might be "parents", "bookworms", or "gamers". • Because you don't know in advance what these groups might be, or even how many there are, you have no known outputs.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Examples of Unsupervised Learning Tasks • Detecting abnormal access patterns to a website • To identify abuse or bugs, it is often helpful to find access patterns that are different from the norm. • Each abnormal pattern might be very different. • Because you only observe traffic, and you don't know what constitutes normal and abnormal behavior, this is an unsupervised problem.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Data Representation • For both supervised and unsupervised learning tasks, it is important to have a representation of your input data that a computer can understand. • Often it is helpful to think of your data as a table. Each point that you want to reason about (each email, each customer, each transaction) is a row, and each property that describes that data point (say, the age of a customer or the amount or location of a transaction) is a column.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Data Representation • You might describe users by their age, their gender, when they created an account, and how often they have bought from your online shop. • You might describe the image of a tumor by the grayscale values of each pixel, or maybe by using the size, shape, and color of the tumor. • Each entity or row here is known as a sample (or data point) in machine learning, while the columns--the properties that describe these entities--are called features. • Building a good representation of your data, which is called feature extraction or feature engineering, is critical to the success of a machine learning system. Notes taken from Introduction to Machine Learning with Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Why Python? • Python has become the language of choice for many data science applications. • It combines the power of general-purpose programming languages with the ease of use of domain-specific scripting languages like MATLAB or R. • Python has libraries for data loading, visualization, statistics, natural language processing, image processing, and more. • This vast toolbox provides data scientists with a large array of general- and special-purpose functionality.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Why Python? • One of the main advantages of using Python is the ability to interact directly with the code, using a terminal or other tools like the Jupyter Notebook, which we’ll look at shortly. • Machine learning and data analysis are fundamentally iterative processes. • It is essential for these processes to have tools that allow quick iteration and easy interaction.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 What is scikit-learn? • scikit-learn is an open source project, meaning that it is free to use and distribute. • The scikit-learn project is constantly being developed and improved, and it has a very active user community. It contains a number of state-of-the-art machine learning algorithms, as well as comprehensive documentation about each algorithm. • scikit-learn is a very popular tool, and the most prominent Python library for machine learning. It is widely used in industry and academia, and a wealth of tutorials and code snippets are available online. Notes taken from Introduction to Machine Learning with Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Other Libraries • In addition to scikit-learn, we will need the following libraries: • numpy • scipy • matplotlib • jupyter • pandas • pillow • The easiest way to get all these libraries (and many more) is to install Anaconda (www.anaconda.com), a fully featured scientific computing platform. Notes taken from Introduction to Machine Learning with Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Jupyter Notebook • The Jupyter Notebook is an interactive environment for running code in the browser. • It is a great tool for exploratory data analysis and is widely used by data scientists. • While the Jupyter Notebook supports many programming languages, we only need the Python support. • The Jupyter Notebook makes it easy to incorporate code, text, and images
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Jupyter Lab • Jupyterlab is the next generation Jupyter Notebook platform. It adds many improvements and new features, including: • Ability to generate a table of contents for ease of navigating the notebook • A visual debugger • To lauch jupyterlab, at the command prompt type • jupyter lab
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Numpy • NumPy is one of the fundamental packages for scientific computing in Python. • It contains functionality for multidimensional arrays, high-level mathematical functions such as linear algebra operations and the Fourier transform, and pseudorandom number generators.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Numpy and scikit-learn • In scikit-learn, the NumPy array is the fundamental data structure. • scikit-learn takes in data in the form of NumPy arrays. • Any data you’re using will have to be converted to a NumPy array. • The core functionality of NumPy is the ndarray class, a multidimensional (n-dimensional) array. • All elements of the array must be of the same type.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Creating a Numpy Array
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 SciPy • SciPy is a collection of functions for scientific computing in Python. It provides, among other functionality, advanced linear algebra routines, mathematical function optimization, signal processing, special mathematical functions, and statistical distributions. • scikit-learn draws from SciPy’s collection of functions for implementing its algorithms.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Creating Sparse Matrices Using SciPy • The most important part of SciPy for us is scipy.sparse: this provides sparse matrices, which are another representation that is used for data in scikit-learn. • Sparse matrices are used whenever we want to store a 2D array that contains mostly zeros:
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 The Compressed Sparse Row (CSR) Format
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 The Coordinate (COO) Format • Usually it is not possible to create dense representations of sparse data (as they would not fit into memory), so we need to create sparse representations directly. • Here is a way to create the same sparse matrix as before, using the COO format:
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 matplotlib • matplotlib is the primary scientific plotting library in Python. • It provides functions for making publication-quality visualizations such as line charts, histograms, scatter plots, and so on. • Visualizing your data and different aspects of your analysis can give you important insights and provide guidance on needed adjustments
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Using matplotlib
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Output
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 pandas • pandas is a Python library for data wrangling and analysis. • It is built around a data structure called the DataFrame, which a is a table, similar to an Excel spreadsheet. • pandas provides methods to modify and operate on this table; in particular, it allows SQL-like queries and joins of tables.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 pandas • In contrast to NumPy, pandas allows each column to have a separate type (for example, integers, dates, floating-point numbers, and strings). • pandas can ingest from a great variety of file formats and databases, like SQL, Excel files, and comma-separated values (CSV) files.
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Using pandas
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Querying the Table • There are several possible ways to query the table, for example:
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 mglearn • mglearn is a library provided by the authors of the course text along with the example code that accompanies the book. • The code is available for download at https://github.com/amueller/introduction_to_ml_with_python. • Once you donwnload it you can run the examples and mglearn is already in the right place. • If you want to use mglearn for your own exercises, you will need to install it, e.g. by typing at the command prompt: • pip install mglearn
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016 Imports • Throughout the course we will make use of NumPy, matplotlib and pandas. All the code will assume the following imports:
import numpy as np import matplotlib.pyplot as plt import pandas as pd import mglearn from IPython.display import display
Notes taken from Introduction to Machine Learning with
Python by Andreas C. Müller & Sarah Guido, O'Reilley Media, 2016
Erick Myers - Python Machine Learning is the Complete Guide to Everything You Need to Know About Python Machine Learning_ Keras, Numpy, Scikit Learn, Tensorflow, With Useful Exercises and Examples. (2
Download ebooks file (Ebook) Hands-On Unsupervised Learning Using Python: How to Build Applied Machine Learning Solutions from Unlabeled Data by Ankur A. Patel ISBN 9781492035640, 1492035645 all chapters