0% found this document useful (0 votes)
34 views

3.8 Supervised Learning With Python A

Uploaded by

Spry Cylinder
Copyright
© © All Rights Reserved
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

3.8 Supervised Learning With Python A

Uploaded by

Spry Cylinder
Copyright
© © All Rights Reserved
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Supervised Learning with Python

Engr. Elisa G. Eleazar


School of Chemical, Biological, and Materials Engineering and Sciences

DS100: APPLIED DATA SCIENCE 1


Outline
Module 3.8: Learning Outcomes
SUPERVISED LEARNING IN PYTHON
Classification 1. Define Machine Learning and differentiate the types
2. Differentiate Classification from Regression
Regression 3. Write Python codes for Classification and Regression
problems

DS100: APPLIED DATA SCIENCE 2


Supervised Learning
• the science and art of giving computers the ability to learn to make decisions from data without being
explicitly programmed
MACHINE Supervised Learning Unsupervised Learning Reinforcement Learning
LEARNING uses labeled data uses unlabeled data machines or software agents
ex: learning to predict whether ex: clustering Wikipedia entries to interact with an environment
an email is spam or not categories

• the aim is to build a model that is able to predict the target variable given the predictor variables
• Independent Variable  features  predictor variables
SUPERVISED • Dependent Variable  target  response variable
LEARNING
Classfication Regression
the target variable consists of categories the target is a continuous variable

DS100: APPLIED DATA SCIENCE 3


Supervised Learning
• the aim is to build a model that is able to predict the target variable given the predictor variables
• Independent Variable  features  predictor variables
SUPERVISED • Dependent Variable  target  response variable
LEARNING
Classfication Regression
the target variable consists of categories the target is a continuous variable

Predictor variables Target variable

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

DS100: APPLIED DATA SCIENCE 4


Supervised Learning
Python Packages for Machine Learning

DS100: APPLIED DATA SCIENCE 5


Classification
DATA PRE-PROCESSING

DS100: APPLIED DATA SCIENCE 6


Classification
EXPLORATORY DATA ANALYSIS

DS100: APPLIED DATA SCIENCE 7


Classification
EXPLORATORY DATA ANALYSIS

DS100: APPLIED DATA SCIENCE 8


Classification
VISUAL EXPLORATORY DATA ANALYSIS

DS100: APPLIED DATA SCIENCE 9


Classification
• process of building a model that is able to predict the categorical target variable given the predictor
variables
CLASSIFICATION
• Training (labeled) data  Label

K-NEAREST • algorithm that predicts the label of a data by taking the majority vote of the ‘k’ closest labeled data
NEIGHBOR points

Classification: red if k=3; green if k=5


DS100: APPLIED DATA SCIENCE 10
Classification
MODEL BUILDING

.fit() trains the model to the data .predict() predicts the label of an unlabeled data point

DS100: APPLIED DATA SCIENCE 11


Classification
MODEL BUILDING

Requirements for the use of scikit-learn


• data must be a NumPy array or a pandas DataFrame
• features must be continuous variables
• there must be no missing values

DS100: APPLIED DATA SCIENCE 12


Classification
MODEL BUILDING

DS100: APPLIED DATA SCIENCE 13


Classification
MEASURING MODEL PERFORMANCE

• commonly-used metric in measuring model performance in classification problems


• number of correct predictions divided by the total number of data points
ACCURACY • normally done by splitting data into training set and test set
• fit/train the classifier on the training set
• make predictions on the test set
• compare predictions with the known labels

train_test_split() randomly splits the data

Arguments: Results (4 arrays):


• feature data • training data
• targets/labels • test data
• test size • training labels
• test labels

DS100: APPLIED DATA SCIENCE 14


Classification
MEASURING MODEL PERFORMANCE

DS100: APPLIED DATA SCIENCE 15


Classification
MEASURING MODEL PERFORMANCE

DS100: APPLIED DATA SCIENCE 16


Classification
MODEL COMPLEXITY

Model Complexity Curve

Smaller k  more complex model  can lead to overfitting


Larger k  smoother decision boundary  less complex model

DS100: APPLIED DATA SCIENCE 17


Regression
• the aim is to build a model that is able to predict the target variable given the predictor variables
• Independent Variable  features  predictor variables
SUPERVISED • Dependent Variable  target  response variable
LEARNING
Classfication Regression
the target variable consists of categories the target is a continuous variable

DS100: APPLIED DATA SCIENCE 18


Regression
DATA PRE-PROCESSING

CRIM: per capita crime rate


NX: nitric oxide concentration
RM: average number of rooms
per dwelling
MEDV: median value of owner
occupied homes in hundreds
of dollars (target variable)

DS100: APPLIED DATA SCIENCE 19


Regression
DATA PRE-PROCESSING

DS100: APPLIED DATA SCIENCE 20


Regression
VISUAL EXPLORATORY DATA ANALYSIS

DS100: APPLIED DATA SCIENCE 21


Regression
VISUAL EXPLORATORY DATA ANALYSIS

DS100: APPLIED DATA SCIENCE 22


Regression
MODEL BUILDING: LINEAR REGRESSION AND VALIDATION: R^2

DS100: APPLIED DATA SCIENCE 23


Outline
Module 3.8: Learning Outcomes
SUPERVISED LEARNING IN PYTHON
Classification 1. Define Machine Learning and differentiate the types
2. Differentiate Classification from Regression
Regression 3. Write Python codes for Classification and Regression
problems

DS100: APPLIED DATA SCIENCE 24


Supervised Learning with Python

Engr. Elisa G. Eleazar


School of Chemical, Biological, and Materials Engineering and Sciences

DS100: APPLIED DATA SCIENCE 25

You might also like