SlideShare a Scribd company logo
Creating Your First
Predictive Model In Python
Coming November 2015!
Information Everywhere
Makes Panda sad and confused
Each New Thing You Learn
Leads to another new thing to learn, and another

So Many Things
1. Which predictive modeling technique to use
2. How to get the data into a format for modeling
3. How to ensure the “right” data is being used
4. How to feed the data into the model
5. How to validate the model results
6. How to save the model to use in production
7. How to implement the model in production and apply it to new observations
8. How to save the new predictions
9. How to ensure, over time, that the model is correctly predicting outcomes
10.How to later update the model with new training data
Choose Your Model
http://scikit-learn.org/stable/tutorial/machine_learning_map/
Format The Data
‱ Pandas FTW!
‱ Use the map() function to convert any text to a
number
‱ Fill in any missing values
‱ Split the data into features (the data) and targets
(the outcome to predict) using .values on the
DataFrame
Get The Right Data
‱ This is called “Feature selection”
‱ Univariate feature selection
‱ SelectKBest removes all but the k highest scoring features
‱ SelectPercentile removes all but a user-speciïŹed highest scoring
percentage of features using common univariate statistical tests for
each feature: false positive rate
‱ SelectFpr, false discovery rate SelectFdr, or family wise error SelectFwe.
‱ GenericUnivariateSelect allows to perform univariate feature selection
with a conïŹgurable strategy.
http://scikit-learn.org/stable/modules/feature_selection.html
Data => Model
1. Build the model
http://scikit-learn.org/stable/modules/cross_validation.html
from sklearn import linear_model
logClassiïŹer = linear_model.LogisticRegression(C=1,
random_state=111)
2. Train the model
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(the_data,
the_targets,
cv=12,
test_size=0.20,
random_state=111)
logClassiïŹer.ïŹt(X_train, y_train)
Validation!
1. Accuracy Score
http://scikit-learn.org/stable/modules/cross_validation.html
from sklearn import metrics
metrics.accuracy_score(y_test, predicted)
2. Confusion Matrix
metrics.confusion_matrix(y_test, predicted)
Save the Model
Pickle it!
https://docs.python.org/3/library/pickle.html
import pickle
model_ïŹle = "/lr_classiïŹer_09.29.15.dat"
pickle.dump(logClassiïŹer, open(model_ïŹle, "wb"))
Did it work?
logClassiïŹer2 = pickle.load(open(model, "rb"))
print(logClassiïŹer2)
Implement in Production
‱ Clean the data the same way you did for the model
‱ Feature mappings
‱ Column re-ordering
‱ Create a function that returns the prediction
‱ Deserialize the model from the ïŹle you created
‱ Feed the model the data in the same order
‱ Call .predict() and get your answer
Save Your Predictions
As you would any other piece of data
Ensure Accuracy Over Time
Employ your minion army, or get more creative
Update the Model
Train it again, but with validated predictions
Coming November 2015!
Robert Dempsey
robertwdempsey
rdempsey
rdempsey
robertwdempsey.com

More Related Content

What's hot (19)

PDF
Automatic image moderation in classifieds
Jaroslaw Szymczak
 
PDF
machine learning
Mounisha A
 
PPTX
Net campus2015 antimomusone
DotNetCampus
 
PDF
BigML Education - Anomaly Detection
BigML, Inc
 
PDF
QCon Rio - Machine Learning for Everyone
Dhiana Deva
 
PDF
Santander customer satisfaction
Aprameya Bhol
 
PDF
BigML Education - Logistic Regression
BigML, Inc
 
DOCX
Prediction of quality for different type of winebased on different feature se...
Venkat Projects
 
PPTX
RapidMiner: Nested Subprocesses
DataminingTools Inc
 
PPTX
Zoo information system presentation
MiltonGZalduondo1
 
PDF
Winning Kaggle 101: Introduction to Stacking
Ted Xiao
 
PPTX
Automated Machine Learning (Auto ML)
Hayim Makabee
 
PDF
Introduction to XGBoost
Joonyoung Yi
 
PDF
EuroSciPy 2019: Visual diagnostics at scale
Rebecca Bilbro
 
PDF
How to understand and implement regression analysis
ClaireWhittaker5
 
PPTX
20 Simple CART
Vishal Dutt
 
PDF
Incheon National University - EATED SRA
ssuser58d6dc2
 
PDF
Tutorial 4 how to edit the unsafe control actions of stpa project in xstampp
Asim Abdulkhaleq, Dr.rer.nat
 
PPTX
RapidMiner: Advanced Processes And Operators
DataminingTools Inc
 
Automatic image moderation in classifieds
Jaroslaw Szymczak
 
machine learning
Mounisha A
 
Net campus2015 antimomusone
DotNetCampus
 
BigML Education - Anomaly Detection
BigML, Inc
 
QCon Rio - Machine Learning for Everyone
Dhiana Deva
 
Santander customer satisfaction
Aprameya Bhol
 
BigML Education - Logistic Regression
BigML, Inc
 
Prediction of quality for different type of winebased on different feature se...
Venkat Projects
 
RapidMiner: Nested Subprocesses
DataminingTools Inc
 
Zoo information system presentation
MiltonGZalduondo1
 
Winning Kaggle 101: Introduction to Stacking
Ted Xiao
 
Automated Machine Learning (Auto ML)
Hayim Makabee
 
Introduction to XGBoost
Joonyoung Yi
 
EuroSciPy 2019: Visual diagnostics at scale
Rebecca Bilbro
 
How to understand and implement regression analysis
ClaireWhittaker5
 
20 Simple CART
Vishal Dutt
 
Incheon National University - EATED SRA
ssuser58d6dc2
 
Tutorial 4 how to edit the unsafe control actions of stpa project in xstampp
Asim Abdulkhaleq, Dr.rer.nat
 
RapidMiner: Advanced Processes And Operators
DataminingTools Inc
 

Similar to Creating Your First Predictive Model In Python (20)

PDF
AIRLINE FARE PRICE PREDICTION
IRJET Journal
 
PPT
feature-selection.ppt on machine learning
MayuraD1
 
PDF
Start machine learning in 5 simple steps
Renjith M P
 
PPTX
Predicting the NBA MVP
Thinkful
 
PPTX
wk5ppt1_Titanic
AliciaWei1
 
ODP
How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...
Matt Harrison
 
PPTX
Introduction to Machine Learning
Andrew Ferlitsch
 
PDF
Predict oscars (5:11)
Thinkful
 
PDF
Machine Learning with Python- Machine Learning Algorithms.pdf
KalighatOkira
 
PPTX
Machine Learning - Simple Linear Regression
Siddharth Shrivastava
 
PPTX
Build_Machine_Learning_System for Machine Learning Course
ssuserfece35
 
PDF
ML MODULE 2.pdf
Shiwani Gupta
 
PPTX
Data Engineer's Lunch #67: Machine Learning - Feature Selection
Anant Corporation
 
PPTX
[DevDay2019] Python Machine Learning with Jupyter Notebook - By Nguyen Huu Th...
DevDay Da Nang
 
PPTX
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Anant Corporation
 
PDF
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
IJCSES Journal
 
PDF
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
ijcseit
 
PDF
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
IJCSES Journal
 
PDF
Predictive modeling
Prashant Mudgal
 
PDF
Tf itpbapm
Shannon Gallagher
 
AIRLINE FARE PRICE PREDICTION
IRJET Journal
 
feature-selection.ppt on machine learning
MayuraD1
 
Start machine learning in 5 simple steps
Renjith M P
 
Predicting the NBA MVP
Thinkful
 
wk5ppt1_Titanic
AliciaWei1
 
How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...
Matt Harrison
 
Introduction to Machine Learning
Andrew Ferlitsch
 
Predict oscars (5:11)
Thinkful
 
Machine Learning with Python- Machine Learning Algorithms.pdf
KalighatOkira
 
Machine Learning - Simple Linear Regression
Siddharth Shrivastava
 
Build_Machine_Learning_System for Machine Learning Course
ssuserfece35
 
ML MODULE 2.pdf
Shiwani Gupta
 
Data Engineer's Lunch #67: Machine Learning - Feature Selection
Anant Corporation
 
[DevDay2019] Python Machine Learning with Jupyter Notebook - By Nguyen Huu Th...
DevDay Da Nang
 
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Anant Corporation
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
IJCSES Journal
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
ijcseit
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
IJCSES Journal
 
Predictive modeling
Prashant Mudgal
 
Tf itpbapm
Shannon Gallagher
 
Ad

More from Robert Dempsey (20)

PDF
Building A Production-Level Machine Learning Pipeline
Robert Dempsey
 
PDF
Using PySpark to Process Boat Loads of Data
Robert Dempsey
 
PDF
Analyzing Semi-Structured Data At Volume In The Cloud
Robert Dempsey
 
PDF
Growth Hacking 101
Robert Dempsey
 
PPTX
Web Scraping With Python
Robert Dempsey
 
PPTX
DC Python Intro Slides - Rob's Version
Robert Dempsey
 
PDF
Content Marketing Strategy for 2013
Robert Dempsey
 
PDF
Creating Lead-Generating Social Media Campaigns
Robert Dempsey
 
PDF
Goal Writing Workshop
Robert Dempsey
 
PDF
Google AdWords Introduction
Robert Dempsey
 
PDF
20 Tips For Freelance Success
Robert Dempsey
 
PDF
How To Turn Your Business Into A Media Powerhouse
Robert Dempsey
 
PDF
Agile Teams as Innovation Teams
Robert Dempsey
 
PDF
Introduction to kanban
Robert Dempsey
 
PDF
Get The **** Up And Market
Robert Dempsey
 
PDF
Introduction To Inbound Marketing
Robert Dempsey
 
PDF
Writing Agile Requirements
Robert Dempsey
 
PDF
Twitter For Business
Robert Dempsey
 
PDF
Introduction To Scrum For Managers
Robert Dempsey
 
PDF
Introduction to Agile for Managers
Robert Dempsey
 
Building A Production-Level Machine Learning Pipeline
Robert Dempsey
 
Using PySpark to Process Boat Loads of Data
Robert Dempsey
 
Analyzing Semi-Structured Data At Volume In The Cloud
Robert Dempsey
 
Growth Hacking 101
Robert Dempsey
 
Web Scraping With Python
Robert Dempsey
 
DC Python Intro Slides - Rob's Version
Robert Dempsey
 
Content Marketing Strategy for 2013
Robert Dempsey
 
Creating Lead-Generating Social Media Campaigns
Robert Dempsey
 
Goal Writing Workshop
Robert Dempsey
 
Google AdWords Introduction
Robert Dempsey
 
20 Tips For Freelance Success
Robert Dempsey
 
How To Turn Your Business Into A Media Powerhouse
Robert Dempsey
 
Agile Teams as Innovation Teams
Robert Dempsey
 
Introduction to kanban
Robert Dempsey
 
Get The **** Up And Market
Robert Dempsey
 
Introduction To Inbound Marketing
Robert Dempsey
 
Writing Agile Requirements
Robert Dempsey
 
Twitter For Business
Robert Dempsey
 
Introduction To Scrum For Managers
Robert Dempsey
 
Introduction to Agile for Managers
Robert Dempsey
 
Ad

Recently uploaded (20)

PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
PDF
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Human Resources Information System (HRIS)
Amity University, Patna
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 

Creating Your First Predictive Model In Python

  • 4. Each New Thing You Learn Leads to another new thing to learn, and another

  • 5. So Many Things 1. Which predictive modeling technique to use 2. How to get the data into a format for modeling 3. How to ensure the “right” data is being used 4. How to feed the data into the model 5. How to validate the model results 6. How to save the model to use in production 7. How to implement the model in production and apply it to new observations 8. How to save the new predictions 9. How to ensure, over time, that the model is correctly predicting outcomes 10.How to later update the model with new training data
  • 7. Format The Data ‱ Pandas FTW! ‱ Use the map() function to convert any text to a number ‱ Fill in any missing values ‱ Split the data into features (the data) and targets (the outcome to predict) using .values on the DataFrame
  • 8. Get The Right Data ‱ This is called “Feature selection” ‱ Univariate feature selection ‱ SelectKBest removes all but the k highest scoring features ‱ SelectPercentile removes all but a user-speciïŹed highest scoring percentage of features using common univariate statistical tests for each feature: false positive rate ‱ SelectFpr, false discovery rate SelectFdr, or family wise error SelectFwe. ‱ GenericUnivariateSelect allows to perform univariate feature selection with a conïŹgurable strategy. http://scikit-learn.org/stable/modules/feature_selection.html
  • 9. Data => Model 1. Build the model http://scikit-learn.org/stable/modules/cross_validation.html from sklearn import linear_model logClassiïŹer = linear_model.LogisticRegression(C=1, random_state=111) 2. Train the model from sklearn import cross_validation X_train, X_test, y_train, y_test = cross_validation.train_test_split(the_data, the_targets, cv=12, test_size=0.20, random_state=111) logClassiïŹer.ïŹt(X_train, y_train)
  • 10. Validation! 1. Accuracy Score http://scikit-learn.org/stable/modules/cross_validation.html from sklearn import metrics metrics.accuracy_score(y_test, predicted) 2. Confusion Matrix metrics.confusion_matrix(y_test, predicted)
  • 11. Save the Model Pickle it! https://docs.python.org/3/library/pickle.html import pickle model_ïŹle = "/lr_classiïŹer_09.29.15.dat" pickle.dump(logClassiïŹer, open(model_ïŹle, "wb")) Did it work? logClassiïŹer2 = pickle.load(open(model, "rb")) print(logClassiïŹer2)
  • 12. Implement in Production ‱ Clean the data the same way you did for the model ‱ Feature mappings ‱ Column re-ordering ‱ Create a function that returns the prediction ‱ Deserialize the model from the ïŹle you created ‱ Feed the model the data in the same order ‱ Call .predict() and get your answer
  • 13. Save Your Predictions As you would any other piece of data
  • 14. Ensure Accuracy Over Time Employ your minion army, or get more creative
  • 15. Update the Model Train it again, but with validated predictions