SlideShare a Scribd company logo
Building Data Products with Python
District Data Labs
Links to various resources
Introduction to Python
http://bit.ly/1gJ73Tt
Github Repository
http://bit.ly/1eLBzki
About the Instructor
Benjamin Bengfort
Data Science:
● MS Computer Science from North Dakota State
● PhD Candidate in CS at the University of Maryland
● Data Scientist at Cobrain Company in Bethesda, MD
● Board member of Data Community DC
● Lecturer at Georgetown University
Python Programmer:
● Python developer for 7 years
● Open source contributor
● My work on Github: https://github.com/bbengfort
About the Instructor
Benjamin Bengfort
I am available to collaborate and
answer questions for all of my
students.
Twitter: twitter.com/bbengfort
LinkedIn: linkedin.com/in/bbengfort
Github: github.com/bbengfort
Email: benjamin@bengfort.com
About the Teaching Assistant
Keshav Magge
● MS Computer Science from University of Houston
● Lead Data/Software Engineer at Cobrain Company in
Bethesda, MD
Python Programmer:
● Python developer for 7 years
● Plone/Zope for 2 years, Django for 5 years
● My work on Github: https://github.com/keshavmagge
About the Teaching Assistant
Keshav Magge
Reach out to me to talk about all
things python/data or just about life
Twitter: twitter.com/keshavmagge
LinkedIn: linkedin.com/pub/keshav-magge/12/a2a/324/
Github: github.com/keshavmagge
Email: keshav@keshavmagge.com
Building Data Products
Hilary Mason
A data product is a product that is
based on the combination of data
and algorithms.
”
“
Building Data Apps with Python
Mike Loukides
A data application acquires its value from the
data itself, and creates more data as a result.
It’s not just an application with data; it’s a
data product. Data science enables the
creation of data products.
”
“
Building Data Apps with Python
The Data Science Pipeline
Data Ingestion
Data Munging
and Wrangling
Computation and
Analyses
Modeling and
Application
Reporting and
Visualization
Data Ingestion
● There is a world of data out
there- how to get it? Web
crawlers, APIs, Sensors? Python
and other web scripting
languages are custom made for
this task.
● The real question is how can we
deal with such a giant volume
and velocity of data?
● Big Data and Data Science often
require ingestion specialists!
● Warehousing the data means
storing the data in as raw a form
as possible.
● Extract, transform, and load
operations move data to
operational storage locations.
● Filtering, aggregation,
normalization and
denormalization all ensure data is
in a form it can be computed on.
● Annotated training sets must be
created for ML tasks.
Data Wrangling
● Hypothesis driven computation
includes design and development
of predictive models.
● Many models have to be trained
or constrained into a
computational form like a Graph
database, and this is time
consuming.
● Other data products like indices,
relations, classifications, and
clusters may be computed.
Computation and Analyses
Modeling and Application
This is the part we’re most familiar with.
Supervised classification, Unsupervised
clustering - Bayes, Logistic Regression,
Decision Trees, and other models.
This is also where the money is.
● Often overlooked, this part is
crucial, even if we have data
products.
● Humans recognize patterns
better than machines. Human
feedback is crucial in Active
Learning and remodeling (error
detection).
● Mashups and collaborations
generate more data- and
therefore more value!
Reporting and Visualization
Don’t forget feedback!
(Active Learning for Data
Products)
What we’re going to build today
SCIENCE BOOKCLUB!!
● A book club that chooses what to
read via a recommender system.
● Uses GoodReads data to ingest
and return feedback on books.
● Statistical model is a non-
negative matrix factorization
● Reporting using Jinja (almost a
web app)
Workflow
1. Setting up a Python skeleton
2. Creating and Running Tests
3. Wading in with a configuration
4. Ingestion with urllib and requests
5. Creating a command line admin with argparse
6. Wrangling with BeautifulSoup and SQLAlchemy
7. Modeling with numpy
8. Reporting with Jinja2
Octavo Architecture (really clear DSP)
requests.py
Ingestion
Module
Raw Data
Storage Computational
Data Storage
Wrangling
Module
BeautifulSou
p
SQLAlchemy
Recommender
Module
Numpy
Reporting
Module
Jinja2Matplotlib
requests.py
Octavo Architecture (really clear DSP)
requests.py
Ingestion
Module
Raw Data
Storage
Computational
Data Storage
Wrangling
Module
BeautifulSoup
SQLAlchemy
Recommender
Module
Numpy
Reporting
Module
Jinja2
Matplotlib
How to tackle this course ...
How to tackle this course ...
Lean into it- absorb as much as
possible, don’t worry about falling
behind - it will be in your head!
Then afterwards - lets all digest it
together (keep in touch)
Ad

More Related Content

What's hot (20)

Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
QuantUniversity
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
Mostafa
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data Analysis
Manuel Martín
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
Knoldus Inc.
 
Ferruzza g automl deck
Ferruzza g   automl deckFerruzza g   automl deck
Ferruzza g automl deck
Eric Dill
 
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
Athens Big Data
 
Introduction to Azure Machine Learning
Introduction to Azure Machine LearningIntroduction to Azure Machine Learning
Introduction to Azure Machine Learning
Paul Prae
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Databricks
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
Ning Jiang
 
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Sri Ambati
 
Data science
Data scienceData science
Data science
Purna Chander
 
Machine learning life cycle
Machine learning life cycleMachine learning life cycle
Machine learning life cycle
Ramjee Ganti
 
Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014
Manuel Martín
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
André Karpištšenko
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
Eng Teong Cheah
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
Spark Summit
 
Ds for finance day 3
Ds for finance day 3Ds for finance day 3
Ds for finance day 3
QuantUniversity
 
Meetup sthlm - introduction to Machine Learning with demo cases
Meetup sthlm - introduction to Machine Learning with demo casesMeetup sthlm - introduction to Machine Learning with demo cases
Meetup sthlm - introduction to Machine Learning with demo cases
Zenodia Charpy
 
Building a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZBuilding a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to Z
Charles Vestur
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
Pratap Dangeti
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
QuantUniversity
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
Mostafa
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data Analysis
Manuel Martín
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
Knoldus Inc.
 
Ferruzza g automl deck
Ferruzza g   automl deckFerruzza g   automl deck
Ferruzza g automl deck
Eric Dill
 
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
Athens Big Data
 
Introduction to Azure Machine Learning
Introduction to Azure Machine LearningIntroduction to Azure Machine Learning
Introduction to Azure Machine Learning
Paul Prae
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Databricks
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
Ning Jiang
 
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Sri Ambati
 
Machine learning life cycle
Machine learning life cycleMachine learning life cycle
Machine learning life cycle
Ramjee Ganti
 
Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014
Manuel Martín
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
Eng Teong Cheah
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
Spark Summit
 
Meetup sthlm - introduction to Machine Learning with demo cases
Meetup sthlm - introduction to Machine Learning with demo casesMeetup sthlm - introduction to Machine Learning with demo cases
Meetup sthlm - introduction to Machine Learning with demo cases
Zenodia Charpy
 
Building a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZBuilding a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to Z
Charles Vestur
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
Pratap Dangeti
 

Similar to Building Data Apps with Python (20)

Machine Learning
Machine LearningMachine Learning
Machine Learning
Ramiro Aduviri Velasco
 
Data Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina Stoy
Data Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina StoyData Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina Stoy
Data Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina Stoy
LazarinaStoyanova
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
zekeLabs Technologies
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Trieu Nguyen
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
Swiss Big Data User Group
 
Evaluation of big data analysis
Evaluation of big data analysisEvaluation of big data analysis
Evaluation of big data analysis
Καρολίνα Κάτι
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
zekeLabs Technologies
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
Awantik Das
 
How to be data savvy manager
How to be data savvy managerHow to be data savvy manager
How to be data savvy manager
TOSHI STATS Co.,Ltd.
 
DevOps Days Rockies MLOps
DevOps Days Rockies MLOpsDevOps Days Rockies MLOps
DevOps Days Rockies MLOps
Matthew Reynolds
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform
Michael Ghen
 
Architecting for analytics
Architecting for analyticsArchitecting for analytics
Architecting for analytics
Rob Winters
 
Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data Scientists
Mitch Sanders
 
How to become a data scientist
How to become a data scientist How to become a data scientist
How to become a data scientist
Manjunath Sindagi
 
Real world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.com
Mathieu Dumoulin
 
Resume
ResumeResume
Resume
Dheeraj Mummareddy
 
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Bhakthi Liyanage
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
Trieu Nguyen
 
How to build data accessibility for everyone
How to build data accessibility for everyoneHow to build data accessibility for everyone
How to build data accessibility for everyone
Karen Hsieh
 
Data Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina Stoy
Data Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina StoyData Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina Stoy
Data Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina Stoy
LazarinaStoyanova
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
zekeLabs Technologies
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Trieu Nguyen
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
Swiss Big Data User Group
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
zekeLabs Technologies
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
Awantik Das
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform
Michael Ghen
 
Architecting for analytics
Architecting for analyticsArchitecting for analytics
Architecting for analytics
Rob Winters
 
Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data Scientists
Mitch Sanders
 
How to become a data scientist
How to become a data scientist How to become a data scientist
How to become a data scientist
Manjunath Sindagi
 
Real world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.com
Mathieu Dumoulin
 
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Bhakthi Liyanage
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
Trieu Nguyen
 
How to build data accessibility for everyone
How to build data accessibility for everyoneHow to build data accessibility for everyone
How to build data accessibility for everyone
Karen Hsieh
 
Ad

More from Benjamin Bengfort (20)

Privacy and Security in the Age of Generative AI - C4AI.pdf
Privacy and Security in the Age of Generative AI - C4AI.pdfPrivacy and Security in the Age of Generative AI - C4AI.pdf
Privacy and Security in the Age of Generative AI - C4AI.pdf
Benjamin Bengfort
 
Implementing Function Calling LLMs without Fear.pdf
Implementing Function Calling LLMs without Fear.pdfImplementing Function Calling LLMs without Fear.pdf
Implementing Function Calling LLMs without Fear.pdf
Benjamin Bengfort
 
Privacy and Security in the Age of Generative AI
Privacy and Security in the Age of Generative AIPrivacy and Security in the Age of Generative AI
Privacy and Security in the Age of Generative AI
Benjamin Bengfort
 
Digitocracy without Borders: the unifying and destabilizing effects of softwa...
Digitocracy without Borders: the unifying and destabilizing effects of softwa...Digitocracy without Borders: the unifying and destabilizing effects of softwa...
Digitocracy without Borders: the unifying and destabilizing effects of softwa...
Benjamin Bengfort
 
Getting Started with TRISA
Getting Started with TRISAGetting Started with TRISA
Getting Started with TRISA
Benjamin Bengfort
 
Visual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningVisual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learning
Benjamin Bengfort
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Benjamin Bengfort
 
Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)
Benjamin Bengfort
 
Visualizing the Model Selection Process
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection Process
Benjamin Bengfort
 
Data Product Architectures
Data Product ArchitecturesData Product Architectures
Data Product Architectures
Benjamin Bengfort
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
Benjamin Bengfort
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
Benjamin Bengfort
 
Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)
Benjamin Bengfort
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
Benjamin Bengfort
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
Benjamin Bengfort
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
Benjamin Bengfort
 
Annotation with Redfox
Annotation with RedfoxAnnotation with Redfox
Annotation with Redfox
Benjamin Bengfort
 
Rasta processing of speech
Rasta processing of speechRasta processing of speech
Rasta processing of speech
Benjamin Bengfort
 
Privacy and Security in the Age of Generative AI - C4AI.pdf
Privacy and Security in the Age of Generative AI - C4AI.pdfPrivacy and Security in the Age of Generative AI - C4AI.pdf
Privacy and Security in the Age of Generative AI - C4AI.pdf
Benjamin Bengfort
 
Implementing Function Calling LLMs without Fear.pdf
Implementing Function Calling LLMs without Fear.pdfImplementing Function Calling LLMs without Fear.pdf
Implementing Function Calling LLMs without Fear.pdf
Benjamin Bengfort
 
Privacy and Security in the Age of Generative AI
Privacy and Security in the Age of Generative AIPrivacy and Security in the Age of Generative AI
Privacy and Security in the Age of Generative AI
Benjamin Bengfort
 
Digitocracy without Borders: the unifying and destabilizing effects of softwa...
Digitocracy without Borders: the unifying and destabilizing effects of softwa...Digitocracy without Borders: the unifying and destabilizing effects of softwa...
Digitocracy without Borders: the unifying and destabilizing effects of softwa...
Benjamin Bengfort
 
Visual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningVisual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learning
Benjamin Bengfort
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Benjamin Bengfort
 
Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)
Benjamin Bengfort
 
Visualizing the Model Selection Process
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection Process
Benjamin Bengfort
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
Benjamin Bengfort
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
Benjamin Bengfort
 
Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)
Benjamin Bengfort
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
Benjamin Bengfort
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
Benjamin Bengfort
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
Benjamin Bengfort
 
Ad

Recently uploaded (20)

Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 

Building Data Apps with Python

  • 1. Building Data Products with Python District Data Labs
  • 2. Links to various resources Introduction to Python http://bit.ly/1gJ73Tt Github Repository http://bit.ly/1eLBzki
  • 3. About the Instructor Benjamin Bengfort Data Science: ● MS Computer Science from North Dakota State ● PhD Candidate in CS at the University of Maryland ● Data Scientist at Cobrain Company in Bethesda, MD ● Board member of Data Community DC ● Lecturer at Georgetown University Python Programmer: ● Python developer for 7 years ● Open source contributor ● My work on Github: https://github.com/bbengfort
  • 4. About the Instructor Benjamin Bengfort I am available to collaborate and answer questions for all of my students. Twitter: twitter.com/bbengfort LinkedIn: linkedin.com/in/bbengfort Github: github.com/bbengfort Email: [email protected]
  • 5. About the Teaching Assistant Keshav Magge ● MS Computer Science from University of Houston ● Lead Data/Software Engineer at Cobrain Company in Bethesda, MD Python Programmer: ● Python developer for 7 years ● Plone/Zope for 2 years, Django for 5 years ● My work on Github: https://github.com/keshavmagge
  • 6. About the Teaching Assistant Keshav Magge Reach out to me to talk about all things python/data or just about life Twitter: twitter.com/keshavmagge LinkedIn: linkedin.com/pub/keshav-magge/12/a2a/324/ Github: github.com/keshavmagge Email: [email protected]
  • 8. Hilary Mason A data product is a product that is based on the combination of data and algorithms. ” “
  • 10. Mike Loukides A data application acquires its value from the data itself, and creates more data as a result. It’s not just an application with data; it’s a data product. Data science enables the creation of data products. ” “
  • 12. The Data Science Pipeline
  • 13. Data Ingestion Data Munging and Wrangling Computation and Analyses Modeling and Application Reporting and Visualization
  • 14. Data Ingestion ● There is a world of data out there- how to get it? Web crawlers, APIs, Sensors? Python and other web scripting languages are custom made for this task. ● The real question is how can we deal with such a giant volume and velocity of data? ● Big Data and Data Science often require ingestion specialists!
  • 15. ● Warehousing the data means storing the data in as raw a form as possible. ● Extract, transform, and load operations move data to operational storage locations. ● Filtering, aggregation, normalization and denormalization all ensure data is in a form it can be computed on. ● Annotated training sets must be created for ML tasks. Data Wrangling
  • 16. ● Hypothesis driven computation includes design and development of predictive models. ● Many models have to be trained or constrained into a computational form like a Graph database, and this is time consuming. ● Other data products like indices, relations, classifications, and clusters may be computed. Computation and Analyses
  • 17. Modeling and Application This is the part we’re most familiar with. Supervised classification, Unsupervised clustering - Bayes, Logistic Regression, Decision Trees, and other models. This is also where the money is.
  • 18. ● Often overlooked, this part is crucial, even if we have data products. ● Humans recognize patterns better than machines. Human feedback is crucial in Active Learning and remodeling (error detection). ● Mashups and collaborations generate more data- and therefore more value! Reporting and Visualization
  • 19. Don’t forget feedback! (Active Learning for Data Products)
  • 20. What we’re going to build today SCIENCE BOOKCLUB!! ● A book club that chooses what to read via a recommender system. ● Uses GoodReads data to ingest and return feedback on books. ● Statistical model is a non- negative matrix factorization ● Reporting using Jinja (almost a web app)
  • 21. Workflow 1. Setting up a Python skeleton 2. Creating and Running Tests 3. Wading in with a configuration 4. Ingestion with urllib and requests 5. Creating a command line admin with argparse 6. Wrangling with BeautifulSoup and SQLAlchemy 7. Modeling with numpy 8. Reporting with Jinja2
  • 22. Octavo Architecture (really clear DSP) requests.py Ingestion Module Raw Data Storage Computational Data Storage Wrangling Module BeautifulSou p SQLAlchemy Recommender Module Numpy Reporting Module Jinja2Matplotlib
  • 23. requests.py Octavo Architecture (really clear DSP) requests.py Ingestion Module Raw Data Storage Computational Data Storage Wrangling Module BeautifulSoup SQLAlchemy Recommender Module Numpy Reporting Module Jinja2 Matplotlib
  • 24. How to tackle this course ...
  • 25. How to tackle this course ... Lean into it- absorb as much as possible, don’t worry about falling behind - it will be in your head! Then afterwards - lets all digest it together (keep in touch)