SlideShare a Scribd company logo
Data Science meets Software Engineering
Vienna Semantic Web Meetup
2016-03-01
Bernhard Haslhofer
Who am I?
• Scientist at AIT’s Digital Insight Lab
• Specialization
• Network Analytics
• Machine Learning
• Text Mining
• PhD in Computer Science
Mind the Gap - Data Science Meets Software Engineering
Plan for tonight
• Build an example service
• Approach problem from
• software engineering perspective
• data science perspective
• Look at gap & propose solution
Example Service
sports
politics
art
business
Text Classification
API
Approaching the Problem
Software
Engineering
Software
Engineering
Steps
• Identify use cases / features
• Choose framework
• Implement functionality
• Ensure quality: test functionality, scalability etc…
• Deploy service
Ensure quality
public classify(Document document) {
….
}
@Test(timeout=100)
public test_classify(…) {
d = new Document(…)
c = classifier.classify(d)
assertNotNull(c)
assert(c in [sports, politics, …])
}
Result / Quality Expectation
• A service
• implementing defined use case(s)
• passing all tests (unit, integration, functional)
• fulfilling scalability needs
Approaching the Problem
Data
Science
Data
Science
Steps
• Define problem / hypothesis
• Collect data
• Design approach / model
• Ensure quality: evaluate model, compare
• Prototype algorithm (in R, Matlab, Octave, etc.)
Ensure quality
• Split dataset into training / test / cross-validation
dataset
• Train model using training dataset
• Evaluate using test (and cross-validation) dataset
• Report and investigate metrics
• precision, recall, F1, …
What ???
Software Engineering Data Science
Overall Goal Build the service Build the service
Technical Goal
Implement software features,
deploy working service
Find the right model features, get
the model right
Quality
assurance
Unit, functional, integration tests
Evaluate model, report metrics, re-
design model
What ???
• The overall (business) goal can be the same
• Different technical approach
• language issues (what is a “feature” !?)
• lack of understanding differences and necessities
• Different quality assurance
• notion of “testing” is different
• different “success factors” (passing test vs. metrics)
Possible solution
Define Goal
Collect
Ground Truth
Implement
Model and
Functions
Test &
Evaluate
Analyze
Errors
Deploy
Service
Metrics Driven Software Engineering
Tool support
@Test(precision >= 0.8)
@Test(timeout=100)
public test_classify(…) {
d = new Document(…)
c = classifier.classify(d)
assertNotNull(c)
assert(c in [sports, politics, …])
}
Thank You!
bernhard.haslhofer@ait.ac.at

More Related Content

Similar to Mind the Gap - Data Science Meets Software Engineering (20)

PDF
Lies, Damned Lies and Software Analytics: Why Big Data Needs Rich Data
Margaret-Anne Storey
 
PDF
AI for Software Engineering
Miroslaw Staron
 
PDF
Se research update
Nacha Chondamrongkul
 
PDF
Dances with unicorns
EspritAgile
 
PPT
Manual testing ppt
Santosh Maranabasari
 
PPT
Software Engineering Fundamentals
Rahul Sudame
 
PPTX
Software engineering practices for the data science and machine learning life...
DataWorks Summit
 
PPTX
Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...
Schaffhausen Institute of Technology
 
PDF
From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...
Sergii Khomenko
 
PDF
Data Science meets Software Development
Alexis Seigneurin
 
PDF
Lean Analytics: How to get more out of your data science team
Digital Transformation EXPO Event Series
 
PDF
Software Engineering Research: Leading a Double-Agent Life.
Lionel Briand
 
PPTX
Software_Engineering_Presentation_Updated.pptx
Madhavkumar509812
 
PPTX
1-SUMSEM2024-25_CSI3014_TH_VL2024250700241_2025-05-13_Reference-Material-I.pptx
PreethamVooturi
 
PDF
Bridging the Gap: from Data Science to Production
Florian Wilhelm
 
PDF
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
ryanorban
 
PPTX
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Data Con LA
 
PPTX
Future se oct15
CS, NcState
 
PDF
Taming the reproducibility crisis
Lars Albertsson
 
PPTX
Software_Engineering_Presentation about intro
blltariq21
 
Lies, Damned Lies and Software Analytics: Why Big Data Needs Rich Data
Margaret-Anne Storey
 
AI for Software Engineering
Miroslaw Staron
 
Se research update
Nacha Chondamrongkul
 
Dances with unicorns
EspritAgile
 
Manual testing ppt
Santosh Maranabasari
 
Software Engineering Fundamentals
Rahul Sudame
 
Software engineering practices for the data science and machine learning life...
DataWorks Summit
 
Bertrand Meyer - Challenges in computing research at SIT Insights in Technolo...
Schaffhausen Institute of Technology
 
From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...
Sergii Khomenko
 
Data Science meets Software Development
Alexis Seigneurin
 
Lean Analytics: How to get more out of your data science team
Digital Transformation EXPO Event Series
 
Software Engineering Research: Leading a Double-Agent Life.
Lionel Briand
 
Software_Engineering_Presentation_Updated.pptx
Madhavkumar509812
 
1-SUMSEM2024-25_CSI3014_TH_VL2024250700241_2025-05-13_Reference-Material-I.pptx
PreethamVooturi
 
Bridging the Gap: from Data Science to Production
Florian Wilhelm
 
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
ryanorban
 
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Data Con LA
 
Future se oct15
CS, NcState
 
Taming the reproducibility crisis
Lars Albertsson
 
Software_Engineering_Presentation about intro
blltariq21
 

More from Bernhard Haslhofer (20)

PDF
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Bernhard Haslhofer
 
PDF
Token Systems, Payment Channels, and Corporate Currencies
Bernhard Haslhofer
 
PDF
Can a blockchain solve the trust problem?
Bernhard Haslhofer
 
PDF
Measurements in Cryptocurrency Networks
Bernhard Haslhofer
 
PDF
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Bernhard Haslhofer
 
PDF
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Bernhard Haslhofer
 
PDF
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
Bernhard Haslhofer
 
PDF
GraphSense - Real-time Insight into Virtual Currency Ecosystems
Bernhard Haslhofer
 
PDF
BITCOIN - De-anonymization and Money Laundering Detection Strategies
Bernhard Haslhofer
 
PDF
Bitcoin - Introduction, Technical Aspects and Ongoing Developments
Bernhard Haslhofer
 
PDF
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Bernhard Haslhofer
 
PDF
The value of open data and the OpenGLAM network
Bernhard Haslhofer
 
PDF
Things, not Strings
Bernhard Haslhofer
 
PDF
Offene Daten im Kulturbereich - Die pragmatische Perspektive
Bernhard Haslhofer
 
PDF
Open Data - Principles and Techniques
Bernhard Haslhofer
 
PDF
Semantic Tagging on Historical Maps
Bernhard Haslhofer
 
PDF
The Story behind Maphub
Bernhard Haslhofer
 
PDF
OpenGLAM Intro @ OKFN.AT Meetup Graz
Bernhard Haslhofer
 
PDF
Semantic Tagging for old maps...and other things on the Web
Bernhard Haslhofer
 
PDF
Linked (Open) Data
Bernhard Haslhofer
 
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Bernhard Haslhofer
 
Token Systems, Payment Channels, and Corporate Currencies
Bernhard Haslhofer
 
Can a blockchain solve the trust problem?
Bernhard Haslhofer
 
Measurements in Cryptocurrency Networks
Bernhard Haslhofer
 
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Bernhard Haslhofer
 
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Bernhard Haslhofer
 
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
Bernhard Haslhofer
 
GraphSense - Real-time Insight into Virtual Currency Ecosystems
Bernhard Haslhofer
 
BITCOIN - De-anonymization and Money Laundering Detection Strategies
Bernhard Haslhofer
 
Bitcoin - Introduction, Technical Aspects and Ongoing Developments
Bernhard Haslhofer
 
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Bernhard Haslhofer
 
The value of open data and the OpenGLAM network
Bernhard Haslhofer
 
Things, not Strings
Bernhard Haslhofer
 
Offene Daten im Kulturbereich - Die pragmatische Perspektive
Bernhard Haslhofer
 
Open Data - Principles and Techniques
Bernhard Haslhofer
 
Semantic Tagging on Historical Maps
Bernhard Haslhofer
 
The Story behind Maphub
Bernhard Haslhofer
 
OpenGLAM Intro @ OKFN.AT Meetup Graz
Bernhard Haslhofer
 
Semantic Tagging for old maps...and other things on the Web
Bernhard Haslhofer
 
Linked (Open) Data
Bernhard Haslhofer
 
Ad

Recently uploaded (20)

PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Next Generation AI: Anticipatory Intelligence, Forecasting Inflection Points ...
dleka294658677
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
[GDGoC FPTU] Spring 2025 Summary Slidess
minhtrietgect
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PPTX
Essential Content-centric Plugins for your Website
Laura Byrne
 
PDF
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
PDF
Software Development Company Keene Systems, Inc (1).pdf
Custom Software Development Company | Keene Systems, Inc.
 
PDF
Home Cleaning App Development Services.pdf
V3cube
 
PPTX
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Next Generation AI: Anticipatory Intelligence, Forecasting Inflection Points ...
dleka294658677
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
[GDGoC FPTU] Spring 2025 Summary Slidess
minhtrietgect
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
Essential Content-centric Plugins for your Website
Laura Byrne
 
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
Software Development Company Keene Systems, Inc (1).pdf
Custom Software Development Company | Keene Systems, Inc.
 
Home Cleaning App Development Services.pdf
V3cube
 
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Digital Circuits, important subject in CS
contactparinay1
 
Ad

Mind the Gap - Data Science Meets Software Engineering

  • 1. Data Science meets Software Engineering Vienna Semantic Web Meetup 2016-03-01 Bernhard Haslhofer
  • 2. Who am I? • Scientist at AIT’s Digital Insight Lab • Specialization • Network Analytics • Machine Learning • Text Mining • PhD in Computer Science
  • 4. Plan for tonight • Build an example service • Approach problem from • software engineering perspective • data science perspective • Look at gap & propose solution
  • 7. Steps • Identify use cases / features • Choose framework • Implement functionality • Ensure quality: test functionality, scalability etc… • Deploy service
  • 8. Ensure quality public classify(Document document) { …. } @Test(timeout=100) public test_classify(…) { d = new Document(…) c = classifier.classify(d) assertNotNull(c) assert(c in [sports, politics, …]) }
  • 9. Result / Quality Expectation • A service • implementing defined use case(s) • passing all tests (unit, integration, functional) • fulfilling scalability needs
  • 11. Steps • Define problem / hypothesis • Collect data • Design approach / model • Ensure quality: evaluate model, compare • Prototype algorithm (in R, Matlab, Octave, etc.)
  • 12. Ensure quality • Split dataset into training / test / cross-validation dataset • Train model using training dataset • Evaluate using test (and cross-validation) dataset • Report and investigate metrics • precision, recall, F1, …
  • 13. What ??? Software Engineering Data Science Overall Goal Build the service Build the service Technical Goal Implement software features, deploy working service Find the right model features, get the model right Quality assurance Unit, functional, integration tests Evaluate model, report metrics, re- design model
  • 14. What ??? • The overall (business) goal can be the same • Different technical approach • language issues (what is a “feature” !?) • lack of understanding differences and necessities • Different quality assurance • notion of “testing” is different • different “success factors” (passing test vs. metrics)
  • 15. Possible solution Define Goal Collect Ground Truth Implement Model and Functions Test & Evaluate Analyze Errors Deploy Service Metrics Driven Software Engineering
  • 16. Tool support @Test(precision >= 0.8) @Test(timeout=100) public test_classify(…) { d = new Document(…) c = classifier.classify(d) assertNotNull(c) assert(c in [sports, politics, …]) }