SlideShare a Scribd company logo
NONDETERMINISTIC SOFTWARE
FOR THE REST OF US
An exercise in frustration by
Tomer Gabel @ GeeCON 2018, Krakow
Case Study #1
• Delver, circa 2007
• We built a search engine
• What’s expected?
– Performant (<1 sec)
– Reliable
– Useful
Let me take you back…
• We applied good old
fashioned engineering
• It was kind of great!
– Reliability
– Fast iteration
– Built-in regression suite
Spec
Tests
Code
Deployment
Let me take you back…
• So yeah, we coded it
• And it worked… sort of
– It was highly available
– It responded within SLA
– … but with crap results
• Green tests aren’t
everything!
Furthermore
• Not all software can be
acceptance-tested
– Qualitative/subjective
(e.g. search, social feed)
Furthermore
• Not all software can be
acceptance-tested
– Qualitative/subjective
(e.g. search, social feed)
– Huge input space
(e.g. machine vision)
Image: Cristian David
Furthermore
• Not all software can be
acceptance-tested
– Qualitative/subjective
(e.g. search, social feed)
– Huge input space
(e.g. machine vision)
– Resource-constrained
(e.g. Lyft or Uber)
Image: rideshareapps.com
“CORRECT” AND “GOOD”
ARE SEPARATE DIMENSIONS
Takeaway #1
Getting Started
• For any product of any
scale, always ask:
– What does success look like?
Image: Hole in the Wall, FremantleMedia North America
Getting Started
• For any product of any
scale, always ask:
– What does success look like?
– How can I measure success?
Image: Hole in the Wall, FremantleMedia North America
Getting Started
• For any product of any
scale, always ask:
– What does success look like?
– How can I measure success?
• You’re an engineer!
– Intuition can’t replace data
– QA can’t save your butt
Image: Hole in the Wall, FremantleMedia North America
What should you measure?
• (Un-) fortunately, you
have customers
• Analyze their behavior
– What do they want?
– What influences your
quality of service?
• For a search engine…
Query
Skim
Decide
Follow
RefinementPaging
USERS ARE PART OF YOUR SYSTEM
Takeaway #2
What should you measure?
• (Un-) fortunately, you
have customers
• Analyze their behavior
– What do they want?
– What influences your
quality of service?
• For a search engine…
Query
Skim
Decide
Follow
RefinementPaging
Signal
Signal
Signal
What should you measure?
Paging
– “Not relevant enough”
Query
Skim
Decide
Follow
RefinementPaging
What should you measure?
Paging
– “Not relevant enough”
Refinement
– “Not what I meant”
Query
Skim
Decide
Follow
RefinementPaging
What should you measure?
Paging
– “Not relevant enough”
Refinement
– “Not what I meant”
Clickthrough
– “Bingo!”
Query
Skim
Decide
Follow
RefinementPaging
What should you measure?
Paging
– “Not relevant enough”
Refinement
– “Not what I meant”
Clickthrough
– “Bingo!”
Bonus: Abandonment
– ”You suck”
Query
Skim
Decide
Follow
RefinementPaging
It should.
Is this starting to look familiar?
Well now!
• We’ve been having this
conversation for years
• Mostly with…
– Product managers
– Business analysis
– Data engineers
• Guess what?
Product
Changes
R&D
DeploymentMeasurement
Analysis
Well now!
• We’ve been having this
conversation for years
• Mostly with…
– Product managers
– Business analysis
– Data engineers
• Guess what?
Product
Changes
R&D
DeploymentMeasurement
Analysis
Informed
by BI
What can we learn from BI?
Ø Be mindful of your users
Ø Talk to your analysts!• Analysis
• Experimentation
• Iteration
What can we learn from BI?
Ø Invest in A/B tests
Ø Prove your
improvements!
• Analysis
• Experimentation
• Iteration
What can we learn from BI?
• Analysis
• Experimentation
• Iteration
Ø Establish your baseline
Ø Invest in metric collection
and dashboards
SYSTEMS ARE NOT SNAPSHOTS.
MEASURE CONTINUOUSLY
Takeaway #3
Hold on to your hats
… this isn’t about search engines
Case Study #2
• newBrandAnalytics,
circa 2011
• A social listening platform
– Finds user-generated
content (e.g. reviews)
– Provides operational
analytics
Social Listening Platform
• A three-stage pipeline
Acquisition
•3rd party ingestion
•BizDev
•Web scraping
Analysis
•Manual tagging/training
•NLP/ML models
Analytics
•Dashboards
•Ad-hoc query/drilldown
•Reporting
Social Listening Platform
• A three-stage pipeline
• My team focused on data
acquisition
• Let’s discuss web scraping
– Structured data extraction
– At scale
– Reliability is paramount
Acquisition
•3rd party ingestion
•BizDev
•Web scraping
Analysis
•Manual tagging/training
•NLP/ML models
Analytics
•Dashboards
•Ad-hoc query/drilldown
•Reporting
Large-Scale Scraping
• A two-pronged problem
• Target sites…
– Can change at the drop of a hat
– Actively resist scraping!
• Both are external constraints
• Neither can be unit-tested
Optimizing for User Happiness
• Users consume reviews
• What do they want?
– Completeness
(no missed reviews)
– Correctness
(no duplicates/garbage)
– Timeliness
(near real-time)
TripAdvisor
Twitter
Yelp
…
DataAcquisition
Reports
Notifications
Data Lake
Putting It Together
• How do we measure
completeness?
• Manually
– Costly, time consuming
– Sampled (by definition)
Image: Keypunching at Texas A&M, Cushing Memorial Library and Archives, Texas A&M (CC-BY 2.0)
Putting It Together
• How do we measure
completeness?
• Manually
– Costly, time consuming
– Sampled (by definition)
• Automatically
– Re-scrape a known subset
– Produce similarity score
Putting It Together
• How do we measure
completeness?
• Manually
– Costly, time consuming
– Sampled (by definition)
• Automatically
– Re-scrape a known subset
– Produce similarity score
• Same with correctness
Putting It Together
• Targets do not want to
be scraped
• Major sites employ:
– IP throttling
– Traffic fingerprinting
• 3rd party proxies are
expensive
Image from the movie “UHF", Metro-Goldwyn-Mayer
Putting It Together
• What of timeliness?
• It’s an optimization
problem
– Polling frequency
determines latency
– But polling has a cost
– “Good” is a tradeoff
Putting It Together
• So then, timeliness…?
• First, build a cost model
– Review acquisition cost
– Break it down by source
• Next, put together SLAs
– Reflect cost in pricing!
– Adjust scheduler by SLA
Recap
1. ”Correct” and “Good” are
separate dimensions
2. Users are part of your system
3. Systems are not snapshots.
Measure continuously
Image: Confused Monkey, Michael Keen (CC BY-NC-ND 2.0)
QUESTIONS?
Thank you for listening
tomer@tomergabel.com
@tomerg
http://www.tomergabel.com
This work is licensed under a Creative
Commons Attribution-ShareAlike 4.0
International License.

More Related Content

What's hot (16)

Moving Fast at Scale
Moving Fast at ScaleMoving Fast at Scale
Moving Fast at Scale
Randy Shoup
 
Flowcon2013 - Virtuous Cycles of Velocity: What I Learned About Going Fast at...
Flowcon2013 - Virtuous Cycles of Velocity: What I Learned About Going Fast at...Flowcon2013 - Virtuous Cycles of Velocity: What I Learned About Going Fast at...
Flowcon2013 - Virtuous Cycles of Velocity: What I Learned About Going Fast at...
Randy Shoup
 
Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020
Randy Shoup
 
Scaling Your Architecture for the Long Term
Scaling Your Architecture for the Long TermScaling Your Architecture for the Long Term
Scaling Your Architecture for the Long Term
Randy Shoup
 
Agile for Me- CodeStock 2009
Agile for Me- CodeStock 2009Agile for Me- CodeStock 2009
Agile for Me- CodeStock 2009
Adrian Carr
 
How to Test Anything
How to Test AnythingHow to Test Anything
How to Test Anything
James Thomas
 
To Deliver, Discover We Must - A value-driven approach to agile planning
To Deliver, Discover We Must - A value-driven approach to agile planningTo Deliver, Discover We Must - A value-driven approach to agile planning
To Deliver, Discover We Must - A value-driven approach to agile planning
Raj Indugula
 
Estimating time-tracking
Estimating time-trackingEstimating time-tracking
Estimating time-tracking
Leigh White
 
Large Scale Data Management
Large Scale Data ManagementLarge Scale Data Management
Large Scale Data Management
Thomas Miller
 
[HCMC STC Jan 2015] Choosing The Best Of The Plan-Driven And Agile Developmen...
[HCMC STC Jan 2015] Choosing The Best Of The Plan-Driven And Agile Developmen...[HCMC STC Jan 2015] Choosing The Best Of The Plan-Driven And Agile Developmen...
[HCMC STC Jan 2015] Choosing The Best Of The Plan-Driven And Agile Developmen...
Ho Chi Minh City Software Testing Club
 
Software Tests and Robots
Software Tests and RobotsSoftware Tests and Robots
Software Tests and Robots
Larry Cynkin
 
Test Strategy-The real silver bullet in testing by Matthew Eakin
Test Strategy-The real silver bullet in testing by Matthew EakinTest Strategy-The real silver bullet in testing by Matthew Eakin
Test Strategy-The real silver bullet in testing by Matthew Eakin
QA or the Highway
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
Randy Shoup
 
Cyd Harrell - State of The Vendor Circus
Cyd Harrell - State of The Vendor CircusCyd Harrell - State of The Vendor Circus
Cyd Harrell - State of The Vendor Circus
bolt peters
 
Rework
ReworkRework
Rework
Andy Hitchman
 
Detecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile SearchDetecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile Search
Julia Kiseleva
 
Moving Fast at Scale
Moving Fast at ScaleMoving Fast at Scale
Moving Fast at Scale
Randy Shoup
 
Flowcon2013 - Virtuous Cycles of Velocity: What I Learned About Going Fast at...
Flowcon2013 - Virtuous Cycles of Velocity: What I Learned About Going Fast at...Flowcon2013 - Virtuous Cycles of Velocity: What I Learned About Going Fast at...
Flowcon2013 - Virtuous Cycles of Velocity: What I Learned About Going Fast at...
Randy Shoup
 
Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020
Randy Shoup
 
Scaling Your Architecture for the Long Term
Scaling Your Architecture for the Long TermScaling Your Architecture for the Long Term
Scaling Your Architecture for the Long Term
Randy Shoup
 
Agile for Me- CodeStock 2009
Agile for Me- CodeStock 2009Agile for Me- CodeStock 2009
Agile for Me- CodeStock 2009
Adrian Carr
 
How to Test Anything
How to Test AnythingHow to Test Anything
How to Test Anything
James Thomas
 
To Deliver, Discover We Must - A value-driven approach to agile planning
To Deliver, Discover We Must - A value-driven approach to agile planningTo Deliver, Discover We Must - A value-driven approach to agile planning
To Deliver, Discover We Must - A value-driven approach to agile planning
Raj Indugula
 
Estimating time-tracking
Estimating time-trackingEstimating time-tracking
Estimating time-tracking
Leigh White
 
Large Scale Data Management
Large Scale Data ManagementLarge Scale Data Management
Large Scale Data Management
Thomas Miller
 
[HCMC STC Jan 2015] Choosing The Best Of The Plan-Driven And Agile Developmen...
[HCMC STC Jan 2015] Choosing The Best Of The Plan-Driven And Agile Developmen...[HCMC STC Jan 2015] Choosing The Best Of The Plan-Driven And Agile Developmen...
[HCMC STC Jan 2015] Choosing The Best Of The Plan-Driven And Agile Developmen...
Ho Chi Minh City Software Testing Club
 
Software Tests and Robots
Software Tests and RobotsSoftware Tests and Robots
Software Tests and Robots
Larry Cynkin
 
Test Strategy-The real silver bullet in testing by Matthew Eakin
Test Strategy-The real silver bullet in testing by Matthew EakinTest Strategy-The real silver bullet in testing by Matthew Eakin
Test Strategy-The real silver bullet in testing by Matthew Eakin
QA or the Highway
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
Randy Shoup
 
Cyd Harrell - State of The Vendor Circus
Cyd Harrell - State of The Vendor CircusCyd Harrell - State of The Vendor Circus
Cyd Harrell - State of The Vendor Circus
bolt peters
 
Detecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile SearchDetecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile Search
Julia Kiseleva
 

Similar to Nondeterministic Software for the Rest of Us (20)

The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive
 
PA2557_SQM_Lecture7 - Defect Prevention.pdf
PA2557_SQM_Lecture7 - Defect Prevention.pdfPA2557_SQM_Lecture7 - Defect Prevention.pdf
PA2557_SQM_Lecture7 - Defect Prevention.pdf
hulk smash
 
Amp Up Your Testing by Harnessing Test Data
Amp Up Your Testing by Harnessing Test DataAmp Up Your Testing by Harnessing Test Data
Amp Up Your Testing by Harnessing Test Data
TechWell
 
Agile Base Camp - Agile metrics
Agile Base Camp - Agile metricsAgile Base Camp - Agile metrics
Agile Base Camp - Agile metrics
Serge Kovaleff
 
Dlf 2012
Dlf 2012Dlf 2012
Dlf 2012
sherriberger
 
Moving Fast At Scale
Moving Fast At ScaleMoving Fast At Scale
Moving Fast At Scale
Randy Shoup
 
Agile Metrics...That Matter
Agile Metrics...That MatterAgile Metrics...That Matter
Agile Metrics...That Matter
Erik Weber
 
Understanding User Needs and Satisfying Them
Understanding User Needs and Satisfying ThemUnderstanding User Needs and Satisfying Them
Understanding User Needs and Satisfying Them
Aggregage
 
Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...
Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...
Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...
Michael Powers
 
Test Driven Search Relevancy w/ Quepid
Test Driven Search Relevancy w/ QuepidTest Driven Search Relevancy w/ Quepid
Test Driven Search Relevancy w/ Quepid
Doug Turnbull
 
Agile & UX What changes and other C.R.A.P.
Agile & UX What changes and other C.R.A.P.Agile & UX What changes and other C.R.A.P.
Agile & UX What changes and other C.R.A.P.
LeanDog
 
Human computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveHuman computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspective
oralonso
 
Test Improvement - Any place, anytime, any where
Test Improvement - Any place, anytime, any whereTest Improvement - Any place, anytime, any where
Test Improvement - Any place, anytime, any where
Ruud Teunissen
 
Machine learning for product managers. Presented at Boston ProductCamp (June...
Machine learning for product  managers. Presented at Boston ProductCamp (June...Machine learning for product  managers. Presented at Boston ProductCamp (June...
Machine learning for product managers. Presented at Boston ProductCamp (June...
Mukund Seshadri
 
116 Machine learning for Product Managers
116   Machine learning for Product Managers116   Machine learning for Product Managers
116 Machine learning for Product Managers
ProductCamp Boston
 
Why Using Data And Talking To Users Is Critical
Why Using Data And Talking To Users Is CriticalWhy Using Data And Talking To Users Is Critical
Why Using Data And Talking To Users Is Critical
Product School
 
Ch 3
Ch   3Ch   3
Ch 3
Saumil Shah
 
Measuring the Quality of Online Service - Jinyoung kim
Measuring the Quality of Online Service - Jinyoung kimMeasuring the Quality of Online Service - Jinyoung kim
Measuring the Quality of Online Service - Jinyoung kim
Jin Young Kim
 
Estimation
EstimationEstimation
Estimation
Dev9Com
 
How do we prioritize our product backlog in Hygger.io?
How do we prioritize our product backlog in Hygger.io?How do we prioritize our product backlog in Hygger.io?
How do we prioritize our product backlog in Hygger.io?
Alexander Sergeev
 
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive
 
PA2557_SQM_Lecture7 - Defect Prevention.pdf
PA2557_SQM_Lecture7 - Defect Prevention.pdfPA2557_SQM_Lecture7 - Defect Prevention.pdf
PA2557_SQM_Lecture7 - Defect Prevention.pdf
hulk smash
 
Amp Up Your Testing by Harnessing Test Data
Amp Up Your Testing by Harnessing Test DataAmp Up Your Testing by Harnessing Test Data
Amp Up Your Testing by Harnessing Test Data
TechWell
 
Agile Base Camp - Agile metrics
Agile Base Camp - Agile metricsAgile Base Camp - Agile metrics
Agile Base Camp - Agile metrics
Serge Kovaleff
 
Moving Fast At Scale
Moving Fast At ScaleMoving Fast At Scale
Moving Fast At Scale
Randy Shoup
 
Agile Metrics...That Matter
Agile Metrics...That MatterAgile Metrics...That Matter
Agile Metrics...That Matter
Erik Weber
 
Understanding User Needs and Satisfying Them
Understanding User Needs and Satisfying ThemUnderstanding User Needs and Satisfying Them
Understanding User Needs and Satisfying Them
Aggregage
 
Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...
Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...
Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...
Michael Powers
 
Test Driven Search Relevancy w/ Quepid
Test Driven Search Relevancy w/ QuepidTest Driven Search Relevancy w/ Quepid
Test Driven Search Relevancy w/ Quepid
Doug Turnbull
 
Agile & UX What changes and other C.R.A.P.
Agile & UX What changes and other C.R.A.P.Agile & UX What changes and other C.R.A.P.
Agile & UX What changes and other C.R.A.P.
LeanDog
 
Human computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveHuman computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspective
oralonso
 
Test Improvement - Any place, anytime, any where
Test Improvement - Any place, anytime, any whereTest Improvement - Any place, anytime, any where
Test Improvement - Any place, anytime, any where
Ruud Teunissen
 
Machine learning for product managers. Presented at Boston ProductCamp (June...
Machine learning for product  managers. Presented at Boston ProductCamp (June...Machine learning for product  managers. Presented at Boston ProductCamp (June...
Machine learning for product managers. Presented at Boston ProductCamp (June...
Mukund Seshadri
 
116 Machine learning for Product Managers
116   Machine learning for Product Managers116   Machine learning for Product Managers
116 Machine learning for Product Managers
ProductCamp Boston
 
Why Using Data And Talking To Users Is Critical
Why Using Data And Talking To Users Is CriticalWhy Using Data And Talking To Users Is Critical
Why Using Data And Talking To Users Is Critical
Product School
 
Measuring the Quality of Online Service - Jinyoung kim
Measuring the Quality of Online Service - Jinyoung kimMeasuring the Quality of Online Service - Jinyoung kim
Measuring the Quality of Online Service - Jinyoung kim
Jin Young Kim
 
Estimation
EstimationEstimation
Estimation
Dev9Com
 
How do we prioritize our product backlog in Hygger.io?
How do we prioritize our product backlog in Hygger.io?How do we prioritize our product backlog in Hygger.io?
How do we prioritize our product backlog in Hygger.io?
Alexander Sergeev
 

More from Tomer Gabel (20)

How shit works: Time
How shit works: TimeHow shit works: Time
How shit works: Time
Tomer Gabel
 
Slaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency InjectionSlaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency Injection
Tomer Gabel
 
An Abridged Guide to Event Sourcing
An Abridged Guide to Event SourcingAn Abridged Guide to Event Sourcing
An Abridged Guide to Event Sourcing
Tomer Gabel
 
How shit works: the CPU
How shit works: the CPUHow shit works: the CPU
How shit works: the CPU
Tomer Gabel
 
How Shit Works: Storage
How Shit Works: StorageHow Shit Works: Storage
How Shit Works: Storage
Tomer Gabel
 
Java 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala StoryJava 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala Story
Tomer Gabel
 
The Wix Microservice Stack
The Wix Microservice StackThe Wix Microservice Stack
The Wix Microservice Stack
Tomer Gabel
 
Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)
Tomer Gabel
 
Scala Refactoring for Fun and Profit
Scala Refactoring for Fun and ProfitScala Refactoring for Fun and Profit
Scala Refactoring for Fun and Profit
Tomer Gabel
 
Onboarding at Scale
Onboarding at ScaleOnboarding at Scale
Onboarding at Scale
Tomer Gabel
 
Scala in the Wild
Scala in the WildScala in the Wild
Scala in the Wild
Tomer Gabel
 
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Tomer Gabel
 
Put Your Thinking CAP On
Put Your Thinking CAP OnPut Your Thinking CAP On
Put Your Thinking CAP On
Tomer Gabel
 
Leveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better ValidationLeveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better Validation
Tomer Gabel
 
A Field Guide to DSL Design in Scala
A Field Guide to DSL Design in ScalaA Field Guide to DSL Design in Scala
A Field Guide to DSL Design in Scala
Tomer Gabel
 
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Tomer Gabel
 
Scala Back to Basics: Type Classes
Scala Back to Basics: Type ClassesScala Back to Basics: Type Classes
Scala Back to Basics: Type Classes
Tomer Gabel
 
5 Bullets to Scala Adoption
5 Bullets to Scala Adoption5 Bullets to Scala Adoption
5 Bullets to Scala Adoption
Tomer Gabel
 
Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)
Tomer Gabel
 
Ponies and Unicorns With Scala
Ponies and Unicorns With ScalaPonies and Unicorns With Scala
Ponies and Unicorns With Scala
Tomer Gabel
 
How shit works: Time
How shit works: TimeHow shit works: Time
How shit works: Time
Tomer Gabel
 
Slaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency InjectionSlaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency Injection
Tomer Gabel
 
An Abridged Guide to Event Sourcing
An Abridged Guide to Event SourcingAn Abridged Guide to Event Sourcing
An Abridged Guide to Event Sourcing
Tomer Gabel
 
How shit works: the CPU
How shit works: the CPUHow shit works: the CPU
How shit works: the CPU
Tomer Gabel
 
How Shit Works: Storage
How Shit Works: StorageHow Shit Works: Storage
How Shit Works: Storage
Tomer Gabel
 
Java 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala StoryJava 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala Story
Tomer Gabel
 
The Wix Microservice Stack
The Wix Microservice StackThe Wix Microservice Stack
The Wix Microservice Stack
Tomer Gabel
 
Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)
Tomer Gabel
 
Scala Refactoring for Fun and Profit
Scala Refactoring for Fun and ProfitScala Refactoring for Fun and Profit
Scala Refactoring for Fun and Profit
Tomer Gabel
 
Onboarding at Scale
Onboarding at ScaleOnboarding at Scale
Onboarding at Scale
Tomer Gabel
 
Scala in the Wild
Scala in the WildScala in the Wild
Scala in the Wild
Tomer Gabel
 
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Tomer Gabel
 
Put Your Thinking CAP On
Put Your Thinking CAP OnPut Your Thinking CAP On
Put Your Thinking CAP On
Tomer Gabel
 
Leveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better ValidationLeveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better Validation
Tomer Gabel
 
A Field Guide to DSL Design in Scala
A Field Guide to DSL Design in ScalaA Field Guide to DSL Design in Scala
A Field Guide to DSL Design in Scala
Tomer Gabel
 
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Tomer Gabel
 
Scala Back to Basics: Type Classes
Scala Back to Basics: Type ClassesScala Back to Basics: Type Classes
Scala Back to Basics: Type Classes
Tomer Gabel
 
5 Bullets to Scala Adoption
5 Bullets to Scala Adoption5 Bullets to Scala Adoption
5 Bullets to Scala Adoption
Tomer Gabel
 
Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)
Tomer Gabel
 
Ponies and Unicorns With Scala
Ponies and Unicorns With ScalaPonies and Unicorns With Scala
Ponies and Unicorns With Scala
Tomer Gabel
 

Recently uploaded (20)

Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 

Nondeterministic Software for the Rest of Us

  • 1. NONDETERMINISTIC SOFTWARE FOR THE REST OF US An exercise in frustration by Tomer Gabel @ GeeCON 2018, Krakow
  • 2. Case Study #1 • Delver, circa 2007 • We built a search engine • What’s expected? – Performant (<1 sec) – Reliable – Useful
  • 3. Let me take you back… • We applied good old fashioned engineering • It was kind of great! – Reliability – Fast iteration – Built-in regression suite Spec Tests Code Deployment
  • 4. Let me take you back… • So yeah, we coded it • And it worked… sort of – It was highly available – It responded within SLA – … but with crap results • Green tests aren’t everything!
  • 5. Furthermore • Not all software can be acceptance-tested – Qualitative/subjective (e.g. search, social feed)
  • 6. Furthermore • Not all software can be acceptance-tested – Qualitative/subjective (e.g. search, social feed) – Huge input space (e.g. machine vision) Image: Cristian David
  • 7. Furthermore • Not all software can be acceptance-tested – Qualitative/subjective (e.g. search, social feed) – Huge input space (e.g. machine vision) – Resource-constrained (e.g. Lyft or Uber) Image: rideshareapps.com
  • 8. “CORRECT” AND “GOOD” ARE SEPARATE DIMENSIONS Takeaway #1
  • 9. Getting Started • For any product of any scale, always ask: – What does success look like? Image: Hole in the Wall, FremantleMedia North America
  • 10. Getting Started • For any product of any scale, always ask: – What does success look like? – How can I measure success? Image: Hole in the Wall, FremantleMedia North America
  • 11. Getting Started • For any product of any scale, always ask: – What does success look like? – How can I measure success? • You’re an engineer! – Intuition can’t replace data – QA can’t save your butt Image: Hole in the Wall, FremantleMedia North America
  • 12. What should you measure? • (Un-) fortunately, you have customers • Analyze their behavior – What do they want? – What influences your quality of service? • For a search engine… Query Skim Decide Follow RefinementPaging
  • 13. USERS ARE PART OF YOUR SYSTEM Takeaway #2
  • 14. What should you measure? • (Un-) fortunately, you have customers • Analyze their behavior – What do they want? – What influences your quality of service? • For a search engine… Query Skim Decide Follow RefinementPaging Signal Signal Signal
  • 15. What should you measure? Paging – “Not relevant enough” Query Skim Decide Follow RefinementPaging
  • 16. What should you measure? Paging – “Not relevant enough” Refinement – “Not what I meant” Query Skim Decide Follow RefinementPaging
  • 17. What should you measure? Paging – “Not relevant enough” Refinement – “Not what I meant” Clickthrough – “Bingo!” Query Skim Decide Follow RefinementPaging
  • 18. What should you measure? Paging – “Not relevant enough” Refinement – “Not what I meant” Clickthrough – “Bingo!” Bonus: Abandonment – ”You suck” Query Skim Decide Follow RefinementPaging
  • 19. It should. Is this starting to look familiar?
  • 20. Well now! • We’ve been having this conversation for years • Mostly with… – Product managers – Business analysis – Data engineers • Guess what? Product Changes R&D DeploymentMeasurement Analysis
  • 21. Well now! • We’ve been having this conversation for years • Mostly with… – Product managers – Business analysis – Data engineers • Guess what? Product Changes R&D DeploymentMeasurement Analysis Informed by BI
  • 22. What can we learn from BI? Ø Be mindful of your users Ø Talk to your analysts!• Analysis • Experimentation • Iteration
  • 23. What can we learn from BI? Ø Invest in A/B tests Ø Prove your improvements! • Analysis • Experimentation • Iteration
  • 24. What can we learn from BI? • Analysis • Experimentation • Iteration Ø Establish your baseline Ø Invest in metric collection and dashboards
  • 25. SYSTEMS ARE NOT SNAPSHOTS. MEASURE CONTINUOUSLY Takeaway #3
  • 26. Hold on to your hats … this isn’t about search engines
  • 27. Case Study #2 • newBrandAnalytics, circa 2011 • A social listening platform – Finds user-generated content (e.g. reviews) – Provides operational analytics
  • 28. Social Listening Platform • A three-stage pipeline Acquisition •3rd party ingestion •BizDev •Web scraping Analysis •Manual tagging/training •NLP/ML models Analytics •Dashboards •Ad-hoc query/drilldown •Reporting
  • 29. Social Listening Platform • A three-stage pipeline • My team focused on data acquisition • Let’s discuss web scraping – Structured data extraction – At scale – Reliability is paramount Acquisition •3rd party ingestion •BizDev •Web scraping Analysis •Manual tagging/training •NLP/ML models Analytics •Dashboards •Ad-hoc query/drilldown •Reporting
  • 30. Large-Scale Scraping • A two-pronged problem • Target sites… – Can change at the drop of a hat – Actively resist scraping! • Both are external constraints • Neither can be unit-tested
  • 31. Optimizing for User Happiness • Users consume reviews • What do they want? – Completeness (no missed reviews) – Correctness (no duplicates/garbage) – Timeliness (near real-time) TripAdvisor Twitter Yelp … DataAcquisition Reports Notifications Data Lake
  • 32. Putting It Together • How do we measure completeness? • Manually – Costly, time consuming – Sampled (by definition) Image: Keypunching at Texas A&M, Cushing Memorial Library and Archives, Texas A&M (CC-BY 2.0)
  • 33. Putting It Together • How do we measure completeness? • Manually – Costly, time consuming – Sampled (by definition) • Automatically – Re-scrape a known subset – Produce similarity score
  • 34. Putting It Together • How do we measure completeness? • Manually – Costly, time consuming – Sampled (by definition) • Automatically – Re-scrape a known subset – Produce similarity score • Same with correctness
  • 35. Putting It Together • Targets do not want to be scraped • Major sites employ: – IP throttling – Traffic fingerprinting • 3rd party proxies are expensive Image from the movie “UHF", Metro-Goldwyn-Mayer
  • 36. Putting It Together • What of timeliness? • It’s an optimization problem – Polling frequency determines latency – But polling has a cost – “Good” is a tradeoff
  • 37. Putting It Together • So then, timeliness…? • First, build a cost model – Review acquisition cost – Break it down by source • Next, put together SLAs – Reflect cost in pricing! – Adjust scheduler by SLA
  • 38. Recap 1. ”Correct” and “Good” are separate dimensions 2. Users are part of your system 3. Systems are not snapshots. Measure continuously Image: Confused Monkey, Michael Keen (CC BY-NC-ND 2.0)
  • 39. QUESTIONS? Thank you for listening [email protected] @tomerg http://www.tomergabel.com This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.