SlideShare a Scribd company logo
Recommendations and User Understanding
          at StumbleUpon

Chief Data Scientist Summit, San Diego, February 2013

                      Debora Donato
                  Principal Data Scientist

                              Slides courtesy of
       Vishal Vaingankar, Tim Abraham, Roberto Sanabria, Ulas Bardak
StumbleUpon’s Mission

Help users find content they did not expect to find
 Be the best way to discover new
and interesting things from across
             the Web.
How StumbleUpon works
1. Register   2. Tell us your interests    3. Start Stumbling and
                                           rating web pages




                                We use your interests and behavior to
                                recommend new content for you!
StumbleUpon
•   Single item type
                            •   No serendipity
•   << 100K items
                            •   Many at a time
•   < 250 categories
                            •   Not personalized*
•   Hand-labeled
                            •   Repeats
•   ~27M users




                       •   +100M items
                       •   >600 recs/mo.
                       •   Auto features
                       •   ~200 methods




•   Mostly about
                            •   Hand-labeled
    presentation
                            •   Item-item
•   Social recs only
                                similarity based
•   10 million
                                methods
    recs/month
Data-driven culture

            Data science




 Applied
                              Analytics
Research


     15% of the total work force
Extensive A/B Testing




AB Tests on metrics such as session length, retention,
rating behavior etc
Outline of the talk
• The recommendation pipeline

• Showcases:
  – Mobile optimization
  – Power User Understanding
  – Lists
Discovery is very different from search


Discovery at StumbleUpon                  Search
     Serendipitous                     Intent driven
      One at a time                   List of articles
     Never repeats                    Always repeats
   Constantly adapting                 Fixed results
     Tailored for you                  Impersonal

    There is a ongoing shift from search to discovery
StumbleUpon Overview
1      Users            Automated
                                                  URL Index
    Discovery           Feeds


                                             3

            Ingestion
             Pipeline                            Rec Engine
                                       Yes
2
                                    Pass
          Sampling                   ?
Grow User’s Interest Graph:
              Implicit + Explicit

                           Experts     Friends

              Likeminded
                 Users                           News


                                User
               Food/                             Trending
Italian
Recipes       Cooking


                    Cars                    nasa.gov

          Vintage              1x.com
           Cars
Mobile Optimization
Changing Ecosystem

                            100%




                            75%
Percent of Total Stumbles




                                                                                      Source
                            50%                                                          mobile
                                                                                         desktop




                            25%




                             0%

                                   2011−01   2011−07   2012−01    2012−07   2013−01
                                                           Date
Webpages on Desktop Vs. Mobile
Webpages on Desktop Vs. Mobile
Finding mobile optimized content

                  Content Features
                  HTML tags
                  #links             P (URL_good | {f1, f2,…..}) = ?
                  #images
                  #videos



                  User Feedback



                                     P (URL_good | {f1, f2,…..}) = ?
User Feedback signals to determine mobile
 optimization




         CDF of thumbed-up
                                          URL is skipped when
         stumbles
                                          timespent <= skip_threshold


                                                          # skips
                                           Skip_rate =
                                                         # stumbles

0.05
       Skip threshold        Time (sec)
       (secs)
Cross-device skip rate prediction
                               URLs worse on
                               mobile vs desktop                       URLs bad on
                                                                       Both devices


             Mobile Skiprate




  URLs good on
  Both devices



                                          Desktop Skiprate

                      E[Mobile_skiprate] = Desktop_skiprate x Slope + Bias
AB RESULTS
User Understanding
Usage mining
Power user definition
• Is a loyal user who has been
  stumbled, even occasionally,
  for years?

• Is a user who regularly
  stumbles (daily or weekly)?

• Is a user who is able to
  discover good content?

• Or one who interacts (rates,
  creates lists, shares contents,
  invites friends)?
Stumble rate




•    Sample of ~5M users active in the last 3   •   max dist. cut off: 25.2 SPD
    months                                      •   50% dist cut off: 31.7 SPD
•    Excluded users that had < 10 DOA
•   Global avg: 39.2 SPD
•   Top 10% avg: 71 SPD
•   25% of users have SPD >= 31.7
Activity Day Rate




        # active _ days
  ADR =                   •   Max error: ~70%, 1.3% of the observations
                              above that rate.
        account _ age     •   Intercept: ~85%, 0.25% of the observations
                              above that rate.
Ranking users and content
 1   1          1


                    Content discovery



 i       r_ij   j    Content “likes”




                n
 m
Normalizations

• By the total number of object discovered

• By the total number of rates

• By the total number of Stumbles of the
  pages

• By keeping into account time of the rate
Lists
Lists
• Released in
  September 2012

• 45,000 lists
  created in the
  first months

• 2.9M total lists by
  February 2013
List by numbers

• Percentage of users who created more
  than 1 list in their first week of activity:
  10%

• Percentage of users who added at least 2
  pages to a list in their first week of activity:
  15%
URLs distribution


                         20
Number of URLs in List




                         10




                         0

                              0%   25%        50%   75%   100%
                                         Quantile
Content diversity
List distribution by number of topics


        1e+05
Count




        5e+04




        0e+00
                                                          151
                0   25                  50           75
                         Number of Topics in Lists
Topic Classification - Minos

               Cleanup
    Remove stopwords, numbers


                Stem
          Remove suffixes


        p (Ci w1, w2 ,… , wd ) = p ( w1, w2 ,… , wd Ci ) × p (Ci )
         Build n-grams
                       n
  Combinations of sequential words
        p (W Ci ) = Õ p ( wk Ci )
                        k=1
                                  n

          (       )      ( ) Õ p ( wk Ci )
              Wiki check
   Eliminate tokens notp Ci × in
        p Ci W = existing
    English Wikipedia as articles k=1
p (Ci w1, w2 ,… , wd ) = p ( w1, w2 ,… , wd Ci ) × p (Ci )
              n
p (W Ci ) = Õ p ( wk Ci )
             k=1
                       n
p (Ci W ) = p (Ci ) × Õ p ( wk Ci )
                      k=1
List Recommendation




                      ?
List Recommendation


        Vintage Cars
        Action movies            Astronomy
        Astronomy                Space Exploration
        Robotics
                                 Physics
                                 Classic Movies

       Movies
Cars               Space
                                 Neuroscience
                                 Astronomy
                                 Space Exploration
                       Science   Comedy Movies
Many other interesting problems…

•   Dupe detection
•   Anti-spam
•   Biases, mood
•   News
•   Adult content
•   Metrics
•   Trending
•   Many more…

More Related Content

PPT
Tugas
fitrinurimoed
 
PPTX
Aristeo contreras yeo pro services
Nickbeacker Theyto Contreras Yeo
 
PDF
T3
IZETT
 
PPTX
Types of restaurants by aristeo contreras yeo
Nickbeacker Theyto Contreras Yeo
 
PPTX
Aplicación de la informática en la enseñanza
Benivan Garcia
 
PDF
Danças tradicionais madeirenses: passado presente_nov12_margarida moura
margarida_moura
 
PDF
Agility 1000 dow (english)
Julio García
 
Aristeo contreras yeo pro services
Nickbeacker Theyto Contreras Yeo
 
T3
IZETT
 
Types of restaurants by aristeo contreras yeo
Nickbeacker Theyto Contreras Yeo
 
Aplicación de la informática en la enseñanza
Benivan Garcia
 
Danças tradicionais madeirenses: passado presente_nov12_margarida moura
margarida_moura
 
Agility 1000 dow (english)
Julio García
 

Viewers also liked (8)

PPT
Pesan moral dalam buku anak. mempang nggak
24 Hour Parenting
 
PPT
4 cara memberikan konsekuensi
24 Hour Parenting
 
PPT
Berapa waktu yang harus diberikan orangtua untuk anak
24 Hour Parenting
 
PPT
Bagaimana payudara bekerja saat menyusui
24 Hour Parenting
 
PPT
Nggak sabar sama anak wajarkah
24 Hour Parenting
 
PPT
Pornografi
24 Hour Parenting
 
PDF
Pornografi
24 Hour Parenting
 
PPT
Ketagihan games
24 Hour Parenting
 
Pesan moral dalam buku anak. mempang nggak
24 Hour Parenting
 
4 cara memberikan konsekuensi
24 Hour Parenting
 
Berapa waktu yang harus diberikan orangtua untuk anak
24 Hour Parenting
 
Bagaimana payudara bekerja saat menyusui
24 Hour Parenting
 
Nggak sabar sama anak wajarkah
24 Hour Parenting
 
Pornografi
24 Hour Parenting
 
Pornografi
24 Hour Parenting
 
Ketagihan games
24 Hour Parenting
 
Ad

Similar to Recommendations and User Understanding at StumbleUpon (20)

PPTX
Recommendations and Discovery at StumbleUpon
Sumanth Kolar
 
PPTX
Nandini gupta usefulpopularhelp_tekom
Nandini Gupta
 
PDF
Google Analytics Basics for NCSU Libraries' Staff
Charlie Morris
 
KEY
Adaptable Information Workshop slides
Louis Rosenfeld
 
PPTX
Dlf 2012
sherriberger
 
PDF
Nondeterministic Software for the Rest of Us
Tomer Gabel
 
PPT
IWMW 2005: Lies, Damn Lies, and Web Statistics (1)
IWMW
 
PPTX
CSC 8101 Non Relational Databases
sjwoodman
 
PPTX
Selfish Accessibility — CodeDaze
Adrian Roselli
 
PPT
Facets and Pivoting for Flexible and Usable Linked Data Exploration
Roberto García
 
PPTX
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive
 
PDF
Perso.na
betabeers
 
PDF
MeasureWorks - Design for Fast Experiences
MeasureWorks
 
PPTX
Transversal social media monitoring overview (october 2012) revised
Transversal Ltd
 
PDF
User-Testing, Testing, 1,2,3
BusinessOnline
 
PDF
How to Interpret Implicit User Feedback
Ladislav Peska
 
PPTX
Wa mw 2013
Nina McHale
 
PDF
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
keelangreen
 
PPTX
Selfish Accessibility — Harbour Front HK
Adrian Roselli
 
KEY
8 Information Architecture Better Practices
Louis Rosenfeld
 
Recommendations and Discovery at StumbleUpon
Sumanth Kolar
 
Nandini gupta usefulpopularhelp_tekom
Nandini Gupta
 
Google Analytics Basics for NCSU Libraries' Staff
Charlie Morris
 
Adaptable Information Workshop slides
Louis Rosenfeld
 
Dlf 2012
sherriberger
 
Nondeterministic Software for the Rest of Us
Tomer Gabel
 
IWMW 2005: Lies, Damn Lies, and Web Statistics (1)
IWMW
 
CSC 8101 Non Relational Databases
sjwoodman
 
Selfish Accessibility — CodeDaze
Adrian Roselli
 
Facets and Pivoting for Flexible and Usable Linked Data Exploration
Roberto García
 
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive
 
Perso.na
betabeers
 
MeasureWorks - Design for Fast Experiences
MeasureWorks
 
Transversal social media monitoring overview (october 2012) revised
Transversal Ltd
 
User-Testing, Testing, 1,2,3
BusinessOnline
 
How to Interpret Implicit User Feedback
Ladislav Peska
 
Wa mw 2013
Nina McHale
 
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
keelangreen
 
Selfish Accessibility — Harbour Front HK
Adrian Roselli
 
8 Information Architecture Better Practices
Louis Rosenfeld
 
Ad

Recently uploaded (20)

PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PPTX
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
PDF
madgavkar20181017ppt McKinsey Presentation.pdf
georgschmitzdoerner
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
Software Development Company | KodekX
KodekX
 
PDF
CIFDAQ's Teaching Thursday: Moving Averages Made Simple
CIFDAQ
 
PDF
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
CIFDAQ
 
PDF
Google’s NotebookLM Unveils Video Overviews
SOFTTECHHUB
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PPTX
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
Smart Infrastructure and Automation through IoT Sensors
Rejig Digital
 
PDF
Make GenAI investments go further with the Dell AI Factory - Infographic
Principled Technologies
 
PDF
agentic-ai-and-the-future-of-autonomous-systems.pdf
siddharthnetsavvies
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Why Your AI & Cybersecurity Hiring Still Misses the Mark in 2025
Virtual Employee Pvt. Ltd.
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
madgavkar20181017ppt McKinsey Presentation.pdf
georgschmitzdoerner
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Software Development Company | KodekX
KodekX
 
CIFDAQ's Teaching Thursday: Moving Averages Made Simple
CIFDAQ
 
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
CIFDAQ
 
Google’s NotebookLM Unveils Video Overviews
SOFTTECHHUB
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Smart Infrastructure and Automation through IoT Sensors
Rejig Digital
 
Make GenAI investments go further with the Dell AI Factory - Infographic
Principled Technologies
 
agentic-ai-and-the-future-of-autonomous-systems.pdf
siddharthnetsavvies
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Why Your AI & Cybersecurity Hiring Still Misses the Mark in 2025
Virtual Employee Pvt. Ltd.
 

Recommendations and User Understanding at StumbleUpon

  • 1. Recommendations and User Understanding at StumbleUpon Chief Data Scientist Summit, San Diego, February 2013 Debora Donato Principal Data Scientist Slides courtesy of Vishal Vaingankar, Tim Abraham, Roberto Sanabria, Ulas Bardak
  • 2. StumbleUpon’s Mission Help users find content they did not expect to find Be the best way to discover new and interesting things from across the Web.
  • 3. How StumbleUpon works 1. Register 2. Tell us your interests 3. Start Stumbling and rating web pages We use your interests and behavior to recommend new content for you!
  • 5. Single item type • No serendipity • << 100K items • Many at a time • < 250 categories • Not personalized* • Hand-labeled • Repeats • ~27M users • +100M items • >600 recs/mo. • Auto features • ~200 methods • Mostly about • Hand-labeled presentation • Item-item • Social recs only similarity based • 10 million methods recs/month
  • 6. Data-driven culture Data science Applied Analytics Research 15% of the total work force
  • 7. Extensive A/B Testing AB Tests on metrics such as session length, retention, rating behavior etc
  • 8. Outline of the talk • The recommendation pipeline • Showcases: – Mobile optimization – Power User Understanding – Lists
  • 9. Discovery is very different from search Discovery at StumbleUpon Search Serendipitous Intent driven One at a time List of articles Never repeats Always repeats Constantly adapting Fixed results Tailored for you Impersonal There is a ongoing shift from search to discovery
  • 10. StumbleUpon Overview 1 Users Automated URL Index Discovery Feeds 3 Ingestion Pipeline Rec Engine Yes 2 Pass Sampling ?
  • 11. Grow User’s Interest Graph: Implicit + Explicit Experts Friends Likeminded Users News User Food/ Trending Italian Recipes Cooking Cars nasa.gov Vintage 1x.com Cars
  • 13. Changing Ecosystem 100% 75% Percent of Total Stumbles Source 50% mobile desktop 25% 0% 2011−01 2011−07 2012−01 2012−07 2013−01 Date
  • 14. Webpages on Desktop Vs. Mobile
  • 15. Webpages on Desktop Vs. Mobile
  • 16. Finding mobile optimized content Content Features HTML tags #links P (URL_good | {f1, f2,…..}) = ? #images #videos User Feedback P (URL_good | {f1, f2,…..}) = ?
  • 17. User Feedback signals to determine mobile optimization CDF of thumbed-up URL is skipped when stumbles timespent <= skip_threshold # skips Skip_rate = # stumbles 0.05 Skip threshold Time (sec) (secs)
  • 18. Cross-device skip rate prediction URLs worse on mobile vs desktop URLs bad on Both devices Mobile Skiprate URLs good on Both devices Desktop Skiprate E[Mobile_skiprate] = Desktop_skiprate x Slope + Bias
  • 22. Power user definition • Is a loyal user who has been stumbled, even occasionally, for years? • Is a user who regularly stumbles (daily or weekly)? • Is a user who is able to discover good content? • Or one who interacts (rates, creates lists, shares contents, invites friends)?
  • 23. Stumble rate • Sample of ~5M users active in the last 3 • max dist. cut off: 25.2 SPD months • 50% dist cut off: 31.7 SPD • Excluded users that had < 10 DOA • Global avg: 39.2 SPD • Top 10% avg: 71 SPD • 25% of users have SPD >= 31.7
  • 24. Activity Day Rate # active _ days ADR = • Max error: ~70%, 1.3% of the observations above that rate. account _ age • Intercept: ~85%, 0.25% of the observations above that rate.
  • 25. Ranking users and content 1 1 1 Content discovery i r_ij j Content “likes” n m
  • 26. Normalizations • By the total number of object discovered • By the total number of rates • By the total number of Stumbles of the pages • By keeping into account time of the rate
  • 27. Lists
  • 28. Lists • Released in September 2012 • 45,000 lists created in the first months • 2.9M total lists by February 2013
  • 29. List by numbers • Percentage of users who created more than 1 list in their first week of activity: 10% • Percentage of users who added at least 2 pages to a list in their first week of activity: 15%
  • 30. URLs distribution 20 Number of URLs in List 10 0 0% 25% 50% 75% 100% Quantile
  • 32. List distribution by number of topics 1e+05 Count 5e+04 0e+00 151 0 25 50 75 Number of Topics in Lists
  • 33. Topic Classification - Minos Cleanup Remove stopwords, numbers Stem Remove suffixes p (Ci w1, w2 ,… , wd ) = p ( w1, w2 ,… , wd Ci ) × p (Ci ) Build n-grams n Combinations of sequential words p (W Ci ) = Õ p ( wk Ci ) k=1 n ( ) ( ) Õ p ( wk Ci ) Wiki check Eliminate tokens notp Ci × in p Ci W = existing English Wikipedia as articles k=1
  • 34. p (Ci w1, w2 ,… , wd ) = p ( w1, w2 ,… , wd Ci ) × p (Ci ) n p (W Ci ) = Õ p ( wk Ci ) k=1 n p (Ci W ) = p (Ci ) × Õ p ( wk Ci ) k=1
  • 36. List Recommendation Vintage Cars Action movies Astronomy Astronomy Space Exploration Robotics Physics Classic Movies Movies Cars Space Neuroscience Astronomy Space Exploration Science Comedy Movies
  • 37. Many other interesting problems… • Dupe detection • Anti-spam • Biases, mood • News • Adult content • Metrics • Trending • Many more…

Editor's Notes

  • #13: I want to step back a bit and ask… what
  • #28: I want to step back a bit and ask… what
  • #32: List are a new reality and since the fast adoption by the users
  • #33: Lists can group very distinct topics like in the case of “Save for later” and although 60% of the list are described by only 1 topics there are cases in which