How Itunes Genius Really Works

An engineer at Apple reveals how the company's premier recommendation engine parses millions of iTunes libraries. ITunes Genius is based on a packet of usage data--what songs a user has in his or her library. The data is "folded into a larger database of users and songs," says the engineer.

Uploaded by

gunasekaran.subramani3879

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views

How Itunes Genius Really Works

Uploaded by

gunasekaran.subramani3879

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Technology Review in U.S.

| en Español | auf Deutsch | in Italiano | 中文 | in India

GUEST BLOG LOG IN

Analysis and insight from Username Password Submit

occasional correspondents and
decision makers.
Wednesday, June 02, 2010
How iTunes Genius Really Works
An Apple engineer discloses how the company's premier recommendation engine
Top Stories Of The Day parses millions of iTunes libraries.
» The New Age Sensors By Christopher Mims
» Retina Transplants from Stem
Cells Ever since the feature debuted in 2008, there's been a lot of speculation about how iTunes Genius
» Detecting Single Cancer accomplishes its playlist-building magic. Now an engineer at Apple that works on the iTune Genius
Molecules team has revealed some tantalizing clues--a rare disclosure for the infamously secretive company.
Recapitulating what Steve Jobs has said previously about iTunes Genius, Apple engineer Erik
Goldman writes in his post on Quora that the starting point for the Genius service is a packet of
usage data--what songs a user has in his or her library (and, presumably, how often he or she plays
them)--sent from the iTunes application which is "folded into a larger database of users and songs."

Basically, your library of tracks is compared to all the other Genius users' libraries of tracks. Apple
then runs a set of previously secret algorithms, which Goldman described as straightforward
recommendation algorithms similar to those used by other services like Netflix when it suggests
movies for a user to watch now or add to his quene, to generate statistics for each song. "These
statistics are computed globally at regular intervals and stored in a cache," notes Goldman, because
data on the similarity of any two songs changes slowly--it's assumed the only reason it changes at
all is because of the changing tastes of the listening public, and the introduction of new tracks and
artists.
Goldman jokes that if he told you how Genius works, he'd have to kill you (or at the least, have a
squad of police officers raid your brain to retrieve Apple's rightful property), but he continues to
describe how the program works anyway.
To uncover part of how iTunes Genius works, says Goldman, "look at information retrieval
algorithms, especially those that leverage the vector-space model." But before you can compare
factors, such as the frequency of a particular artist or genre in a user's library or playlists, across
iTunes libraries via a Vector-Space model, you need a clever way to define the factor that gives more
weight to the things that really matter.
A simple way to properly weight factors for comparison is what's known as term frequency-
inverse document frequency (tf-idf). It's simply a way to compare how often a particular factor
occurs in a single document (or song, or library) to how often that factor occurs in a larger body
such as the sum of all iTunes libraries stored by the Genius servers. Thus, a factor that occurs
pretty often in a given user's library--for example, an affinity for an obscure indy band--will tend to
be a more powerful determinant, unless it also happens to occur quite often in the total set of data-
-as would be the case if the factor was an affinity for the Beatles.
Once you've got your tf-idf weights sorted, you can represent them in a vector space model as
vectors.

In this example (courtesy Wikipedia) two different documents (or songs) have all their various tf-
idf weights represented as a single vector (e.g. d1) which can then be compared to a second
document / vector (e.g. d2) and a query (q)--such as "which of these two songs is most like the one
for which I've just clicked the 'genius' button." Whichever one is closer in angle to your query
vector is more similar.
Digging deeper into iTune Genius system, Goldman talks about its use of latent-factor algorithms.
"Latent-factor algorithms, in particular, tend to work very well on huge data sets with an enormous
number of dimensions and a lot of noise," says Goldman.
Latent factors are what shakes out when you do a particular kind of statistical analysis, called a
factor analysis, on a set of data, looking for the hidden, unseen variables that cause the variation in
all the different variables you're examining. Let's say that the variability in a dozen different
variables turns out to be caused by just four or five "hidden" variables--those are your latent
factors. They cause many other variables to move in more or less lock-step.
Discovering the hidden or "latent" factors in your data set is a handy way to reduce the size of the
problem that you have to compute, and it works because humans are predictable: people who like
Emo music are sad, and sad people also like the soundtracks to movie versions of vampire novels
that are about yearning, etc. You might think of it as the mathematical expression of a stereotype--
only it works.
If you want to go really deep on this subject, Goldman suggests you read the papers that came out
of the million-dollar Netflix Prize, which was won by a combination of teams led by engineers from
AT&T. Their challenge was to improve Netflix's recommendation engine, and one of their primary
innovations was reducing the computational intensity of the algorithms used in recommendation
engines.
Previously, the amount of computation required to do a pairwise comparison of any two items in
Netflix's (and presumably Apple's) library scaled as a quadratic function of the number of
comparisons to be performed. But the AT&T team figured out how to re-write a fundamental
algorithm to make the problem scale only linearly with the amount of data involved. So, whatever
Apple's new data center is for, it's probably not to calculate Genius results.

Comments Close Comments

Clarification
The article says "library scaled as a quadratic function of the
number of comparisons". There are at most a quadratic snedunuri
number of pairwise comparisons you can do, so this must 06/04/2010
Posts:50
mean that the algorithm was comparing comparisons?? Or Avg Rating:
should it say scaled quadratically with the amount of data?
Rate this comment:
(Reply)

« Previous Post Next Post »

Computing Web Communications Energy

Materials Biomedicine Business Today's Stories

About Us | Privacy | Terms of Use | Subscribe | Advertise | Sitemap | Contact Us | Feedback

Compensation Management
100% (1)
Compensation Management
24 pages
CDR Submitted To Engineers Australia
80% (5)
CDR Submitted To Engineers Australia
24 pages
Python Data Science Essentials - Second Edition
From Everand
Python Data Science Essentials - Second Edition
Alberto Boschetti
4.5/5 (3)
Syngo - Plaza: Software Version VA20B and Higher Operator and Administrator Manual - Addendum
100% (1)
Syngo - Plaza: Software Version VA20B and Higher Operator and Administrator Manual - Addendum
87 pages
Workout Overview
No ratings yet
Workout Overview
8 pages
Recommendation System in Python
No ratings yet
Recommendation System in Python
13 pages
CompSci HL P3 Case Study
No ratings yet
CompSci HL P3 Case Study
7 pages
Deep learning: deep learning explained to your granny – a guide for beginners
From Everand
Deep learning: deep learning explained to your granny – a guide for beginners
PAT NAKAMOTO
3/5 (2)
Big Data
No ratings yet
Big Data
12 pages
Deep Learning Frameworks
From Everand
Deep Learning Frameworks
Jamal Hopper
No ratings yet
CS 3308 DISCUSSION FORUM 4
No ratings yet
CS 3308 DISCUSSION FORUM 4
2 pages
1 Ijetst PDF
No ratings yet
1 Ijetst PDF
9 pages
Unsupervised Learning Algorithm 1
No ratings yet
Unsupervised Learning Algorithm 1
3 pages
Updated DM
No ratings yet
Updated DM
72 pages
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
From Everand
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
William Sullivan
1/5 (1)
Big Data Analytics
No ratings yet
Big Data Analytics
128 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
48 pages
One ✅ (18)
No ratings yet
One ✅ (18)
38 pages
AI in BI - Unit 5
No ratings yet
AI in BI - Unit 5
50 pages
R and Handoop
No ratings yet
R and Handoop
9 pages
DATA MINING for search engines
No ratings yet
DATA MINING for search engines
33 pages
c9641 c000
No ratings yet
c9641 c000
12 pages
What Is ChatGPT Doing: ... and Why Does It Work?
From Everand
What Is ChatGPT Doing: ... and Why Does It Work?
Stephen Wolfram
No ratings yet
Garv Gupta X HT
No ratings yet
Garv Gupta X HT
59 pages
The Internet of Things: From Information To Insights
No ratings yet
The Internet of Things: From Information To Insights
12 pages
AI Prompting: A Guide to Communicating with Artificial Intelligence
From Everand
AI Prompting: A Guide to Communicating with Artificial Intelligence
E. A. Ruppert II
No ratings yet
Capturing & Analyzing High Velocity High Volume Machine Data
No ratings yet
Capturing & Analyzing High Velocity High Volume Machine Data
12 pages
Math for Deep Learning: What You Need to Know to Understand Neural Networks
From Everand
Math for Deep Learning: What You Need to Know to Understand Neural Networks
Ronald T. Kneusel
No ratings yet
Innovation Slides
No ratings yet
Innovation Slides
282 pages
AI Programming
From Everand
AI Programming
Alisa Turing
No ratings yet
AIML_presentation
No ratings yet
AIML_presentation
21 pages
Module 2
No ratings yet
Module 2
53 pages
Data Mining-Module Ii Notes (S4 Bca)
No ratings yet
Data Mining-Module Ii Notes (S4 Bca)
40 pages
Discovering New Knowledge - Data Mining
No ratings yet
Discovering New Knowledge - Data Mining
55 pages
CS490D: Introduction To Data Mining: Chris Clifton
No ratings yet
CS490D: Introduction To Data Mining: Chris Clifton
28 pages
Module 1_Aug 2024
No ratings yet
Module 1_Aug 2024
93 pages
Data Science Basics
From Everand
Data Science Basics
Zoe Codewell
No ratings yet
An Introduction To Data Mining IIT Bombay
No ratings yet
An Introduction To Data Mining IIT Bombay
48 pages
Datamining Lect1
No ratings yet
Datamining Lect1
61 pages
Data Analytics AI v2
No ratings yet
Data Analytics AI v2
31 pages
M4
No ratings yet
M4
58 pages
Dunham - Data Mining PDF
83% (6)
Dunham - Data Mining PDF
156 pages
Dunham - Data Mining PDF
No ratings yet
Dunham - Data Mining PDF
156 pages
Lecture 2: More Similarity Searching Multidimensional Scaling
No ratings yet
Lecture 2: More Similarity Searching Multidimensional Scaling
8 pages
MGS_616__Unsupervised_Learning
No ratings yet
MGS_616__Unsupervised_Learning
103 pages
Server Technology 600
No ratings yet
Server Technology 600
42 pages
Unit 1 DM
No ratings yet
Unit 1 DM
37 pages
Emerging Concepts & Trends in Business Analytics
No ratings yet
Emerging Concepts & Trends in Business Analytics
15 pages
1.1 What Is Data Mining?
No ratings yet
1.1 What Is Data Mining?
6 pages
Estimating Frequent Products in Shopping Cart Using Data Mining
No ratings yet
Estimating Frequent Products in Shopping Cart Using Data Mining
5 pages
David Magar 20048430
No ratings yet
David Magar 20048430
15 pages
Creating New Market Space Slides
No ratings yet
Creating New Market Space Slides
28 pages
Using Item Descriptors in Recommender Systems: Eliseo Reategui, John A. Campbell, Roberto Torres
No ratings yet
Using Item Descriptors in Recommender Systems: Eliseo Reategui, John A. Campbell, Roberto Torres
7 pages
Module-4_Notes_13-12-2024.docx
No ratings yet
Module-4_Notes_13-12-2024.docx
21 pages
Ccs360-Rs-Unit-Ii - Content Based RS
No ratings yet
Ccs360-Rs-Unit-Ii - Content Based RS
15 pages
Analytics For Iot: Making Sense of Data From Sensors
No ratings yet
Analytics For Iot: Making Sense of Data From Sensors
45 pages
CIM Session 2-090519
No ratings yet
CIM Session 2-090519
47 pages
Tauseef Sharif - Bda
No ratings yet
Tauseef Sharif - Bda
4 pages
Exploring Artificial Intelligence Machine Learning
No ratings yet
Exploring Artificial Intelligence Machine Learning
178 pages
PYTHON PROGRAMMING LANGUAGE FOR BEGINNERS: Learn Python from Scratch and Kickstart Your Programming Journey (2023 Crash Course)
From Everand
PYTHON PROGRAMMING LANGUAGE FOR BEGINNERS: Learn Python from Scratch and Kickstart Your Programming Journey (2023 Crash Course)
Bert Daniels
No ratings yet
Item-Based Top-N Recommendation Algorithms: Mukund Deshpande and George Karypis University of Minnesota
No ratings yet
Item-Based Top-N Recommendation Algorithms: Mukund Deshpande and George Karypis University of Minnesota
35 pages
1701.09042
No ratings yet
1701.09042
7 pages
ML Unit 6
No ratings yet
ML Unit 6
83 pages
Architecting Recommender Systems: Boston Machine Learning
No ratings yet
Architecting Recommender Systems: Boston Machine Learning
63 pages
Anna Hazare Pamphlet
No ratings yet
Anna Hazare Pamphlet
2 pages
Anatomic Therapy Tamil PDF Book
100% (3)
Anatomic Therapy Tamil PDF Book
320 pages
Singing Skylarks On CityBuzz
No ratings yet
Singing Skylarks On CityBuzz
1 page
Java Memory Management Whitepaper - April 2006
100% (2)
Java Memory Management Whitepaper - April 2006
21 pages
Example CV
No ratings yet
Example CV
4 pages
Smallest Nano Antennas For HighSpeed Data Networks
No ratings yet
Smallest Nano Antennas For HighSpeed Data Networks
2 pages
Creating A Home Screen App Widget On Android
No ratings yet
Creating A Home Screen App Widget On Android
5 pages
EJB Programming Using Weblogic
100% (2)
EJB Programming Using Weblogic
316 pages
WinCE6.0 Developer Guide
No ratings yet
WinCE6.0 Developer Guide
244 pages
OMA Device Management
No ratings yet
OMA Device Management
172 pages
GSM Terminating Call Flow
100% (3)
GSM Terminating Call Flow
4 pages
Cell Phone Safetly Recommendations 01
No ratings yet
Cell Phone Safetly Recommendations 01
45 pages
Jai GSM
100% (1)
Jai GSM
54 pages
GSM Originating Call Flow
100% (2)
GSM Originating Call Flow
4 pages
WinCE6.0 Developer Guide
No ratings yet
WinCE6.0 Developer Guide
244 pages
GSM Location Update Sequence Diagram
No ratings yet
GSM Location Update Sequence Diagram
5 pages
SS7 in Mobile Communication Networking
No ratings yet
SS7 in Mobile Communication Networking
5 pages
GSM MM Call Flows
100% (3)
GSM MM Call Flows
2 pages
Telecommunication Glossary
No ratings yet
Telecommunication Glossary
41 pages
Mobile Application Based On (U) SIM Java Card Applet Patrick Biget
No ratings yet
Mobile Application Based On (U) SIM Java Card Applet Patrick Biget
66 pages
Overview of CS PS SS7
No ratings yet
Overview of CS PS SS7
4 pages
Smart Card Overview From ETSI
No ratings yet
Smart Card Overview From ETSI
23 pages
Arun Kumar - Java 5 - J2EE Interviews Questions and Answers
100% (1)
Arun Kumar - Java 5 - J2EE Interviews Questions and Answers
356 pages
UML Basics: Grady Booch & Ivar Jacobson
100% (1)
UML Basics: Grady Booch & Ivar Jacobson
101 pages
ICRD Template
No ratings yet
ICRD Template
6 pages
(Explainer) Indian Citizenship: Modes of Acquiring
No ratings yet
(Explainer) Indian Citizenship: Modes of Acquiring
5 pages
Casey Sarah R. Erato Block D: Amount Paid For The Property Acquired
No ratings yet
Casey Sarah R. Erato Block D: Amount Paid For The Property Acquired
3 pages
Math8q1 Week6 Hybrid - Version2
No ratings yet
Math8q1 Week6 Hybrid - Version2
19 pages
Slim River Boq - Rev 1 - (RN)
No ratings yet
Slim River Boq - Rev 1 - (RN)
11 pages
Teclado Casio GZ5 Service Manual
No ratings yet
Teclado Casio GZ5 Service Manual
15 pages
CAS Overview The Wabtec Difference
No ratings yet
CAS Overview The Wabtec Difference
1 page
Maths Igcse Textbook 14
No ratings yet
Maths Igcse Textbook 14
1 page
Regional Aspirations-1
No ratings yet
Regional Aspirations-1
9 pages
Emotion Regulation A Matter of Time Frontiers of Developmental Science 1st Edition Pamela M Cole Editor Tom Hollenstein Editor instant download
100% (1)
Emotion Regulation A Matter of Time Frontiers of Developmental Science 1st Edition Pamela M Cole Editor Tom Hollenstein Editor instant download
52 pages
Costa Rica - MPG - 2007 PDF
No ratings yet
Costa Rica - MPG - 2007 PDF
11 pages
3 540 45798 4
No ratings yet
3 540 45798 4
509 pages
Chaos Walking 2021 - Eng - SRT
No ratings yet
Chaos Walking 2021 - Eng - SRT
105 pages
Mock I Cho Problems
No ratings yet
Mock I Cho Problems
24 pages
Types of Advertising Agency
No ratings yet
Types of Advertising Agency
20 pages
BEP061SN-Sports Idioms3
No ratings yet
BEP061SN-Sports Idioms3
4 pages
Solar Collector
No ratings yet
Solar Collector
27 pages
Erythritol Tetranitrate
100% (2)
Erythritol Tetranitrate
11 pages
Fuchs Lubricants Sweden AB - HYDRAULIC OIL 131 - Unknown - 07-05-2016 - English
No ratings yet
Fuchs Lubricants Sweden AB - HYDRAULIC OIL 131 - Unknown - 07-05-2016 - English
10 pages
Topic: Cheating And Hacking In Video Games Team 5: Hoàng Vũ Hải Nam Lê Gia Khoa Hứa Hải Minh Nguyễn Trường Sơn Time estimated: About 15 to 20 minutes
No ratings yet
Topic: Cheating And Hacking In Video Games Team 5: Hoàng Vũ Hải Nam Lê Gia Khoa Hứa Hải Minh Nguyễn Trường Sơn Time estimated: About 15 to 20 minutes
5 pages
Mentor Hotel/Motel Police Reports
No ratings yet
Mentor Hotel/Motel Police Reports
18 pages
Neuro Oncologic Emergencies.14
No ratings yet
Neuro Oncologic Emergencies.14
33 pages
Concrete Construction Assignment
No ratings yet
Concrete Construction Assignment
8 pages
GST807 Module 3
No ratings yet
GST807 Module 3
27 pages
PEA 2009 Annual Report
No ratings yet
PEA 2009 Annual Report
80 pages
Diabetes in Pregnancy: Multiple Choice Questions For Vol. 25, No. 1 - Obgyn Key
No ratings yet
Diabetes in Pregnancy: Multiple Choice Questions For Vol. 25, No. 1 - Obgyn Key
1 page

How Itunes Genius Really Works

Uploaded by

How Itunes Genius Really Works

Uploaded by

Technology Review in U.S.

| en Español | auf Deutsch | in Italiano | 中文 | in India

GUEST BLOG LOG IN

Analysis and insight from Username Password Submit

Comments Close Comments

« Previous Post Next Post »

Computing Web Communications Energy

About Us | Privacy | Terms of Use | Subscribe | Advertise | Sitemap | Contact Us | Feedback

Massachusetts Institute of Technology © 2010 Technology Review. All Rights Reserved.

You might also like