Application of Dimensionality Reduction in Recommender System - A Case Study
Application of Dimensionality Reduction in Recommender System - A Case Study
1
2
3
5
4
Figure 1: Illustration of the neighborhood formation process. The distance between the
target user and every other user is computed and the closest-k users are chosen as the
neighbors (for this diagram k = 5).
communities cannot depend on each person knowing Usenet news and movies. Ringo (Shardanand et al.
the others. Several systems use statistical techniques 1995) and Video Recommender (Hill et al. 1995) are
to provide personal recommendations of documents email and web systems that generate
by finding a group of other users, known as recommendations on music and movies respectively.
neighbors that have a history of agreeing with the Here we present the schematic diagram of the
target user. Usually, neighborhoods are formed by architecture of the GroupLens Research collaborative
Recommender
System
Request Ratings Engine
Ratings
Dynamic
WWW
HTML
Response Server generator
Recomm-
Recomm- endations
Customer endations
Correlation Ratings
Figure 2. Recommender System Architecture Database Database
applying proximity measures such as the Pearson filtering engine in figure 2. The user interacts with a
Web interface. The Web server software concludes the paper and provides directions for future
communicates with the recommender system to research.
choose products to suggest to the user. The
recommender system, in this case a collaborative
filtering system, uses its database of ratings of 2 Existing Recommender Systems
products to form neighborhoods and make Approaches and their Limitations
recommendations. The Web server software displays
the recommended products to the user. Most collaborative filtering based recommender
systems build a neighborhood of likeminded
The largest Web sites operate at a scale that stresses customers. The Neighborhood formation scheme
the direct implementation of collaborative filtering. usually uses Pearson correlation or cosine similarity
Model-based techniques (Fayyad et al., 1996) have as a measure of proximity (Shardanand et al. 1995,
the potential to contribute to recommender systems Resnick et al. 1994). Once these systems determine
that can operate at the scale of these sites. However, the proximity neighborhood they produce two types
these techniques must be adapted to the real-time of recommendations.
needs of the Web, and they must be tested in realistic
problems derived from Web access patterns. The 1. Prediction of how much a customer C will like a
present paper describes our experimental results in product P. In case of correlation based
applying a model-based technique, Latent Semantic algorithm, prediction on product ‘P’ for
Indexing (LSI), that uses a dimensionality reduction customer ‘C’ is computed by computing a
technique, Singular Value Decomposition (SVD), to weighted sum of co-rated items between C and
our recommender system. We use two data sets in all his neighbors and then by adding C's average
our experiments to test the performance of the model- rating to that. This can be expressed by the
based technique: a movie dataset and an e-commerce following formula (Resnick et al., 1994):
∑
dataset.
( J P − J )rCJ
The contributions of this paper are: C P pred = C + J ∈ rates
Each entry in our data matrix R represents a rating on 4.3.2 Top-N recommendation experiment:
a 1-5 scale, except that in cases where the user i
didn’t rate movie j the entry ri,j is null. We then We started with a matrix as the previous experiment
performed the following experimental steps. but converted the rating entries (i.e., non-zero entries)
to "1". Then we produced top-10 product
We computed the average ratings for each user and recommendations for each customer based on the
for each movie and filled the null entries in the following two schemes:
matrix by replacing each null entry with the column
average for the corresponding column. Then we § High dimensional neighborhood: In this scheme
normalized all entries in the matrix by replacing each we built the customer neighborhood in the
entry ri,j with (ri,j - ri ), where, ri is the row average of original customer-product space and used most
the ith row. Then MATLAB was used to compute the frequent item recommendation to produce top-10
SVD of the filled and normalized matrix R, product list. We then used our F1 metric to
producing the three SVD component matrices U, S evaluate the quality.
and V'. S is the matrix that contains the singular § Low dimensional neighborhood: We first reduce
values of matrix R sorted in decreasing order. Sk was the dimensionality of the original space by
computed from S by retaining only k largest singular applying SVD and then used UkSk1/2 (i.e.,
values and replacing the rest of the singular with 0. representation of customers in k dimensional
We computed the square root of the reduced matrix space) matrix to build the neighborhood. As
and computed the matrix products UkSk1/2 and Sk1/2V'k before we used most frequent item
as mentioned above. We then multiplied the matrices recommendation to produce top-10 list and
UkSk1/2 and Sk1/2V'k producing a 943 x 1682 matrix, P. evaluated by using F1 metric.
Since the inner product of a row from UkSk1/2 and a
column from Sk1/2Vk gives us a prediction score, each In this experiment our main focus was on the E-
SVD prediction quality variation with number of dimension x=0.2 SVD as Prediction Generator Pure-CF
x=0.5 (k is fixed at 14 for SVD) SVD
x=0.8
0.8
0.86
0.79
0.84
Mean absolute error
0.78
0.82
0.77
0.8
MAE
0.76
0.78
0.75
0.76
0.74
0.74
0.73
0.72
2
9
25
35
45
55
65
75
85
95
0.72
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
2 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 25 50 100
Figure 3. (a) Determination of optimum value of k. (b) SVD vs. CF-Predict prediction quality
commerce data. We also report our findings when we determined the optimum x ratio for both of our data
apply this technique on our movie preference data. sets in high dimensional and low dimensional cases.
At first we run the high dimensional experiment for
4.4 Results different x ratio and then we perform low
dimensional experiments for different x values for a
fixed dimension (k) and compute the F1 metric.
4.4.1Prediction experiment results
Figure 4 shows our results, we observe that optimum
Figure 3(b) charts our results for the prediction x values are 0.8 and 0.6 for the movie data and the E-
experiment. The data sets were obtained from the commerce data respectively.
same sample of 100,000 ratings, by varying the sizes
Once we obtain the best x value, we run high
of the training and test data sets, (recall that x is the
dimensional experiment for that x and compute F1
Determination of the optimum x value ML High-dim Determination of the optimum x value EC High-dim
(Movie data set) ML Low-dim (Commerce data set) EC Low-dim
0.24 0.16
0.22 0.14
0.2 0.12
F1 Metric
F1 Metric
0.18 0.1
0.08
0.16
0.06
0.14 0.04
0.12 0.02
0.1 0
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Figure 4. Determination of the optimum value of x. a) for the Movie data b) for the Commerce data
ratio between the size of the training set and the size metric. Then we run our low-dimensional
of the entire data set). Note that the different values experiments for that x ratio, but vary the number of
of x were used to determine the sensitivity of the dimension, k. Our results are presented in figures 5
different schemes on the sparsity of the training set. and 6. We represent the corresponding high
dimensional results (i.e., results from CF-
4.4.2 Top-N recommendation experiment recommend) in the chart by drawing vertical lines at
results their corresponding values.
0.226
0.224
High dimensional value at x = 0.8
0.222
0.22
10 20 30 40 50 60 70 80 90 100
Dimension, k
0.17
0.16
0.15 High dimensional value at x = 0.6
F1 Metric
0.14
0.13
0.12
0.11
0.1
0.09
50 100 150 200 250 300 350 400 450 500 600 700
Dimension, k
the e-commerce data the recommendation quality recommendation quality was better than
keeps on growing with increasing dimensions. The corresponding high dimensional scheme. It indicates
movie experiment reveals that the low dimensional that neighborhoods formed in the reduced
results are better than the high dimensional dimensional space are better than their high
counterpart at all values of k. In case of the e- dimensional counterparts.2
commerce experiment the high dimensional result is
always better, but as more and more dimensions are
added low dimensional values improve. However, we
increased the dimension values up to 700, but the low
dimensional values were still lower than the high
dimensional value. Beyond 700 the entire process
becomes computationally very expensive. Since the
2
commerce data is very high dimensional We’re also working with experiments to use the reduced
(6502x23554), probably such a small k value (up to dimensional neighborhood for prediction generation using
700) is not sufficient to provide a useful classical CF algorithm. So far, the results are encouraging.
5 Conclusions was also supported by NSF CCR-9972519, by Army
Research Office contract DA/DAAG55-98-1-0441,
Recommender systems are a powerful new by the DOE ASCI program and by Army High
technology for extracting additional value for a Performance Computing Research Center contract
business from its customer databases. These systems number DAAH04-95-C-0008. We thank anonymous
help customers find products they want to buy from a reviewers for their valuable comments.
business. Recommender systems benefit customers
by enabling them to find products they like. References
Conversely, they help the business by generating
more sales. Recommender systems are rapidly
1. Berry, M. W., Dumais, S. T., and O’Brian, G. W.
becoming a crucial tool in E-commerce on the Web.
1995. “Using Linear Algebra for Intelligent
Recommender systems are being stressed by the huge Information Retrieval”. SIAM Review, 37(4),
volume of customer data in existing corporate pp. 573-595.
databases, and will be stressed even more by the
2. Billsus, D., and Pazzani, M. J. 1998. “Learning
increasing volume of customer data available on the
Collaborative Information Filters”. In
Web. New technologies are needed that can
Proceedings of Recommender Systems
dramatically improve the scalability of recommender
Workshop. Tech. Report WS-98-08, AAAI
systems.
Press.
Our study shows that Singular Value Decomposition
3. Bhattacharyya, S. 1998. “Direct Marketing
(SVD) may be such a technology in some cases. We
Response Models using Genetic Algorithms.” In
tried several different approaches to using SVD for
Proceedings of the Fourth International
generating recommendations and predictions, and
Conference on Knowledge Discovery and Data
discovered one that can dramatically reduce the
Mining, pp. 144-148.
dimension of the ratings matrix from a collaborative
filtering system. The SVD-based approach was 4. Brachman, R., J., Khabaza, T., Kloesgen, W.,
consistently worse than traditional collaborative Piatetsky-Shapiro, G., and Simoudis, E. 1996.
filtering in se of an extremely sparse e-commerce “Mining Business Databases.” Communications
dataset. However, the SVD-based approach of the ACM, 39(11), pp. 42-48, November.
produced results that were better than a traditional
5. Deerwester, S., Dumais, S. T., Furnas, G. W.,
collaborative filtering algorithm some of the time in
Landauer, T. K., and Harshman, R. 1990.
the denser MovieLens data set. This technique leads
“Indexing by Latent Semantic Analysis”.
to very fast online performance, requiring just a few
Journal of the American Society for Information
simple arithmetic operations for each
Science, 41(6), pp. 391-407.
recommendation. Computing the SVD is expensive,
but can be done offline. Further research is needed to 6. Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P.,
understand how often a new SVD must be computed, and Uthurusamy, R., Eds. 1996. “Advances in
or whether the same quality can be achieved with Knowledge Discovery and Data Mining”. AAAI
incremental SVD algorithms (Berry et. al., 1995). press/MIT press.
Future work is required to understand exactly why 7. Goldberg, D., Nichols, D., Oki, B. M., and
SVD works well for some recommender applications, Terry, D. 1992. “Using Collaborative Filtering to
and less well for others. Also, there are many other Weave an Information Tapestry”.
ways in which SVD could be applied to Communications of the ACM. December.
recommender systems problems, including using
SVD for neighborhood selection, or using SVD to 8. Good, N., Schafer, B., Konstan, J., Borchers, A.,
create low-dimensional visualizations of the ratings Sarwar, B., Herlocker, J., and Riedl, J. 1999.
"Combining Collaborative Filtering With
space.
Personal Agents for Better Recommendations."
In Proceedings of the AAAI-'99 conference, pp
6 Acknowledgements 439-446.
9. Heckerman, D. 1996. “Bayesian Networks for
Funding for this research was provided in part by the Knowledge Discovery.” In Advances in
National Science Foundation under grants IIS Knowledge Discovery and Data Mining. Fayyad,
9613960, IIS 9734442, and IIS 9978717 with U. M., Piatetsky-Shapiro, G., Smyth, P., and
additional funding by Net Perceptions Inc. This work Uthurusamy, R., Eds. AAAI press/MIT press.
10. Herlocker, J., Konstan, J., Borchers, A., and
Riedl, J. 1999. "An Algorithmic Framework for
Performing Collaborative Filtering." In
Proceedings of ACM SIGIR'99. ACM press.
11. Hill, W., Stead, L., Rosenstein, M., and Furnas,
G. 1995. “Recommending and Evaluating
Choices in a Virtual Community of Use”. In
Proceedings of CHI ’95.
12. Le, C. T., and Lindgren, B. R. 1995.
“Construction and Comparison of Two Receiver
Operating Characteristics Curves Derived from
the Same Samples”. Biom. J. 37(7), pp. 869-877.
13. Ling, C. X., and Li C. 1998. “Data Mining for
Direct Marketing: Problems and Solutions.” In
Proceedings of the 4th International Conference
on Knowledge Discovery and Data Mining, pp.
73-79.
14. Resnick, P., Iacovou, N., Suchak, M., Bergstrom,
P., and Riedl, J. 1994. “GroupLens: An Open
Architecture for Collaborative Filtering of
Netnews. In Proceedings of CSCW ’94, Chapel
Hill, NC.
15. Sarwar, B., M., Konstan, J. A., Borchers, A.,
Herlocker, J., Miller, B., and Riedl, J. 1998.
“Using Filtering Agents to Improve Prediction
Quality in the GroupLens Research
Collaborative Filtering System.” In Proceedings
of CSCW ’98, Seattle, WA.
16. Sarwar, B.M., Konstan, J.A., Borchers, A., and
Riedl, J. 1999. "Applying Knowledge from KDD
to Recommender Systems." Technical Report TR
99-013, Dept. of Computer Science, University
of Minnesota.
17. Schafer, J. B., Konstan, J., and Riedl, J. 1999.
“Recommender Systems in E-Commerce.” In
Proceedings of ACM E-Commerce 1999
conference.
18. Shardanand, U., and Maes, P. 1995. “Social
Information Filtering: Algorithms for
Automating ‘Word of Mouth’.” In Proceedings
of CHI ’95. Denver, CO.
19. Yang, Y., and Liu, X. 1999. "A Re-examination
of Text Categorization Methods." In Proceedings
of ACM SIGIR'99 conferenc, pp 42-49.
20. Zytkow, J. M. 1997. “Knowledge = Concepts: A
Harmful Equation.” In Proceedings of the Third
International Conference on Knowledge
Discovery and Data Mining.