llmrecommpaper
llmrecommpaper
1395
KDD ’24, August 25–29, 2024, Barcelona, Spain Sein Kim et al.
SASRec MoRec TALLRec In this paper, we propose an efficient all-round LLM-based rec-
0.8 AMZ. Movies 0.7 AMZ. Video Games
ommender system, called A-LLMRec (All-round LLM-based Rec-
0.7 0.6787 0.6 ommender system), that excels not only in the cold scenario but also
0.5764
0.6 0.4977 0.4897 in the warm scenario (hence, all-round recommender system). Our
0.5
0.5 main idea is to enable an LLM to directly leverage the collaborative
0.4395 0.4 0.395
HIT@1
HIT@1
0.4 knowledge contained in a pre-trained state-of-the-art collaborative
0.3
0.3 0.2987 filtering recommender system (CF-RecSys) so that the emergent
0.2589 0.2745 0.2654 0.2318
0.2 0.1991 ability [45] of the LLM, as well as the high-quality user/item em-
0.2
0.1 0.1 beddings that are already trained by the state-of-the-art CF-RecSys,
can be jointly exploited. More precisely, we devise an alignment
0.0 0.0
Cold Cold Warm Warm network that aligns the item embeddings of the CF-RecSys with
Figure 1: Comparisons between collaborative filtering model the token space of the LLM, aiming at transferring the collabora-
(SASRec), modality-aware model (i.e., MoRec), and LLM- tive knowledge learned from a pre-trained CF-RecSys to the LLM
based model (i.e., TALLRec) under the cold/warm1 scenarios enabling it to understand and utilize the collaborative knowledge
on Amazon Movies/Video Games dataset (Hit@1)2 . for the downstream recommendation task.
The key innovation of A-LLMRec is that it requires the fine-
tuning of neither the CF-RecSys nor the LLM, and that the alignment
Despite the effectiveness of modality-aware recommender sys- network is the only neural network that is trained in A-LLMRec,
tems in cold scenarios, the recent emergence of Large Language which comes with the following two crucial advantages:
Models (LLMs), known for their rich pre-trained knowledge and (1) (Model-agnostic) A-LLMRec allows any existing CF-RecSys
advanced language understanding capabilities, has attracted signif- to be integrated, which implies that services using their own
icant interest in the recommendation domain to effectively extract recommender models can readily utilize the power of the LLM.
and integrate modality information [37, 48]. Early studies on LLM- Besides, any updates of the recommender models can be easily
based recommendation [12, 16, 44] have employed OpenAI-GPT reflected by simply replacing the old models, which makes the
with In-context Learning [4]. This approach adapts to new tasks model practical in reality.
or information based on the context provided within the input (2) (Efficiency) A-LLMRec is efficient in that the alignment net-
prompt and demonstrates the potential of LLMs as a recommender work is the only trainable neural network, while TALLRec [2]
system. Moreover, to bridge the gap between the training tasks of requires the fine-tuning of the LLM with LoRA [18]. As a re-
LLMs and recommendation tasks, TALLRec [2] fine-tunes LLMs sult, A-LLMRec trains approximately 2.53 times and inferences
with recommendation data using LoRA [18]. This approach has 1.71 times faster than TALLRec, while also outperforming both
empirically demonstrated that, in cold scenarios and cross-domain TALLRec and CF-RecSys in both cold and warm scenarios.
scenarios, fine-tuned LLMs outperform traditional collaborative
filtering models. Our extensive experiments on various real-world datasets demon-
Although modality-aware and LLM-based recommender systems strate the superiority of A-LLMRec, revealing that aligning high-
have proven effective in cold scenarios with limited user-item in- quality user/item embeddings with the token space of the LLM is the
teractions, we argue that these methods suffer from the lack of key for solving not only cold/warm scenarios but also few-shot, cold
collaborative knowledge due to their heavy reliance on textual in- user, and cross-domain scenarios. Lastly, beyond the recommenda-
formation [51]. Consequently, when abundant user-item interactions tion task, we perform a language generation task, i.e., favorite genre
are available (i.e., warm scenario), modality-aware and LLM-based prediction, to demonstrate that A-LLMRec can generate natural
recommenders are rather inferior to simple traditional collaborative language outputs based on the understanding of users and items
filtering models. As shown in Figure 1, while the modality-aware through the aligned collaborative knowledge from CF-RecSys. Our
recommender (i.e., MoRec) and the LLM-based recommender (i.e., main contributions are summarized as follows:
TALLRec) significantly outperform the traditional collaborative • We present an LLM-based recommender system, called A-LLMRec,
filtering model (i.e., SASRec [20]) in the cold scenario, they are that directly leverages the collaborative knowledge contained
outperformed by the traditional collaborative filtering model in in a pre-trained state-of-the-art recommender system.
the warm scenario. This is mainly because the textual information • A-LLMRec requires the fine-tuning of neither the CF-RecSys
becomes less important in the warm scenario, where ID-based col- nor the LLM, while only requiring an alignment network to be
laborative filtering models excel at modeling popular items [6, 51]. trained to bridge between them.
However, while excelling in the cold scenario is crucial, the majority • Our extensive experiments demonstrate that A-LLMRec out-
of user interactions and the revenue are predominantly generated performs not only the conventional CF-RecSys in the warm
from already existing and active items (i.e., warm items) in real- scenario but also the LLMs in the cold scenario.
world application of recommendation systems, which contribute up
to 90% of interactions in offline-industrial data [8, 49]. Furthermore, 2 RELATED WORK
as demonstrated by DCBT [49], modeling both warm and cold items
is essential for improving overall user engagement, which is evi- 2.1 Collaborative Filtering
denced by A/B testing with real-world industrial data. This implies Collaborative Filtering (CF) is the cornerstone of recommenda-
that the warm scenario should not be overlooked. tion systems, fundamentally relying on leveraging users’ historical
1396
Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender System KDD ’24, August 25–29, 2024, Barcelona, Spain
preferences to inform future suggestions. The key idea is to rely LLM as a recommender system. More precisely, [12, 16, 44] utilize
on similar users/items for recommendations. The emergence of LLMs with In-context Learning [4], adapting to new tasks or in-
matrix factorization marked a significant advancement in CF, as formation based on the context provided within the input prompt.
evidenced by numerous studies [19, 22, 38], demonstrating its supe- For example, Sanner et al. [37] employs In-context Learning for
riority in capturing the latent factors underlying user preferences. recommendation tasks, exploring various prompting styles such
This evolution continued with the introduction of Probabilistic Ma- as completion, instructions, and few-shot prompts based on item
trix Factorization (PMF) [5, 33] and Singular Value Decomposition texts and user descriptions. Gao et al. [12] assigns the role of a
(SVD) [30, 54], which integrate probabilistic and decomposition recommender expert to rank items that meet users’ needs through
techniques to further refine the predictive capabilities of CF models. prompting and conducts zero-shot recommendations. These studies
AutoRec [39] and Neural Matrix Factorization (NMF) [15] utilized empirically demonstrated the potential of LLMs using its rich item
deep learning to enhance CF by capturing complex user-item in- information and natural language understanding in the recommen-
teraction patterns. Recently, [7, 21, 34, 36] proposed modeling col- dation domain. However, these approaches often underperform
laborative filtering based on sequential interaction history. Caser traditional recommendation models [20, 40], due to the gap be-
[41] and NextItNet [50] utilize Convolutional Neural Networks tween the natural language downstream tasks used for training
(CNNs) [23] to capture the local sequence information, treating an LLMs and the recommendation task [2]. To bridge this gap, TALL-
item sequence as images. While these methods effectively capture Rec [2] employs the Parameter Efficient Fine-Tuning (PEFT) method,
user preferences using interaction history, including user and item also known as LoRA [18]. This methodology enables TALLRec to
IDs, they overlook the potential of the modality information of the demonstrate enhanced efficacy, surpassing traditional collaborative
user/item, which could enhance model performance and offer a filtering recommendation models, particularly in mitigating the
deeper analysis of user behaviors. challenges posed by the cold start dilemma and in navigating the
complexities of cross-domain recommendation scenarios. However,
2.2 Modality-aware Recommender Systems it is important to note that since TALLRec simply converts the con-
ventional recommendation task into an instruction text and uses it
Modality-aware recommenders utilize modality information such
for fine-tuning, it still fails to explicitly capture the collaborative
as item titles, descriptions, or images to enhance the recommen-
knowledge that is crucial in warm scenarios.
dation performance mainly under cold scenarios. Initially, CNNs
were used to extract visual features, modeling human visual pref-
3 PROBLEM FORMULATION
erences based on Mahalanobis distance [31]. With advancements
in pre-trained modality encoders like BERT [9, 27, 29, 47, 51] and In this section, we introduce a formal definition of the problem
ResNet/Vision-Transformer [10, 11], modality-aware recommender including the notations and the task description.
systems have accelerated research by utilizing modality knowledge Notations. Let D denote the historical user-item interaction dataset
on recommendation tasks. For example, NOVA [27] and DMRL [28] (U, I, T , S) ∈ D, where U, I, T , and S denote the set of users,
proposed non-invasive fusion and disentangled fusion of modality, items, item titles/descriptions, and item sequences, respectively.
respectively, by carefully integrating pure item embeddings and S𝑢 = (𝑖𝑢1 , 𝑖𝑢2 , · · · , 𝑖𝑘𝑢 , · · · 𝑖𝑢| S𝑢 | ) ∈ S is a sequence of item interac-
text-integrated item embeddings using the attention mechanism. tions of a user 𝑢 ∈ U, where 𝑖𝑘𝑢 denotes the 𝑘-th interaction of user
MoRec [51] leverages modality encoders to project raw modality 𝑢, and this corresponds to the index of the interacted item in the
features, thereby replacing item embeddings used in collaborative item set I. Moreover, each item 𝑖 ∈ I is associated with title and
filtering models. As for the pre-training based models, Liu et al. [29] description text (𝑡 𝑖 , 𝑑 𝑖 ) ∈ T .
constructs user-user and item-item co-interaction graphs to extract Task: Sequential Recommendation. The goal of sequential rec-
collaborative knowledge, then integrates with user/item text infor- ommendation is to predict the next item to be interacted with by a
mation through attention mechanism in an auto-regressive manner, user based on the user’s historical interaction sequence. Given a set
n o
and CTRL [25] pre-trains the collaborative filtering models using
of user historical interaction sequences S = S 1, S 2, · · · , S | U | ,
paired tabular data and textual data through a contrastive learn-
ing objective, subsequently fine-tuning them for recommendation where S𝑢 denotes the sequence of user 𝑢, the subset S1:𝑘 𝑢 ⊆ S𝑢
tasks. Most recently, RECFORMER [24] proposed to model user represents the sequence of user 𝑢 from the first to the 𝑘-th item
denoted as S1:𝑘 𝑢 = (𝑖𝑢 , 𝑖𝑢 , · · · , 𝑖𝑢 ). Given an item embedding matrix
preferences and item features as language representations based on 1 2 𝑘
the Transformer architecture by formulating the sequential recom- E ∈ R |𝐼 | ×𝑑 , the embedding matrix of items in S1:𝑘 𝑢 is denoted by
1397
KDD ’24, August 25–29, 2024, Barcelona, Spain Sein Kim et al.
A-LLMRec ℒ123
User Representation
/.$+(
!! Copy !!
Item
CF-RecSys "
A-LLMRec
ℒ5295-12378
/./$( /0/$(
ℒ4526-12378
ℒ12345678 #- #.
A-LLMRec SBERT
Item Text
Information
" %
/.$+( /0$+(
CF-RecSys [User Representation] is a user representation.
Input
This user has watched [HISTORY (Item Title, Item Emb)] in the past.
Recommend a movie … following set of movie titles, Prompt
Input Prompt Embedding [CANDIDATE (Item Title, Item Emb)]. The recommendation is
Item
SBERT Embeddings
Large Language Model Large Language Model
User-Item Interaction History
[Next Item Title] Trained Candidate Item Text
[Next Item Title]
Item Information
Frozen
(a) Framework Overview (b) Stage 1 (c) Stage 2
Figure 2: (a) is the overview of A-LLMRec. (b) and (c) are the detailed architecture of Stage 1 and Stage 2, respectively.
capture both collaborative and textual knowledge. We employ a pre- where 𝑓𝐼𝑑𝑒𝑐 and 𝑓𝑇𝑑𝑒𝑐 are the decoders added to the encoders 𝑓𝐼𝑒𝑛𝑐
trained Sentence-BERT (SBERT) [35] model, which is fine-tuned and 𝑓𝑇𝑒𝑛𝑐 , respectively. In Section 5.3.1, we empirically demonstrate
during training, to extract text embeddings from textual informa- the benefit of introducing the reconstruction losses.
tion associated with items3 . Then, we introduce two encoders, i.e.,
4.1.2 Recommendation Loss. Besides aligning the collaborative
item encoder 𝑓𝐼𝑒𝑛𝑐 and text encoder 𝑓𝑇𝑒𝑛𝑐 , each containing a 1-layer
knowledge from the user-item interactions with the textual knowl-
Multi-Layer Perceptron (MLP), to align the item embeddings from
edge from the associated text information, we introduce a recom-
a frozen CF-RecSys with the text embeddings from SBERT. Given
′ mendation loss to explicitly incorporate the collaborative knowl-
an item 𝑖, the item encoder 𝑓𝐼𝑒𝑛𝑐 : R𝑑 → R𝑑 encodes an item edge, while informing the model about the recommendation task.
′
embedding E𝑖 ∈ R𝑑 into a latent item embedding e𝑖 ∈ R𝑑 , i.e., Specifically, the recommendation loss is defined as follows [20]:
′
e𝑖 = 𝑓𝐼𝑒𝑛𝑐 (E𝑖 ), while the text encoder 𝑓𝑇𝑒𝑛𝑐 : R768 → R𝑑 encodes a ∑︁
Lrec = − 𝑙𝑜𝑔 (𝜎 (𝑠 (x𝑢|S𝑢 |−1 , 𝑓𝐼𝑑𝑒𝑐 (𝑓𝐼𝑒𝑛𝑐 (E𝑖𝑢 𝑢 ) ) ) ) )
text embedding Q𝑖 ∈ R768 from SBERT, whose output dimension S𝑢 ∈S
|S |
′ (5)
size is 768, into a latent text embedding q𝑖 ∈ R𝑑 , i.e., q𝑖 = 𝑓𝑇𝑒𝑛𝑐 (Q𝑖 ). +𝑙𝑜𝑔 (1 − 𝜎 (𝑠 (x𝑢|S𝑢 |−1 , 𝑓𝐼𝑑𝑒𝑐 (𝑓𝐼𝑒𝑛𝑐 (E𝑖𝑢,− ) ) ) ) )
Then, we perform latent space matching between item embeddings |S𝑢 |
3 Although using a larger language model, such as OPT [53] and LLaMA [42], would
where x𝑢| S𝑢 | −1 = CF-RecSys(S1:|
𝑢
S𝑢 | −1
) ∈ R𝑑 is the user repre-
further enhance the quality of the text embeddings, we adopt SBERT for efficiency. sentation extracted from the collaborative filtering recommender
1398
Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender System KDD ’24, August 25–29, 2024, Barcelona, Spain
system, i.e., CF-RecSys, obtained after the user 𝑢 has interacted [User Representation] is a user representation.
𝑢
with the last item in the sequence S1:| , and E𝑖𝑢,−𝑢 ∈ R𝑑 is the This user has watched [HISTORY (Item Titles, Item Emb)]
S𝑢 | −1 LLM
|S | in the past. Recommend a movie for this user to watch
Input:
embedding of a negative item of 𝑖𝑢| S𝑢 | , i.e., 𝑖𝑢,− , and 𝑠 (a, b) is a dot next from the following set of movie titles,
| S𝑢 | [CANDIDATE (Item Titles, Item Emb)]. The
product between a and b. recommendation is
4.1.3 Final Loss of Stage-1. Finally, the final objective of Stage-1, LLM
[Next Item Title]
Output:
i.e., Lstage-1 , is the sum of the matching loss defined in Equation 2,
reconstruction losses defined in Equation 3 and 4, and recommen- Figure 3: An example prompt of A-LLMRec designed for the
dation loss in Equation 5: Amazon Movies dataset. For other datasets, we keep the same
Lstage-1 = Lmatching + 𝛼 Litem-recon + 𝛽 Ltext-recon + Lrec (6) format but adjust the verbs and nouns to fit the context (e.g.,
‘watched’ → ‘bought’, ‘movie’ → ’item’).
where 𝛼 and 𝛽 are the coefficients that control the importance of
each term. Note that for efficiency in training, we only considered
the last item in S𝑢 for each user 𝑢 to minimize Lstage-1 . However,
considering all items in the sequence further enhances the recom-
token token
mendation performance, which will be shown in Section 5.4.2. where O𝑢 ∈ R𝑑 and O𝑖 ∈ R𝑑 are the projected embeddings
of the representation of user 𝑢 and the joint collaborative-text
4.1.4 Joint Collaborative-Text Embedding. Having trained the au-
embedding of item 𝑖, and they can now be used as inputs to LLM
toencoder based on Equation 6, we consider e𝑖 = 𝑓𝐼𝑒𝑛𝑐 (E𝑖 ) as the
prompts, which allow the LLM to perform recommendation without
joint collaborative-text embedding (shortly joint embedding) of
any fine-tuning.
item 𝑖, which will be passed to the LLM as input. The joint embed-
ding introduces the collaborative and textual knowledge to LLMs,
which will be described in Section 4.2. 4.2.2 Prompt Design for Integrating Collaborative Knowledge. Prompt
It is important to note that when encountering new items that engineering helps in understanding the capabilities and limitations
have not been seen during the training of the collaborative filter- of LLMs, enabling them to perform complex tasks such as question
ing recommender, we can instead rely on the text encoder 𝑓𝑇𝑒𝑛𝑐 to answering and arithmetic reasoning [4, 46]. Recent studies on LLM-
extract the joint collaborative-text embedding, i.e., q𝑖 = 𝑓𝑇𝑒𝑛𝑐 (Q𝑖 ). based recommender systems have shown that carefully crafted
Since the two encoders 𝑓𝐼𝑒𝑛𝑐 and 𝑓𝑇𝑒𝑛𝑐 are jointly trained to match prompts enhance the performance of LLMs [2, 16, 37]. However, as
their latent spaces, we expect the joint embedding q𝑖 to not only existing LLM-based recommender systems focus on cold scenarios
capture the textual knowledge but also to implicitly capture the col- with few user-item interactions, their prompts mainly consider
laborative knowledge. In summary, we use e𝑖 = 𝑓𝐼𝑒𝑛𝑐 (E𝑖 ) as the joint ways to incorporate modality information (e.g., item description
collaborative-text embedding by default, but we use q𝑖 = 𝑓𝑇𝑒𝑛𝑐 (Q𝑖 ) text), while overlooking the collaborative knowledge. To this end,
when item 𝑖 lacks interactions, i.e., cold item, few-shot, and cross- we introduce a novel approach to prompt design for LLM-based rec-
domain scenarios, which will be demonstrated in the experiments ommender system, which combines collaborative knowledge with
in Section 5.2.2, Section 5.2.4, and Section 5.2.5, respectively. recommendation instructions (See Figure 3). This is done by directly
incorporating user representations O𝑢 and joint collaborative-text
4.2 Alignment between Joint Collaborative-Text embeddings O𝑖 into the textual prompts in the token embedding
space. In other words, as O𝑢 and O𝑖 have been projected into the
Embedding and LLM (Stage-2)
LLM token space, they can be considered as ordinary tokens used
Recall that in Stage-1 we obtained the joint collaborative-text em- by the LLM and readily incorporated within a prompt. To facilitate
beddings by aligning the collaborative knowledge with item textual the understanding of the LLM regarding the given user, which is
information. Our goal in Stage-2 is to align these joint embeddings crucial for personalized recommendation, we place the projected
with the token space of the LLM (Section 4.2.1), and design a prompt user representation O𝑢 at the beginning of the prompt to provide
that allows the LLM to solve the recommendation task by leverag- the LLM with the information about users, which is analogous to
ing the learned collaborative knowledge (Section 4.2.2). Figure 2 soft prompts [26]. Moreover, we add the projected joint embedding
shows the overall architecture of Stage-2. Note that the component of an item O𝑖 next to its title. This structured prompt then serves as
trained in Stage-1, which is also utilized in Stage-2, i.e., 𝑓𝐼𝑒𝑛𝑐 , is an input to the LLM, with the expected output being recommenda-
frozen in Stage-2. tions tailored to the user. The learning objective of Stage-2 is given
4.2.1 Projecting collaborative knowledge onto the token space of as follows:
𝑢
LLM. We first project the user representations x𝑢 ∈ R𝑑 and the ∑︁ |𝑦
∑︁|
′ max 𝑙𝑜𝑔 (𝑃𝜃,Θ (𝑦𝑘𝑢 |𝑝𝑢 , 𝑦𝑢<𝑘 ) ) (8)
joint collaborative-text embeddings e𝑖 ∈ R𝑑 obtained from Stage-1 𝜃
token S𝑢 ∈S 𝑘=1
onto the token space of LLM, i.e., R𝑑 . By doing so, we allow
the LLM to take them as inputs. More precisely, we introduce two
token ′ token where 𝜃 denotes the learnable parameters of 𝐹𝑈 and 𝐹𝐼 , Θ is the
2-layer MLPs, i.e., 𝐹𝑈 : R𝑑 → R𝑑 and 𝐹𝐼 : R𝑑 → R𝑑 , to frozen parameters of LLM, 𝑝𝑢 and 𝑦𝑢 are the input prompt and the
project the user representations and the joint collaborative-text next item title of user 𝑢, respectively. 𝑦𝑘𝑢 is the 𝑘-th token of 𝑦𝑢 and
embeddings to the token space of LLM, respectively, as follows: 𝑦𝑢<𝑘 represents the tokens before 𝑦𝑘𝑢 . Note that we only use the last
O𝑢 = 𝐹𝑈 (x𝑢 ), O𝑖 = 𝐹𝐼 (e𝑖 ) (7) item of each user sequence to train Equation 8 for efficiency.
1399
KDD ’24, August 25–29, 2024, Barcelona, Spain Sein Kim et al.
Table 1: Overall model performance (Hit@1) over various datasets. The best performance is denoted in bold.
Collaborative filtering Modality-aware LLM-based
NCF NextItNet GRU4Rec SASRec MoRec CTRL RECFORMER LLM-Only TALLRec MLP-LLM A-LLMRec
Movies and TV 0.4273 0.5855 0.5215 0.6154 0.4130 0.3467 0.4865 0.0121 0.2345 0.5838 0.6237
Video Games 0.3159 0.4305 0.4026 0.5402 0.4894 0.2354 0.4925 0.0168 0.4403 0.4788 0.5282
Beauty 0.2957 0.4231 0.4131 0.5298 0.4997 0.3963 0.4878 0.0120 0.5542 0.5548 0.5809
Toys 0.1849 0.1415 0.1673 0.2359 0.1728 0.1344 0.2871 0.0141 0.0710 0.3225 0.3336
Table 2: Statistics of the dataset after preprocessing. Avg. Len Table 3: Hyperparameter specifications of A-LLMRec
denotes the average sequence length of users.
Learning rate Learning rate embedding dim embedding dim
alpha beta
stage 1 stage 2 (CF-RecSys) 𝑑 (𝑓𝐼𝑒𝑛𝑐 , 𝑓𝑇𝑒𝑛𝑐 ) 𝑑 ′
Datasets #Users #Items #Interactions. Avg. Len Movies and TV 0.0001 0.0001 50 128 0.5 0.5
Video Games 0.0001 0.0001 50 128 0.5 0.5
Movies and TV 297,498 59,944 3,409,147 11.46 Beauty 0.0001 0.0001 50 128 0.5 0.2
Toys 0.0001 0.0001 50 128 0.5 0.2
Video Games 64,073 33,614 598,509 8.88
Beauty 9,930 6,141 63,953 6.44
Toys 30,831 61,081 282,213 9.15
systems (LLM-Only, TALLRec [2] and MLP-LLM). For more detail
regarding the baselines, please refer to Appendix A
5 EXPERIMENTS Evaluation Setting. We divide user sequences into training, val-
5.1 Experimental Setup idation, and test sets. For each user sequence, the most recently
Datasets. For comprehensive evaluations, we used four datasets interacted item, denoted as 𝑖𝑢| S𝑢 | , is used as the test set, while the
from Amazon datasets [13, 32], i.e., Movies and TV, Video Games, second most recent user interaction item, 𝑖𝑢| S𝑢 | −1 , is used as the
Beauty, and Toys, which consist of comprehensive textual informa- validation set. The remaining sequence of items is used as the train-
tion including "title" and "description." Note that we deliberately ing set. To evaluate the performance of sequential recommendation
selected datasets with varying statistics in terms of number of users models, we add 19 randomly selected non-interacted items to the
and items to conduct an extensive analysis of the models. The sta- test set, so that the test set of each user contains 1 positive item
tistics for each dataset after preprocessing are presented in Table 2 and 19 negative items. For quantitative comparison, we employ a
and we describe details regarding data preprocessing as follows: widely used metric, Hit Ratio at 1 (Hit@1) for all experiments.
• Movies and TV To evaluate the models on a large scale, we Implementation Details. Although A-LLMRec is model-agnostic,
select about 300K users and 60K items. Following existing stud- in this work, we adopt OPT-6.7B [53] as the backbone LLM and
ies [20, 51], we removed users and items with fewer than 5 SASRec [20] as the pre-trained CF-RecSys. For fair comparisons,
interactions. we also used OPT-6.7B as the backbone LLM for other LLM-based
• Video Games To evaluate the models on moderate-scale data, models (i.e., LLM-Only, TALLRec [2] and MLP-LLM). Moreover,
which is smaller than the Movies and TV dataset, we select we use SASRec as the CF-RecSys in other modality-aware models
about 64K users and 33K items, removing users and items with (i.e., MoRec [51] and CTRL [25]), and fix the dimension of item
fewer than 5 interactions, as in the Movies and TV dataset. and model embeddings to 50 for all the methods and datasets. For
• Beauty To compose a small and cold dataset, we select about RECFORMER [24], we follow the paper and employ Longformer [3]
9K users and 6K items, removing users and items with fewer as the backbone network. We set the batch size to 128 for all col-
than 4 interactions. To retain some information from user-item laborative filtering-based and modality-aware models. Moreover,
feedback, we categorized user ratings by treating items above the batch size is set to 32 for Stage-1 of A-LLMRec, and 4 for MLP-
3 as positive and all others including non-interacted items as LLM, TALLRec, and Stage-2 of A-LLMRec. We trained Stage-1 of
negative. A-LLMRec for 10 epochs, and Stage-2 of A-LLMRec for 5 epochs,
• Toys For the evaluation of the models where the number of and TALLRec is trained for a maximum of 5 epochs. We use the
items is larger than number of users, unlike other datasets, we Adam optimizer to train the models in all datasets. For hyperpa-
select about 3K users and 6K items, with the number of items rameters, we tune the model in certain ranges as follows: learning
being twice as large as the number of users, and remove users rate 𝜂 1, 𝜂 2 in {0.01, 0.001, 0.0005, 0.0001} for the training stage each,
and items with fewer than 4 interactions. Similar to the Beauty coefficient 𝛼, 𝛽 in {0.1, 0.2, 0.5, 0.75, 1.0} for each, we report the best-
dataset, to preserve some information from user-item feedback, performing hyper-parameters for each dataset in Table 3. We use
we categorize positive and negative items with the criterion of four NVIDIA GeForce A6000 48GB for the Movies and TV dataset
rating 3. to train LLM-based models, and one NVIDIA GeForce A6000 48GB
Baselines. We compare A-LLMRec with the following baselines for other datasets including LLM-based and other models.
that can be categorized into three types: collaborative filtering
recommender systems (NCF [15], NextItNet [50], GRU4Rec [17] and 5.2 Performance Comparison
SASRec [20]), modality-aware recommender systems (MoRec [51], For comprehensive evaluations of A-LLMRec, we perform evalu-
CTRL [25], and RECFORMER [24]), and LLM-based recommender ations under various scenarios, i.e., general scenario (Sec. 5.2.1),
1400
Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender System KDD ’24, August 25–29, 2024, Barcelona, Spain
Table 4: Results (Hit@1) on cold/warm item scenario. A- Table 5: Results (Hit@1) on cold user scenario.
LLMRec (SBERT) is a variant of A-LLMRec that uses q instead
Movies and TV Video Games Beauty
of e for inference.
SASRec 0.2589 0.4048 0.4459
Movies and TV Video Games Beauty MoRec 0.3918 0.3572 0.4815
Cold Warm Cold Warm Cold Warm
CTRL 0.2273 0.1737 0.3902
SASRec 0.2589 0.6787 0.1991 0.5764 0.1190 0.6312 RECFORMER 0.4481 0.3989 0.4644
MoRec 0.2745 0.4395 0.2318 0.4977 0.2145 0.5425 TALLRec 0.2143 0.3895 0.5202
CTRL 0.1517 0.3840 0.2074 0.2513 0.1855 0.4711 MLP-LLM 0.4909 0.3960 0.5276
RECFORMER 0.3796 0.5449 0.3039 0.5377 0.3387 0.5133 A-LLMRec 0.5272 0.4160 0.5337
TALLRec 0.2654 0.2987 0.3950 0.4897 0.5462 0.6124
A-LLMRec 0.5714 0.6880 0.4263 0.5970 0.5605 0.6414 Table 6: Results (Hit@1) on the few-shot training scenario
A-LLMRec (SBERT) 0.5772 0.6802 0.4359 0.5792 0.5591 0.6405 on various datasets (𝐾: num. users in the training set).
𝐾 SASRec MoRec TALLRec A-LLMRec A-LLMRec (SBERT)
cold/warm item scenario (Sec. 5.2.2), cold user scenario (Sec. 5.2.3),
256 0.2111 0.2208 0.1846 0.2880 0.2963
few-shot training scenario (Sec. 5.2.4), cross-domain scenario (Sec. 5.2.5). Movies and TV
128 0.1537 0.1677 0.1654 0.2518 0.2722
256 0.1396 0.1420 0.2321 0.2495 0.2607
5.2.1 Overall Performance. The results of the recommendation Video Games
128 0.1089 0.1157 0.1154 0.1608 0.1839
task on four datasets are given in Table 1. We have the following Beauty
256 0.2243 0.2937 0.3127 0.3467 0.3605
0.3486
observations: 1) A-LLMRec outperforms other LLM-based recom- 128 0.1813 0.2554 0.2762 0.3099
1401
KDD ’24, August 25–29, 2024, Barcelona, Spain Sein Kim et al.
implying again that using the text encoder to extract the joint text- indicates the reduction of collaborative knowledge between items
collaborative knowledge is useful when items lack interactions. 3) and users, which is crucial for recommendation tasks. 4) Lastly,
Under the few-shot scenario, LLM-based models outperform the we kept SBERT frozen while training A-LLMRec. We observe that
CF-Resys, i.e., SASRec, due to the textual understanding of LLM, freezing SBERT leads to poor performance across all datasets. This
which helps extract information from the text of the unseen item, implies that fine-tuning SBERT facilitates the text embeddings to
while CF-RecSys suffers from the lack of collaborative knowledge adapt to the recommendation task.
regarding unseen/new items.
Table 9: Ablation study on Stage-2 of A-LLMRec (Hit@1).
Table 7: Results (Hit@1) on a cross-domain scenario (i.e., Pre- Row Ablation Movies and TV Video Games Beauty Toys
trained: Movies and TV, Evaluation: Video Games). (1) A-LLMRec 0.6237 0.5282 0.5809 0.3336
SASRec MoRec RECFORMER TALLRec A-LLMRec A-LLMRec (SBERT) (2) A-LLMRec w/o user representation 0.5925 0.5121 0.5547 0.3217
Movies and TV (3) A-LLMRec w/o joint embedding 0.1224 0.4773 0.5213 0.2831
0.0506 0.0624 0.0847 0.0785 0.0901 0.1203
→ Video Games (4) A-LLMRec with random joint embedding 0.1200 0.4729 0.5427 0.0776
1402
Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender System KDD ’24, August 25–29, 2024, Barcelona, Spain
1403
KDD ’24, August 25–29, 2024, Barcelona, Spain Sein Kim et al.
REFERENCES [21] Sein Kim, Namkyeong Lee, Donghyun Kim, Minchul Yang, and Chanyoung
[1] Himan Abdollahpouri, Robin Burke, and Bamshad Mobasher. 2017. Controlling Park. 2023. Task Relation-aware Continual User Representation Learning. In
Popularity Bias in Learning-to-Rank Recommendation. In Proceedings of the Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and
Eleventh ACM Conference on Recommender Systems (Como, Italy) (RecSys ’17). Data Mining (KDD ’23). Association for Computing Machinery, New York, NY,
Association for Computing Machinery, New York, NY, USA, 42–46. https://doi. USA, 1107–1119. https://doi.org/10.1145/3580305.3599516
org/10.1145/3109859.3109912 [22] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization tech-
[2] Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. niques for recommender systems. Computer 42, 8 (2009), 30–37.
2023. Tallrec: An effective and efficient tuning framework to align large language [23] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Clas-
model with recommendation. arXiv preprint arXiv:2305.00447 (2023). sification with Deep Convolutional Neural Networks. In Advances in Neural
[3] Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long- Information Processing Systems, F. Pereira, C.J. Burges, L. Bottou, and K.Q. Wein-
document transformer. arXiv preprint arXiv:2004.05150 (2020). berger (Eds.), Vol. 25. Curran Associates, Inc.
[4] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, [24] Jiacheng Li, Ming Wang, Jin Li, Jinmiao Fu, Xin Shen, Jingbo Shang, and Julian
Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda McAuley. 2023. Text Is All You Need: Learning Language Representations for
Askell, et al. 2020. Language models are few-shot learners. Advances in neural Sequential Recommendation (KDD ’23). Association for Computing Machinery,
information processing systems 33 (2020), 1877–1901. New York, NY, USA, 1258–1267. https://doi.org/10.1145/3580305.3599519
[5] Allison JB Chaney, David M Blei, and Tina Eliassi-Rad. 2015. A probabilistic model [25] Xiangyang Li, Bo Chen, Lu Hou, and Ruiming Tang. 2023. CTRL: Connect Tabular
for using social networks in personalized item recommendation. In Proceedings and Language Model for CTR Prediction. arXiv preprint arXiv:2306.02841 (2023).
of the 9th ACM Conference on Recommender Systems. 43–50. [26] Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous
[6] Jiawei Chen, Hande Dong, Yang Qiu, Xiangnan He, Xin Xin, Liang Chen, Guli Lin, Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Associa-
and Keping Yang. 2021. AutoDebias: Learning to Debias for Recommendation tion for Computational Linguistics and the 11th International Joint Conference on
(SIGIR ’21). Association for Computing Machinery, New York, NY, USA, 21–30. Natural Language Processing (Volume 1: Long Papers).
https://doi.org/10.1145/3404835.3462919 [27] Chang Liu, Xiaoguang Li, Guohao Cai, Zhenhua Dong, Hong Zhu, and Lifeng
[7] Chen Cheng, Haiqin Yang, Michael R. Lyu, and Irwin King. 2013. Where you Shang. 2021. Noninvasive self-attention for side information fusion in sequential
like to go next: successive point-of-interest recommendation. In Proceedings of recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence,
the Twenty-Third International Joint Conference on Artificial Intelligence (Beijing, Vol. 35. 4249–4256.
China) (IJCAI ’13). AAAI Press, 2605–2611. [28] Fan Liu, Huilin Chen, Zhiyong Cheng, Anan Liu, Liqiang Nie, and Mohan Kankan-
[8] Robert G. Cooper and Scott J. Edgett. 2012. Best Practices in the Idea-to-Launch halli. 2022. Disentangled multimodal representation learning for recommenda-
Process and Its Governance. Research Technology Management 55, 2 (2012), 43–54. tion. IEEE Transactions on Multimedia (2022).
https://www.jstor.org/stable/26586220 [29] Zhuang Liu, Yunpu Ma, Matthias Schubert, Yuanxin Ouyang, and Zhang Xiong.
[9] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. 2022. Multi-Modal Contrastive Pre-training for Recommendation. In Proceedings
BERT: Pre-training of Deep Bidirectional Transformers for Language Under- of the 2022 International Conference on Multimedia Retrieval (Newark, NJ, USA)
standing, Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association (ICMR ’22). Association for Computing Machinery, New York, NY, USA, 99–108.
for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https: https://doi.org/10.1145/3512527.3531378
//doi.org/10.18653/v1/N19-1423 [30] Chih-Chao Ma. 2008. A guide to singular value decomposition for collaborative
[10] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- filtering. Computer (Long Beach, CA) 2008 (2008), 1–14.
aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg [31] Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel.
Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is 2015. Image-Based Recommendations on Styles and Substitutes (SIGIR ’15).
Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Association for Computing Machinery, New York, NY, USA, 43–52. https://doi.
Conference on Learning Representations. org/10.1145/2766462.2767755
[11] Xiaoyu Du, Zike Wu, Fuli Feng, Xiangnan He, and Jinhui Tang. 2022. In- [32] Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel.
variant Representation Learning for Multimedia Recommendation. In Pro- 2015. Image-based recommendations on styles and substitutes. In Proceedings
ceedings of the 30th ACM International Conference on Multimedia (<conf-loc>, of the 38th international ACM SIGIR conference on research and development in
<city>Lisboa</city>, <country>Portugal</country>, </conf-loc>) (MM ’22). As- information retrieval. 43–52.
sociation for Computing Machinery, New York, NY, USA, 619–628. https: [33] Andriy Mnih and Russ R Salakhutdinov. 2007. Probabilistic matrix factorization.
//doi.org/10.1145/3503161.3548405 Advances in neural information processing systems 20 (2007).
[12] Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei [34] Yunhak Oh, Sukwon Yun, Dongmin Hyun, Sein Kim, and Chanyoung Park. 2023.
Zhang. 2023. Chat-rec: Towards interactive and explainable llms-augmented MUSE: Music Recommender System with Shuffle Play Recommendation Enhance-
recommender system. arXiv preprint arXiv:2303.14524 (2023). ment. In Proceedings of the 32nd ACM International Conference on Information
[13] Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Vi- and Knowledge Management (CIKM ’23). Association for Computing Machinery,
sual Evolution of Fashion Trends with One-Class Collaborative Filtering. In New York, NY, USA, 1928–1938. https://doi.org/10.1145/3583780.3614976
Proceedings of the 25th International Conference on World Wide Web (Mon- [35] Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings
tréal, Québec, Canada) (WWW ’16). International World Wide Web Confer- using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).
ences Steering Committee, Republic and Canton of Geneva, CHE, 507–517. [36] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factor-
https://doi.org/10.1145/2872427.2883037 izing personalized Markov chains for next-basket recommendation. In Proceedings
[14] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng of the 19th International Conference on World Wide Web (Raleigh, North Carolina,
Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for USA) (WWW ’10). Association for Computing Machinery, New York, NY, USA,
recommendation. In Proceedings of the 43rd International ACM SIGIR conference 811–820. https://doi.org/10.1145/1772690.1772773
on research and development in Information Retrieval. 639–648. [37] Scott Sanner, Krisztian Balog, Filip Radlinski, Ben Wedin, and Lucas Dixon.
[15] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng 2023. Large language models are competitive near cold-start recommenders for
Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international language-and item-based preferences. In Proceedings of the 17th ACM conference
conference on world wide web. 173–182. on recommender systems. 890–896.
[16] Zhankui He, Zhouhang Xie, Rahul Jha, Harald Steck, Dawen Liang, Yesu Feng, [38] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based
Bodhisattwa Prasad Majumder, Nathan Kallus, and Julian McAuley. 2023. Large collaborative filtering recommendation algorithms. In Proceedings of the 10th
language models as zero-shot conversational recommenders. In Proceedings of the international conference on World Wide Web. 285–295.
32nd ACM international conference on information and knowledge management. [39] Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015.
720–730. Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th
[17] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. international conference on World Wide Web. 111–112.
2015. Session-based recommendations with recurrent neural networks. arXiv [40] Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang.
preprint arXiv:1511.06939 (2015). 2019. BERT4Rec: Sequential recommendation with bidirectional encoder rep-
[18] Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean resentations from transformer. In Proceedings of the 28th ACM international
Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large conference on information and knowledge management. 1441–1450.
Language Models. In International Conference on Learning Representations. [41] Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommenda-
[19] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for tion via convolutional sequence embedding. In Proceedings of the eleventh ACM
implicit feedback datasets. In 2008 Eighth IEEE international conference on data international conference on web search and data mining. 565–573.
mining. Ieee, 263–272. [42] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne
[20] Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal
mendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv
197–206. preprint arXiv:2302.13971 (2023).
1404
Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender System KDD ’24, August 25–29, 2024, Barcelona, Spain
[43] Maksims Volkovs, Guangwei Yu, and Tomi Poutanen. 2017. DropoutNet: Recommendation. In Proceedings of the 46th International ACM SIGIR Conference
Addressing Cold Start in Recommender Systems. In Advances in Neural In- on Research and Development in Information Retrieval (SIGIR ’23). Association for
formation Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wal- Computing Machinery, New York, NY, USA, 3369–3373. https://doi.org/10.1145/
lach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran 3539618.3591856
Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/ [50] Fajie Yuan, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M Jose, and Xi-
dbd22ba3bd0df8f385bdac3e9f8be207-Paper.pdf angnan He. 2019. A simple convolutional generative network for next item
[44] Lei Wang and Ee-Peng Lim. 2023. Zero-Shot Next-Item Recommendation using recommendation. In Proceedings of the twelfth ACM international conference on
Large Pretrained Language Models. arXiv preprint arXiv:2304.03153 (2023). web search and data mining. 582–590.
[45] Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian [51] Zheng Yuan, Fajie Yuan, Yu Song, Youhua Li, Junchen Fu, Fei Yang, Yunzhu
Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al. Pan, and Yongxin Ni. 2023. Where to Go Next for Recommender Systems? ID-
2022. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682 vs. Modality-based Recommender Models Revisited (SIGIR ’23). Association for
(2022). Computing Machinery, New York, NY, USA, 2639–2649. https://doi.org/10.1145/
[46] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, 3539618.3591932
Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning [52] Sukwon Yun, Kibum Kim, Kanghoon Yoon, and Chanyoung Park. 2022. Lte4g:
in large language models. Advances in Neural Information Processing Systems 35 Long-tail experts for graph neural networks. In Proceedings of the 31st ACM
(2022), 24824–24837. International Conference on Information & Knowledge Management. 2434–2443.
[47] Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng [53] Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui
Chua. 2019. MMGCN: Multi-modal Graph Convolution Network for Personalized Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. 2022. Opt:
Recommendation of Micro-video (MM ’19). Association for Computing Machin- Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068
ery, New York, NY, USA, 1437–1445. https://doi.org/10.1145/3343031.3351034 (2022).
[48] Likang Wu, Zhi Zheng, Zhaopeng Qiu, Hao Wang, Hongchao Gu, Tingjia Shen, [54] Xun Zhou, Jing He, Guangyan Huang, and Yanchun Zhang. 2015. SVD-based
Chuan Qin, Chen Zhu, Hengshu Zhu, Qi Liu, et al. 2023. A Survey on Large incremental approaches for recommender systems. J. Comput. System Sci. 81, 4
Language Models for Recommendation. arXiv preprint arXiv:2305.19860 (2023). (2015), 717–733.
[49] Jieyu Yang, Liang Zhang, Yong He, Ke Ding, Zhaoxin Huan, Xiaolu Zhang, and
Linjian Mo. 2023. DCBT: A Simple But Effective Way for Unified Warm and Cold
1405
KDD ’24, August 25–29, 2024, Barcelona, Spain Sein Kim et al.
Horror, Mystery/Thriller ""'s & B'..., R. and P-A*S F: '10/6" (B+: 6"… M: Sh Y":
“Psycho 3” Themes: Death, Psychoanalysis, Serial Killer Movie We will try as hard as possible to get them added ASAP
"Shoe V: 2"'s & S& C": "R"
[User Representation] is a user representation.
This user has watched [The Bounty VHS (Item Emb), Hopalong This user has watched [The Bounty VHS, Hopalong
This user has watched [The Bounty VHS, Hopalong
Cassidy: The Complete Series (Item Emb), Lash LaRue Cassidy: The Complete Series, Lash LaRue Collector's
Cassidy: The Complete Series, Lash LaRue Collector's
Collector's Set (Item Emb), An American Christmas Carol VHS Set, An American Christmas Carol VHS, …] in the past.
Set, An American Christmas Carol VHS, …] in the past.
(Item Emb), …] in the past. Specify the genres this user would enjoy watching.
Specify the genres this user would enjoy watching.
Specify the genres this user would enjoy watching.
Please specify the genres this user enjoy watching, and ""R. & B's'- P: R, and... F*: '10/26: M: 6+… C: O Y: 10" (A:
Westerns, Action & Adventure also please add any other information you think is
Hopalong Cassidy: The Complete Series and Lash S: Sh: V: H's E:B's & D: Mc:'s
relevant to help us improve our database of movies
LaRue Collector's Set is a Classic Western film
[User Representation] is a user representation. This user has watched [White House Down, Thor: The
This user has watched [White House Down, Thor: The
This user has watched [White House Down (Item Emb), Thor: Dark World, Sleeping Beauty, Ant-Man, …] in the past.
Dark World, Sleeping Beauty, Ant-Man, …] in the past.
The Dark World (Item Emb), Sleeping Beauty (Item Emb), Ant- Specify the genres this user would enjoy watching.
Specify the genres this user would enjoy watching.
Man (Item Emb), …] in the past.
Specify the genres this user would enjoy watching.
's "'... & P. B, '-/10*(A+B and F's (R's Y R's Shoe M's S's:
I'm not sure what you mean by this
Action, Adventure/Fantasy 6:6's"'s"'s V: 10"'s"'s C#s" in:"S"'s"'s"'s"'s"'s"
Recommend “San Andreas Bilingual”
Figure 5: A-LLMRec, LLM-Only, and TALLRec on the favorite genre prediction task (Movies and TV dataset used).
This user has watched [HISTORY (Item Titles)] in the • RECFORMER [24] models user preferences and item features
LLM past. Recommend a movie for this user to watch next
Input: from the following set of movie titles, using the Transformer architecture, transforming sequential
[CANDIDATE (Item Titles)]. The recommendation is
recommendation into a task of predicting the next item as if
LLM
[Next Item Title] predicting the next sentence, by converting item attributes
Output:
into a sentence format.
Figure 6: An example prompt designed for the Amazon (3) LLM-based recommender systems
Movies dataset used by LLM-based models, i.e., TALLRec • LLM-Only utilizes an open-source LLM model OPT [53] with
and LLM-Only models. prompts related to recommendation tasks as shown in Figure 6.
Table 12: Source code links of the baseline methods. In our experiments, we adopt the 6.7B size version of OPT for
all LLM-based recommendations.
Methods Source code • TALLRec [2] is our main baseline, which learns the recom-
SASRec https://github.com/pmixer/SASRec.pytorch
NextItNet https://github.com/syiswell/NextItNet-Pytorch mendation task based on prompts consisting solely of text and
GRU4Rec https://github.com/hungpthanh/GRU4REC-pytorch fine-tunes the LLMs using the LoRA. Their approach involves
RECFORMER https://github.com/AaronHeee/RecFormer
TALLRec https://github.com/SAI990323/TALLRec providing user interaction history and one target item and
A-LLMRec https://github.com/ghdtjr/A-LLMRec determining whether a user will prefer this target item. This
simpler task necessitates only a brief prompt for the LLMs.
A BASELINES In contrast, our recommendation task requires a more exten-
(1) Collaborative filtering recommender systems sive prompt. Even though this adjustment results in a smaller
• NCF [15] combines neural networks (MLP) to capture the col- batch size, the same as A-LLMRec, for training TALLRec. We
laborative information. Note that NCF is a two-tower model use the prompt shown in Figure 6.
comprised of separate components for the user and item em- • MLP-LLM is an additionally designed LLM-based recommen-
bedding matrix. dation model for analysis. Compared with A-LLMRec, this
• NextItNet [50] proposes a temporal convolutional network model directly connects the user and item embeddings from
that utilizes 1D-dilated convolutional layers and residual con- frozen CF-RecSys and LLM using only MLP layers, instead
nections to capture the long-term dependencies inherent in of the auto-encoders in A-LLMRec that involve various tech-
interaction sequence. niques to align the collaborative knowledge of CF-RecSys
• GRU4Rec [17] adopts RNNs to model user behavior sequences with the LLM. Note that we use the prompt shown in Figure 3.
for session-based recommendations. B LANGUAGE GENERATION TASK
• SASRec [20] is our main baseline, a state-of-the-art collabo- In Figure 5, we present additional favorite genre prediction task
rative filtering recommender system (CF-RecSys) that adopts results for experiment in shown in Section 5.4.4. As mentioned in
a self-attention encoding method to model user preferences Section 5.4.4, TALLRec could not generate valid natural language
from user behavior sequences. outputs due to the fine-tuning via instruction tuning process, which
(2) Modality-aware recommender systems makes the LLM of TALLRec being able to answer only with some
• MoRec [51] employs a pre-trained SBERT to utilize the text particular prompts used in instruction tuning process. The addi-
information of items to generate the initial embeddings for tional results indicate that A-LLMRec can generate the favorite
items that will be used in collaborative filtering models. We genres for the users based on the understanding of the aligned user
utilize SASRec as the backbone model of MoRec. representation and item embeddings while LLM-only fails to do so.
• CTRL [25] employs a two-stage learning process: the first
stage involves contrastive learning on textual information C REPRODUCIBILITY
of items to initialize the backbone model, and the second For implementing the baseline, we followed the official codes pub-
stage, fine-tunes the model on recommendation tasks. We use lished by authors as detailed in Table 12. Refer to our source code
SASRec as the backbone model of CTRL. and instructions to run code for reproducing the results reported
in the experiments.
1406