0% found this document useful (0 votes)
32 views

Data-Efficient Fine-Tuning For LLM-based Recommendation

Data-efficient Fine-tuning for LLM-based Recommendation

Uploaded by

weihaopan1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Data-Efficient Fine-Tuning For LLM-based Recommendation

Data-efficient Fine-tuning for LLM-based Recommendation

Uploaded by

weihaopan1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Data-efficient Fine-tuning for LLM-based Recommendation

Xinyu Lin Wenjie Wang∗ Yongqi Li


[email protected] [email protected] [email protected]
National University of Singapore National University of Singapore The Hong Kong Polytechnic
Singapore Singapore University
Hong Kong SAR, China

Shuo Yang Fuli Feng∗ Yinwei Wei


[email protected] [email protected] [email protected]
The University of Hong Kong University of Science and Technology Monash University
Hong Kong SAR, China of China Melbourne, Australia
arXiv:2401.17197v2 [cs.IR] 4 Jun 2024

Hefei, China

Tat-Seng Chua
[email protected]
National University of Singapore
Singapore

ABSTRACT the potential gap between the surrogate model and LLMs, we
Leveraging Large Language Models (LLMs) for recommendation has further propose an effort score to prioritize some hard samples
recently garnered considerable attention, where fine-tuning plays a specifically for LLMs. We instantiate the proposed method on
key role in LLMs’ adaptation. However, the cost of fine-tuning LLMs two competitive LLM-based recommender models, and empirical
on rapidly expanding recommendation data limits their practical results on three real-world datasets validate the effectiveness of our
application. To address this challenge, few-shot fine-tuning offers a proposed method. In particular, our method uses only 2% samples
promising approach to quickly adapt LLMs to new recommendation to surpass the full data fine-tuning, reducing time costs by 97%.
data. We propose the task of data pruning for efficient LLM-
based recommendation, aimed at identifying representative samples CCS CONCEPTS
tailored for LLMs’ few-shot fine-tuning. While coreset selection • Information systems → Recommender systems.
is closely related to the proposed task, existing coreset selection
methods often rely on suboptimal heuristic metrics or entail costly KEYWORDS
optimization on large-scale recommendation data. Data Pruning, LLM-based Recommendation, Efficient Fine-tuning
To tackle these issues, we introduce two primary objectives for
ACM Reference Format:
the data pruning task in the context of LLM-based recommendation: Xinyu Lin, Wenjie Wang∗ , Yongqi Li, Shuo Yang, Fuli Feng, Yinwei
1) high accuracy aims to identify the influential samples that can Wei, and Tat-Seng Chua. 2024. Data-efficient Fine-tuning for LLM-based
lead to high overall performance; and 2) high efficiency underlines Recommendation. In Proceedings of the 47th International ACM SIGIR
the low costs of the data pruning process. To pursue the two Conference on Research and Development in Information Retrieval (SIGIR
objectives, we propose a novel data pruning method incorporating ’24), July 14–18, 2024, Washington, DC, USA. ACM, New York, NY, USA,
two scores, namely influence score and effort score, to efficiently 10 pages. https://doi.org/10.1145/3626772.3657807
identify the influential samples. Particularly, the influence score
is introduced to accurately estimate the influence of removing 1 INTRODUCTION
each sample on the overall performance. To achieve low costs Leveraging Large Language Models (LLMs) for recommenda-
of the data pruning process, we employ a small-sized surrogate tion has demonstrated promising efficacy across various tasks,
model to replace LLMs to obtain the influence score. Considering including Click-Through Rate (CTR) prediction [4], sequential
∗ Corresponding
recommendation [35], and explainable recommendation [11]. To
author. This work is supported by the CCCD Key Lab of Ministry of
Culture and Tourism.
build LLM-based recommender models, it is crucial to fine-tune
LLMs on recommendation data for two primary reasons: 1) there
Permission to make digital or hard copies of all or part of this work for personal or exists a significant gap between previous LLMs’ tuning tasks and
classroom use is granted without fee provided that copies are not made or distributed the recommendation tasks [4], and 2) the rapid and continuous
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the update of recommendation data necessitates frequent fine-tuning
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or of LLMs [38]. For example, there are approximately 160 million
republish, to post on servers or to redistribute to lists, requires prior specific permission new videos and 942 billion interactions emerging on TikTok per
and/or a fee. Request permissions from [email protected].
SIGIR ’24, July 14–18, 2024, Washington, DC, USA day1 . Thus, frequent fine-tuning is imperative to incorporate up-to-
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM. date item information and enhance user behavior comprehension.
ACM ISBN 979-8-4007-0431-4/24/07
https://doi.org/10.1145/3626772.3657807 1 https://www.tiktok.com/transparency/.
Conference’17,
SIGIR July 2017,
’24, July 14–18, 2024, Washington,
Washington,DC,
DC,USA
USA XinyuAnon.
Lin et al.

Tosummary,
• In achieve high
this accuracy,
work o�ers it is essential
three majortocontributions:
measure the influence
(Recall@10)
(Recall@10) Games (Recall@20)
0.0075
Model GPU (GiB) Time
0.023
Recall@10 0.031
• Weof removing
introduceeach training
a data sample
pruning taskontothe empirical
identify the risk. However,
in�uential
BIGRec 18.60 ⇥ 4GPU 36.87h assessingtailored
the influence of all samples is costly, as it requires
�ne-the
Recall@20
samples for e�cient LLM-based recommender
SASRec 1.61 ⇥ 1GPU 0.45h
0.0050
leaving-one-out retraining for each sample [43].
tuning, unlocking the remarkable potential of applying LLM-
0.023 % Red. 97.84% 98.78%
To achieve high efficiency,
models toone possible solution is to train a
0.018
BIGRec • based recommender real-world platforms.
0.0025
SASRec (full data) • We propose a novel data pruning method using
surrogate model for sample selection, e.g., a small-sized
to discover the
0.013 0.015
(b) The comparison
Figure of training costs
2: The comparison of traditionalsamples
in�uential recommender model, which
for LLM-based can drastically
recommendation, reduce
which
between an LLM (BIGRec) and a
the GPU memory usage and the training time compared to LLMs
0.0000 0.1
512 1024 2048 training costs between an
0.3 0.5 1 2
0 128
(a) Effect
256
of !Performance.
w.r.t. Recall
surrogate model (SASRec). The e�ectively and e�ciently assesses the in�uence of removing
(a)(a)Few-shot
Few-shot performance on LLM (BIGRec)
statistics are basedand a surrogate
on NVIDIA RTX (see Figure 1(b)). However,
a sample on empirical risk. there exists a gap between LLMs and
MicroLens-50K. A5000 on Games. 3 surrogate models, attributable to their divergent capabilities in
model (SASRec) . • We conduct extensive experiments on three real-world datasets,
Figure 1:Figure 1: imagethat BIGRec achieves remarkable
(a) reveals learning user behaviors (refer to Figure 3). As such, influential
performance with only hundreds of samples. (b) shows the demonstrating the e�ectiveness of DEALRec in achieving both
samples selected by surrogate models might deviate from the
low costs of surrogate models. high e�ciency and accuracy.
To overcome the above issues, we summarize two principal objec- ones on LLMs, potentially hurting the adaptation of LLMs.
tives for data
However, pruning LLMs
fine-tuning in the context of LLM-based
on large-scale recommendation:
recommendation data 2 To address
TASK the challenges, we propose a novel Data pruning
FORMULATION
1) high accuracy, which focuses on selecting
demands substantial computational resources and time costs [26], the samples that can method, to Efficiently identify the influentiAl samples for LLM-
In this section, we �rst introduce LLM-based recommender models
lead to low empirical risk; and 2) high
thereby diminishing the practicality of LLM-based recommender e�ciency, which emphasizes based Recommender fine-tuning (shorted as DEALRec). DEALRec
and uncover the challenge of real-world applicability. Thereafter, we
the low
models costs of the
in real-world data pruning
applications. process,
As such, it is i.e., eliminating
essential to enhance the leverages two scores, namely influence score and effort score, to
formulate the task of data pruning for LLM-based recommendation
thedependency
fine-tuningofefficiency well-trained LLMs on the
of LLM-based full data. Nevertheless,
recommender models. identify the influential samples. The influence score is formulated
and compare the related work on coreset selection.
pursuing the two objectives faces two
Fortunately, the rich world knowledge encoded in LLMs offers challenges: to estimate the influence of removing each training sample on
• To achieve high accuracy, it is essential
a promising solution for efficient fine-tuning: few-shot fine-tuning. to measure the in�uence •theLLM-based
empirical risk. recommender
It is calculated models. by To leverage the
extending the compe-influence
of removing each training sample on the empirical risk. However, tent capabilities of LLMs, LLM-based
function [15] via chain rules and second-order optimizationrecommendation typically
Previous studies have uncovered that LLMs have the potential to
assessing therecommendation
in�uence of all samples utilize powerful LLMs directly as the recommender models.score Since for
quickly adapt to tasks byisfine-tuning
costly, as iton requires
randomly the techniques [24]. To efficiently calculate the influence
leaving-one-out retraining for each sample [39]. LLMs
all are notDEALRec
samples, particularly trainedaon
employs the recommendation
simple yet effective data,
symmetric
sampled few-shot data [3, 4, 27] (Figure 1(a)), significantly reducing
• To achieve high e�ciency, one possible
training time and computational costs. Despite its efficiency, solution is to train a �ne-tuning
property to is the necessary
accelerate the and
calculation,key step for
requiring LLMs
only tothe learn the
estimation
surrogate model for sample selection,
randomly sampled data may lack sufficient representativeness e.g., using a small-sized item knowledge and understand user behavior.
once for all samples (cf. Section 3.1). Thereafter, DEALRec uses a Let U and I denote
traditional
to enable LLMsrecommender
to effectivelymodel, which can
comprehend new drastically
items and reduce
user the sets of users
traditional and items, respectively.
recommender model as a We presentmodel
surrogate each training to obtain
the GPU memory usage and the training
behaviors. To combat this issue, we introduce the task of data time compared to LLMs sample,
the influence
i.e., user sequence,
score and as
introduces
B = (G, the
~), where
effort Gscore
= [8 1 to
, 8 2 mitigate
, . . . , 8 |G | ]the
pruning(see Table ??). However,
for efficient LLM-based thererecommendation,
exists a gap between which LLMsaimsand to isgap
thebetween
user’s historical
the surrogateinteractions
modelinand chronological
LLMs. Theorder, effortand score ~ is
surrogate
identify models, attributable
representative samples tailored to theirfor divergent capabilities
LLMs’ few-shot in
fine- isobtained
the next by interacted
calculating item theof the user4 , where
gradient norm {8 of1,a. .sample
. , 8 |G | , ~}loss ⇢ I. w.r.t.
tuning.learning user behaviors (refer to Figure 4). As such, in�uential Formally, given the
the parameters of user
LLMs, sequences
intuitively of the training set
measuring theD effort
= {BDof|DLLMs 2
samples selected by surrogate models
A closely related literature to this data pruning task is coreset might deviate from the to
U}, the
fit a target
specific is to �ne-tune
sample. By an LLM for
regularizing recommendation
the influence tasks.
score The with
ones [13].
selection on LLMs,
It tries potentially
to select ahurting
small but therepresentative
adaptation of subset LLMs. from learnable
the effortparameters
score, DEALRec (q 2 ) of an LLM isthe
identifies optimized
influential by minimizing
samples that
the fullTo data,
address aiming to achieve comparable
the challenges, we propose performance.
a novel Data pruning Existing the negative log-likelihood
encompass of the next interacted
both the representativeness itemfull
of the ~ conditioned
data and the
coreset
method, selection
to E�cientlymethods generally
identify fall into two
the in�uentiAl categories
samples for LLM- 2 : 1) significance
on input G: to LLMs. We instantiate DEALRec on two LLM-based
Heuristic methods select
based Recommender hard or(shorted
�ne-tuning diverse as samples
DEALRec). based on pre-
DEALRec recommender models and conduct extensive experiments on three
real-world datasets, validating
’|~ |
the superiority of DEALRec in terms
defined metrics [30, 34, 49]. Such heuristic
leverages two scores, namely in�uence score and e�ort score, to methods do not estimate min{Lq !!"
= log %q (~C |~ <C , G)}, (1)
theidentify
impactthe of in�uential
selected samples samples. onThe empirical
influence risk, possibly
score is leading
formulated of both efficiency q 2 and accuracy. The code and datasets are available
BIGRec achieves remarkable performance with only hundreds of samples C=1
totosuboptimal
estimate the coreset
in�uence selection. 2) Optimization-based methods at https://github.com/Linxyhaha/DEALRec.
onofGames
removing each training sample on where ~C denotesthis the C-th
mainly optimize risk.
the empirical the selection of subsets
It is calculated bytoextending
minimizethe the in�uence
empirical In summary, worktokenoffers of three
~, andmajor
~ <C represents
contributions: the token
sequence preceding ~C .
risk [5, 50]. [16]
function However,via chainthese methods
rules andaresecond-order
inapplicable to large-scale
optimization We introduce
• While a data pruning task to identify the influential
�ne-tuning LLMs has demonstrated e�ectiveness in
recommendation
techniques [25]. To e�ciently calculate the in�uence bi-level
datasets due to the complex and costly or samples tailored
score for recommendation tasksfor [30],efficient LLM-based
its practical application recommender
is hindered by fine-
discrete
all samples,optimization
DEALRec problem
employs [17]. Worse yet
a simple still,e�ective
both heuristic
symmetric and tuning, unlocking the remarkable potential of applying LLM-
the high resource costs required by LLMs and the continuous in�ux
optimization-based methods rely on the model well-trained by the based recommender data models
property to accelerate the calculation, requiring only the estimation of new recommendation [35].toHence,
real-world platforms.
it is essential to enhance
full data
once fortoallselect
samples the (cf.
coreset,
Section e.g., calculating
3.1). Thereafter, pre-defined
DEALRec uses scores a • We propose a novel data pruning method to discover the
the e�ciency of LLM-based recommender �ne-tuning.
ortraditional
optimizingrecommender
the data subset basedason
model the well-trained
a surrogate model modelto obtain (cf. influential samples for LLM-based recommendation, which
Section 2). As such, Data pruning and for e�cient LLM-based recommendation.
scoreitand is infeasible
introducestothe directly
e�ort apply these methods • effectively efficiently assesses the influence of removing
the in�uence score to mitigate the
forgap
LLM-based To achieve e�cient LLM-based recommendation, a promising
between recommendation
the surrogate model because
and LLMs.of the high training
The e�ort scorecosts is a sample on empirical risk.
ofobtained
LLMs onby thecalculating
large-scalethe fullgradient
recommendation approach is to extensive
reduce the costs by few-shoton three �ne-tuning with
norm of a data. sample loss w.r.t. • We conduct experiments real-world datasets,
To overcome the above issues, we summarize two principal objec- randomly selected samples
demonstrating the [4]. Nevertheless,
effectiveness of DEALRecthe random
in achieving samples both
the parameters of LLMs, intuitively measuring the e�ort of LLMs
tives for data pruning in the context of LLM-based recommendation: might
high lose some crucial
efficiency and information for LLMs to acquire the latest
accuracy.
to �t a speci�c sample. By regularizing the in�uence score with
1) the
highe�ortaccuracy, information on user behavior or items, e.g., trending items. In this
score,which
DEALRec focuses on selecting
identi�es the samples
the in�uential samples thatthatcan
lead to low empirical risk; and 2) high efficiency, which emphasizes light,
2 TASK we introduce the task of data pruning for e�cient LLM-based
FORMULATION
encompass both the representativeness of the full data and the
the low costs of the data pruning process, i.e., eliminating the recommendation, which aims to identify a set of representative
signi�cance to LLMs. We instantiate DEALRec on two LLM-based In this section, we first introduce LLM-based recommender models
dependency of well-trained LLMs on the fullexperiments
data. Nevertheless, samples particularly for LLMs’ few-shot �ne-tuning. Formally,
recommender models and conduct extensive on three and uncover the challenge of real-world applicability. Thereafter, we
pursuing thedatasets,
two objectives faces given all training samples D = {BD |D 2 U}, the target of data
real-world validating thetwo challenges:
superiority of DEALRec in terms formulate the task of data pruning for LLM-based recommendation
of both
2 More detailede�ciency
related workandisaccuracy.
discussed andThecompared
code and in datasets are5.available
Section 4 and 4and compare
Our main focus liesthein related
sequentialwork on coresetwhich
recommendation, selection.
holds notable practical
at https://anonymous.4open.science/r/DEALRec/. signi�cance by intricately considering the temporal aspect in real-world scenarios.
Data-efficient Fine-tuning for LLM-based Recommendation SIGIR ’24, July 14–18, 2024, Washington, DC, USA

• LLM-based recommender models. To leverage the compe-


most
Effort Score Calculation influential
tent capabilities of LLMs, LLM-based recommendation typically Optimize
LLMs least
utilize powerful LLMs directly as the recommender models. Since ×#
influential
LLMs are not particularly trained on the recommendation data, Influence

fine-tuning is the necessary and key step for LLMs to learn the
Surrogate
Model
Score
Calculation ×$
+
item knowledge and understand user behavior. Let U and I denote Samples
the sets of users and items, respectively. We present each training training
after training
sample, i.e., user sequence, as 𝑠 = (𝑥, 𝑦), where 𝑥 = [𝑖 1, 𝑖 2, . . . , 𝑖 |𝑥 | ]
is the user’s historical interactions in chronological order, and 𝑦 Figure 2: Overview of DEALRec. DEALRec first trains a
is the next interacted item of the user3 , where {𝑖 1, . . . , 𝑖 |𝑥 | , 𝑦} ⊂ I. surrogate model on the full training samples. Subsequently,
Formally, given the user sequences of the training set D = {𝑠𝑢 |𝑢 ∈ it calculates the influence score, which is then regularized
U}, the target is to fine-tune an LLM for recommendation tasks. The by the effort score, to identify influential samples.
learnable parameters (𝜙 ∈ Φ) of an LLM is optimized by minimizing
the negative log-likelihood of the next interacted item 𝑦 conditioned
on input 𝑥: 2) Optimization-based methods [5, 22, 23, 48] mainly utilize bi-
∑︁
|𝑦| level optimization techniques to learn the best subset chosen
min { L𝜙𝐿𝐿𝑀 = − log 𝑃𝜙 (𝑦𝑡 |𝑦 <𝑡 , 𝑥 ) }, (1) for training:
𝜙 ∈Φ
𝑡 =1
ˆ D ),
S ∗ = arg min L (𝜃, s.t. 𝜃ˆ = arg min L (𝜃, S). (3)
where 𝑦𝑡 denotes the 𝑡-th token of 𝑦, and 𝑦 <𝑡 represents the token S ⊂D 𝜃 ∈Θ
sequence preceding 𝑦𝑡 .
While fine-tuning LLMs has demonstrated effectiveness in Besides, there is also some work that employs discrete optimiza-
recommendation tasks [29], its practical application is hindered by tion problems based on the empirical minimizer 𝜃ˆ in Eq. (2).
the high resource costs required by LLMs and the continuous influx Nevertheless, they struggle to be applied to large-scale datasets
of new recommendation data [38]. Hence, it is essential to enhance e.g., recommendation data, due to the complex solving of the
the efficiency of LLM-based recommender fine-tuning. optimization problem [17].
• Data pruning for efficient LLM-based recommendation. Furthermore, as shown in Eq. (2-3), previous coreset selection
To achieve efficient LLM-based recommendation, a promising methods usually require the model to be trained over original
approach is to reduce the costs by few-shot fine-tuning with training samples D, which however is infeasible for LLM-based
randomly selected samples [4]. Nevertheless, the random samples recommender models due to the continuous influx of data and the
might lose some crucial information for LLMs to acquire the latest high resource costs of LLMs (cf. Section 1).
information on user behavior or items, e.g., trending items. In this • Drawing upon the above insights, we consider two objectives
light, we introduce the task of data pruning for efficient LLM-based for data pruning: 1) high accuracy emphasizes the low empirical
recommendation, which aims to identify a set of representative risk of the model trained on the selected samples, and 2) high
samples particularly for LLMs’ few-shot fine-tuning. Formally, efficiency focuses on the low costs of the data pruning process,
given all training samples D = {𝑠𝑢 |𝑢 ∈ U}, the target of data breaking free from the heavy fine-tuning of LLMs for data pruning.
pruning is to select a subset S ⊂ D, such that the LLMs trained on
the subset S can yield good performance on the testing set. The 3 DEALREC
size of S is controlled by the given selection ratio 𝑟 , i.e., |S| = 𝑟 |D|. To pursue efficient LLM-based recommendation, we propose a
• Retrospect of coreset selection. As the closely related work novel data pruning method DEALRec, which involves two key
to this data pruning task, coreset selection methods generally fall components, i.e., the influence score to estimate the influence on
into two groups: empirical risk, and the effort score as a regularization to mitigate
1) Heuristic methods [7, 10, 44] typically design some heuristic the gap between surrogate model and LLMs. The overview of our
strategies to select samples based on an empirical minimizer: method is presented in Figure 2.
ˆ D ),
S = 𝐻 (𝜃, s.t. 𝜃ˆ = arg min L (𝜃, D ), (2)
𝜃 ∈Θ 3.1 Influence Score
where L (·) is the loss function of the task, e.g., image clas- To achieve good overall performance with the model trained on the
sification [16] or CTR prediction [14], and 𝐻 (·) denotes the pruned dataset S, the key lies in the ability to assess the influence
heuristic strategy such as selecting samples with larger prediction on the empirical risk, i.e., overall performance, caused by removing
entropy [7], or clustering the samples based on the sample a sample in training. However, simply assessing the the influence by
representations [6]. However, this group of methods designs removing each sample is impractical, because it requires brute force
the strategy 𝐻 (·) intuitively and fails to explicitly consider the leaving-one-out-retraining for 𝑛 = |D| times. To overcome this
influence of a sample on the empirical risk. This might lead to challenge, we propose an efficient approximation of the influence
suboptimal selection, thereby declining the performance of the for all samples by extending influence on parameter change (i.e., a
model trained by the selected subset. classic result from influence function [24]) via chain rule and second-
3 Our main focus lies in sequential recommendation, which holds notable practical order optimization techniques. We further utilize the symmetric
significance by intricately considering the temporal aspect in real-world scenarios. property to speed up the calculation of the influence score.
SIGIR ’24, July 14–18, 2024, Washington, DC, USA Xinyu Lin et al.

• Influence on parameter change. To estimate the influence Algorithm 1 Procedure of HVP Estimation
on empirical risk for each sample, we first start with the classic Input: Original training dataset D, parameters of a well-trained model 𝜃ˆ,
result [28] from research on influence function [8], which gives us iteration number 𝑇 .
the estimation of the parameter change caused by upweighting a Í
1: Compute 𝑖 𝑛1 ∇𝜃 L (𝑠𝑖 , 𝜃ˆ ) for ∀𝑖 ∈ {1, . . . , 𝑛}.
hÍ i Í
sample 𝑠 for training. Considering a training sample 𝑠 is upweighted 2: Initialize 𝐻˜ −1 1
∇𝜃 L (𝑠𝑖 , 𝜃ˆ ) = 1
∇𝜃 L (𝑠𝑖 , 𝜃ˆ ).
0
by a small 𝜖, the empirical minimizer can be rewritten as:
𝑖 𝑛 𝑖 𝑛
3: for all 𝑡 ∈ {1, . . . ,𝑇 } do
1 ∑︁ Randomly sample a training sample 𝑠𝑡 ∈ D;
𝜃ˆ𝜖,𝑠 = arg min L (𝑠𝑖 , 𝜃 ) + 𝜖 L (𝑠, 𝜃 ). (4) 4:
𝑛 5: Calculate ∇𝜃2 L (𝑠𝑡 ) as the unbiased estimator of 𝐻 ;
hÍ i
𝜃 ∈Θ
Í
𝑠𝑖 ∈D

According to [28], the influence of upweighting a sample 𝑠 on the 6: 𝐻˜ 𝑡−1 𝑖 L (𝑠𝑖 , 𝜃ˆ ) ← 𝑖 𝑛1 ∇𝜃 L (𝑠𝑖 , 𝜃ˆ )+
1
𝑛 ∇𝜃
  hÍ i
parameter change is then given as: 𝐼 − ∇𝜃2 L (𝑠𝑡 ) 𝐻˜ 𝑡−1 1 ˆ
𝑖 𝑛 ∇𝜃 L (𝑠𝑖 , 𝜃 ) ; ⊲ Eq. (10)
hÍ i h −1
i
d𝜃ˆ𝜖,𝑠 ˆ 7: 𝐻˜ −1 1 ˆ
𝑖 𝑛 ∇𝜃 L (𝑠𝑖 , 𝜃 ) ← 𝐻𝑇
˜ −1 Í𝑖 1 ∇𝜃 L (𝑠𝑖 , 𝜃ˆ ) .
Iparam (𝑠 ) = = −𝐻 −1
ˆ ∇𝜃 L (𝑠, 𝜃 ), (5) hÍ i
d𝜖
𝑛

Output: Unbiased estimation 𝐻˜ −1 𝑖 𝑛1 ∇𝜃 L (𝑠𝑖 , 𝜃ˆ ) .


𝜃
Í
𝜖=0

where 𝐻𝜃ˆ = 𝑛1 𝑠𝑖 ∈ D ∇𝜃2 L (𝑠𝑖 , 𝜃ˆ) is the Hessian and positive


definite by assumption, Iparam (𝑠) ∈ R𝑚 , and 𝑚 is the number of
parameters. Notably, assigning − 𝑛1 to 𝜖 is equivalent to removing
the sample 𝑠 from training. As such, the parameter change of 𝑛 training samples and 𝜃 ∈ R𝑚 . This results in cumbersome
removing a training sample 𝑠 can be linearly approximated as: calculation of influence scores for all training samples.
1 1 • Efficient estimation of influence score. To achieve efficient
𝜃ˆ−𝑠 − 𝜃ˆ ≈ − Iparam (𝑠 ) = 𝐻 −1 ∇𝜃 L (𝑠, 𝜃ˆ ), (6) computation of influence score, we utilize stochastic-based Hessian-
𝑛 𝑛 𝜃ˆ
Í Vector Products (HVP) [1] to efficiently approximate 𝐻 −1 ∇𝜃 L (𝑠, 𝜃ˆ).
where 𝜃ˆ−𝑠 = arg min𝜃 ∈Θ 𝑠𝑖 ∈ D,𝑠𝑖 ≠𝑠 L (𝑠𝑖 , 𝜃 ). 𝜃ˆ
Based on Eq. (6), an intuitive approach to assess the sample The idea of stochastic-based HVP estimation is to iteratively obtain
influence for model training is to utilize the L2 norm of a sample’s in- an unbiased estimator of 𝐻𝜃ˆ and approach the unbiased estimation
fluence on parameter change or an additional discrete optimization of HVP, i.e., 𝐻 −1 ∇𝜃 L (𝑠, 𝜃ˆ). Specifically, we omit the 𝜃ˆ subscript for
𝜃ˆ
problem as proposed in [50]. Nevertheless, large parameter changes clarity and write the first 𝑗 terms in Taylor expansion of 𝐻 −1 as
do not necessarily lead to performance improvements. Besides, def Í 𝑗
calculating Eq. (6) for all training samples can be computationally 𝐻 𝑗−1 = 𝑖=0 (𝐼 −𝐻 )𝑖 , which can be further rewritten recursively as
costly [17] and is infeasible for recommendation data. To alleviate −1 . From the validity of the Taylor expansion,
𝐻 𝑗−1 = 𝐼 + (𝐼 − 𝐻 )𝐻 𝑗−1
the issues, we propose an efficient approximation for the influence we have 𝐻 → 𝐻 as 𝑗 → ∞. Thereafter, denoting ∇𝜃 L (𝑠, 𝜃ˆ)
−1
𝑗
−1
of removing a sample on the empirical risk.
as 𝑣, the update iteration for the estimated 𝐻 −1 ∇𝜃 L (𝑠, 𝜃ˆ) at step 𝑡
• Influence on empirical risk. Based on the parameter change 𝜃ˆ
can be written as:
obtained via the influence function, we can then estimate the  
influence of upweighting a training sample 𝑠 by a small 𝜖 on the 𝐻˜ 𝑡−1 𝑣 = 𝑣 + 𝐼 − ∇𝜃2 L (𝑠𝑡 ) 𝐻˜ 𝑡−1
−1 𝑣, (10)
loss of an arbitrary sample 𝑠 ′ :
where 𝑠𝑡 is a training sample randomly drawn from D, and ∇𝜃2 L (𝑠𝑡 )
dL (𝑠 ′ , 𝜃ˆ𝜖,𝑠 )
def

Iupweight,loss (𝑠, 𝑠 ) = is an unbiased estimator of the 𝐻 at step 𝑡 for fast-to-compute
d𝜖 𝜖=0 HVP [24]. Despite that stochastic-based HVP can alleviate the
ˆ
T d𝜃 𝜖,𝑠 (7) computation burdens of the estimation, calculating the influence
= ∇𝜃 L (𝑠 , 𝜃ˆ )

(chain rule)
d𝜖 𝜖=0 score for each sample is still costly due to the independent 𝑛
= −∇𝜃 L (𝑠 ′ , 𝜃ˆ ) T 𝐻 −1 ˆ
ˆ ∇𝜃 L (𝑠, 𝜃 ). estimations of 𝐻 −1 ∇𝜃 L (𝑠, 𝜃ˆ) for each 𝑠 ∈ D (refer to Eq. (9)).
𝜃 ˆ 𝜃
Similarly, the influence of removing a training sample 𝑠 on the loss To further enhance the efficiency of acquiring influence scores
of an arbitrary sample 𝑠 ′ can be linearly approximated as: for all samples, we use symmetric property to rewrite Eq. (9) into:
1 ∑︁ 
Iremove, loss (𝑠, 𝑠 ′ ) = ∇𝜃 L (𝑠 ′ , 𝜃ˆ ) T 𝐻 −1 ∇𝜃 L (𝑠, 𝜃ˆ ). (8) 1 1
𝑛 𝜃ˆ Iremove,loss (𝑠, D ) = ∇𝜃 L (𝑠, 𝜃ˆ ) T 𝐻 −1 ∇ L (𝑠 , 𝜃ˆ) .
𝜃ˆ 𝜃 𝑖
𝑛 𝑖 𝑛
We can then obtain the influence of removing a sample 𝑠 on the | {z } (11)
empirical risk (i.e., influence score) by (constant vector)
 Í 
1 ′ ˆ The reformulation is based on the assumption that L (·) has
1 d 𝑛 𝑖 L (𝑠 , 𝜃 𝜖,𝑠𝑖 )
Iremove,loss (𝑠, D )= −
𝑛 d𝜖 continuous second-order derivatives, which is consistent with the
∑︁
𝜖=0
(9)
assumption for influence function
hÍ [24], leadingi to the fact that
1 1
= ∇𝜃 L (𝑠𝑖 , 𝜃ˆ ) T 𝐻 −1 ∇𝜃 L (𝑠, 𝜃ˆ ) . 𝐻 is symmetric. Since 𝐻
−1 −1 1 ˆ
∇𝜃 L (𝑠𝑖 , 𝜃 ) ∈ R𝑚 is a constant
𝜃ˆ
𝜃ˆ 𝜃ˆ
𝑖 𝑛
| {z }
𝑛 𝑖 𝑛
vector for any sample 𝑠 ∈ D, we can efficiently obtain influence
(influence score)
scores
hÍfor all samplesiby only applying HVP estimation once for
However, it is non-trivial to directly obtain 𝐻 −1 as forming and 𝐻 −1 1 ∇ L (𝑠 , 𝜃ˆ) . The detailed HVP estimation process is
Í 𝜃ˆ 𝜃ˆ
𝜃
𝑖 𝑛 𝑖
inverting 𝐻 ˆ =
𝜃
1
𝑛 ∇2 L (𝑠𝑖 , 𝜃ˆ) requires O (𝑛𝑚 2 + 𝑚 3 ) with
𝑠𝑖 ∈ D 𝜃
illustrated in Algorithm 1.
Data-efficient Fine-tuning for LLM-based Recommendation SIGIR ’24, July 14–18, 2024, Washington, DC, USA

Algorithm 2 Procedure of DEALRec

Number of Samples (Users)


Small Effort
4k
Large Effort

Surrogate Model Input: Original training dataset D, randomly initialized parameters of


3k score≈3.2
surrogate model 𝜃 , pre-trained parameters of LLM 𝜙.
Í
1: 𝜃ˆ = arg min𝜃 ∈Θ 𝑛1 𝑠𝑖 ∈D L (𝑠𝑖 , 𝜃 ).
LLMs with
hÍ i
Gap
Knowledge 2k
adapt adapt
… …… …
2: Obtain estimated 𝐻 −1 1
∇𝜃 L (𝑠𝑖 , 𝜃ˆ ) via HVP estimation.
𝑖 𝑛
1k
Rec
3: for all 𝑖 ∈ {1, . . . , 𝑛} do hÍ i
Data 0 4: 𝐼𝑠𝑖 = 1
𝑛2
∇𝜃 L (𝑠𝑖 , 𝜃ˆ ) T 𝐻 −1
ˆ
1
𝑗 𝑛 ∇𝜃 L (𝑠 𝑗 , 𝜃ˆ ) +
0 1 2 3 4 𝜃
(a)
Effort Score 𝜆 ∥ ∇𝜙 L 𝐿𝐿𝑀 (𝑠𝑖 ) ∥ 2 ; ⊲ Eq. (13)
(b)
Figure 3: (a) depicts the different learning ability due to the 5: G = {𝐺 1 , . . . , 𝐺 𝐾 } ← Split training samples D into 𝐾 groups
according to the final score 𝐼𝑠 with even range width.
prior knowledge in LLMs. (b) presents the distributions of 𝑟 |D |
6: S ← ∅, 𝐵 ← ⌊ 𝐾 ⌋.
effort scores of LLM and surrogate model on Games dataset4 .
7: while G ≠ ∅ do
8: 𝑘 ∗ = arg min𝑘 |𝐺𝑘 |;
(a) Depicts the some users are easier for LLMs to learn. 9: S𝑘 ∗ ← randomly select min{𝐵, |𝐺𝑘 ∗ | } samples from 𝐺𝑘 ∗ ;
3.2 Gap
(b) Regularization
Presents the distributions of discrepancy scores on LLM and
10: S ← S ∪ S𝑘 ∗ ; G ← G \ {𝐺𝑘 ∗ };
surrogate model, showing the different learning ability of the two
As shown
models.
in Eq. (11), assessing the influence score of a
sample 11: 𝐵 ← ⌊ 𝑟 |D|G|
|−|S|
⌋; ⊲ Update sampling budget
requires the optimized parameters 𝜃ˆ well-trained over all training
Output: Selected samples S for few-shot fine-tuning.
samples D. Nevertheless, this poses challenges for LLM-based
recommender models due to the continuous influx of large-scale
new data in real-world scenarios. In this light, we propose to utilize
a surrogate model to replace the LLMs and introduce an effort where 𝜆 is a hyper-parameter to balance the strength of the gap
score as a gap regularization to complement the learning ability regularization. Notably, the gap regularization would suppress the
gap between LLMs and the surrogate models. easy samples with smaller effort scores while emphasizing the
• Surrogate model. To reduce the costs, we propose utilizing a samples that are more difficult to learn, i.e., larger effort scores.
surrogate model, e.g., a small-sized traditional recommender model, Intuitively, DEALRec identifies the influential samples with two
to compute the influence scores. Nevertheless, since LLMs acquire key considerations: 1) the influence score focuses on selecting the
rich world knowledge during the pre-training stage, they intricately representative samples from the full dataset, capturing collaborative
possess different learning abilities compared to the surrogate model filtering information for low empirical risk; and 2) the effort score
(Figure 3(a)). Therefore, the influential samples on LLMs might highlights the non-trivial samples that are significant to the learning
deviate from the ones for LLMs. of LLMs. The effectiveness of the two scores is empirically validated
in Section 4.3.1.
• Effort score. To compensate for the gap, we introduce the effort
score, which aims to capture significant samples particularly for
3.3 Few-shot Fine-tuning
LLMs. Specifically, we define the effort score of a sample, i.e., a user
sequence, 𝑠 as: Based on the final influential score obtained via Eq. (13), we can
select a subset of data S for LLMs’ few-shot fine-tuning, given an
𝛿𝑠 = ∥∇𝜙 L 𝐿𝐿𝑀
(𝑠)∥ 2, (12)
expected selection ratio 𝑟 .
where 𝜙 is the learnable parameters of Intuitively, it LLMs5 . • Few-shot data coverage. A straightforward approach is to
measures the learning effort of LLMs to fit a specific user sequence, select the data greedily, i.e., rank the samples based on the overall
and a larger score indicates a harder sample for LLMs to learn. To scores, and then select the top-𝑟 percentage of the training data.
elaborate, Eq. (12) measures the change in the model parameters, However, greedily selecting the samples with higher scores might
which can be interpreted as the discrepancy from the current result in very similar samples with low data coverage, which leads
knowledge encoded in LLMs’ parameters to the latest item to: 1) Inadequacy of samples from other areas, thus hurting the
knowledge or user behavior. As such, the effort score can emphasize bounded empirical risk [57] and lowering the overall performance
significant samples particularly for LLMs, supplementing the (cf. Section 4.2). 2) Poor utilization of training samples because
different learning ability of the surrogate model (Figure 3(b)). of the redundant samples with similar patterns, thereby causing
• Overall score. By injecting the signals of LLMs’ learning ability suboptimal selection for few-shot fine-tuning.
into the calculation of influence score, we can obtain the final score • Coverage-enhanced sample selection. To address the above
of each user sequence for LLM-based recommender fine-tuning: issues, we follow [57] to select the users based on the idea of
∑︁ 
1 ˆ ) T 𝐻 −1 1 ˆ ) + 𝜆 ∥ ∇𝜙 L 𝐿𝐿𝑀 (𝑠 ) ∥ 2 , (13)
stratified sampling. The core idea is to maintain the budget for
𝐼𝑠 = ∇ L (𝑠, 𝜃 ∇ L (𝑠 , 𝜃
𝑛2
𝜃 ˆ 𝑖 𝑛 𝜃 𝑖
| {z } the samples in different areas of training distribution, such that
| {z }
𝜃

(effort score)
the data coverage will be improved to ensure a high-probability
(influence score) bound for the empirical risk (refer to [57] for detailed proof). In
detail, we first divide the samples into 𝐾 groups according to their
4We obtain the effort scores for surrogate model by calculating the gradient norm of overall scores. We then iteratively sample 𝑛𝑠 user sequences from
the parameters of the surrogate model (Eq. (12)).
5 The learnable parameters can be either the whole parameters of LLMs or the learnable the group with the fewest samples and discard that group after
parameters from parameter-efficient training, e.g., LoRA [19]. sampling, where 𝑛𝑠 is the average sampling budget for all groups
SIGIR ’24, July 14–18, 2024, Washington, DC, USA Xinyu Lin et al.

Table 1: Statistics of the three datasets. 1) Few-shot fine-tuning fine-tunes LLM-based recommender models
Datasets # Users # Items # Interactions Density with limited samples at a fixed size, e.g., 1024-shot, obtained via
Games 49,156 17,332 342,329 0.04% different data pruning methods. 2) Full fine-tuning utilizes all
MicroLens-50K 49,887 19,217 359,048 0.04% samples to fine-tune LLM-based recommender models without
Book 88,263 86,272 5,303,707 0.07% data pruning.

4.1.2 Baselines. We compare DEALRec with the random sam-


𝑟 |D|
and is initialized with ⌊ 𝐾 ⌋. If the group size is smaller than the pling and several competitive coreset selection methods, including
average sampling budget, we select all users from this group and difficulty-based methods and diversity-based methods: 1) Random
update the average sampling budget for the remaining groups (see obtains the data subset via random sampling, which is a popular
Algorithm 2). and strong baseline in data-efficient training [13]. 2) GraNd [34] is
Based on the selected few-shot samples S, we optimize the a representative coreset selection method that selects the difficult
learnable parameters (𝜙 ∈ Φ) of LLMs: samples with larger gradient norms during training. 3) EL2N [34]
1 ∑︁ LLM proposes to select the difficult samples with larger errors between
𝜙ˆ = arg min L𝜙 (𝑠𝑖 ). (14) the labels and the prediction from the model trained by the original
𝜙 ∈Φ | S |
𝑠𝑖 ∈S
dataset. 4) CCS [57] is a competitive method that selects the samples
• Instantiation. To instantiate DEALRec on LLM-based recom- considering both high data coverage and sample importance. We
mender models, we first employ a surrogate model to train on use EL2N as the importance metric for CCS. 5) TF-DCon [49]
original training samples D and calculate the influence score for is a recently proposed data pruning method for content-based
all samples via Eq. (11), where the L (·) can be any form of the loss recommendation, which clusters the user sequences based on the
function from the surrogate model, e.g., BPR [37]. We then obtain user representations obtained from both well-trained recommender
the effort score for LLMs via Eq. (12), where 𝜙 can be the learnable models and LLMs for selection. 6) RecRanker [30] proposes a
parameters from any backend LLM-based recommender models. sampling strategy to select high-quality user sequences. It selects
Eventually, we apply the stratified sampling to select the samples the users with more interactions for better user modeling and
for LLMs’ few-shot fine-tuning. The detailed data pruning process utilizes a cluster-based sampling strategy to enhance user diversity.
of DEALRec is demonstrated in Algorithm 2. We do not perform optimization-based methods for comparison
because of the inapplicability of complex bi-level or discrete
4 EXPERIMENT optimization for LLMs on large-scale recommendation data (cf.
We conduct extensive experiments on three real-world datasets to Section 2). We instantiate our proposed DEALRec and all baselines
answer the following research questions: on two competitive backend LLM-based recommender models: 1)
BIGRec [3] utilizes the item title to present the user sequence for
• RQ1: How does our proposed DEALRec perform compared to recommendation generation; 2) TIGER [35] learns extra tokens from
the coreset selection baselines for LLM-based recommendation item features to present items, and then converts the user sequence
and the models trained with full data? into the sequence of the new item token for next-item generation.
• RQ2: How do the different components of DEALRec (i.e.,
influence score, gap regularization, and stratified sampling) affect • Evaluation. We employ the widely used metrics Recall@𝐾 and
the performance, and is DEALRec generalizable to different NDCG@𝐾 to evaluate the models [18], with 𝐾 set to 10 and 20 for
surrogate models? Games, and 𝐾 = 20 and 50 for MicroLens-50K and Book8 .
• RQ3: How does DEALRec perform under different selection
4.1.3 Implementation. As for the two backend LLM-based
ratios and how does DEALRec improve the overall performance?
recommender models, we follow the original settings in their
paper for implementation. We employ LLaMA-7B for BIGRec and
4.1 Experimental Settings transformer-based architecture for TIGER as in their paper [35].
4.1.1 Datasets. We conduct experiments on three real-world All fine-tuning experiments are conducted on four NVIDIA RTX
recommendation datasets: 1) Games is from the Amazon review A5000 GPUs. Besides, we adopt the parameter-efficient fine-tuning
datasets6 , which covers interactions between users and video games technique LoRA [19] to fine-tune BIGRec and fully fine-tune the
with rich textual features. 2) MicroLens-50K7 is a newly released parameters of TIGER. We utilize SASRec [20], a representative
micro-video recommendation dataset [33]. It contains 50𝑘 users’ sequential recommender model, as the surrogate model in DEALRec.
interactions with micro-videos and their associated multimodal We set the iteration number 𝑇 for HVP estimation at 5000, and
features. 3) Book is also from Amazon review datasets, containing search the regularization strength 𝜆 in {0.1, 0.3, 0.5, 1.0, 2.0}. For
users’ interactions with extensive books. For Games and Book, cluster-based methods, the number of clusters 𝐾 is explored in
we follow previous work and discard the interactions with the {25, 50, 75}. As for the coreset selection methods that require the
ratings < 4. For the three datasets, we sort all user-item interactions training of LLMs, we consider a feasible implementation [7] by
according to the global timestamps, and then split the interactions executing them on the same surrogate model as DEALRec.
into training, validation, and testing sets with the ratio of 8:1:1.
Besides, we consider two different fine-tuning settings as follows:
8We report metrics@20 and @50 because of the challenging modeling of user behavior
6 https://jmcauley.ucsd.edu/data/amazon/.
on book and micro-video recommendations, where the temporal shifts of user interests
7 https://github.com/westlake-repl/MicroLens/. and the item feature is stronger and thus more difficult to capture [45, 46].
Data-efficient Fine-tuning for LLM-based Recommendation SIGIR ’24, July 14–18, 2024, Washington, DC, USA

Table 2: Overall performance comparison between the baselines and DEALRec instantiated on two competitive LLM-based
recommender models on three datasets. For each backend model, the bold results highlight the best results while the second-best
ones are underlined. ∗ implies the improvements over the second-best results are statistically significant (𝑝-value < 0.01) under
one-sample t-tests. We run all experiments for 3 times with different random seeds and report the averaged results.
Games MicroLens-50K Book
1024-shot (𝒓=2%) 1024-shot (𝒓=2%) 1024-shot (𝒓=1%)
Methods R@10 R@20 N@10 N@20 R@20 R@50 N@20 N@50 R@20 R@50 N@20 N@50
TF-DCon 0.0102 0.0157 0.0062 0.0078 0.0066 0.0099 0.0027 0.0034 0.0104 0.0144 0.0083 0.0092
RecRanker 0.0112 0.0166 0.0074 0.0090 0.0024 0.0042 0.0011 0.0014 0.0108 0.0145 0.0090 0.0097
CCS 0.0164 0.0246 0.0097 0.0122 0.0096 0.0131 0.0041 0.0049 0.0110 0.0145 0.0088 0.0096
BIGRec GraNd 0.0158 0.0250 0.0098 0.0125 0.0014 0.0032 0.0006 0.0010 0.0102 0.0136 0.0080 0.0087
EL2N 0.0154 0.0256 0.0098 0.0128 0.0096 0.0045 0.0041 0.0016 0.0107 0.0149 0.0085 0.0094
Random 0.0163 0.0241 0.0100 0.0122 0.0108 0.0151 0.0044 0.0054 0.0099 0.0134 0.0083 0.0090
DEALRec 0.0181* 0.0276* 0.0115* 0.0142* 0.0124* 0.0160* 0.0055* 0.0064* 0.0117* 0.0155* 0.0096* 0.0104*
TF-DCon 0.0051 0.0074 0.0033 0.0040 0.0006 0.0057 0.0002 0.0013 0.0028 0.0051 0.0020 0.0027
RecRanker 0.0028 0.0045 0.0019 0.0024 0.0043 0.0064 0.0011 0.0014 0.0027 0.0052 0.0018 0.0025
CCS 0.0050 0.0084 0.0031 0.0041 0.0026 0.0061 0.0010 0.0013 0.0026 0.0048 0.0018 0.0024
TIGER GraNd 0.0042 0.0053 0.0027 0.0030 0.0006 0.0014 0.0003 0.0005 0.0008 0.0020 0.0006 0.0010
EL2N 0.0034 0.0048 0.0024 0.0029 0.0011 0.0016 0.0004 0.0004 0.0005 0.0015 0.0004 0.0007
Random 0.0062 0.0102 0.0039 0.0051 0.0037 0.0059 0.0011 0.0014 0.0033 0.0066 0.0022 0.0031
DEALRec 0.0074* 0.0114* 0.0062* 0.0074* 0.0058* 0.0076* 0.0020* 0.0020* 0.0039* 0.0076* 0.0026* 0.0037*

Table 3: Performance comparison between DEALRec under 1024-shot fine-tuning and the full fine-tuning of the BIGRec in
terms of both accuracy and time costs. “%Improve.” denotes the relative improvement achieved by DEALRec compared to the
full fine-tuning. Models are trained for 50 epochs with the early stopping strategy.
Games MicroLens-50K Book
R@10↑ R@20↑ N@10↑ N@20↑ Time↓ R@20↑ R@50↑ N@20↑ N@50↑ Time↓ R@20↑ R@50↑ N@20↑ N@50↑ Time↓
Full 0.0169 0.0233 0.0102 0.0120 36.87h 0.0081 0.0136 0.0038 0.0053 66.64h 0.0076 0.0108 0.0060 0.0068 84.77h
DEALRec 0.0181 0.0276 0.0115 0.0142 1.67h 0.0124 0.0160 0.0055 0.0064 1.23h 0.0117 0.0155 0.0096 0.0104 1.93h
% Improve. 7.10% 18.45% 12.75% 18.33% -95.47% 53.09% 17.65% 44.74% 20.75% -98.15% 53.95% 43.52% 60.00% 52.94% -97.72%

4.2 Overall Performance (RQ1) it maintains easy samples for selection, thus compensating the
The results of the baselines and DEALRec with two competitive knowledge of recommendation data from high-density areas.
backend LLM-based recommender models on three datasets under • Another interesting observation is that random sampling yields
few-shot fine-tuning (1024 samples) are presented in Table 2, from competitive performance or even outperforms other coreset
which we have the following observations: selection methods in some cases, which might attributed to two
possible reasons: 1) Uniformly selected user sequences preserve
• All methods with BIGRec typically yield better performance than high coverage of the original training distribution compared to
those with TIGER, which is attributed to two reasons: 1) BIGRec other baselines, which ensures a high probability of guaranteed
employs a larger LLM (i.e., LLaMA-7B) compared to TIGER, bound for low empirical risk [57]. This observation is also
thereby benefiting from the stronger generalization ability of consistent with the findings in [13]. 2) The inferior performance
large-sized LLMs [27]; and 2) BIGRec leverages item titles to of some coreset selection methods also might be caused by the
present the user sequence, leading to better utilization of world implementation settings (Section 4.1.3), where they may suffer
knowledge in LLMs. In contrast, TIGER learns extra item tokens from the learning ability gap between the surrogate model and
for LLMs. This might result in cold-start item issues since only LLMs. (cf. Section 3.2).
limited item tokens are learned while others are maintained • DEALRec significantly outperforms all coreset selection methods
randomly initialized under the few-shot fine-tuning setting. across the three datasets. The consistent performance improve-
• Among all coreset selection baselines, difficulty-based (GraNd, ments on both backend models validate the superiority of
EL2N) methods generally perform better than diversity-based DEALRec in identifying influential samples for LLMs’ adaptation
methods (TF-DCon, RecRanker). This is reasonable since to the recommendation data. The superior performance is
diversity-based methods merely heuristically encourage selecting attributed to: 1) the accurate and efficient estimation of the
users with divergent preference, which lacks the assessments of influence on empirical risk, i.e., overall performance by removing
their contributions to the model training. In contrast, GraNd and a sample in training; and 2) the gap regularization based on
EL2N use pre-defined metrics to measure the sample difficulty the effort score to penalize the easy samples for LLMs. By
and select the samples with larger scores, which encourages emphasizing the non-trivial samples specifically for LLMs, gap
selecting the samples that are more informative for models’ regularization alleviates the learning ability gap between the
optimization. Besides, CCS improves EL2N in most cases, as surrogate model and the LLMs.
SIGIR ’24, July 14–18, 2024, Washington, DC, USA Xinyu Lin et al.

0.030
Games 0.016
MicroLens-50K (Recall@20) (Time Costs (h)) (%Reduction)
0.028 1.8 1.009
Recall@20 NDCG@20 Recall@20 NDCG@20 %Reduction
0.0118 0.0097
Full Training

0.026 0.013 0.022 1.2


0.0112 0.0092 0.967

0.016 0.6
0.022 0.011
0.0106 0.0087

0.01 0 0.925
0.018 0.008 0.0100 0.0082 0.2% 0.5% 1% 1.5% 2% 4% 0.2% 0.5% 1% 1.5% 2% 4%
Greedy w/o IS w/o 𝜹𝒔 DEALRec Greedy w/o IS w/o 𝜹𝒔 DEALRec
(a) (b) (a) Effect of 𝒓 w.r.t. Recall (b) Effect of 𝒓 w.r.t. time costs
Figure 4: Ablation study of the influence score, effort score, Figure 5: Performance of DEALRec with different selection
and coverage-enhanced sample selection strategy. ratio 𝑟 w.r.t. accuracy and efficiency on Games.

4.3.2 Robustness on different surrogate model (RQ2). To


Table 4: Performance comparison between DEALRec with further assess the generalization ability of DEALRec on different
different surrogate models and the BIGRec under full surrogate models, we employ three representative sequential
training. “Time” presents the time costs for training the recommender models, i.e., BERT4Rec [41], SASRec [20], and
surrogate model on a single NVIDIA RTX A5000. DCRec [51] as the surrogate models, respectively. From the
R@10↑ R@20↑ N@10↑ N@20↑ Time↓ results in Table 4, we can find that: 1) DEALRec with the three
Full 0.0169 0.0233 0.0102 0.0120 / surrogate models consistently outperforms BIGRec under full fine-
BERT4Rec 0.0175 0.0258 0.0103 0.0128 0.76h tuning. This demonstrates the strong robustness of DEALRec on
SASRec 0.0181 0.0276 0.0115 0.0142 0.45h different surrogate models. 2) Different surrogate models cause
DCRec 0.0211 0.0283 0.0117 0.0137 0.61h
some fluctuations in accuracy. This is reasonable because different
model architectures express user behavior and item knowledge
differently, possibly resulting in varied selected samples which will
• Comparison with full fine-tuning. We further compare affect the performance. 3) SASRec exhibits the least time costs for
DEALRec with BIGRec under full training w.r.t. accuracy and training and achieves competitive performance among the three
efficiency, as presented in Table 3. We can find that: 1) DEALRec surrogate models. Therefore, SASRec could be a good choice of
achieves higher performance compared to the model trained by the surrogate model for DEALRec in real-world deployments.
full data, indicating the effectiveness of DEALRec for high accuracy.
The inferior performance of BIGRec under full training also implies 4.3.3 Effect of selection ratio 𝒓 (RQ3). To investigate the effect
that not all user sequences are informative for model training, of selection ratio 𝑟 on DEALRec on both accuracy and efficiency, we
or even harmful to the training, e.g., false negative interactions. vary the ratio 𝑟 from 0.2% (128-shot) to 4% (4096-shot) and present
This has also been observed in CTR prediction [48] and has been the results in Figure 5. It is observed that: 1) The recommendation
discussed in [2] from the view of data redundancy. 2) DEALRec accuracy rapidly improves as the number of selected samples
significantly reduces the time costs for LLMs’ fine-tuning (97.11% increases from 0.2% to 1%, surpassing the full training when 𝑟 = 1%.
reduction of fine-tuning costs on average). With the remarkably Besides, if we continuously increase the selection ratio from 2% to
declined training costs, DEALRec has the potential to facilitate 4%, the benefits from additional samples gradually diminish and
real-world applications of LLM-based recommender models. only minor improvements in accuracy are observed. We suspect
that the gap between and the recommendation data mainly resides
4.3 In-depth Analysis in a small subset of the representative user behaviors, which is what
DEALRec aims to identify. 2) Meanwhile, although the time costs for
4.3.1 Ablation Study (RQ2). To study the effectiveness of fine-tuning LLMs gradually increase because of additional samples,
each component of DEALRec, i.e., influence score, effort score, the cost reduction compared to the full training still reaches over
and coverage-enhanced sample selection strategy, we separately 94%. 3) Empirically, setting 𝑟 = 1% is recommended to achieve
remove the Influence Score (IS) and effort score 𝛿𝑠 , referred to comparable performance to full fine-tuning and low costs.
as “w/o IS” and “w/o 𝛿𝑠 ”, respectively. Besides, we replace the
coverage-enhanced sample selection strategy by greedily selecting 4.3.4 User group evaluation (RQ3). To study how DEALRec
the samples with higher scores, denoted as “Greedy”. From the achieves superior overall performance, we test DEALRec over user
results presented in Figure 4, we can observe that: removing sequences of different difficulties. Specifically, we calculate the
either the influence score or effort score will cause performance loss of each user sequence via the model trained by randomly
drops. This validates the effectiveness of 1) the assessment of selected few-shot samples; we then divide the users into three
overall performance change caused by removing samples from groups according to their loss values, from the easier samples with
training; 2) additional signals of learning ability captured from smaller loss (Group 1) to the harder samples with larger loss (Group
LLMs as regularization, alleviating the gap between the surrogate 3). The results of each group of DEALRec and Random on Games
model and the LLMs. Moreover, simply selecting the samples with are presented in Figure 6. We can find that 1) the performance of
higher overall scores might weaken the learning of distinct user both DEALRec and Random gradually declines from Group 1 to
behaviors and item knowledge (inferior performance of “Greedy”), Group 3, because users with larger loss are more difficult to predict.
as discussed in Section 3.3. Nevertheless, 2) DEALRec consistently outperforms Random in
Data-efficient Fine-tuning for LLM-based Recommendation SIGIR ’24, July 14–18, 2024, Washington, DC, USA

0.052
Games Games 5.2 Coreset Selection
Coreset selection has been widely studied in both traditional
↑11.42% ↑8.41%
Random Random
DEALRec
0.02
DEALRec machine learning and deep learning [47, 50], benefiting many
0.038
downstream tasks such as data-efficient learning [44], neural
↑37.50%
architecture search [40], and active learning [39]. It aims to select a
0.024
↑37.29% 0.01 small but representative subset from the full data that can lead
↑7.02%
to comparable model performance. Previous work mainly falls
↑5.98% into two groups: 1) Heuristic methods [7, 10, 44] typically assume
0.01
Group 1 Group 2 Group 3
0
Group 1 Group 2 Group 3 difficult or diverse samples are informative for model training. 2)
(a) Performance w.r.t. Recall@20 (b) Performance w.r.t. NDCG@20 Optimization-based methods [21, 25, 50] leverages the bi-level or
Figure 6: Performance of DEALRec over easy to difficult discrete optimization techniques to optimize the data subset that
samples (Group 1 to Group 3). can minimize the empirical risk. However, heuristic methods might
be suboptimal since they overlook the impact of selected samples on
(Recall@10) Games (Recall@20) (NDCG@10) Games (NDCG@20) empirical risk. And optimization-based methods fail to be applied
Recall@10 0.031 0.02 NDCG@10 0.019 to LLM-based recommendation due to the cumbersome calculation
for complex optimization. Furthermore, previous methods usually
0.023
Recall@20 NDCG@20
rely on the training of the model on full data for selection, which is
0.018 0.023 0.012 0.010 infeasible for LLM-based recommendation (cf. Section 2).
• Data Condensation [56] is another potential solution to
0.013 0.015 0.004 0.000
achieve data-efficient training. However, it is intrinsically different
0.1 0.3 0.5 1 2 0.1 0.3 0.5 1 2 from our proposed task of data pruning. While it aims to synthesize
(a) Effect of 𝝀 w.r.t. Recall (b) Effect of 𝝀 w.r.t. NDCG a small but informative dataset [55], our task targets to identify
Figure 7: Performance of DEALRec with different 𝜆. existing samples that are representative. Besides, previous work
mainly works for continuous data, which is inapplicable to LLM-
each group, which validates the effectiveness of DEALRec in based recommendation [48]. TF-DCon [49] is recently proposed for
considering the influence on overall performance. content-based recommendation and we compare it in Section 4.2.

4.3.5 Effect of regularization strength 𝝀. We vary 𝜆 from 0.1 6 CONCLUSION


to 2 for DEALRec and evaluate the performance as in Figure 7.
In this work, we proposed the task of data pruning for efficient
From the figures, we can find that: 1) As we incrementally increase
LLM-based recommendation, which aims to identify representative
the value of 𝜆, the overall trend of accuracy has been observed
samples tailored for LLMs’ few-shot fine-tuning. Furthermore, we
to be generally improved. This is due to the gap between the
posited two objectives for this data pruning task: 1) high accuracy
surrogate model and LLMs as discussed in Section 3.2, emphasizing
targets to select the samples that can lead to low empirical risk;
the necessity to regularize the influence score to be aligned with
and 2) high efficiency strives to consume low costs for the data
the learning ability of the LLMs. 2) However, blindly pursuing
pruning process. To this end, we proposed a novel data pruning
larger lambda is not necessarily beneficial. We should carefully
method, namely DEALRec, to efficiently identify the influential
balance between the performance-driven influential samples from
samples with two scores. 1) The influence score is formulated to
the surrogate model and the difficult samples for the LLMs.
estimate the influence of sample removal on empirical risk, which
is extended from the influence function and is accelerated through
5 RELATED WORK the symmetric property. 2) We introduced a small-sized surrogate
5.1 LLM-based Recommendation model to calculate the influence score efficiently and proposed the
Leveraging LLMs for recommendation has gained remarkable effort score to bridge the gap between the surrogate model and
attention recently [36, 52], showcasing their potential across LLMs. Empirical results validate the effectiveness of DEALRec in
various recommendation tasks [4, 12, 27]. Some early studies achieving both high efficiency and high accuracy.
explore the recommendation ability of powerful LLMs through This work proposes a data pruning task for LLM fine-tuning,
in-context-learning ability [9, 42]. Nevertheless, the performance of opening up a new research direction for efficient LLM-based
LLMs is limited without extra fine-tuning over the domain-specific recommendation and leaving many promising future directions
recommendation data [4]. To fully leverage the potential of LLMs for future work. 1) It is worthwhile to apply DEALRec to more
for recommendation, a series of work studies various fine-tuning LLM-based recommender models on more cross-domain datasets,
strategies tailored for recommendation tasks [12, 26, 31, 32, 53, 54]. improving fine-tuning performance with limited resources. 2) Due
However, fine-tuning LLMs requires extensive computational to the limited context window length of LLMs, it is promising
resources and time costs, thus hindering real-world applications. to select the informative interacted items in users’ interaction
Therefore, it is crucial to enhance the fine-tuning efficiency of LLM- sequences for LLMs’ fine-tuning. 3) Enhancing the inference
based recommender models. In this work, we propose the task of efficiency of LLM-based recommender models is also a crucial
data pruning for efficient LLM-based recommendation, aiming to problem for their real-world deployments.
identify representative samples for LLMs’ few-shot fine-tuning.
SIGIR ’24, July 14–18, 2024, Washington, DC, USA Xinyu Lin et al.

REFERENCES Instruction Tuning Large Language Model as Ranker for Top-k Recommendation.
[1] Naman Agarwal, Brian Bullins, and Elad Hazan. 2016. Second-order stochastic arXiv:2312.16018.
optimization in linear time. stat 1050 (2016), 15. [31] Zheqi Lv, Wenqiao Zhang, Zhengyu Chen, Shengyu Zhang, and Kun Kuang. 2024.
[2] Sharat Agarwal, Himanshu Arora, Saket Anand, and Chetan Arora. 2020. Intelligent Model Update Strategy for Sequential Recommendation. In WWW.
Contextual diversity for active learning. In ECCV. Springer, 137–153. ACM.
[3] Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yancheng [32] Zheqi Lv, Wenqiao Zhang, Shengyu Zhang, Kun Kuang, Feng Wang, Yongwei
Luo, Fuli Feng, Xiangnaan He, and Qi Tian. 2023. A bi-step grounding paradigm Wang, Zhengyu Chen, Tao Shen, Hongxia Yang, Beng Chin Ooi, et al. 2023. DUET:
for large language models in recommendation systems. arXiv:2308.08434. A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework
[4] Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. for Efficient Device Model Generalization. In WWW. ACM, 3077–3085.
2023. Tallrec: An effective and efficient tuning framework to align large language [33] Yongxin Ni, Yu Cheng, Xiangyan Liu, Junchen Fu, Youhua Li, Xiangnan
model with recommendation. In RecSys. ACM. He, Yongfeng Zhang, and Fajie Yuan. 2023. A Content-Driven Micro-Video
[5] Zalán Borsos, Mojmir Mutny, and Andreas Krause. 2020. Coresets via bilevel Recommendation Dataset at Scale. arXiv:2309.15379 (2023).
optimization for continual learning and streaming. NeurIPS 33 (2020), 14879– [34] Mansheej Paul, Surya Ganguli, and Gintare Karolina Dziugaite. 2021. Deep
14890. learning on a data diet: Finding important examples early in training. NeurIPS
[6] Chengliang Chai, Jiayi Wang, Nan Tang, Ye Yuan, Jiabin Liu, Yuhao Deng, and 34, 20596–20607.
Guoren Wang. 2023. Efficient coreset selection with cluster-based methods. In [35] Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H Keshavan, Trung
KDD. ACM, 167–178. Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q Tran, Jonah Samost, et al. 2023.
[7] C Coleman, C Yeh, S Mussmann, B Mirzasoleiman, P Bailis, P Liang, J Leskovec, Recommender Systems with Generative Retrieval. In NeurIPS. Curran Associates,
and M Zaharia. 2020. Selection via Proxy: Efficient Data Selection for Deep Inc.
Learning. In ICLR. [36] Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei
[8] R Dennis Cook. 1977. Detection of influential observation in linear regression. Yin, and Chao Huang. 2024. Representation learning with large language models
Technometrics 19, 1 (1977), 15–18. for recommendation. In WWW. ACM.
[9] Sunhao Dai, Ninglu Shao, Haiyuan Zhao, Weijie Yu, Zihua Si, Chen Xu, [37] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.
Zhongxiang Sun, Xiao Zhang, and Jun Xu. 2023. Uncovering chatgpt’s capabilities 2009. BPR: Bayesian personalized ranking from implicit feedback. In UAI. AUAI
in recommender systems. In RecSys. ACM, 1126–1132. Press, 452–461.
[10] Vitaly Feldman and Chiyuan Zhang. 2020. What neural networks memorize [38] Noveen Sachdeva, Mehak Dhaliwal, Carole-Jean Wu, and Julian McAuley. 2022.
and why: Discovering the long tail via influence estimation. NeurIPS 33 (2020), Infinite recommendation networks: a data-centric approach. NeurIPS 35, 31292–
2881–2891. 31305.
[11] Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei [39] Ozan Sener and Silvio Savarese. 2018. Active learning for convolutional neural
Zhang. 2023. Chat-rec: Towards interactive and explainable llms-augmented networks: A core-set approach. (2018).
recommender system. arXiv:2303.14524. [40] Jae-hun Shim, Kyeongbo Kong, and Suk-Ju Kang. 2021. Core-set sampling for
[12] Yuqi Gong, Xichen Ding, Yehui Su, Kaiming Shen, Zhongyi Liu, and Guannan efficient neural architecture search. arXiv:2107.06869.
Zhang. 2023. An Unified Search and Recommendation Foundation Model for [41] Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng
Cold-Start Scenario. In CIKM. 4595–4601. Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder
[13] Chengcheng Guo, Bo Zhao, and Yanbing Bai. 2022. Deepcore: A comprehensive representations from transformer. In CIKM. 1441–1450.
library for coreset selection in deep learning. In DEXA. Springer, 181–195. [42] Weiwei Sun, Lingyong Yan, Xinyu Ma, Pengjie Ren, Dawei Yin, and Zhaochun
[14] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. Ren. 2023. Is ChatGPT Good at Search? Investigating Large Language Models as
DeepFM: a factorization-machine based neural network for CTR prediction. In Re-Ranking Agent. In EMNLP. ACL, 14918–14937.
IJCAI. 1725–1731. [43] Haoru Tan, Sitong Wu, Fei Du, Yukang Chen, Zhibin Wang, Fan Wang, and
[15] Frank R Hampel. 1974. The influence curve and its role in robust estimation. Xiaojuan Qi. 2023. Data Pruning via Moving-one-Sample-out. arXiv:2310.14664.
Journal of the american statistical association 69, 346 (1974), 383–393. [44] Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler,
[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual Yoshua Bengio, and Geoffrey J Gordon. 2018. An empirical study of example
learning for image recognition. In CVPR. IEEE, 770–778. forgetting during deep neural network learning. arXiv:1812.05159.
[17] Muyang He, Shuo Yang, Tiejun Huang, and Bo Zhao. 2023. Large-scale Dataset [45] Wenjie Wang, Xinyu Lin, Liuhui Wang, Fuli Feng, Yunshan Ma, and Tat-Seng
Pruning with Dynamic Uncertainty. arXiv:2306.05175. Chua. 2023. Causal Disentangled Recommendation Against User Preference
[18] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Shifts. TOIS (2023).
Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for [46] Wenjie Wang, Xinyu Lin, Liuhui Wang, Fuli Feng, Yinwei Wei, and Tat-Seng Chua.
recommendation. In SIGIR. 639–648. 2023. Equivariant Learning for Out-of-Distribution Cold-start Recommendation.
[19] Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean In MM. 903–914.
Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank adaptation of large [47] Kai Wei, Rishabh Iyer, and Jeff Bilmes. 2015. Submodularity in data subset
language models. arXiv:2106.09685. selection and active learning. In ICML. PMLR, 1954–1963.
[20] Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential [48] Jiahao Wu, Wenqi Fan, Shengcai Liu, Qijiong Liu, Rui He, Qing Li, and Ke Tang.
recommendation. In ICDM. IEEE, 197–206. 2023. Dataset condensation for recommendation. arXiv:2310.01038.
[21] Krishnateja Killamsetty, Sivasubramanian Durga, Ganesh Ramakrishnan, Abir [49] Jiahao Wu, Qijiong Liu, Hengchang Hu, Wenqi Fan, Shengcai Liu, Qing Li,
De, and Rishabh Iyer. 2021. Grad-match: Gradient matching based data subset Xiao-Ming Wu, and Ke Tang. 2023. Leveraging Large Language Models
selection for efficient deep model training. In ICML. PMLR, 5464–5474. (LLMs) to Empower Training-Free Dataset Condensation for Content-Based
[22] Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Recommendation. arXiv:2310.09874.
Rishabh Iyer. 2021. Glister: Generalization based data subset selection for efficient [50] Shuo Yang, Zeke Xie, Hanyu Peng, Min Xu, Mingming Sun, and Ping Li. 2023.
and robust learning. In AAAI, Vol. 35. 8110–8118. Dataset pruning: reducing training data by examining generalization influence.
[23] Krishnateja Killamsetty, Xujiang Zhao, Feng Chen, and Rishabh Iyer. 2021. (2023).
Retrieve: Coreset selection for efficient and robust semi-supervised learning. [51] Yuhao Yang, Chao Huang, Lianghao Xia, Chunzhen Huang, Da Luo, and Kangyi
NeurIPS 34 (2021), 14488–14501. Lin. 2023. Debiased Contrastive Learning for Sequential Recommendation. In
[24] Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via WWW. 1063–1073.
influence functions. In ICML. PMLR, 1885–1894. [52] Honglei Zhang, He Liu, Haoxuan Li, and Yidong Li. 2024. TransFR: Transferable
[25] Suraj Kothawade, Vishal Kaushal, Ganesh Ramakrishnan, Jeff Bilmes, and Federated Recommendation with Pre-trained Language Models. arXiv:2402.01124.
Rishabh Iyer. 2022. PRISM: A Unified Framework of Parameterized Submodular [53] Honglei Zhang, Fangyuan Luo, Jun Wu, Xiangnan He, and Yidong Li. 2023.
Information Measures for Targeted Data Subset Selection and Summarization. In LightFR: Lightweight federated recommendation with privacy-preserving matrix
AAAI. factorization. TOIS 41, 4 (2023), 1–28.
[26] Lei Li, Yongfeng Zhang, and Li Chen. 2023. Prompt distillation for efficient [54] Junjie Zhang, Ruobing Xie, Yupeng Hou, Wayne Xin Zhao, Leyu Lin, and Ji-Rong
llm-based recommendation. In CIKM. 1348–1357. Wen. 2023. Recommendation as instruction following: A large language model
[27] Xinyu Lin, Wenjie Wang, Yongqi Li, Fuli Feng, See-Kiong Ng, and Tat-Seng empowered recommendation approach. arXiv:2305.07001.
Chua. 2023. A multi-facet paradigm to bridge large language model and [55] Bo Zhao and Hakan Bilen. 2023. Dataset condensation with distribution matching.
recommendation. arXiv:2310.06491. In WACV. IEEE, 6514–6523.
[28] Robert F Ling. 1984. Residuals and influence in regression. [56] Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2020. Dataset Condensation
[29] Qijiong Liu, Nuo Chen, Tetsuya Sakai, and Xiao-Ming Wu. 2024. ONCE: Boosting with Gradient Matching. In ICLR.
Content-based Recommendation with Both Open- and Closed-source Large [57] Haizhong Zheng, Rui Liu, Fan Lai, and Atul Prakash. 2022. Coverage-centric
Language Models. In WSDM. ACM. Coreset Selection for High Pruning Rates. In ICLR.
[30] Sichun Luo, Bowei He, Haohan Zhao, Yinya Huang, Aojun Zhou, Zongpeng
Li, Yuanzhang Xiao, Mingjie Zhan, and Linqi Song. 2023. RecRanker:

You might also like