0% found this document useful (0 votes)
7 views

6 - Pre-training With Aspect-Content Text Mutual Prediction for Multi-Aspect Dense Retrieval

Uploaded by

Juju tem12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

6 - Pre-training With Aspect-Content Text Mutual Prediction for Multi-Aspect Dense Retrieval

Uploaded by

Juju tem12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Pre-training with Aspect-Content Text Mutual Prediction for

Multi-Aspect Dense Retrieval


Xiaojie Sun Keping Bi Jiafeng Guo
CAS Key Lab of Network Data CAS Key Lab of Network Data CAS Key Lab of Network Data
Science and Technology, ICT, CAS Science and Technology, ICT, CAS Science and Technology, ICT, CAS
University of Chinese Academy of University of Chinese Academy of University of Chinese Academy of
Sciences Sciences Sciences
Beijing, China Beijing, China Beijing, China
[email protected] [email protected] [email protected]

Xinyu Ma Yixing Fan Hongyu Shan


CAS Key Lab of Network Data CAS Key Lab of Network Data Qishen Zhang
Science and Technology, ICT, CAS Science and Technology, ICT, CAS Zhongyi Liu
University of Chinese Academy of University of Chinese Academy of Ant Group
Sciences Sciences Beijing, China
Beijing, China Beijing, China {xinzong,qishen.zqs,zhongyi.lzy}@alibaba-
[email protected] [email protected] inc.com
ABSTRACT CCS CONCEPTS
Grounded on pre-trained language models (PLMs), dense retrieval • Information systems → Information retrieval.
has been studied extensively on plain text. In contrast, there has
been little research on retrieving data with multiple aspects using KEYWORDS
dense models. In the scenarios such as product search, the aspect Dense Retrieval, Multi-Aspect, Pre-training
information plays an essential role in relevance matching, e.g., cat-
egory: Electronics, Computers, and Pet Supplies. A common way of ACM Reference Format:
leveraging aspect information for multi-aspect retrieval is to intro- Xiaojie Sun, Keping Bi, Jiafeng Guo, Xinyu Ma, Yixing Fan, Hongyu Shan,
duce an auxiliary classification objective, i.e., using item contents Qishen Zhang, and Zhongyi Liu. 2023. Pre-training with Aspect-Content
Text Mutual Prediction for Multi-Aspect Dense Retrieval. In Proceedings
to predict the annotated value IDs of item aspects. However, by
of the 32nd ACM International Conference on Information and Knowledge
learning the value embeddings from scratch, this approach may Management (CIKM ’23), October 21–25, 2023, Birmingham, United Kingdom.
not capture the various semantic similarities between the values ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3583780.3615157
sufficiently. To address this limitation, we leverage the aspect in-
formation as text strings rather than class IDs during pre-training 1 INTRODUCTION
so that their semantic similarities can be naturally captured in the
Dense retrieval models [9–11, 26, 28, 29] have achieved compelling
PLMs. To facilitate effective retrieval with the aspect strings, we
performance with pre-trained language models (PLMs) [5, 24] as
propose mutual prediction objectives between the text of the item
the backbone. Most studies on dense retrieval focus on unstructured
aspect and content. In this way, our model makes more sufficient
data consisting of plain text, while little attention has been paid
use of aspect information than conducting undifferentiated masked
to structured item retrieval such as product and people search. In
language modeling (MLM) on the concatenated text of aspects and
these scenarios, additional aspect information beyond the query or
content. Extensive experiments on two real-world datasets (product
item content is critical for relevance matching, such as brand-nike,
and mini-program search) show that our approach can outperform
affiliation-Stanford. However, little work has explored how to use
competitive baselines both treating aspect values as classes and
them effectively in dense retrieval models.
conducting the same MLM for aspect and content strings. Code and
A typical way of leveraging aspect information for multi-aspect
related dataset will be available at the URL 1 .
retrieval is to refine the item representations with an auxiliary
aspect prediction objective [12]. Specifically, for each aspect of an
1 https://github.com/sunxiaojie99/ATTEMPT item, the item content is used to predict its annotated value IDs
during training. This approach has two major disadvantages: 1)
It considers the values of an aspect as isolated classes and learns
This work is licensed under a Creative Commons Attribution the embeddings of value IDs from scratch, ignoring their semantic
International 4.0 License. relations. For example, among the category values, "Hunting &
Fishing" is more related to "Sports & Outdoors" while unrelated
CIKM ’23, October 21–25, 2023, Birmingham, United Kingdom to "Pet Supplies". However, such semantic relations may not be
© 2023 Copyright held by the owner/author(s).
ACM ISBN 979-8-4007-0124-5/23/10. captured sufficiently if we treat them as independent classes. 2)
https://doi.org/10.1145/3583780.3615157 It does not use query/item aspects such as category, brand, color,

4300
CIKM ’23, October 21–25, 2023, Birmingham, United Kingdom Xiaojie Sun et al.

etc. during test time, which limits the potential retrieval gains. Content-to-Aspect
Although it may be costly to obtain query aspects during online [CLS] [𝑨𝟏 ] 𝑡#$! 𝑡#%! [𝑨𝟐 ] 𝑡#$" [C] 𝑡&$ 𝑡&% 𝑡&' [SEP]
… … …
service, item aspects can be extracted offline and it is easy to also
use them during inference if they are already used in training.
<Multi-level < Title
In this paper, we propose a method of pre-training with Aspect- [CLS] [𝑨𝟏 ]
Categories >
[𝑨𝟐 ] < Brand> [SEP] [C] [SEP]
+ Description >
contenT TExt Mutual PredicTion (ATTEMPT) to address the above
limitations. Specifically, ATTEMPT leverages aspect values as text Clothing, Women, Shoes, adidas adidas Women Cloud foam Pure
Athletic, Running Running Shoe White 6.5 US…
strings and concatenates them with the content using leading indi-
cator tokens in between. For more effective retrieval, rather than [CLS] [𝑨𝟏 ] 𝑡#$! 𝑡#%! … [𝑨𝟐 ] 𝑡#$" … [C] 𝑡&$ 𝑡&% … 𝑡&' [SEP]

simply conducting undifferentiated MLM on the concatenated as-


Aspect-to-Content
pect and content text, we specifically design an aspect-content
mutual prediction objective. It keeps the entire aspect/content to- Figure 1: The mutual prediction MLM in ATTEMPT. The
kens and predicts the masked ones in the content/aspects. Also, aspect and content texts are colored green and purple.
to suit the scenario where the overhead of obtaining the query
3 METHODOLOGY
aspects online is high, we set the query aspect text to empty during
inference. Our method has several advantages over the common 3.1 Preliminary
approach: 1) In ATTEMPT, the text of an aspect value reuses the to- For a query q or a candidate item 𝑖 , we represent the content text
ken embeddings from the powerful PLMs so the semantic relations (e.g., query string, title, description) as 𝑡𝑐 and the aspect text (e.g.,
between values can be naturally captured. 2) Being concatenated values for brand, color, and category) as 𝑡𝑎 . Assuming 𝑞 or 𝑖 has
with the content, the item aspects can also take effect for relevance 𝑘 aspects, 𝑡𝑎 is further denoted as 𝑡𝑎1 , ..., 𝑡𝑎𝑘 . For each aspect 𝑎 𝑗
matching during test time. 3) The aspect-content mutual prediction (1 ≤ 𝑗 ≤ 𝑘), it has a finite vocabulary of aspect values, denoted as
objective promotes sufficient interactions between the aspect and 𝑉𝑎 𝑗 . Previous work [12] incorporates aspect information by predict-
content at the token level, producing better item representations ing the IDs corresponding to the annotated values of each aspect
for retrieval, which is confirmed by extensive experimental results. 𝑎 𝑗 within the space 𝑉𝑎 𝑗 . In contrast, we propose to pre-train the
As far as we know, there are no suitable large-scale public encoder by conducting mutual prediction between text 𝑡𝑎 and 𝑡𝑐 .
datasets for multi-aspect retrieval. We construct such a dataset
3.2 ATTEMPT
by crawling the item categories from their pages to complement
the aspects in the Amazon ESCI dataset [19]. Our experiments on To model the semantic relationship between various values of an
this refined dataset and a real-world commercial mini-program aspect naturally, we treat the aspect values as text strings and
dataset show that ATTEMPT can significantly outperform the com- concatenate them with the content text. For sufficient capture of
petitive baselines both predicting the classes of aspect values and the interactions between item aspects and contents, we introduce
conducting the same MLM for aspect and content strings. mutual prediction objectives as illustrated in Figure 1.
Encoder Input. To indicate different types of text segments, we
2 RELATED WORK prepend an indicator token [𝐴 𝑗 ] (1 ≤ 𝑗 ≤ 𝑘) and [𝐶] to the as-
pect text 𝑡𝑎 𝑗 and the original content 𝑡𝑐 , e.g., an encoder input is
There are three threads of work related to our study. (1) Multi- [𝐴1 ]𝑡𝑎1 [𝐴2 ]𝑡𝑎2 [𝐴3 ]𝑡𝑎3 [𝑆𝐸𝑃] [𝐶]𝑡𝑐 [𝑆𝐸𝑃]. When a query/item does
aspect Retrieval. Some work has exploited multi-aspect infor- not have certain aspect information, the corresponding aspect text
mation to rank products or entities before PLMs appear [1, 2, 21]. will be empty. In this case, the indicator tokens could still learn
In the era of PLM [6], there has been limited research on multi- some implicit representations of the query/item content. Note that
aspect retrieval until Kong et al. [12] first attempts to do so. They during relevance matching, we always keep the query aspect text
learn aspect embeddings by predicting their value IDs with item empty to suit the practical retrieval scenarios where the overhead
contents and fuse them to yield an item embedding. Later, Shan of obtaining query aspects is high and also avoid potential semantic
et al. [23] proposed a fine-tuning method that uses the local aspect- drift. Table 3 will show that the query-side indicator tokens ([𝐴 𝑗 ]
level matching signals to enhance the global query-item embedding (1 ≤ 𝑗 ≤ 𝑘), [𝐶]) alone learned during pre-training are beneficial
matching. (2) Multi-field Retrieval. How to effectively leverage for retrieval. Since the other parts of ATTEMPT are exactly the
multiple fields (e.g., title, body, etc.) in a document has been a long- same between 𝑞 and 𝑖, we take 𝑖 as an example for illustration.
standing research topic. The most famous method is BM25F [22]. Content Masked Language Modeling (MLM). To capture the
Methods leveraging multi-fields have also been proposed before interactions between the content tokens without any auxiliary
and after PLMs appeared [3, 18, 27, 30]. The multi-fields are unstruc- information, ATTEMPT conducts MLM on the item content. It
tured text in nature and the essential issue is how to weigh them randomly masks tokens in the content text and predicts the masked
differently during matching. Aspects, unlike fields, usually have a tokens with the context-dependent representations encoded by
fixed value set which is much smaller than the space of field text. Transformer layers [5]. The corresponding loss function is:
Thus, their core challenges are different. (3) Pre-trained Models ∑︁
L𝑀𝐿𝑀 ( 𝒕ˆ𝒄 ) = − 𝑙𝑜𝑔𝑃 (𝑤 | 𝒕ˆ𝒄 \𝑚 ( 𝒕ˆ𝒄 ) ),
for Dense Retrieval. Many studies have explored promoting the (1)
𝑤∈𝑚 ( 𝒕ˆ𝒄 )
capabilities of PLMs for dense retrieval including introducing extra
where 𝒕ˆ𝒄 denotes the text produced by randomly masking some
training objectives[4, 13, 15–17], special masking schemes [25], and
tokens in the text 𝒕 𝒄 , 𝑚( 𝒕ˆ𝒄 ) denotes the masked tokens, and 𝒕ˆ𝒄 \𝑚 ( 𝒕ˆ𝒄 )
model architecture changes [7], etc. Our method is grounded on
the basic dual BERT encoders [5]. denotes the remaining tokens in 𝒕ˆ𝒄 .

4301
Pre-training with Aspect-Content Text Mutual Prediction for Multi-Aspect Dense Retrieval CIKM ’23, October 21–25, 2023, Birmingham, United Kingdom

Aspect-to-Content MLM Prediction. We take the entire aspect Table 1: Overall performance. The best results are in bold. †
text as context when predicting the masked tokens in the content indicates significant differences between ATTEMPT and the
text. Under this context, the prediction of masked content tokens best baselines in the first/second/third group.
has extra evidence for consideration and can act differently than MA-Amazon Alipay
Method
content MLM alone. The aspect-to-content
∑︁
(a2c) loss L𝑎2𝑐 is: r@100 r@500 ndcg@50 r@100 r@500 ndcg@50
L𝑎2𝑐 (𝒕 𝒂 ⊕ 𝒕ˆ𝒄 ) = − 𝑙𝑜𝑔𝑃 (𝑤 |𝒕 𝒂 ⊕ 𝒕ˆ𝒄 \𝑚 ( 𝒕ˆ𝒄 ) ), BIBERT 0.6075 0.7795 0.3929 0.4464 0.6284 0.2033
(2) Condenser 0.6091† 0.7801† 0.3960† 0.4520† 0.6423† 0.2072†
𝑤∈𝑚 ( 𝒕ˆ𝒄 )
where ⊕ means concatenation. In particular, the leading tokens MTBERT 0.6137† 0.7852† 0.3969† 0.4498 0.6280 0.2064
[𝐴 𝑗 ] (1 ≤ 𝑗 ≤ 𝑘) and [𝐶] in the input will not be masked. MADRAL 0.6088 0.7815 0.3950 0.4506† 0.6383† 0.2057†
Content-to-Aspect MLM Prediction. The idea of content-to- BIBERT-C 0.6137 0.7814 0.4005 0.4517 0.6291 0.2103
aspect prediction is similar to the aspect classification in [12], both BIBERT-C(A) 0.6137 0.7841 0.4019 0.4611 0.6432† 0.2091
MTBERT-C 0.6142 0.7839 0.3997 0.4391 0.6189 0.2026
of which use the original content to predict the aspects. However, MADRAL-C 0.6169† 0.7850† 0.4041† 0.4376 0.6141 0.2044
ATTEMPT predicts the masked words in the aspect text rather than ATTEMPT 0.6233 0.7924 0.4097 0.4667 0.6592 0.2113
the value classes (IDs), which encodes the aspect information in a
softer manner. Specifically, the content-to-aspect (c2a) loss is: method for plain text. It employs MLM [5] to pre-train the encoder
using the content text of query/item. (2) Condenser [7]: It adds a
∑︁
L𝑐2𝑎 ( 𝒕ˆ𝒂 ⊕ 𝒕 𝒄 ) = − 𝑙𝑜𝑔𝑃 (𝑤 | 𝒕ˆ𝒂 \𝑚 ( 𝒕ˆ𝒂 ) ⊕ 𝒕 𝒄 ). short circuit between the tokens except CLS of the lower layer and
(3)
𝑤∈𝑚 ( 𝒕ˆ𝒂 ) the higher layer of BERT [5] to enhance the final CLS representation.
Overall Learning Objective. By introducing L𝑎2𝑐 and L𝑐2𝑎 , (3) BIBERT-C: It only differs from BIBERT in the encoder input.
ATTEMPT can incorporate the aspect information into the item It uses the aspect text in the same way as ATTEMPT during pre-
representation sufficiently through bidirectional interactions. In training and fine-tuning. (4) BIBERT-C(A): It refines BIBERT-C by
summary, our overall pre-training objective is: assigning a higher mask ratio specifically for the aspect text, which
L𝑜𝑣𝑒𝑟𝑎𝑙𝑙 = L𝑀𝐿𝑀 ( 𝒕ˆ𝒄 ) + 𝜆 ( L𝑎2𝑐 (𝒕 𝒂 ⊕ 𝒕ˆ𝒄 ) + L𝑐2𝑎 ( 𝒕ˆ𝒂 ⊕ 𝒕 𝒄 )), (4)
is consistent with ATTEMPT. (5) MTBERT [-C] [12]: It conducts 𝑘
additional aspect classification tasks on the CLS during pre-training.
where 𝜆 is a hyper-parameter. (6) MADRAL [-C] [12]: It initiates extra multiple aspect embed-
4 EXPERIMENTAL SETUP dings and learns them by predicting the value classes of each aspect
4.1 Datasets and fuses them to produce the final item representation.
We conduct model comparisons on two real-world datasets:
4.3 Implementation and Evaluation Details
Multi-Aspect Amazon ESCI Dataset (MA-Amazon). Amazon We implemented ATTEMPT and all the baselines by ourselves. For
ESCI Product Search [19] originally has multilingual real-world all the methods, the encoder is shared for both queries and items.
queries, product information such as brand, color, title, description, Pre-training. The maximum token length is 156, The learning rate
etc., and 4-level relevance labels: Exact, Substitute, Complement, and and epoch for the MA-Amazon/Alipay dataset are set to 1e-4/5e-5
Irrelevant. We only use the English part and enrich the dataset by and 20/10, respectively. We initialize the BERT parameters with
collecting multi-level product categories from the item pages. We Google’s public checkpoint and use Adam optimizer with a linear
merge all the items and get a corpus of 482K unique items, which warm-up. For all -C baselines and ATTEMPT, the mask ratios are
is used for pre-training. For fine-tuning, we divide the original set to 0.15/0.3 for item/query content to account for the shorter
training set into training and validation sets by queries, and keep the query length. They all have the same mask ratio between aspect
test set, yielding 17K, 3.5K, and 8.9K queries respectively. As in [19], and content text except for BIBERT-C(A) and ATTEMPT, where the
we treat Exact as relevant and the other labels as irrelevant during mask ratio for aspect text is 0.6. 𝜆 in Eq.4 is set as 1.0. We fine-tune
training and for recall calculation. MA-Amazon only has item aspect the pre-trained model checkpoints every two epochs and select the
information, and the coverage of brand, color, and category of levels best one on the validation dataset.
1-2-3-4 are 94%, 67%, and 87%-87%-85%-71%, respectively. Fine-tuning. On both datasets, all models are trained for 20 epochs
Alipay Search Dataset. Alipay is a mini-program (app-like with the Tevatron toolkit[8]. We use a learning rate of 5e-6 and a
service) search dataset with binary manual relevance annota- batch size of 64. All methods are trained with softmax cross entropy
tions. The pre-training query/item corpus has 1.3M/1.8M distinct loss with in-batch negatives and one hard negative. Note that we
queries/items with aspect information i.e., brand (44%/0.6% cover- have not used auxiliary classification objective for MTBERT and
age on query/item) and three-level categories (91%-90%-56%/90%- MADRAL since no significant improvements are achieved.
90%-62% coverage for category 1-2-3 of query/item). The fine- Metrics. We report recall@100, recall@500 and ndcg@50. When
tuning dataset consists of 60K/3.3K/3.3K unique queries in the calculating ndcg on MA-Amazon, following [19], we set the gains
training/validation/test set. Note that the queries for validation of E, S, C, and I to 1.0, 0.1, 0.01, and 0.0, respectively. We perform
and testing do not appear in the pre-training query corpus. two-tailed t-tests (p-value ≤ 0.05) to see significant differences.

4.2 Baselines 5 EXPERIMENTAL RESULTS


We compare ATTEMPT with the following pre-training methods 5.1 Main Results
(-C means that the input takes the same concatenation strategy The overall performance is shown in Table 1. We have the following
for aspect and content text as ATTEMPT): (1) BIBERT [14, 20]: observations: (1) Generally, methods using aspect information out-
BIBERT, the backbone of ATTEMPT, is a prevalent dense retrieval perform those that don’t, confirming the importance of aspects in

4302
CIKM ’23, October 21–25, 2023, Birmingham, United Kingdom Xiaojie Sun et al.

Table 2: Study of various component choices on MA-Amazon. Table 3: Ablation study of query/item aspects on Alipay. †
† indicates significant improvements over BIBERT. indicates significant improvements over BIBERT.
r@100 r@500 ndcg@50 r@100 r@500 ndcg@50
BIBERT 0.6075 0.7795 0.3929 BIBERT 0.4464 0.6284 0.2033
ATTEMPT 0.6233† 0.7924† 0.4097† ATTEMPT 0.4667† 0.6592† 0.2113†
only brand 0.5977 0.7710 0.3859 ATTEMPT 𝑜𝑛𝑙 𝑦 𝑑 0.4563† 0.6437† 0.2105†
only color 0.5867 0.7626 0.3773 ATTEMPT 𝑜𝑛𝑙 𝑦 𝑞 0.4526 0.6366† 0.2059
only cate1-4 0.6212† 0.7893† 0.4050†
brand+color+cate1 0.6192† 0.7863† 0.4040† Ablation Study of Loss Function. We remove each of the three
brand+color+cate1-2 0.6199† 0.7898† 0.4073† losses from the overall loss to see how important it is. In Table 2,
brand+color+cate1-3 0.6223† 0.7910† 0.4092†
we find that: (1) The bidirectional prediction losses are beneficial
ATTEMPT −L𝑐2𝑎 0.6211† 0.7882† 0.4068† to ATTEMPT, and excluding either leads to a performance drop.
ATTEMPT −L𝑎2𝑐 0.6127† 0.7846† 0.3997†
ATTEMPT −L𝑚𝑙𝑚 0.6145† 0.7851† 0.4013† (2) The Aspect-to-Content(a2c) prediction is the most helpful, indi-
BIBERT+𝐴𝐺𝑅𝐸𝐸 0.6246† 0.7913† 0.4112† cating that using aspects as context for content MLM prediction
ATTEMPT +𝐴𝐺𝑅𝐸𝐸 0.6393† 0.8019† 0.4257† is a feasible way to infuse the aspect information into an item. (3)
relevance matching. Notably, MADRAL performs worse than MT- The performance also drops a lot when the vanilla MLM loss is
BERT on MA-Amazon, possibly due to insufficient pre-training data eliminated, indicating the original content semantics without being
to learn the aspect embeddings from scratch sufficiently. (2) Models affected by external information are also important.
treating aspect information as text strings (BIBERT-C/-C(A)) sur- Combination with Advanced Fine-tuning Techniques. AGREE
pass those considering aspect values as discrete classes (MTBERT [23] is a recently proposed fine-tuning method that incorporates
and MADRAL). When the aspect text has a larger mask ratio than a local aspect-query matching loss with the original global query-
the content (in BIBERT-C(A)), the retrieval performance will be item matching loss. AGREE has not studied how to utilize query
boosted. This shows that the aspect text should be taken special care aspects, which suits MA-Amazon well since it does not have query
to encourage sufficient learning. (3) When aspect text concatenation aspects. Since AGREE concatenates the item aspects with content,
is incorporated with methods using aspect values for classification it is easy to integrate AGREE during fine-tuning after pre-training
(MTBERT and MADRAL), the retrieval performance does not al- with ATTEMPT. The last block in Table 2 shows the performance
ways become better. This could be because the input aspect text of AGREE alone and combining both. It shows that based on better
becomes the shortcut for the models to predict its corresponding fine-tuning techniques, ATTEMPT can achieve better performance.
class ID. When the pre-training data is large (e.g., on Alipay), such Notably, combining AGREE with methods that conduct aspect clas-
relation is more likely grasped by models, deterring the learning sification will not necessarily lead to better performance (Check
of beneficial interactions. (4) More powerful pre-training method MTBERT-C and MADRAL-C in Table 1).
(Condenser) sometimes perform better than methods using aspects Ablation Study of Query/Item Aspects. We examine the influ-
(MTBERT and MADRAL on Alipay). Note that the benefit from ence of the query and item aspects in Table 3. It shows that both
the advanced pre-training techniques is orthogonal to the aspect query aspects and item aspects contribute to retrieval performance
information and they can be combined for even better performance. and the item aspects are more important. Since we only use item
We leave the study of this in future work. (5) Overall, our ATTEMPT aspects during relevance matching, query aspects only take effect
achieves the best performance on both datasets, showing the ef- during pre-training and could have fewer contributions.
ficacy of its pre-training objective specifically proposed for the 6 CONCLUSION
concatenated text of aspect and content.
In this paper, we propose an effective pre-training method that uses
5.2 Further Analysis aspects as text strings and conducts mutual prediction between
We also probe ATTEMPT from various perspectives to verify its the aspect and content text for multi-aspect retrieval. In contrast
effectiveness. For reproducibility, our analysis is based on MA- to previous approaches that treat aspect values as categorical IDs,
Amazon. The only exception is the ablation study of query/item ATTEMPT can capture the semantic relation between aspects by
aspects since only Alipay has both of them. their text strings and perform finer-grained interactions between
Ablation Study of Aspects. We study the effects of various as- item aspect and content by mutual prediction. Our experiments on
pects in ATTEMPT (brand, color, and category from level 1 to 4) two real-world datasets show that ATTEMPT can outperform mul-
in Table 2. We find that: (1) When use each aspect alone, only the tiple competitive baselines significantly. Moreover, we release our
category information enhances model performance. This might be- enriched Multi-aspect Amazon Product Search dataset to encourage
cause brand and color are often included in the item content already research on multi-aspect dense retrieval.
while the category is extra meta information. The observation that
category matters the most is consistent with [12]. (2) Combining ACKNOWLEDGMENTS
all aspects outperforms using category only, indicating that brand This work was funded by the National Natural Science Foundation
and color may take a better effect when interacting with the cat- of China (NSFC) under Grants No. 61902381, the Youth Innovation
egory. (3) More levels of category information will lead to better Promotion Association CAS under Grants No. 2021100, the project
performance except that three and four levels have similar results. under Grants No. JCKY2022130C039 and 2021QY1701, the Lenovo-
While adding more category levels provides richer information, the CAS Joint Lab Youth Scientist Project. This work was also supported
reduced coverage (refer to Section 4.1) might limit the benefits. by Ant Group through Ant Innovative Research Program.

4303
Pre-training with Aspect-Content Text Mutual Prediction for Multi-Aspect Dense Retrieval CIKM ’23, October 21–25, 2023, Birmingham, United Kingdom

REFERENCES [17] Xinyu Ma, Ruqing Zhang, Jiafeng Guo, Yixing Fan, and Xueqi Cheng. 2022. A
[1] Qingyao Ai, Vahid Azizi, Xu Chen, and Yongfeng Zhang. 2018. Learning Hetero- Contrastive Pre-training Approach to Discriminative Autoencoder for Dense
geneous Knowledge Base Embeddings for Explainable Recommendation. Algo- Retrieval. In Proceedings of the 31st ACM International Conference on Information
rithms 11, 9 (2018), 137. https://doi.org/10.3390/a11090137 & Knowledge Management, Atlanta, GA, USA, October 17-21, 2022, Mohammad Al
[2] Qingyao Ai, Yongfeng Zhang, Keping Bi, and W. Bruce Croft. 2020. Explainable Hasan and Li Xiong (Eds.). ACM, 4314–4318. https://doi.org/10.1145/3511808.
Product Search with a Dynamic Relation Embedding Model. ACM Trans. Inf. Syst. 3557527
38, 1 (2020), 4:1–4:29. https://doi.org/10.1145/3361738 [18] Zhen Qin, Zhongliang Li, Michael Bendersky, and Donald Metzler. 2020. Matching
[3] Saeid Balaneshinkordan, Alexander Kotov, and Fedor Nikolaev. 2018. Attentive Cross Network for Learning to Rank in Personal Search. In WWW ’20: The
Neural Architecture for Ad-hoc Structured Document Retrieval. In Proceedings Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, Yennun Huang, Irwin
of the 27th ACM International Conference on Information and Knowledge Man- King, Tie-Yan Liu, and Maarten van Steen (Eds.). ACM / IW3C2, 2835–2841.
agement, CIKM 2018, Torino, Italy, October 22-26, 2018, Alfredo Cuzzocrea, James https://doi.org/10.1145/3366423.3380046
Allan, Norman W. Paton, Divesh Srivastava, Rakesh Agrawal, Andrei Z. Broder, [19] Chandan K. Reddy, Lluís Màrquez, Fran Valero, Nikhil Rao, Hugo Zaragoza,
Mohammed J. Zaki, K. Selçuk Candan, Alexandros Labrinidis, Assaf Schuster, and Sambaran Bandyopadhyay, Arnab Biswas, Anlu Xing, and Karthik Subbian. 2022.
Haixun Wang (Eds.). ACM, 1173–1182. https://doi.org/10.1145/3269206.3271801 Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product
[4] Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, and Sanjiv Kumar. Search. CoRR abs/2206.06588 (2022). https://doi.org/10.48550/arXiv.2206.06588
2020. Pre-training Tasks for Embedding-based Large-scale Retrieval. In 8th arXiv:2206.06588
International Conference on Learning Representations, ICLR 2020, Addis Ababa, [20] Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embed-
Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id= dings using Siamese BERT-Networks. In Proceedings of the 2019 Conference
rkg-mA4FDr on Empirical Methods in Natural Language Processing and the 9th International
[5] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong
Pre-training of Deep Bidirectional Transformers for Language Understanding. Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vincent Ng, and
CoRR abs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805 Xiaojun Wan (Eds.). Association for Computational Linguistics, 3980–3990.
[6] Yixing Fan, Xiaohui Xie, Yinqiong Cai, Jia Chen, Xinyu Ma, Xiangsheng Li, Ruqing https://doi.org/10.18653/v1/D19-1410
Zhang, and Jiafeng Guo. 2022. Pre-training Methods in Information Retrieval. [21] Ridho Reinanda, Edgar Meij, and Maarten de Rijke. 2015. Mining, Ranking and
Found. Trends Inf. Retr. 16, 3 (2022), 178–317. https://doi.org/10.1561/1500000100 Recommending Entity Aspects. In Proceedings of the 38th International ACM
[7] Luyu Gao and Jamie Callan. 2021. Condenser: a Pre-training Architecture for SIGIR Conference on Research and Development in Information Retrieval, Santiago,
Dense Retrieval. In Proceedings of the 2021 Conference on Empirical Methods in Chile, August 9-13, 2015, Ricardo Baeza-Yates, Mounia Lalmas, Alistair Moffat,
Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican and Berthier A. Ribeiro-Neto (Eds.). ACM, 263–272. https://doi.org/10.1145/
Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia 2766462.2767724
Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, [22] Stephen E. Robertson, Hugo Zaragoza, and Michael J. Taylor. 2004. Simple BM25
981–993. https://doi.org/10.18653/v1/2021.emnlp-main.75 extension to multiple weighted fields. In Proceedings of the 2004 ACM CIKM
[8] Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan. 2022. Tevatron: An International Conference on Information and Knowledge Management, Washington,
Efficient and Flexible Toolkit for Dense Retrieval. CoRR abs/2203.05765 (2022). DC, USA, November 8-13, 2004, David A. Grossman, Luis Gravano, ChengXiang
https://doi.org/10.48550/arXiv.2203.05765 arXiv:2203.05765 Zhai, Otthein Herzog, and David A. Evans (Eds.). ACM, 42–49. https://doi.org/
[9] Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, and Allan 10.1145/1031171.1031181
Hanbury. 2021. Efficiently Teaching an Effective Dense Retriever with Balanced [23] Hongyu Shan, Qishen Zhang, Zhongyi Liu, Guannan Zhang, and Chenliang
Topic Aware Sampling. In SIGIR ’21: The 44th International ACM SIGIR Conference Li. 2023. Beyond Two-Tower: Attribute Guided Representation Learning for
on Research and Development in Information Retrieval, Virtual Event, Canada, July Candidate Retrieval. In Proceedings of the ACM Web Conference 2023. 3173–3181.
11-15, 2021, Fernando Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, [24] Yu Sun, Shuohuan Wang, Yu-Kun Li, Shikun Feng, Xuyi Chen, Han Zhang,
and Tetsuya Sakai (Eds.). ACM, 113–122. https://doi.org/10.1145/3404835.3462891 Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. ERNIE: Enhanced
[10] Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. 2019. Representation through Knowledge Integration. CoRR abs/1904.09223 (2019).
Real-time Inference in Multi-sentence Tasks with Deep Pretrained Transformers. arXiv:1904.09223 http://arxiv.org/abs/1904.09223
CoRR abs/1905.01969 (2019). arXiv:1905.01969 http://arxiv.org/abs/1905.01969 [25] Shitao Xiao, Zheng Liu, Yingxia Shao, and Zhao Cao. 2022. RetroMAE: Pre-
[11] Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Training Retrieval-oriented Language Models Via Masked Auto-Encoder. In
Search via Contextualized Late Interaction over BERT. In Proceedings of the 43rd Proceedings of the 2022 Conference on Empirical Methods in Natural Language
International ACM SIGIR conference on research and development in Information Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022,
Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020, Jimmy X. Huang, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Com-
Yi Chang, Xueqi Cheng, Jaap Kamps, Vanessa Murdock, Ji-Rong Wen, and Yiqun putational Linguistics, 538–548. https://aclanthology.org/2022.emnlp-main.35
Liu (Eds.). ACM, 39–48. https://doi.org/10.1145/3397271.3401075 [26] Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N. Bennett,
[12] Weize Kong, Swaraj Khadanga, Cheng Li, Shaleen Kumar Gupta, Mingyang Junaid Ahmed, and Arnold Overwijk. 2021. Approximate Nearest Neighbor
Zhang, Wensong Xu, and Michael Bendersky. 2022. Multi-Aspect Dense Retrieval. Negative Contrastive Learning for Dense Text Retrieval. In 9th International
In KDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May
Mining, Washington, DC, USA, August 14 - 18, 2022, Aidong Zhang and Huzefa 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=zeFrfgyZln
Rangwala (Eds.). ACM, 3178–3186. https://doi.org/10.1145/3534678.3539137 [27] Hamed Zamani, Bhaskar Mitra, Xia Song, Nick Craswell, and Saurabh Tiwary.
[13] Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. 2019. Latent Retrieval for 2018. Neural Ranking Models with Multiple Document Fields. In Proceedings
Weakly Supervised Open Domain Question Answering. In Proceedings of the 57th of the Eleventh ACM International Conference on Web Search and Data Mining,
Conference of the Association for Computational Linguistics, ACL 2019, Florence, WSDM 2018, Marina Del Rey, CA, USA, February 5-9, 2018, Yi Chang, Chengxiang
Italy, July 28- August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Zhai, Yan Liu, and Yoelle Maarek (Eds.). ACM, 700–708. https://doi.org/10.1145/
Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 3159652.3159730
6086–6096. https://doi.org/10.18653/v1/p19-1612 [28] Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, and Shaoping Ma.
[14] Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2021. Pretrained Transformers 2021. Optimizing Dense Retrieval Model Training with Hard Negatives. In SIGIR
for Text Ranking: BERT and Beyond. Morgan & Claypool Publishers. https: ’21: The 44th International ACM SIGIR Conference on Research and Development
//doi.org/10.2200/S01123ED1V01Y202108HLT053 in Information Retrieval, Virtual Event, Canada, July 11-15, 2021, Fernando Diaz,
[15] Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, and Tetsuya Sakai (Eds.).
Paul Bennett, Tie-Yan Liu, and Arnold Overwijk. 2021. Less is More: Pretrain ACM, 1503–1512. https://doi.org/10.1145/3404835.3462880
a Strong Siamese Encoder for Dense Text Retrieval Using a Weak Decoder. In [29] Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2020.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language RepBERT: Contextualized Text Embeddings for First-Stage Retrieval. CoRR
Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 abs/2006.15498 (2020). arXiv:2006.15498 https://arxiv.org/abs/2006.15498
November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and [30] Hongchun Zhang, Tianyi Wang, Xiaonan Meng, and Yi Hu. 2019. Improving Se-
Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 2780–2791. mantic Matching via Multi-Task Learning in E-Commerce. In Proceedings of the SI-
https://doi.org/10.18653/v1/2021.emnlp-main.220 GIR 2019 Workshop on eCommerce, co-located with the 42st International ACM SIGIR
[16] Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, and Xueqi Cheng. 2022. Pre- Conference on Research and Development in Information Retrieval, eCom@SIGIR
train a Discriminative Text Encoder for Dense Retrieval via Contrastive Span 2019, Paris, France, July 25, 2019 (CEUR Workshop Proceedings, Vol. 2410), Jon
Prediction. In SIGIR ’22: The 45th International ACM SIGIR Conference on Research Degenhardt, Surya Kallumadi, Utkarsh Porwal, and Andrew Trotman (Eds.).
and Development in Information Retrieval, Madrid, Spain, July 11 - 15, 2022, Enrique CEUR-WS.org. http://ceur-ws.org/Vol-2410/paper2.pdf
Amigó, Pablo Castells, Julio Gonzalo, Ben Carterette, J. Shane Culpepper, and
Gabriella Kazai (Eds.). ACM, 848–858. https://doi.org/10.1145/3477495.3531772

4304

You might also like