0% found this document useful (0 votes)
73 views

Efficient Conversion Prediction SoftCOM

This document proposes a method for predicting conversions in e-commerce using unsupervised machine learning on user event data. It introduces the concept of a UX value that reduces a sequence of events between a user and item to a single number between 0 and 1. It then uses a deep neural network to predict unknown UX values and the likelihood of conversions based on the UX values. The method was tested on a real-world dataset of over 8 million e-commerce events and showed efficient and accurate prediction capabilities.

Uploaded by

Sarah F
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

Efficient Conversion Prediction SoftCOM

This document proposes a method for predicting conversions in e-commerce using unsupervised machine learning on user event data. It introduces the concept of a UX value that reduces a sequence of events between a user and item to a single number between 0 and 1. It then uses a deep neural network to predict unknown UX values and the likelihood of conversions based on the UX values. The method was tested on a real-world dataset of over 8 million e-commerce events and showed efficient and accurate prediction capabilities.

Uploaded by

Sarah F
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Efficient Conversion Prediction in E-Commerce

Applications with Unsupervised Learning


Péter Szabó Béla Genge
Department of Electrical Engineering Department of Electrical Engineering
and Information Technology and Information Technology
University of Medicine, Pharmacy, University of Medicine, Pharmacy,
Science and Technology of Târgu Mures, Science and Technology of Târgu Mures,
Târgu Mures, , Romania Târgu Mures, , Romania
Email: [email protected] Email: [email protected]

Abstract—Unsupervised machine learning became a ubiquitous tailored to their needs and expectations. To address this, we
method appearing in E-commerce solutions that strive to provide present a methodology that relies on user events instead of
personalized recommendations for their users. Most of those ratings. Our event-based approach has led to a significant
solutions embrace collaborative filtering (CF) to predict conver-
sions, which are the beneficial user events, such as a purchase. increase in the number of data points, which seems counter-
Traditionally, the predictions were made based on rating data. intuitive because the operation time of recommender engines
However, e-commerce users seldom leave ratings. Instead, we increases exponentially with the increase of data-points [9].
must rely on user events, such as viewing an item or adding it Our main contribution is a solution reducing all events between
to the cart. The event-based approach seems counter-intuitive, an item and a user (the event chain) to a single UX value in the
for the reason that the operation time of recommender systems
increases exponentially with the increase of data-points. [0, 1) range. This number also depends on the sequentiality of
One of the main contributions of this paper is the UX value the events. After determining the UX value for all event chains,
function. It reduces all events between an item and a user to we stored it in a sparse matrix. This matrix can be generated
a single user experience number, which also depends on the from any event dataset in linear time. Using a random split
sequentiality of the events. We present a method to calculate of the sparse matrix, we trained a deep neural network to
this number in linear time. Then we use a deep neural network
to predict the likelihood of conversions based on this number predict unknown (null) values. We have implemented a ma-
to prove the practical solvability of the problem in a scalable chine learning model featuring Adam optimizer and dropout
manner, with a relatively fast learning speed and good prediction regularization.
accuracy. We have conducted an extensive experimental analysis The machine learning model was introduced to prove the
on Kechinov’s ‘eCommerce Events History in Cosmetics Shop’ practical solvability of the problem in a scalable manner,
dataset, containing 8,738,120 user events. The results of those ex-
periments prove the efficiency and applicability of the developed with a relatively fast learning speed, and to good prediction
approach. accuracy. To do so, we applied this model to Kechinov’s
Index Terms—machine learning, e-commerce, conversion, pre- ‘eCommerce Events History in Cosmetics Shop’ dataset [10]
diction, recommender, collaborative filtering of over eight million events.
We have structured the remainder of this paper as follows:
I. I NTRODUCTION After an overview of related works in Section II, we presented
In recent years a new research topic emerged, the use of the developed approach in Section III, followed by Section IV,
unsupervised machine learning to create recommendations and which documents the numerical evaluation of the developed
improve the user experience (UX) [1]. Quantifying UX is a approach, applied to a real-world dataset. We concluded the
difficult task, and often leads to inaccurate results [2]–[4]. On paper with Section V.
the other hand, it is possible to accurately record the user
events, which happened during the entire interaction between II. R ELATED W ORKS
the user and the application. Examples of such events include Recently, numerous researchers adopted user events to cre-
viewing an item (a product selected by the user), adding or ate better experiences, usually in the form of recommenda-
removing the product from the cart, or purchasing an item. tions. To this end, we mention the work of Vijayakumar, et
Many researchers considered collaborative filtering (CF) to al. [11], which used a heat map of visited travel locations
be the best approach for recommender systems [5]–[7]. CF to create travel recommendations for tourists. Next, Szabo et
represents a collection of algorithms that filter items a user al. used machine learning and behavioral analysis for user-
might like based on reactions from other users. Traditionally, tailored viewer experience [12]. Deng et al. proposed a unified
CF algorithms use the existing ratings to predict the prefer- framework of representation learning and matching function
ences of the users [8]. However, e-commerce users seldom learning to pair users with items, which can also be applied
leave ratings, yet they require shopping recommendations to event data [13].
CF is computationally intensive, especially on large ith user event in E. It is assumed that given two user events
datasets. Several solutions were suggested to improve CF. ei , and ej , if i < j, then user event ei precedes in time user
The typicality-based CF determines the neighbors from user event ej .
groups based on their typicality degree [5]. Another example is Next, we assume that events can be categorized according
the demographic content-based collaborative recommendation to the user input. For example, if the user clicks on an item
system framework. This is a three-step process, which starts in a category listing to see the item in detail, this is called a
with K-means clustering based on the user’s demographic ‘view’ event. An event category is, hereinafter, called an event
information. Then, it predicts the rating using a hybrid of type. Let T denote the set of event types found in E, and the
Pearson correlation similarity and Cosine similarity. Finally, it event type of a user event ei be ti , ∀ti ∈ T . Let c ∈ E denote
gives recommendations using content-based CF [6]. For real- a conversion event. The event type of c is denoted by tc , such
world uses-cases, a weighted average of the typicality-based that tc ∈ T .
CF and demographic-based CF can be used to produce the
best recommendation result for the user [5]. In 2019, a critical A. Calculate the Probability of Conversion
analysis of 131 CF-related articles from 36 journals concluded Let P(ti ) be the probability of the event type in the
that recommendation systems require further research [14]. complete sample space, and P(ti ∩ tc ) the probability of the
Traditional CF suffers from the ‘cold-start problem,’ mean- expected favorable outcome. We assume that the probability
ing that new items do not get recommended until someone of conversion is dependent of the ti event. The computation
reviews them [15]. For rating-based recommenders, the cold- of the probability of the conversion event type (tc ) given event
start problem can mean weeks or, in certain cases, months type ti is therefore defined as:
of delay. Recently this has been identified as a research topic ∀ti (ti ∈ T =⇒ P(tc ) 6= P(tc |ti ))
worth exploring, and we find several suggestions on how to (
alleviate these issues [16]–[18]. Castillejo et al. suggests using 0, if P(ti ) = 0 (1)
P(tc |ti ) = P(ti ∩tc )
data from the users’ social network in a CF recommender P(ti ) , if P(ti ) ∈ (0, 1].
system [19] to tackle the cold start problem. Contrary to those
studies, the approach developed in this paper addresses the In the equation above, P(ti ) is the probability of the
cold-start problem by relying on user events. User events are event type in the whole sample space, and P(ti ∩ tc ) is the
logged as soon as the items are added to an e-commerce probability of the expected favorable outcome. We assume that
platform, or a new user visits it for the first time, therefor the probability of conversion is dependent of the ti event.
there cold-start has almost zero impact on UX. B. Compute the UX Value for an Event Chain
Compared to the aforementioned studies, the methodology
All events between a specific user and a specific item form
documented in this paper distinguishes itself by tackling scala-
the UX event chain, where events are ordered based on their
bility via a new data reduction approach. Overall, it is a faster
recency (timestamp) such that the first element is the oldest
and more accurate prediction methodology, as demonstrated
event, and the last element is the most recent event. The UX
in Section IV.
value function transforms the last element of the UX event
III. D EVELOPED A PPROACH chain into a scalar value.
The main goal of the developed approach is to predict the Let u denote a specific user, and o a specific item, the object
probability of a conversion type event happening for an item of the interaction between the human agent and the software.
unknown to the user. To this end, a conversion is defined as Let E u,o ⊆ E denote an event chain associated to user u and
an event directly beneficial for both the user and the platform. specific to item o. Then, let eu,o
i ∈ E u,o be the i-th element
A purchase event is the most common type of conversion, and of E u,o . According to these definitions, the members of the
the only conversion event discussed in this paper. But other E u,o event chain differ only in event type. As a result, the UX
conversion events are also possible, such as signing up for a function is defined as:
(
newsletter of depositing money in a digital wallet. 0, if i = 0.
u,o
We summarise the developed approach as the subsequent U X(ei ) = (2)
tanh(U X(eu,oi−1 ) + P(t |t
c i )), if i > 0.
execution of the following main steps:
1) Calculate the likelihood of conversion for all event types. In the above equation, the hyperbolic tangent is used to
2) Compute the User Experience (UX) value for all user- assure that the function’s return value is always in the [0, 1)
item event chains and store it as a user-item sparse matrix. range. Therefore it acts as the UX value’s normalizer. Lastly,
3) Train a neural network with Adam optimizer to minimize we define the Yu,o sparse matrix to contain the UX value for
the mean squared error. the last eu,o event, as defined in the following equation:
Then, the resulting machine learning model is used to predict Yu,o ⇐ U X(eu,o
max(i) ) (3)
the likelihood of purchase for all unknown user-item pairs.
In order to formally define the approach, we first introduce Accordingly, Yu,o contains the UX value for the last eu,o
a few notations. Let E denote the ordered set of all user events event. The main benefit of this approach is that it has a com-
(the dataset), and e ∈ E, an element of E. Let ei denote the putational complexity of O(|E|), where |E| is the number of
elements in set E, in other words, it has a linear computation backward propagation of errors [23] and let Adam(lr, β1 , β2 )
time. Another benefit of the approach is that it yields a sparse be the Adam optimizer function to minimize the loss function.
user × item matrix, most of the values being zero, which can The Adam optimizer, as introduced by Kingma et al., assumes
be constructed and stored efficiently. the following hyperparameters. Let lr ∈ (0, 1) be the learning
rate, and β1 , β2 ∈ [0, 1) be the decay rates for moment
C. Three-way Data Split
estimates. To minimize the loss function, the Adam optimizer
In the next step of the developed approach, we split the Y combines the best properties of the AdaGrad and RMSProp
sparse matrix into three matrices with the same dimensions as algorithms, and it was demonstrated that it can handle sparse
Y using three-way data split. gradients on noisy problems [24].
For colossal datasets, generally, holdout validation is used To be able to train the algorithm, let epochs ∈ N>0
[20], [21]. We used a special case of holdout validation, the denote the number of times the entire dataset Ytrain is passed
three-way data split. With this approach, only the final, trained both forward and backward through the neural network. Let
model is evaluated using the test set, and the validation set is batch size ∈ N>0 denote the number of samples evaluated
used during hyperparameter optimization only. before the model’s internal parameters are updated.
Let ptrain , pval , and ptest be the probability of an event When we want to prevent over-fitting for Adam, L2 reg-
chain being in the Ytrain , Yval , and Ytest matrices respectively, ularization is not effective, as demonstrated by Loshchilov
we apply three-way data split using the following equation: et al. [25]. To prevent over-fitting we could use decoupled
ptrain + pval + ptest = 1 weight decay regularization [25] or dropout [26]. Weight decay
(4) penalizes large weights, forcing all weights to be close to
ptrain Ytrain + pval Yval + ptest Ytest = Y
0. Due to the relatively high number of training examples
D. A Neural Network With Adam Optimizer and parameters, dropout regularization is a better option.
In the final step of the developed approach, we train a neural Dropout means dropping units and their connections from the
network, in order to predict the likelihood of conversion for neural network during training, with P0 chance, to prevent co-
the user-item pairs with unknown UX values. adapting [27]. Let D() be a dropout regularization function
Non-negative matrix factorization is NP-hard, as proven with P0 ∈ [0, 1) probability of eliminating units and their
by Vavasis [22]. Therefore, it is unlikely that there is an connections from the neural network during training [26].
exact algorithm that runs in polynomial time. Algorithms that Our algorithm will run until Ytrain is passed forward and
run in time exponential cannot reasonably be considered for backward through the neural network epochs times. During
Ytrain
real-world applications due to the large dimensions of the each pass, batch size number of batches are taken from Ytrain .
datasets. Fortunately, it is not critical to get perfect user-item Let Embedding(x, y) be a lookup function that retrieves
recommendations for an e-commerce site. Instead, it is enough embeddings. Let x be the size of the dictionary of embeddings,
to get recommendations with a high probability, if the solution and y be the size of each embedding vector. Let nf ∈ N>0
scales well to a large number of users and items. Therefore, be the number of factors. Let userf be the user factors, and
within the developed approach, we used a gradient descent- users are the users found in the batch, so that userf ⇐
based optimization of an objective function. Embedding(users, nf ). Let userb be the user bias, so that
To solve the problem of predicting the missing values in userb ⇐ Embedding(users, 1). Moreover let itemf be the
the sparse matrix, we define the objective function to be item factors, and items the items found in the batch, so that
minimized, known as the loss function. The data preparation itemf ⇐ Embedding(items, nf ). Let itemb be the item
procedure described in the previous section ensures that there bias, so that itemb ⇐ Embedding(items, 1).
are no considerable outliers in a data set. The main reason In the developed algorithm the loss is calculated using
why we did not consider mean absolute error (MAE) as the MSPE function (Eq. (5)). Then, the backpropagation()
the loss function is that user event data is bound to have a achieves the backward propagation of errors. Finally, the
small perturbation. MAE is not as stable as mean squared Adam optimizer function is used to minimize the loss function.
error (MSE). Therefore, a small perturbation will have a more After each epoch, the validation loss is calculated and stored
significant effect on MAE than on MSE. using M SP E(Ŷ , Yval ). A decreasing validation loss indicates
Let Y be the ground truth, Ŷ the prediction of the algorithm that the algorithm is learning to predict the conversions. The
about the ground truth, and q the number of predicted data resulting solution is summarized as Algorithm 1.
points. The loss function we used is the MSE calculated on IV. E XPERIMENTAL A SSESSMENT
q data points, known as the mean squared prediction error
A. Implementation Details
(MSPE), defined in the following equation:
We have used Python 3.7.7 to implemented the solution
n+q
1 X described in the previous section. The Jupyter Notebooks,
M SP E(Ŷ , Y ) = (Ya − Ŷa )2 (5) and the outputs can be found on the project’s GitHub reposi-
q a=n+1
tory: https://github.com/WSzP/uxml-ecommerce.
Next, we proceed with the definition of the neural network For the implementation of machine learning algorithms, we
training algorithm. Let backpropagation() be a function for used the PyTorch open source library [28] (version 1.5) and the
Algorithm 1 The training and validation algorithm
Dataset: eCommerce Events History
Require: Ytrain , Yval ∈ Rusers×items . Sparse matrices in Cosmetics Shop
Require: lr ∈ (0, 1) . Step size
Require: epochs ∈ N>0 . passes through Ytrain 8,738,120 events (rows) × 9 columns

Require: nf ∈ N>0 . Number of factors Drop users with < 5 events


Keep columns:
Require: batch size ∈ N>0 . Examples/iteration. event_type, product_id, user_id
Require: β1 , β2 ∈ [0, 1) . Decay rates for moment estimates Sort by: event_time
Require: P0 ∈ [0, 1) . Dropout probability 7,780,863 events (rows) × 3 columns
for i ← 1, epochs do Calculate the likelihood of conversion
Ytrain
for all batch ∈ batch size do
for all event types.
userf ⇐ Embedding(users, nf )
ux_constants
itemf ⇐ Embedding(items, nf )
userb ⇐ Embedding(users, 1) Apply the UX value function
to all user-item event chains
itembP ⇐ Embedding(items, 1)
Ŷ ⇐ (D(userf ) × D(itemf )) + userb + itemb sparse matrix (shape: 177,592 × 44,780)
loss ⇐ M SP E(Ŷ , Ytrain )) Three-way data split
backpropagation(loss) 70%-15%-15% train-test-validaiton
Adam(lr, β1 , β2 )
Prepared data (ready for machine learning)
lossval ⇐ M SP E(Ŷ , Yval ))
Fig. 1. Data preparation procedure used in the experiment

Viewed the item


PyTorch Lightning lightweight wrapper [29] (version 0.7.4) to 45.07%
achieve transparency, understandability, and reproducibility. 3,938,296 events

All measurements were made on a desktop system with


the following configuration: Intel Core i9-9900K CPU @ Added to cart All events
29.12% 100%
3.60GHz; 64 GiB memory; NVIDIA GeForce RTX 2080 2,544,192 events 8,738,120 events
Ti GPU with 11GiB dedicated memory and 7.5 compute
capability [30].
The data preparation used for the experimental assessment
in this paper is depicted in Fig. 1, and it is detailed in the Removed from cart Purchased
following subsections. After the data preparation, evaluation 19.31% 6.50%
1,687,591 events 568,041 events
metrics are introduced, which are then used to test the data-
reduction against the ground-truth, and later to test the ef-
Fig. 2. Logged user events in the dataset (percentage and number).
ficiency of the predictions. Then, Algorithm 1 is applied to
predict conversions.
parallelism of GPU. However, increasingly large batch-sizes
B. The Dataset and Parameter Selection have a negative effect on the generalization, resulting in worse
To validate the methodology documented in this paper, a validation and test error. Please note that the 1024 batch-size
user-event dataset of considerable size was needed. In Decem- is only indicative of GPU training on devices similar to the
ber 2019, Kechinov published the ‘eCommerce Events History test device.
in Cosmetics Shop’ dataset [10]. All the results presented in For dropout, for the observed dataset F1 results kept in-
this paper, are based on that user data, containing user events creasing until dropoutp = 0.1, and only a slight decrease
from October and November 2019. The dataset has the shape was observed at 0.2. We decided to use 0.2 because it had a
of 8,738,120 rows and 9 columns. The distribution of events considerably better M SP E.
is shown in Fig. 2.
The tuned hyperparameters used for training were batch C. Data Reduction
size: 1024; learning rate: 0.001; β1 : 0.9; β2 : 0.999; nf : 20; Data reduction is often used to solve the scalability problem
dropoutp : 0.2. We found that increasing β1 , β2 , or decreasing of real-world recommender engines [14], [32], [33]. This is a
the learning rate does not improve prediction accuracy, and crucial problem for e-commerce sites with a large and rapidly
also leads to a slower convergence. For nf , Google rec- growing user base and a wide variety of products.
ommends the fourth root of the number of categories, as a It is possible to drop all users who had a low number of
‘general rule of thumb’ [31], which is 20 (rounded down). events. The recommendation accuracy for those users would
We evaluated other integer values, and indeed 20 proved to be very close to random due to the low amount of data. We
be the best. When it comes to determining the batch size wanted to create personalized experiences for users with at
for GPU based training, the first approach is to try larger least five events, until they dispatch five events. Five events can
batch sizes, because that allows faster training due to the happen even after a minute of using an e-commerce platform,
300,000 Train loss
0.4 Validation loss
250,000
Number of users

MSPE loss
200,000 0.3
150,000
0.2
100,000

50,000 0.1

0
1 2 3 4 5+ 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000
Number of events per user Batch number

Fig. 3. Users categorised by the number of events they dispatched Fig. 4. MSPE loss change while training Algorithm 1 for 500 epochs
(1,260,998 batches).

so it has no significant drawback to user experience, but it TABLE I


leads to a significant reduction, dropping 75.1% of the data, E VALUATION OF A LGORITHM 1, COMPARED TO THE BASELINE .
as shown in Fig. 3. Removing 75.1% of the users only leads to
a 10.95% reduction in the number of events used. However, it Baseline Algorithm 1
leads to 75.1% reduction in the user dimension of the sparse 100 epochs 100 epochs 500 epochs
matrix, thus improving the execution speed of the solution
RMSE 1.144820 0.296800 0.261031
significantly. MAE 0.589327 0.193500 0.175807
Using Eq. (1), we calculated the probability of conversion
for each event type in the dataset, in other words, the likelihood Precision 0.136623 0.552044 0.596072
Recall 0.073204 0.295909 0.316774
of the event resulting in a purchase. For ‘view’ events this
F1 0.095329 0.385292 0.413696
value is 0.053, for ‘add to cart’ events 0.195, and for ‘remove
from cart’ 0.042. Using these values, we applied the UX value
function to all event chains, where the events were ordered
based on their timestamp (as recorded). Finally, we applied Training the algorithm 1 for 100 epochs took 1518.01 s, while
the three-way data split. for 500 epochs 8348.41 s.
To be able to compare the efficiency of the proposed
D. Testing the Data-reduction Against the Ground-truth solution, we have reconstructed a baseline matrix factorization
While in the implementation we calculated considerably model, by summarizing the findings of five research papers
more metrics (available in the repository), in this paper we [34]–[38]. The model was adapted to the outputs of the data
present only the following metrics: Root mean squared error preparation (sparse matrices), and uses stochastic gradient
(RMSE) and mean absolute error (MAE) calculated for q descent (SGD) as published by L. Bottou [39]. It is an unbiased
data points. The full test dataset precision was calculated by implementation with no regularization, the simplest form of
dividing the number of true positives (ground truth conversion SGD for the conversion prediction problem.
predicted to be conversion) with the number of all positive The results of training the model with Adam optimizer
results. Recall was calculated by dividing the number of true for 100 and 500 epochs demonstrate that predicting purchase
positives with the total of true positives and false negatives. behaviours is possible in a reasonable time, using consumer-
Finally, F1 is the harmonic mean of precision and recall. grade hardware. It is worth noting that after 100 epochs, the
We analyzed the results of the data preparation against the training could have been stopped, as it started to produce
ground truth to evaluate how accurately the UX event chain diminishing returns, as plotted on Fig. 4. The resulting metrics
function represented the multitude of user events with a single are summarized as Table I.
value. In other words, how much information is lost due to Based on the RMSE and MAE values, we can claim that our
reduction. solution predicts conversions with remarkable accuracy and an
Close to zero MAE (0.0036) and RMSE (0.0602) proves adequate precision and F1 score.
that the result of data reduction can be used as a basis of One of the reasons for the significantly better prediction
training. Our assumption is reinforced by having only 0.3% accuracy is using Adam optimizer with dropout regularization.
false positives. Precision is 0.9827, recall 0.9930, while F1 = Moreover, adding user bias and item bias also contributed to
0.987. the better results. The data reduction due to the UX value
function had to a most significant contribution to the relatively
E. Accuracy and Performance Assessment fast learning time of 25 minutes on a consumer-grade GPU.
Applying the UX value function to all event chains using
V. C ONCLUSIONS
a single thread of the test CPU took approximately 141 s
including the userID and itemID change. If needed, the process The solution presented in this paper creates accurate predic-
could be optimized with multiprocessing. With multiprocess- tions of what the users want to purchase based on the events
ing (using 16 CPU Cores), we achieved 21.8 s runtime. they dispatch while using an e-commerce application.
One of the main contributions of this paper is the UX value [19] E. Castillejo, A. Almeida, and D. López-de Ipina, “Social network
function, a method to reduce all events between a user-item analysis applied to recommendation systems: Alleviating the cold-user
problem,” in International Conference on Ubiquitous Computing and
pair to a single scalar value in the [0, 1) range, which also Ambient Intelligence. Springer, 2012, pp. 306–313.
depends on the sequentiality of the events. Calculating this UX [20] S. Yadav and S. Shukla, “Analysis of k-fold cross-validation over hold-
value for all events runs in linear time. The machine learning out validation on colossal datasets for quality classification,” in 2016
IEEE 6th International Conference on Advanced Computing (IACC),
model was introduced to prove the practical solvability of the Feb 2016, pp. 78–83.
problem in a scalable manner, with a relatively fast learning [21] A. K. Nandi and H. Ahmed, Classification Algorithm Validation. IEEE,
speed, and to good prediction accuracy. 2019, pp. 307–319.
[22] S. A. Vavasis, “On the complexity of nonnegative matrix factorization,”
As future work, to apply this method to a significantly larger SIAM Journal on Optimization, vol. 20, no. 3, pp. 1364–1377, 2010.
dataset, we intend to improve data reduction efficiency even [23] R. Hecht-Nielsen, “Theory of the backpropagation neural network,” in
further. Moreover, we also need to find a faster data loading Neural networks for perception. Elsevier, 1992, pp. 65–93.
[24] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
mechanism. Recently a new variant of Adam, AMSGrad, was arXiv preprint arXiv:1412.6980, 2014.
developed [40]–[42]. Unfortunately, preliminary evaluations [25] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,”
suggest that it produces relatively similar, but slightly worse arXiv preprint arXiv:1711.05101, 2017.
[26] C. Wei, S. Kakade, and T. Ma, “The implicit and explicit regularization
results for conversion predictions. effects of dropout,” 2020.
[27] N. Srivastava, G. Hinton, A. Krizhevsky et al., “Dropout: a simple way
R EFERENCES to prevent neural networks from overfitting,” The journal of machine
[1] A. Even, “Analytics: Turning data into management gold,” Applied learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
Marketing Analytics, vol. 4, no. 4, pp. 330–341, 2019. [28] A. Paszke et al., “Pytorch: An imperative style, high-performance
[2] J. Sauro and J. R. Lewis, Quantifying the user experience: Practical deep learning library,” in Advances in Neural Information Processing
statistics for user research. Morgan Kaufmann, 2016. Systems 32, H. Wallach et al., Eds. Curran Associates, Inc., 2019, pp.
[3] P. W. Szabo, User Experience Mapping. Packt Publishing, 2017. 8024–8035. [Online]. Available: https://bit.ly/pytorchcite
[4] J. R. Lewis, “Measuring user experience with 3, 5, 7, or 11 points: Does [29] W. Falcon et al., “Pytorch lightning,” https://bit.ly/pt-lightning, 2019.
it matter?” Human factors, p. 0018720819881312, 2019. [30] Nvidia, “Cuda c++ programming guide,” Nvidia Developer Documen-
[5] B. L. Velammal, “Typicality-based collaborative filtering for book rec- tation, 2019. [Online]. Available: https://bit.ly/cudacpp
ommendation,” Expert Systems, vol. 36, no. 3, p. e12382, 2019. [31] Google, “Introducing tensorflow feature columns,” Google Developers
[6] M. Patil and M. Rao, “Studying the contribution of machine learning Blog, 2017. [Online]. Available: https://bit.ly/tf-fc
and artificial intelligence in the interface design of e-commerce site,” [32] Z. Wang, X. Yu, N. Feng, and Z. Wang, “An improved collaborative
in Smart intelligent computing and applications. Springer, 2019, pp. movie recommendation system using computational intelligence,” Jour-
197–206. nal of Visual Languages & Computing, vol. 25, no. 6, pp. 667–675,
[7] M. V. R. Senthilkumar, C. R. Kiron, M. Vishnuvarthan et al., “A new 2014.
approach to product recommendation systems,” 2019. [33] X. Zhao, “A study on e-commerce recommender system based on big
[8] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen, “Collaborative data,” in 2019 IEEE 4th International Conference on Cloud Computing
filtering recommender systems,” in The adaptive web. Springer, 2007, and Big Data Analysis (ICCCBDA). IEEE, 2019, pp. 222–226.
pp. 291–324. [34] H.-J. Xue, X. Dai, J. Zhang, S. Huang, and J. Chen, “Deep matrix
[9] J.-W. Choi, S.-K. Yun, and J.-B. Kim, “Improvement of data sparsity factorization models for recommender systems.” in IJCAI, 2017, pp.
and scalability problems in collaborative filtering based recommenda- 3203–3209.
tion systems,” in International Conference on Applied Computing and [35] Q.-Y. Hu, Z.-L. Zhao, C.-D. Wang, and J.-H. Lai, “An item orientated
Information Technology. Springer, 2019, pp. 17–31. recommendation algorithm from the multi-view perspective,” Neurocom-
[10] M. Kechinov, “ecommerce events history in cosmetics shop - november puting, vol. 269, pp. 261–272, 2017.
2019,” 12 2019, kaggle dataset, provided by the REES46 Marketing [36] C.-D. Wang, Z.-H. Deng, J.-H. Lai, and S. Y. Philip, “Serendipitous
Platform. [Online]. Available: https://bit.ly/Kechinov recommendation in e-commerce using innovator-based collaborative
[11] V. Vijayakumar, S. Vairavasundaram, R. Logesh, and A. Sivapathi, filtering,” IEEE transactions on cybernetics, vol. 49, no. 7, pp. 2678–
“Effective knowledge based recommender system for tailored multiple 2692, 2018.
point of interest recommendation,” International Journal of Web Portals [37] L. Huang, Z.-L. Zhao, C.-D. Wang, D. Huang, and H.-Y. Chao, “Lscd:
(IJWP), vol. 11, no. 1, pp. 1–18, 2019. Low-rank and sparse cross-domain recommendation,” Neurocomputing,
[12] P. W. Szabo and Z. L. Janosi, “Using machine learning and behavioural vol. 366, pp. 86–96, 2019.
analysis for user-tailored viewer experience,” in Proceedings of the 2019 [38] G. Trigeorgis, K. Bousmalis, S. Zafeiriou, and B. W. Schuller, “A deep
IBC Show, ser. IBC2019, 2019. matrix factorization method for learning attribute representations,” IEEE
[13] Z.-H. Deng et al., “Deepcf: A unified framework of representation transactions on pattern analysis and machine intelligence, vol. 39, no. 3,
learning and matching function learning in recommender system,” pp. 417–429, 2016.
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, [39] L. Bottou, “Large-scale machine learning with stochastic gradient de-
p. 61–68, Jul 2019. scent,” in Proceedings of COMPSTAT’2010. Springer, 2010, pp. 177–
[14] M. K. Najafabadi, A. H. Mohamed, and M. N. Mahrin, “A survey on data 186.
mining techniques in recommender systems,” Soft Computing, vol. 23, [40] J. R. Sashank, K. Satyen, and K. Sanjiv, “On the convergence of adam
no. 2, pp. 627–654, 2019. and beyond,” in International Conference on Learning Representations,
[15] Y. Shao and Y.-h. Xie, “Research on cold-start problem of collabora- 2018.
tive filtering algorithm,” in Proceedings of the 2019 3rd International [41] P. T. Tran et al., “On the convergence proof of amsgrad and a new
Conference on Big Data Research, 2019, pp. 67–71. version,” IEEE Access, vol. 7, pp. 61 706–61 716, 2019.
[16] N. Silva et al., “The pure cold-start problem: A deep study about how [42] T. Tan, S. Yin, K. Liu, and M. Wan, “On the convergence speed of
to conquer first-time users in recommendations domains,” Information amsgrad and beyond,” in 2019 IEEE 31st International Conference on
Systems, vol. 80, pp. 1–12, 2019. Tools with Artificial Intelligence (ICTAI). IEEE, 2019, pp. 464–470.
[17] Y. Zhu, J. Lin, S. He, B. Wang, Z. Guan, H. Liu, and D. Cai, “Addressing
the item cold-start problem by attribute-driven active learning,” IEEE
Transactions on Knowledge and Data Engineering, 2019.
[18] S. Natarajan, S. Vairavasundaram, S. Natarajan, and A. H. Gandomi,
“Resolving data sparsity and cold start problem in collaborative filtering
recommender system using linked open data,” Expert Systems with
Applications, vol. 149, p. 113248, 2020.

You might also like