0% found this document useful (0 votes)

21 views17 pages

Log-Based Session Profiling and Online Behavioral Prediction in ECommerce Websites

important document, machine learning, data set, Ai

Uploaded by

kamaram Monira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views17 pages

Log-Based Session Profiling and Online Behavioral Prediction in ECommerce Websites

important document, machine learning, data set, Ai

Uploaded by

kamaram Monira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Received July 28, 2020, accepted August 31, 2020, date of publication September 18, 2020, date of current

version September 30, 2020.

Digital Object Identifier 10.1109/ACCESS.2020.3024649

Log-Based Session Profiling and Online

Behavioral Prediction in E–Commerce Websites
JAVIER FABRA , PEDRO ÁLVAREZ, AND JOAQUÍN EZPELETA
Department of Computer Science and Systems Engineering, Aragón Institute of Engineering Research, Universidad de Zaragoza, 50009 Zaragoza, Spain
Corresponding author: Javier Fabra ([email protected])
This work was supported in part by the Spanish Ministry of Economy and Competitiveness under Project TIN2017-84796-C2-2-R, and in
part by the Aragonese Government under Project DisCo-T21-20R.

ABSTRACT Improvements to customer experience give companies a competitive advantage, as understand-

ing customers’ behaviors allows e-commerce companies to enhance their marketing strategies by means of
recommendation techniques and the customization of products and services. This is not a simple task, and
it becomes more difficult when working with anonymous sessions since no historical information of the
user can be applied. In this article, analysis and clustering of the clickstreams of past anonymous sessions
are used to synthesize a prediction model based on a neural network. The model allows for prediction of
a user’s profile after a few clicks of an online anonymous session. This information can be used by the
e-commerce’s decision system to generate online recommendations and better adapt the offered services to
the customer’s profile.

INDEX TERMS Behavior prediction, user profiling, log analysis, clustering, neural networks, model
checking.

I. INTRODUCTION applied to create a segmentation on the available data [4], [5].

E-commerce is a very effective way to bring customers Frequently, a series of metrics that revolve around the concept
to your business and offer them a 24-7 service. Further- of user sessions have previously been generated. The clusters
more, the possibility of keeping the customer connected to obtained are then used to define profiles according to the
the business in a non-face-to-face manner has become a user’s browsing history [5]. This history also allows the
necessity, as reflected by the recent crisis caused by the clusters to be characterized so that it is possible to understand
coronavirus (COVID-19). Analysis of the behaviors and customers’ previous actions (transactional data) or their
interests of customers is crucial to improve the systems that demographic profiles [6]. Nevertheless, it is exponentially
support e-commerce, with the aim of providing customized more valuable to provide insights concerning what customers
services and products to increase the conversion ratio related will do in the future.
to purchases [1], [2] and enhance loyalty in certain strategic The renewed interest in artificial intelligence techniques
sectors [3]. has led to a proliferation of methods for predicting the
Enhancement of a customer’s experience gives companies future behavior of e-commerce customers [7], [8]. Most
a competitive advantage, as generic marketing makes brands of these methods address the prediction problem from
forgettable. The target for a commerce company is loyal the perspective of aggregated data, providing high-level
and engaged customers who come back and buy again. predictions as a result. Obviously, it is very interesting to
Prediction of customer behaviors and tastes, as well as know the probability of a customer’s purchase during the
individual analysis, is a very complex task that requires a next visit to the website or whether she/he will be interested
time-consuming integration process. Under a typical configu- in buying a specific product in the future. Nevertheless,
ration, information on the activity of visitors and customers of the challenge is to make progress towards predictive analytics
an e-commerce website is stored in the server logs. To classify that offer fine-grained results and increase the dynamism
the users of the website, clustering techniques are frequently of the business [9], [10]. This new generation of predic-
tion techniques must help adopters to discover potential
The associate editor coordinating the review of this manuscript and customers, prevent churn, configure the website’s layouts
approving it for publication was Mansoor Ahmed . to maximize sales, or offer customized recommendations.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
171834 VOLUME 8, 2020
J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

Achieving these goals requires that the prediction models be out in Section V. Section VI details the customer profiling
integrated into the company’s decision-making systems and and the validation process. The behavior prediction methods
that their predictions be validated to adapt those models to used and the obtained results are presented in Section VII.
the changing conditions of the business and the evolution of The integration of the prediction system into an e-commerce
customers’ habits. platform is then detailed in Section VIII. Finally, Section IX
In this article, existing research proposals in the field outlines some conclusions of this article and addresses future
of customer behavior prediction are reviewed. This review research lines.
shows the necessity of addressing the challenges towards
building fine-grained predictive models and applying them to II. RELATED WORK
real scenarios. The research presented in this article advances Before making predictions about customers’ future behav-
in this direction by focusing on the behavioral analysis of ior, it is necessary to discover the different profiles of
unregistered customers of an e-commerce, and it shows that it users that visit the e-commerce website. The process
is possible to accurately predict the customer profile of a user of profiling consists of two stages: the characterization of
session while browsing the website. With respect to existing customers’ past behaviors and the grouping of customers who
techniques, the proposed solution addresses the following behave similarly. In this section, the most relevant research
contributions: approaches related to these two stages will be detailed and
• the prediction model works with incomplete unregis- analyzed.
tered user sessions; Regarding the first stage, most research techniques create
• the different customers’ profiles are explicitly con- customers’ behavioral descriptions from the website’s log
sidered as part of the predictions, thus providing a files or the database of customer transactions. The contents of
deeper understanding of users’ browsing and purchasing these descriptions can vary depending on the intended use of
behaviors; the analysis results. Customer personal data [11], their RFM
• those profiles are interpreted and validated from a (Recency, Frequency and Monetary) values [12]–[16], their
business perspective to match predictions with desirable browsing behaviors [17], [18] or purchasing habits [19]–[21],
customer behaviors; or the products they have shown interest in [22]–[25] are
• finally, the integration process of a prototype of the solu- typically used for the creation of such descriptions. The
tion into a website based on the Magento e-commerce concept of session plays a relevant role in this characteriza-
technology is detailed. tion due to the fact that a description is calculated for each
The proposal requires building prediction models that customer session. For this reason, the existing approaches
are used along with clickstream techniques to analyze the are essentially interested in the analysis of registered users.
customers’ behaviors at runtime. To do that, methodologies The sessions of these users are clearly identified and directly
for server log processing, clustering algorithms and artifi- recorded in the website’s log files. As an exception, [22] is
cial intelligence techniques are combined in a three-phase the only work dealing with unregistered users. In this case,
process. First, the log files are processed to discover the a process of reconstructing sessions based on clickstream
customer profiles. These profiles are then validated and analysis is required [4].
interpreted from a business perspective, associating each one Once customers’ descriptions have been created, they are
with a set of behavioral patterns that characterize the users usually grouped using either clustering methods [13], [15],
belonging to that profile. After that, a prediction model is [16], [19], [23], [26] or classification methods [17], [18], [25].
created and trained to evaluate users’ pattern-based behavior As a result, the application of these techniques generates
and determine the customer’s profile. Alternately, once the a set of clusters that must be subsequently interpreted
models are available, predictions are conducted using the to understand the particular behaviors of each class of
user’s clickstream. This allows the system to perform, after customers. An expert-guided analysis of the computed
a small number of events, precise predictions concerning the clusters is proposed by some of the approaches [12], [20],
segment the session is probably going to fall into. [23], [26]. Such a task is rather complicated and time-
From this point, the results from the predictions can be consuming, and therefore alternatives that automatically
used to adapt the customer’s session so as to reinforce the extract knowledge from clusters’ descriptions should be
prediction or attempt to move the session towards a more studied. [11], [19] use association rules for automating
interesting segment, according to the e-commerce website’s these interpretations. Nevertheless, they require advance
interests. Unlike other existing solutions, predictions are knowledge of the interesting attributes to define suitable
based on the current behaviors of users and not limited by rules. However, the clusters obtained must also be validated.
the purchasing probability. Ideally, the clusters’ validation should consist of matching
The remainder of this article is organized as follows. users’ future behaviors according to clusters. Some works
Section II focuses on related work. The process for predicting have proposed frameworks to provide an incremental cluster-
the customer’s profile and a real scenario are introduced in ing to dynamically maintain the customer profiles [27], [28].
Section III. The preprocessing of the log files is presented Despite the efforts to interpret and validate clusters, these are
in Section IV. After that, the clustering process is carried still open challenges.

VOLUME 8, 2020 171835

J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

Customer segmentation techniques are needed to build feature engineering to automate the selection and ranking
models that help to make predictions regarding customers’ of a large number of features to improve prediction tasks.
future behaviors. To give a compact view of the existing Secondly, a method based on association rules is proposed
research in the field of prediction, a set of criteria that help in [4], [31] as an alternative to traditional techniques.
us to classify these works has been established. The result of Prediction techniques. Three types of techniques have been
this classification is presented in Table 1. Let us now detail widely applied in the prediction of customers’ purchasing
the classification criteria used. behavior: classification methods, regression analysis, and
Prediction goals. Some models address the challenge algorithmic techniques. Among the classification methods,
of distinguishing between buying and non-buying sessions the most common are Neuronal Networks (NN), Decision
(B/NB, two possible prediction outcomes) [4], [29]–[35], Trees (DT), Support Vector Machines (SVM), Random For-
[7], [8]. Alternatively, other works concentrate on calculating est (RF) and Naives-Bayes models (NBM). The approaches
the probability that a customer buys either a specific based on these methods make their predictions using a single
product (B-Prod) [20], [36]–[38] or a class of products classifier (S-Class, [33], [34], [43], [46]) or by combining
(B-CProd) [39], [40], makes a purchase in the next visit multiple classifiers to improve the accuracy of the results
to the online store (Next) [41]–[43], or repurchases in a (M-Class, [7], [8], [29]–[31], [40], [42], [44], [45]). In the
future session (ReP) [44], [45]. Time constraints have also last case, a combination algorithm integrating the predictions
been considered as a part of some prediction models to of the different classifiers is needed. Genetic algorithms [30]
estimate the purchasing probability of a user for the next (GA), the Artificial Bee Colony (ABC) algorithm [45],
day (Next-D) [46], for the next year (Next-Y) [47], or over Bootstrap Aggregation (BA) [48] and strategies based on
time (NoT) [40]. Likewise, customers’ profiles have been majority voting [20], [31] (MV) have also been used as
used to distinguish between VIP and non-VIP customers combination methods. As an exception, [44] assigns weights
(V/NV) [48]. Notice that the predictions in the cited research to models manually (MN).
require identified users to predict future behavior once the Alternately, regression analysis has been used to determine
corresponding past behavior has been analyzed. the purchasing probability using logistic regression [32], [35],
Data source. Customers’ past behavior is usually extracted [38], [41], [47]. This statistical model requires an a-priori
from log files generated by Web servers (Log-based pro- analysis of the predictors to be used and the correlations
posals, Log) or transaction data recorded in the seller’s between them to make accurate predictions. In [20], a hybrid
ERP/CRM systems (database approaches, CTD). These solution is presented. Logistic regression and classification
data sources are processed to discover and select the methods are combined to improve the purchasing predictions
features/attributes that will be used to create prediction of a concrete e-commerce. Although the results notably
models. As an exception, [34], [45], [47] propose the use increase the prediction coverage, prediction accuracy is not
of questionnaires for gathering information regarding cus- clearly improved with respect to other approaches based on
tomers’ preferences and behaviors in the hiring of (banking) the use of a unique technique.
services. Finally, different algorithms have been proposed to study
Types of customers. Most works make their predictions specific purchasing behaviors [4], [36], [37]. These solutions
based on registered customers’ past behaviors. Only four define probability models that are evaluated in conjunction
works estimate the purchasing probability for unregistered with association rules to extract the knowledge of interest for
customers [4], [29], [31], [32]. These solutions apply e-commerce managers. [36] attempts to discover the most
clickstream analysis to reconstruct users’ sessions and profitable products and customers. It searches for potential
discover user’s behaviors during their navigation through the customers interested in purchasing a star product in the near
e-commerce website. future and analyses those buyers’ personal profiles. [37]
Selection of a predictor. The selection of features is a determines the best time (the peak hour) for a customer to
critical issue for the creation of an accurate prediction model. purchase a product. This time-based information is used to
It consists of extracting/computing a set of relevant attributes deliver personalized marketing messages to increase sales. [4]
from the data source. As shown in Table 1, the most common uses rules to estimate the purchasing probability of a user
attributes are customers’ personal (P) or demographic session depending on the pages that were visited in the past
(D) data, product interest scores (PI), customers’ naviga- and the time spent on them.
tion (NB) or purchasing behaviors (PB), or historical pur- The nature of approaches. Some of the research works
chasing data (HP) (the RFM value or payments, for instance). are Application-oriented Approaches (AoA) in the sense
Nevertheless, some proposals select alternative interesting that they apply existing prediction methods to solve some
attributes, such as the use of shopping carts (SC) [32], concrete problem, usually in the domain of e-commerce or
[43], seller’s reputations and facilities (SRF) [45], customers’ e-banking services. Adopting a different point of view, some
opinions (CO) [47], changes in user behavior (ChB) [46] works (let us call them Methodology-oriented Approaches,
or interactions of users with Web pages and their ele- abbreviated as MoA) concentrate on defining new meth-
ments (Int) [7]. From a methodological point of view, ods/algorithms for predicting future customers’ purchasing
two approaches should be emphasized. Firstly, [44] applies behaviors. Generally, these types of works also validate

171836 VOLUME 8, 2020

J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

TABLE 1. Comparative analysis of the methods for predicting the purchasing probability.

their solutions by applying them to real application cases usually studied. Some works apply association rules (AsR)
(MoA/AoA). to analyze the sessions with high purchasing probability to
Integration into e-commerce websites. Prediction methods discover behavioral patterns and the reasons that lead to
help explain customers’ behaviors. This understanding can be the purchase of some products [4], [31], [36], [48]. The
used to improve the design and contents of websites, perform knowledge discovered is limited and consists of simple
various recommendation techniques, increase the effective- relationships between pairs of navigation/purchasing events.
ness of marketing campaigns, or customize the service for As an alternative, [47] builds a Behavioral Scoring Model
the user, for instance. In spite of these possibilities, most (BSC). These models have been widely used to identify
solutions have not been integrated into a real e-commerce frequent user behaviors in the field of financial services,
system, except for [7], which developed a prototype of but their applicability to online commerce must still be
a system that can be installed on users’ mobile devices. investigated.
Therefore, the integration of the predictions in the lifecycle
of e-commerce websites is an open challenge that should be III. A PROCESS FOR PREDICTING CUSTOMERS’ PROFILES
addressed. Our goal is to create a model that helps to predict the possible
Validation of results by experts. Prediction models and future behavior of a customer session while browsing an
algorithms are usually trained and tested using the data e-commerce website. This prediction can be used to influence
recorded in server logs or transaction databases. Moreover, customers’ actions (for example, to improve purchase inten-
different metrics (recall, precision, etc.) have been defined tions and/or probability) or to provide them with customized
to evaluate the quality of the predictions. Nevertheless, other contents or products. The proposed approach consists of
supplementary methods should be applied to evaluate the real applying a process in three phases: preprocessing of the server
usefulness of predictions to create new business value and log files, discovery of the customer profiles, and synthesis of
opportunities. These methods could consist of a qualitative the behavioral model. The final result is a prediction model
validation of results based on expert opinions, for instance. that is integrated into the e-commerce’s decision system to
Alternately, because the prediction models are not usually personalize the services it offers to customers.
integrated in real systems, the predictions are not validated The process followed in this article is similar to that used
with customers’ future behaviors. As an exception, [47] in [49] in the field of predictive business process monitoring.
assesses the validation of predictions (the probability that a In that case, a two-phase approach analyzes incomplete traces
customer buys during the next year) by comparing them with of business processes to predict at runtime whether their
the purchases made during the year following the publication execution outcomes will be as expected. These predictions
of the paper. help to minimize the likelihood of violation of business
Discovery of extra knowledge. Many of the proposals constraints specified using Linear Temporal Logic (LTL).
aim to classify customers’ behaviors. Nevertheless, the rea- That technique is applied over medical processes with a
sons that lead customers to exhibit that behavior are not well-defined structure. It is a relevant difference with respect

VOLUME 8, 2020 171837

J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

to our approach, in which a user can navigate freely through coming from an external search engine) and also be short
the website’s structure. sessions.
The aim of the prediction model synthesis phase is to
A. BUILDING A PREDICTION MODEL generate a prediction model so as to be able to establish, after
Figure 1 shows the three-phase process followed in this a few session events, the cluster to which a live session is
article. Firstly, the raw e-commerce logs are preprocessed probably going to belong. The inputs for this phase are the
to discard uninteresting requests, identify user sessions and set of clusters and some behavioral indexes for each session
prepare the log contents to enable their analysis. The result that typically are associated with initial stages of the session.
of the preprocessing phase is a collection of sessions. The prediction model will be used to analyze the event stream
A session is an ordered sequence of user interactions with the of each session and, after the considered initial stage, predict
system (events) that take place within a time frame. A session the cluster of the considered session. Different (artificial
can contain multiple page views, events, social interactions intelligence or statistical) techniques can be applied to obtain
and e-commerce transactions corresponding to actions such the prediction model. In this work, neural network pattern
as visiting a page, executing a search, adding/deleting a recognition techniques are applied during the process, but
product to/from the cart or completing the payment process, it could be easily adapted to different alternatives. For that,
for instance. A session can be interpreted in terms of users’ a vector of features is obtained for the first k events of each
behaviors. The process is designed to be useful for systems session. The features and clusters feed an artificial neural
with either logged or anonymous access. In the first case, network synthesis method, which is trained and validated (in
a session is clearly established in terms of a sequence of the Model training and Quality analysis tasks, respectively).
events corresponding to the logged session. In the case of The resulting artificial neural network is the model used to
anonymous access, a sessionization process is required to predict the session behavior.
establish which events in a sequence can be considered as Finally, the model obtained is integrated into the
belonging to the same session, as will be detailed later. e-commerce’s prediction system. E-commerce data logs are
Afterwards, the established sessions are used to determine processed during the customer’s navigation and transformed
the e-commerce’s customer profiles. This second phase starts into events of interest. These events represent the customer’s
by computing a vector of features for each session (the actions during the browsing (visiting a product category or
features creation task). A feature provides a high-level the product itself, using the search engine, adding/deleting a
and (usually) quantitative description of the user’s behavior product to/from the cart, or completing the purchase, among
during the session: the total session time, the number and type others). The prediction system interprets the event stream and
of visited pages, or the number of times that the resources determines the most probable customer profile, which will be
were used (the search engine, the cart, the wishlist, etc.), for used by the e-commerce system to make some decisions or
instance. In the case of logged e-commerce websites, features recommendations adapted to the session behavior.
can be enhanced with demographic and geographic data,
buying patterns of previous user sessions, or the purchase B. THE UP&SCRAP USE CASE
history, for instance. The features are then processed by the The process presented to obtain a prediction model will be
clustering task to group in the same cluster those sessions applied over a real e-commerce website. Specifically, it will
that present similar features (that are assumed to be strongly be applied over the website of Up&Scrap1, a scrapbooking
related to the user’s behavior). These first two tasks can be company with more than 25, 000 clients all around the world.
executed several times to improve the features’ expressivity In this subsection, the structure and contents of this website
and to find a more adequate (optimal) number of clusters as are introduced.
well. The structure of the website of Up&Scrap is organized
Once the session clusters have been computed, they around two different types of sections (main and secondary).
are interpreted from a business perspective. The business Each section is then split into several subsections to refine
analyst is responsible for mapping these clusters to customer the product classification. Figure 2 depicts the structure of
profiles. This profile discovery is complex and requires the website. Similar taxonomies have been proposed by
knowledge of the website’s structure, the customer’s interests different authors but including only main sections [23], [50].
and purchase habits, and the types of users that typically From the homepage (level 0), different sections can be
interact with the e-commerce. Finally, the resulting profiles accessed (level 1). Two different types of sections can be
must be validated. This task consists of checking if the distinguished. Main sections organize products according to
behavior of a cluster’s session corresponds to the behavioral their functionality and utility. The website provides a menu to
description of its profile. To do that, some type of study access this main categorization of products. There are eight
of the conformance of the sessions in a cluster and the different sections (papers, decorations, stamps, tools, project
intuitive description established for it must be carried out. life-smash, albums, home decor-DIY, and gifts), which are
For instance, if a cluster is described as corresponding to divided into subcategories. Alternately, there are secondary
spurious website users, one can expect those sessions to enter
the system from a different point than the home URL (maybe 1 http://www.upandscrap.com

171838 VOLUME 8, 2020

J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

FIGURE 1. Sketch of the process for identifying and predicting the customers’ profiles.

the IP address from which the session was established,

the timestamp of the request, the page URL, or the HTTP
status returned to the client, for instance. The data that
is being handled in this article correspond to the analysis
of the log provided by the system corresponding to two
months of use, and 8, 607, 625 events are contained. The log
corresponds to non-logged users sessions, and there is no
user info. Figure 3 shows a piece of the raw web log of the
considered scenario.
First, a cleaning and filtering process to remove undesired
and uninteresting records for the behavioral analysis of users
was applied. Specifically, events corresponding to the follow-
ing criteria were removed: automatic requests, such as the
ones performed by robots, spiders and crawlers; requests with
FIGURE 2. Abstract view of the structure of the Up&Scrap website.
erroneous status codes that are not relevant for navigational
patterns; requests of irrelevant HTTP methods (only GET and
sections, which classify products according to other com- POST requests have been considered because they are unique
plementary criteria, such as designers, themes/collections, requests directly from users); and finally, requests asking for
brands, offers, or new products, for instance. The secondary multimedia contents automatically generated by the browser.
menu provides access to six different secondary sections. After these steps, the log was reduced to 5, 875, 479 records,
The website also includes a search engine that allows users 68.26% of the original size.
to directly look for products without using the proposed After that, a log preparation stage was carried out. The
sections. As a consequence, the website structure allows users aim of this process is to prepare the log file for the
to reach products following many different navigation paths, clustering process. Two types of actions are performed to
as well as offering direct access via the search engine. this end. In the categorization sub-phase, each record is
In the following, let us detail the process of identifying and analyzed to identify high-level events and extract meaningful
predicting the customers’ profiles for Up&Scrap through the information. Additionally, log contents are reduced in the
different phases depicted in Figure 1. simplification sub-phase to increase the effectiveness of the
post-processing.
IV. PREPROCESSING OF LOG FILES During the log categorization, different events can be
The website logs follow the Common Language Format identified by analyzing the CLF log contents. For such
standard (CLF) [51] and provide raw information such as purpose, each event is automatically classified by considering

VOLUME 8, 2020 171839

J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

FIGURE 3. An extract from the raw log of the web server, where IP addresses have been anonymized.

whether it is a GET or POST request and analyzing its URL (accessing the results of the search engine is considered
in terms of the presence and/or absence of specific keywords to be visiting a secondary section), Visit_main_section_L2,
and resources (that is, the words between slash characters). Visit_secondary_section_L2, Visit_homepage (an event
These requests are classified based on their deepness and type representing that the homepage has been visited),
whether they correspond to a main or a secondary section. Visit_product (an event type representing that the URL of
In the case of the Up&Scrap website, its structure is organized a product has been visited), Add_wishlist_products_to_the_
in two levels (N = 2). The different events have then been cart, Add_product_to_the_cart, Add_product_to_the_
separated into 63 different types, such as Visit main section wishlist, Buy_products_in_the_cart, Delete_product_from_
L1, Visit secondary section L22, Visit product, Login, Logout, the_cart, and finally, Update_product_ from_the_cart.
Add product to the wishlist, Add product to the cart, etc. The last filter removes duplicated events, which reduced
These events refer to different actions and can affect different the log size to 1, 331, 697 records. Figure 4 shows the
sections of the website. However, not all event types are processed log of the extract depicted in Figure 3. A more
interesting for the analysis because some of them provide detailed description of the entire process can be found in [52].
superfluous information. They can, for instance, refer to user After that, a sessionization process to group those events
account management or legal warnings. that could be considered as belonging to the same session
Therefore, in the simplification stage, some of these was conducted; because this study deals with non-logged
event types are discarded, and only the event types that sessions, additional criteria had to be applied to define the
are interesting for the type of analysis that is going to be start and end events of each session. For that, a session as the
conducted are considered, with the aim of reducing the ordered sequence of events from the same IP for which no
amount of information included in the log by filtering the more than 30 minutes passed between any two consecutive
records that do not contain relevant information. To do that, events was defined. This is a common characterization that
the following filters are applied. First, sessions with fewer has been used in log file analysis to discover knowledge by
than three requests are discarded because they do not contain several authors [53]–[55]. As a result, 138, 085 anonymous
valuable information and mainly correspond to users that do sessions were identified.
not have an interest in the website contents.
Second, some events are discarded because they do not
V. CLUSTERING PROCESS
provide valuable information for the analysis. Because the
The next step consists of the clustering of sessions with some
goal of analyzing the logs is to extract information regarding
common characteristics. To do that, for each session, a set of
users’ behaviors and preferences when buying products,
global properties was extracted. The set of properties that can
there are many events that can be considered superfluous,
be interesting is strongly related to the problem domain. In the
such as events related to user account management or rating
domain of this work, it has to be dependent on the website
products. In this case, a set of 12 types of events that
structure because the structure will constrain the types of
considered relevant for the analysis has been identified, and
sequences of events a user can execute. For the structure
the remaining ones have been filtered. In the following,
in Figure 2, the properties described in Table 2 have been
they are detailed according to the different sets iden-
considered.
tified: Visit_main_section_L1, Visit_secondary_section_L1
For each session, the set of corresponding values is used
2 Note that L1 and L2 are related to the two levels of the Up&Scrap to generate the vector of features. This vector is useful
website for providing a high-level view of the session (abstracting

171840 VOLUME 8, 2020

J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

FIGURE 4. Log generated after the preprocessing phase from the web server log’s extract depicted in Figure 3.

TABLE 2. Properties considered as characterizing user sessions.

unimportant details) and facilitating their interpretation by the obtained during the clustering process). A lower entropy
business analyst. Of the 15 properties identified in Table 2, determines the optimal number of clusters in which the data
9 are left, which are those most relevant for prediction should be grouped. In the case of the R software, the NbClust
issues. Of these 9 properties, some can be grouped/added package [58] was used. This package provides 30 indexes to
for the clustering process. The other properties that have determine the optimal number of clusters and proposes the
not been used in this phase can be useful and interesting best grouping scheme based on the different results obtained.
later to perform certain validation processes. Specifically, This includes very well-known methods such as Silhouette or
the following features have been considered: MAIN , which Pamk (which uses the PAM or Clara algorithms, along with
groups visits to the main category (MAIN = ML1 + the Silhouette method).
ML2 ); SECONDARY , which groups visits to the secondary The clustering process was conducted using the sessions
category (SECONDARY = SL1 +SL2 ); MARKETING, which whose lengths were greater than one event, feeding the
groups marketing-related events (MARKETING = OFFER+ process with the described features. One-event sessions can
NOV ); INTEREST , which groups events that indicate interest be considered as noise traces. From the original dataset,
(INTEREST = WISH + PROD + CART ); and finally, which contained 138, 085 sessions, there are 101,917 traces
the SEARCH feature, which corresponds to the property of whose lengths are longer than one event (LONG > 1). Both
the same name. As a result, a feature for each session is Knime and R provided us with the same optimal number of
obtained. As the next step, the sessions are clustered. clusters, k = 4.
As the clustering technique, k-Means has been applied to Table 3 provides information regarding the results of
the vector of features. The objective of this algorithm is to the clustering and the mean values for the features of
partition a set of n elements into k groups of ‘‘near’’ elements: each cluster with respect to the considered sessions. Each
each element belongs to the group whose average is closer. element in the table corresponds to the mean value in the
The algorithm requires definition of the number of clusters, considered set. For instance, the normalized global mean
k. To find the optimal value of k, Knime [56] and R [57] were value of the MAIN value is 0.3538, while that constrained
used. to Cluster 1 is 0.0348. Cluster 1 contains 20, 273 sessions
In Knime, a workflow that performs an iterative process (19.9% of the total number of sessions), cluster 2 contains
and calculates the entropy generated by the selection of 38, 670 sessions (37.9%), cluster 3 contains 15,573 sessions
different values of k has been developed. Entropy is a measure (15.3%), and finally, cluster 4 contains the remaining 27, 401
of the variation of the attributes in the data set for each sessions (26.9%).
cluster; the closer the value is to 0, the greater the similarity The analysis of the features of each cluster with respect
of the data is. However, the further away from 0, the greater to the initial set of data shows that there is a set of feature
differences between data were (and thus worse results are values that stands out for each cluster (the values have

VOLUME 8, 2020 171841

J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

TABLE 3. Normalized clustering results for k = 4.

been highlighted in bold in Table 3), which can be used instead remaining in a narrow navigation area. They have a
to establish the users’ profiles. Cluster 1 shows normalized low browsing dispersion. Analyzing session duration, it was
mean values of 260% for the SECONDARY and 409% for found that these customers’ sessions are short.
the MARKETING attributes, respectively. Cluster 2 shows The second cluster stands out in the MAIN property, which
a ratio of visits to the main page (MAIN) of 189% with indicates that the users falling into this cluster represent
respect to the global average. Cluster 4 stands out in the first-time customers or customers that spend time browsing
SECONDARY and SEARCH values, with averages of 189% the website. They probably are users that land on the website
and 469% with respect to the global average, respectively. for exploratory or purchasing purposes. These users have
The fact that the search events are part of the secondary ones long-term sessions with high dispersion (low focus on the
is clearly reflected in the correlated values of such attributes same level/category of items), and they show a moderate ratio
in this cluster. Finally, cluster 4 shows that INTEREST stands of purchases.
out with values that represent 197% with respect to the A detailed view of cluster 3 emphasizes that, as in cluster 1,
global average. There are other coincidences regarding the the secondary-section property stands out. However, this
outstanding features among the clusters, which will help in cluster also highlights the search engine property (SEARCH),
the characterization process. which allows intuiting that the population that falls into
cluster 3 corresponds to those customers that browse using
VI. CUSTOMER PROFILING AND VALIDATION the search engine (search-based navigation). There are three
The analysis of the data obtained from the clustering process main options that probably explain this behavior: possible
allows us to perform an initial profiling phase [59]. Customer ignorance of the website map, the aim of looking for
profiling is the subdivision of a market into discrete customer very specific items, or a non-specific purchasing/browsing
groups that share similar characteristics [60], [61]. This focus. Additionally, depending on the session duration, two
process allows for identification of common characteristics different subclasses of customers with this profile can be
among different users and potential customers, as well as distinguished: one with sessions that have low time between
the proposal of retargeting strategies. Customer profiling events, which represent customers that usually will not finally
requires, as key steps, the division of the market into purchase; and another one with very specific customers who
meaningful and measurable segments (clusters) according to visit the product page to finally purchase it. These last are
customers’ needs, past behaviors or demographic profiles (if represented by sessions that are characterized with a longer
available), as well as determination of the profit potential of time between events.
each cluster by analyzing those aspects and characteristics Finally, customers grouped in cluster 4 stand out for the
that stand out in each one. INTEREST property, which groups the wishlist, visits to
detail pages of the product, and actions in the shopping cart.
A. CLUSTER INTERPRETATION
This indicates that these customers may have a clear idea
about the website and the products and categories they are
Let us now perform a cluster interpretation focusing on
interested in. It can be observed also that these customers
the salient features of the clusters obtained in the previous
focus more on main sections than on secondary ones, visit
section, as well as the results shown in Table 3. In addition,
more product pages, and spend some time on them. Usually,
the information that the clusters offer us allows for the
they enter through the homepage, and their sessions end with
addition of certain characteristics based on the properties
just purchasing products or keeping products in the cart.
calculated previously. The values that stand out above the
Sessions belonging to this profile are long and concentrated
others for each feature appear in bold in Table 3.
on interest-related events.
As shown, cluster 1 stands out in the features of
Based on the main characteristics of each cluster, let
MARKETING and secondary sections (SECONDARY).
us provide a named classification. This helps to create a
These values indicate that this cluster groups the customers
conceptual separation among the groups, similarly to other
who usually access (or repeat their visit to) the website
approaches [62]:
via a campaign or marketing source (which correspond to
secondary items). Those customers also focus on secondary • customers in cluster 1 correspond to repeat or geek
items (SECONDARY feature) and do not tend to move out customers;
of the visited category or explore among different categories, • customers in cluster 2 correspond to explorer customers;

171842 VOLUME 8, 2020

J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

• customers in cluster 3 correspond to searcher customers the LTL version proposed in [63]. In the table, operators
(or narrow searcher customers for very specific ones); G,F, and X have the usual LTL interpretation: Always,
and finally, Eventually and Next, respectively, being H,O, and Y in past
• customers in cluster 4 correspond to potential or counterparts. Alternately, x,y,z, appearing in queries Q15 and
prospective buyers. Q16 , correspond to freeze operators, allowing us to talk about
specific positions in the session and providing the capacity to
B. CLUSTER VALIDATION relate attributes of different session events.
The complete data (features along with the initial properties) The answers to the questions are different depending on
from the clustering process was used to validate the clusters the clusters. The characteristics that stand out for a cluster
using model checking techniques [63]. To this end, this study with respect to the others have been highlighted in green
proposes a set of validation queries related to results obtained in Table 5, while those less prominent but equally important
from the clustering process. features have been highlighted in yellow.
Events have been defined as propositional variables and As it is shown, queries Q1 and Q2 especially highlight
grouped into formulas, as depicted in Table 4, for better sessions in cluster 1, corresponding to the repeat or geek
understanding. The meaning of every event can be easily customers according to the initial interpretation. This indi-
deduced from the event name. Alternately, &, | and ! cates that the marketing campaigns mainly target this type of
correspond, respectively, to and, or and not logic connectives. client (Q1 , 24%) and that they mainly access the secondary
Some of these queries are detailed below in natural sessions of the website (Q2 , 85%). Query Q2 also shows that
language: the searcher customers (cluster 3) have a high percentage
(44%) centered on the secondary sections. This makes sense
• How do users access the website (Q1 : How many because the use of the search engine allows for refinement
sessions directly access the website through a MARKET- of the navigation on the website, giving direct access to
ING event?). brands and products (which are located in the secondary
• How do users access main sections (Q2 : How many ses- categories). In addition, query Q2 indicates that the explorer
sions visit neither the main L1 nor main L2 sections?). customers (cluster 2) always visit the main categories at both
• Which is the relation between how do users access the L1 and L2 levels, which means that customers within
the website and purchasing (Q3 : How many sessions this profile explore the website through these more general
access through a MARKETING event and then have a categories.
PURCHASE event?). The third query (Q3 ) tells us the impact that marketing
• How do users use the search engine of the website (Q4 : campaigns have on purchases. As can be seen, the percent-
How many sessions never use the search engine? Q5,6 : ages are very low and only highlight two clusters, cluster
How many sessions use the search engine at least three 4 with 3% of sessions that end up buying from marketing
(Q5 ) or four times (Q6 )? Q7 : How many sessions feature campaigns and, on the contrary, cluster 1, where there are no
intensive use of the search engine (at least 4 times) and sessions in which a marketing campaign produces a purchase.
purchase items?). This has a direct relationship with the initial interpretation
• How do customers purchase products (Q8 : How many of the clusters because cluster 4 corresponds to a buyer
sessions have a PURCHASE event? Q9 : How many client profile (hence the highest percentage), while cluster 1
sessions operate with the cart and then checkout? Q10 : represents users who access the website, especially to browse
How many sessions have two or more purchase events?) secondary sections, but without a buyer profile.
• What is the relation between user navigation and Queries Q4 through Q7 allow us to study the use of
purchases (Q11 : How many sessions iterate at least five the search engine of the website and its relationship with
times between main and secondary sections and do not purchases. Query Q4 tells us that the sessions of clusters 1,
purchase in the end? Q12 : How many sessions visit five 2 and 4 use the search engine not very often (between 93%
or more product pages?) and 71% of the sessions never use it), while in the sessions of
• How do users purchase and iterate with the cart (Q13 : cluster 3, the search engine is an event that does appear very
How many sessions add products to the cart but do frequently (92% of the sessions). This behavior verifies that
not purchase in the end? Q14 : How many sessions add the sessions in cluster 3 correspond to a searcher customer
at least two products to the cart? Q15,16 : How many profile. Queries Q5 and Q6 allow us to go into detail regarding
sessions have two PURCHASE events, but the cart those sessions that use the search engine, noting that a high
remains untouched (Q15 )/is modified (Q16 ) between the percentage (Q6 , 28%) use it at least four times during a
checkouts? Q17 : How many sessions contain three or session. Finally, the query Q7 allows us to see the relationship
more PURCHASE events?) between the use of the search engine and the purchase events.
Table 5 shows the results obtained for each cluster (ci ), As shown in the table, the low percentages obtained (2% and
reflecting the percentage of sessions that the validation 3% for clusters 3 and 4, respectively) indicate that the use
query (Qj ) fulfills with respect to the total number of of the search engine does not have an evident impact on the
sessions of the cluster, as well as the queries described in purchasing processes.

VOLUME 8, 2020 171843

J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

TABLE 4. Definition of formulas used in validation queries.

TABLE 5. Results for the profile’s validation process.

Queries Q8 through Q17 allow us to study the purchase an explorer profile, which validates our initial hypothesis that
process and its relationship with other events. Query Q8 tells the cluster-2 sessions correspond to the explorer customers.
us where the sessions that make purchases are at some point. Alternately, query Q12 gives us more information about the
As can be observed, sessions in cluster 1 (repeat or geek buyers (cluster 4): a high percentage (38%) of the sessions
customers) never make a purchase on the website. Customers have at least five visits to the products, which is natural in a
from the other profiles do make purchases, but where they purchase profile.
concentrate is in cluster 4 (13% vs. 3% and 4% of clusters Queries Q13 through Q17 are oriented toward the interac-
2 and 3, respectively), which effectively corresponds to tion with the cart. As shown, the greatest interaction with the
potential/prospective buyers. The same behavior is repeated cart occurs in the sessions of cluster 4 (Q13 , 28%; Q14 , 21%),
in queries Q9 and Q10 , which confirms the results obtained. where a purchase is not made in the end. This explains why
Queries Q11 and Q12 are very interesting to elucidate the profile of cluster 4 includes prospective buyers: they are
the intention of the clients of the clusters. Query Q11 buyers who have not finished deciding but who probably end
allows us to observe that cluster-2 sessions stand out for up buying the products in the next session. Queries Q15 and
navigating between main and secondary sections but do not Q16 allow us to study what happens between two purchase
end up buying (46%). In addition, in this cluster, there is events: as can be seen, there is a significant percentage
a significant number of sessions that perform at least five (Q15 , 7%) of sessions in cluster 4 in which the customer
views of specific products (Q12 , 15%). This corresponds to does not modify the cart between one checkout and another.

171844 VOLUME 8, 2020

J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

This indicates that the client started the checkout, decided buy no product, and then some suggestions could be proposed
to go back to review a product, but finally ended up not to drive him towards one of the clusters with a larger buying
modifying the content of the cart. It is very interesting to probability (namely, 2 or 4).
be able to observe this behavior from anonymous sessions
where there is no information associated with the detail of TABLE 6. Some data about sessions and clusters.
the checkout process.
Finally, query Q17 confirms that cluster 4 corresponds to
buyers (or potential buyers) because 8% of their sessions
make three purchase events at some time. The percentages
in the other clusters are much lower or nonexistent (2% in
clusters 2 and 3, and no sessions in cluster 1).
This process of validating the clusters through the use
of queries with temporal logic allows us to validate the
initial hypothesis the clusters obtained, as well as providing With the aim of correlating initial user behavior and
additional information (use of website components such as clusters, in this study some machine learning techniques have
the search engine, impact of marketing campaigns, details of been applied. The process carried out is as follows:
the purchase process, etc.) that can be very valuable for the • First, a vector of features for each session has been com-
business expert. puted. The features are based on the same values used
From the validation, it should be proven that the four for clustering (MAIN, SECONDARY, MARKETING,
clusters correspond to the interpretation given in the previous INTEREST and SEARCH, as described in Section V),
subsection. but the counting of event occurrences is constrained to
the first n events, with n varying from 3 to 8. Different
VII. BEHAVIOR PREDICTION prediction models using the first 3 to 8 events are going
Previous sections have established the interest in grouping to be obtained.
users’ behaviors by means of clustering techniques. At this • As the second step, a multilayer feed-forward network
point, a few interesting questions concerning the relationship has been built. It is composed of 5 hidden layers, with
between clusters and the users’ behaviors appear. Is it 10 hidden neurons per layer. The learning algorithm
possible to predict the cluster to which a session will belong applied is RProp [64], constraining its execution to up
to by analyzing a few initial events? If so, how many events to 100 learning iterations. The network has been trained
are required to get a good prediction? How accurate is that with the scaled conjugate gradient back-propagation
prediction? method using 75% of randomly selected sessions
Answering the previous questions is useful when one (76,478 session features).
wants to modify the user’s behavior to reach some desired • Finally, the remaining 25% sessions (25,479 session
objectives. Let us consider, for instance, the case in which features) have been used to test the quality of the
very few users of a given cluster buy products. Predicting the resulting pattern recognition method, whose results are
case after a few events is essential to apply recommendation commented on in the following.
policies with the aim of redirecting the user session towards The results obtained are summarized in different tables.
a different cluster, one more related to the searched objective. Table 7 corresponds to the confusion matrices of the
On the contrary, if the up-to-now user behavior predicts that prediction models based on 3 (left) and 4 events (right);
the session is going to belong to a cluster strongly related with Table 8 corresponds to the use of 5 and 6 events; and Table 9
buyers, the interest will be in ensuring that she or he does not corresponds to the cases of 7 and 8 events. As an example, let
abandon the behavior associated with that cluster. us describe the case of 5 events (left part of Table 8).
Table 6 shows some global data regarding the clusters Rows AC1 through AC4 correspond to the clusters to
and also the relations between clusters and buying sessions. which input sessions belong (actual clusters), while columns
Each row corresponds to a cluster. The columns correspond, PC1 through PC4 correspond to the predicted clusters
respectively, to the number of sessions, the percentage of according to the trained neural network (predicted clusters).
sessions with respect to the total number of sessions in the Concentrating on a row, the diagonal element corresponds
log, the number of buying sessions, the percentage of buying to the correctly predicted sessions, while the rest are false
sessions with respect to the total number of sessions in the log, negatives (id est, input features that should be predicted as
and finally, the percentage of buying sessions with respect to belonging to the cluster corresponding to the row but that
the total number of buying sessions. Notice that most of the have been predicted as belonging to a different one). For
buying sessions are concentrated in clusters 4 (64.81%) and 2 instance, row 2 in Table 8 (left) shows that 8,238 (true
(23.63%). Let us consider that, after a few initial events of positives) cluster-2 sessions where properly predicted as
a session (5, for instance), the system is able to detect that belonging to that cluster, while 269, 254 and 906 cluster-
the session is very likely going to belong to cluster 1 or 3. 2 sessions where predicted as belonging to clusters 1, 3 and 4,
This means that it will be quite probable that the user will respectively (false negatives). The value in the Rec. column

VOLUME 8, 2020 171845

J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

TABLE 7. Confusion matrices for the test phase when computing the feature for the 3 (left) and 4 (right) first events.

TABLE 8. Confusion matrices for the test phase when computing the feature for the 5 (left) and 6 (right) first events.

TABLE 9. Confusion matrices for the test phase when computing the feature for the 7 (left) and 8 (right) first events.

corresponds to what is called the recall value, as a measure The kappa value has been computed according to the
of the quality of the prediction for the considered cluster, and following formula [66]:
is computed with the following formula: N · ki=1 xii − ki=1 xi. · x.i
P P
K=
#true positives N 2 − ki=1 xi. · x.i
P
recall =
#true positives + #false negatives where xii is the number of cases in the i position of the main
Let us now concentrate on columns. Column values diagonal, N = 25, 479 is the number of sessions, k = 4 is the
out of the diagonal correspond to false positives: sessions number of clusters, and x.i , xi. are the total number of sessions
predicted as belonging to the cluster associated with the in the i-th column and row, respectively.
column while actually belonging to a different cluster. For
TABLE 10. Accuracy and Cohen’s kappa values for the predictions based
instance, considering column 2 of the same table, 38, 236 and on 3 to 8 events.
1,305 sessions where predicted as belonging to cluster 2
when they actually belonged to clusters 1, 3 and 4,
respectively. The value in the Prec. row corresponding
to what is called precision, which is computed with the
following formula: Table 10 shows, for the different models, the values of
#true positives the accuracy and Cohen’s kappa statistics. Depending on the
precision = authors and the problem domain, there are different scales
#true positives + #false positives
dividing the kappa value domain, from non-agreement to
Precision and recall provide insight into the prediction almost perfect agreement. What values of the kappa statistic
quality for each cluster. To measure the global quality, accu- are interesting? There are different interpretations. [65] estab-
racy and Cohen’s kappa statistics are typically considered. lished negative values as indicating that there is no agreement,
Accuracy is computed, for the entirety of the data, as: 0.01-0.20 as having little agreement, 0.21-0.40 as fair
agreement, 0.41-0.60 as moderate, 0.61-0.80 as substantial,
#true positives
accuracy = and 0.81-1.00 as almost perfect agreement. Alternately, [67]
#instances considers that scale to be unacceptable for some domains (in
Accuracy provides an intuitive global view of the quality healthcare research, for instance), proposing an alternative
of the predictions. Cohen’s kappa is used to measure to what scale: 0-0.20 as no agreement, 0.21-0.39 as minimal, 0.40-
degree two different systems of prediction are in agreement. 0.59 as weak, 0.60-0.79 as moderate, 0.80-0.90 as strong, and
In this case, it is used to compare the accuracy of the above 0.9 as almost perfect.
prediction system (observed accuracy) with respect to the As was expectable, the quality of the prediction improves
accuracy of a random system (expected accuracy) [65], [66]. when more initial events are considered for the prediction.

171846 VOLUME 8, 2020

J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

The online analysis system can make a first initial prediction

as soon as a minimal number of initial events have occurred
(for example, 4) and then pass the information to the
component in charge of applying some recommendation
policies. If a new event occurs, the prediction with the
corresponding model can be reported, and so on.

VIII. INTEGRATION OF THE PREDICTION SYSTEM INTO

AN E-COMMERCE WEBSITE
Finally, let us depict the integration process of a prototype
of the solution into the Up&Scrap website and discuss its
practical implications from a business perspective.

A. DESCRIPTION OF THE TECHNICAL PROPOSAL

The Up&Scrap website was developed using the Magento
Commerce Platform [68]. Magento provides sophisticated
functionality to create customized and secured e-commerce
websites, analyze the business to accelerate the sales,
and manage an enterprise’s product catalog, inventory or
FIGURE 5. Integration of the prediction system into a Magento
marketing channels, among others. Moreover, its open-source e-commerce website.
nature has encouraged the development of a wide variety
of extensions and themes that help programmers to improve
website capabilities and presentation, respectively. These
features have led Magento to become one of the most popular Kafka [70] is used to process these messages, organizing them
solutions on the market. into (incomplete) sessions and generating an event stream
The goal is now to improve customers’ shopping expe- that describes what is happening in each of these in-process
rience by predicting their future behavior. This requires sessions.
the integration of the new prediction system based on The neural network built for predicting the clusters to
the previously computed models into the backend of the which sessions belong is integrated into the prediction
Up&Scrap website. A reusable system design is proposed system, which is subscribed to the Kafka event streaming
to favor its future integration into other Magento websites. and uses these events to run the trained neural network. The
The extensible architecture and the technological stack of prediction consists of determining to which cluster a session
Magento have helped us to address this design issue. is more likely to belong based on the incomplete information
Figure 5 shows the high-level design of the proposed available thus far. These online predictions are then sent to
prediction system. On the left part of the figure, an abstrac- a decision-maker tool, developed as a Magento extensible
tion of the initial Magento-based e-commerce system for module that offers its functionality as a service contract. This
Up&Scrap is represented. It consists of a customized instance design decision facilitates its integration into the service layer
of the layered framework of Magento 2.0 (colored in orange). of the backend’s e-commerce system. Internally, predictions
This version of Magento allows us to store data logs about can be used to provide Web content that changes based on the
customers’ navigation through the website, system’s internal behavior, preferences, and interests of the customer or to send
processes and application server’s performance. These logs personalized offers/discounts by e-mail to customers during
comply with the PSR-3 standard and were generated and their browsing, among things.
stored as files by using Monolog, a standard logging library
for PHP [69]. B. PRACTICAL IMPLICATIONS OF THE INTEGRATION
Monolog allows the programming of advanced logging The introduction of changes in a business must be a
strategies. A new Monolog handler is required. This handler progressive process oriented towards consolidation of the
is responsible for registering customers’ navigation and improvements. In the case of Up&Scrap, the results of the
actions through the website and sending the corresponding customer profiling and the prediction system have been used
log messages to the preprocessing server via a socket con- to understand the current state of its e-business and propose
nection. This logging handler is integrated into the backend’s a roadmap of changes that increases the sales. This roadmap
presentation layer. The role of the preprocessing server is has been structured in three phases.
similar to the sessionization component described in Figure1, The beginning phase is based on the knowledge extracted
that is, it discards uninteresting requests and identifies from the customer profiles. In this phase, the business experts
user sessions. Nevertheless, in this case, the messages of are mainly interested in corroborating their intuitions about
interest related to a session will be created progressively Up&Scrap customers’ behaviors and adopting corrective
as the customer is browsing the website. Then, Apache actions that improve the organization of the contents and the

VOLUME 8, 2020 171847

J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

navigation structure of the e-commerce website. Therefore, session is going to belong can be predicted after a few initial
the changes are directed to improve the user experience session events.
during the e-shopping. The analytics tools integrated into Despite the process being applicable to a wide domain,
the current version of the e-commerce provide high-level each step requires specific adaptations when considering its
insights regarding the customers’ navigation habits, but they application to specific cases: in the preprocessing phase,
were not useful for performing fine-grained analysis of where some events are discarded for different reasons
those behaviors. The techniques applied to the creation and that cause events to be considered as non-user events,
validation of profiles have been demonstrated to be useful or non-interesting events, for instance; in the clustering phase,
to extract those low-level insights from the server logs. where the vector of features as well as an adequate number
Moreover, these techniques can be reused to validate the of clusters must be established; in the profiling phase, where
results of the changes proposed, analyzing the logs stored the ratios between global and cluster values are chosen to
after those changes. define the cluster profiles; in the validation phase, where the
In the second phase, the results of the prediction system cluster-profile relations are evaluated and, perhaps, changed;
are used to turn the Up&Scrap e-commerce website into and in the prediction model synthesis phase, where the
a dynamic application able to offer a more personalized number of initial events is chosen as an adequate parameter to
service to the customers. The interest of the company is obtain an accurate online prediction. Every taken decision is
specially focused on improving the mechanisms of online arguable. However, some of the steps are easily automatized.
marketing as a way of influencing the customers’ behavior Most clustering tools are able to find an optimal number of
during their navigation. The predictions are interpreted to clusters, and this is also the case when looking for an adequate
determine the contents that are introduced in the banners and number of initial events for prediction, for instance.
pop-up messages shown dynamically to each customer. These Further research is required, and different techniques could
contents provide feedback to the user about the products, be applied and adopted for the clustering and prediction
offers and services (for instance, workshops) that could be model synthesis phases. With respect to the vector of features,
of interest to her/him. The advantage of this customized the attributes considered are mainly quantitative but do not
marketing strategy is that the e-commerce system does not consider causal relations among events in a sequence. For
require significant technical changes. Moreover, these types instance, when counting the number of times two events, a
of improvements are also applied to the marketing by e-mail, and b, appear in a (partial) session, the possibility of a always
sending more personalized product recommendations and appearing after b, or vice versa, is not distinguished, which
offers that induce the customers to buy in future sessions. could hint at very different behaviors, thus corresponding to
Finally, the third phase is the most ambitious and involves different profiles. In this sense, the inclusion in the features
a change in the Up&Scrap technological infrastructure. The of such types of relations (or more complex ones) using the
goal is for the e-commerce system to dynamically adapt its answers to temporal logic formulas describing such relations,
contents and navigation structure to each customer during for instance, could be a way of improving the results.
the shopping. These adaptations would be based on the
results of the prediction system and directed to maximize REFERENCES
the probability that the customer buys in that session.
[1] R. Kohavi, ‘‘Mining E-commerce data: The good, the bad, and the
Unfortunately, the Up&Scrap e-commerce (and any solution ugly,’’ in Proc. Pacific-Asia Conf. Knowl. Discovery Data Mining, 2001,
based on Magento or other similar technologies) has a pp. 8–13.
static nature that does not allow modification of its contents [2] N. Verma and J. Singh, ‘‘An intelligent approach to big data analytics for
sustainable retail environment using apriori-MapReduce framework,’’ Ind.
and structure at runtime and with the desired flexibility. Manage. Data Syst., vol. 117, no. 7, pp. 1503–1520, Aug. 2017.
Therefore, this phase would involve the development of a [3] I. Ullah, B. Raza, A. K. Malik, M. Imran, S. U. Islam, and S. W. Kim,
new version of the e-commerce system and a set of base ‘‘A churn prediction model using random forest: Analysis of machine
learning techniques for churn prediction and factor identification in
technologies that support the dynamism required. telecom sector,’’ IEEE Access, vol. 7, pp. 60134–60149, 2019.
[4] G. Suchacka and G. Chodak, ‘‘Using association rules to assess purchase
IX. CONCLUSION probability in online stores,’’ Inf. Syst. e-Bus. Manage., vol. 15, no. 3,
This article has concentrated on the prediction of users’ pp. 751–780, 2017.
[5] Q. Su and L. Chen, ‘‘A method for discovering clusters of e-commerce
behaviors in e-commerce websites. First, session traces interest patterns using click-stream data,’’ Electron. Commerce Res. Appl.,
have been grouped according to the similarity of a set vol. 14, no. 1, pp. 1–13, Jan. 2015.
of quantitative session parameters. After that, the resulting [6] R. E. Bucklin and C. Sismeiro, ‘‘Click here for Internet insight: Advances
values for each cluster have been compared with the entire log in clickstream data analysis in marketing,’’ J. Interact. Marketing, vol. 23,
no. 1, pp. 35–48, Feb. 2009.
dataset to define a user profile for each cluster. The profiles [7] L. Guo, L. Hua, R. Jia, B. Zhao, X. Wang, and B. Cui, ‘‘Buying or
have been validated (or refined) by a closer inspection browsing?: Predicting real-time purchasing intent using attention-based
of the sessions in clusters so as to confirm or contradict deep network with multiple behavior,’’ in Proc. 25th ACM SIGKDD Int.
Conf. Knowl. Discovery Data Mining, Jul. 2019, pp. 1984–1992.
the (intuitive) cluster profiling description. A later training
[8] D. Koehn, S. Lessmann, and M. Schaal, ‘‘Predicting online shopping
phase has generated the prediction model that will be used behaviour from clickstream data using deep learning,’’ Expert Syst. Appl.,
for the online prediction. As a result, the cluster to which the vol. 150, Jul. 2020, Art. no. 113342.

171848 VOLUME 8, 2020

J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

[9] K. Močarníková and M. Greguš, ‘‘Conceptualization of predictive [32] J. Qiu, Z. Lin, and Y. Li, ‘‘Predicting customer purchase behavior in the e-
analytics by literature review,’’ in Data-Centric Business and Applications commerce context,’’ Electron. Commerce Res., vol. 15, no. 4, pp. 427–452,
(Lecture Notes on Data Engineering and Communications Technologies), Dec. 2015.
vol. 30, N. Kryvinska and M. Greguš, Eds. Cham, Switzerland: Springer, [33] R. Jia, R. Li, M. Yu, and S. Wang, ‘‘E-commerce purchase prediction
2020, doi: 10.1007/978-3-030-19069-9_8. approach by user behavior data,’’ in Proc. Int. Conf. Comput., Inf.
[10] J. Chen and A. Abdul, ‘‘A session-based customer preference learning Telecommun. Syst. (CITS), Jul. 2017, pp. 1–5.
method by using the gated recurrent units with attention function,’’ IEEE [34] L. M. Badea, ‘‘Predicting consumer behavior with artificial neural
Access, vol. 7, pp. 17750–17759, 2019. networks,’’ Procedia Econ. Finance, vol. 15, pp. 238–246, Jan. 2014.
[11] W. Niyagas, A. Srivihok, and S. Kitisin, ‘‘Clustering e-banking customer [35] M. Zeng, H. Cao, M. Chen, and Y. Li, ‘‘User behaviour modeling,
using data mining and marketing segmentation,’’ ECTI Trans. Comput. Inf. recommendations, and purchase prediction during shopping festivals,’’
Technol., vol. 2, no. 1, pp. 63–69, Jan. 1970. Electron. Markets, vol. 29, no. 2, pp. 263–274, Jun. 2019.
[12] H. C. C. Chai, ‘‘Online auction customer segmentation using a neural [36] H.-J. Chang, L.-P. Hung, and C.-L. Ho, ‘‘An anticipation model of
network model,’’ Int. J. Appl. Sci. Eng., vol. 3, no. 2, pp. 101–110, 2005. potential customers’ purchasing behavior based on clustering analysis and
[13] M. Namvar, M. R. Gholamian, and S. KhakAbi, ‘‘A two phase clustering association rules analysis,’’ Expert Syst. Appl., vol. 32, no. 3, pp. 753–764,
method for intelligent customer segmentation,’’ in Proc. Int. Conf. Intell. Apr. 2007.
Syst., Model. Simulation, Jan. 2010, pp. 215–219. [37] N. Vanessa and A. Japutra, ‘‘Contextual marketing based on customer
[14] L. B. Romdhane, N. Fadhel, and B. Ayeb, ‘‘An efficient approach for buying pattern in grocery E-Commerce: The case of Bigbasket. com
building customer profiles from business data,’’ Expert Syst. Appl., vol. 37, (India),’’ ASEAN Marketing J., vol. 9, no. 1, pp. 56–67, 2018.
no. 2, pp. 1573–1585, Mar. 2010. [38] N. Nishimura, N. Sukegawa, Y. Takano, and J. Iwanaga, ‘‘A latent-class
[15] M. Walters and J. Bekker, ‘‘Customer super-profiling demonstrator to model for estimating product-choice probabilities from clickstream data,’’
enable efficient targeting in marketing campaigns,’’ South Afr. J. Ind. Eng., Inf. Sci., vol. 429, pp. 406–420, Mar. 2018.
vol. 28, no. 3, pp. 113–127, Nov. 2017. [39] Y.-T. Wen, P.-W. Yeh, T.-H. Tsai, W.-C. Peng, and H.-H. Shuai, ‘‘Customer
[16] A. Beheshtian-Ardakani, M. Fathian, and M. Gholamian, ‘‘A novel model purchase behavior prediction from payment datasets,’’ in Proc. 11th ACM
for product bundling and direct marketing in e-commerce based on market Int. Conf. Web Search Data Mining (WSDM), 2018, pp. 628–636.
segmentation,’’ Decis. Sci. Lett., vol. 7, no. 1, pp. 39–54, 2018. [40] C. Huang, X. Wu, X. Zhang, C. Zhang, J. Zhao, D. Yin, and
[17] J. X. Yu, Y. Ou, C. Zhang, and S. Zhang, ‘‘Identifying interesting N. V. Chawla, ‘‘Online purchase prediction via multi-scale modeling
customers through Web log classification,’’ IEEE Intell. Syst., vol. 20, of behavior dynamics,’’ in Proc. 25th ACM SIGKDD Int. Conf. Knowl.
no. 3, pp. 55–59, May 2005. Discovery Data Mining, Jul. 2019, pp. 2613–2622.
[18] P.-H. Chou, P.-H. Li, K.-K. Chen, and M.-J. Wu, ‘‘Integrating Web mining [41] D. Van den Poel and W. Buckinx, ‘‘Predicting online-purchasing
and neural network for personalized e-commerce automatic service,’’ behaviour,’’ Eur. J. Oper. Res., vol. 166, no. 2, pp. 557–575, Oct. 2005.
Expert Syst. Appl., vol. 37, no. 4, pp. 2898–2910, Apr. 2010. [42] K. Shapoval and T. Setzer, ‘‘Next-purchase prediction using projections
[19] J. Wilson, S. Chaudhury, and B. Lall, ‘‘Clustering short temporal behaviour of discounted purchasing sequences,’’ Bus. Inf. Syst. Eng., vol. 60, no. 2,
sequences for customer segmentation using LDA,’’ Expert Syst., vol. 35, pp. 151–166, Apr. 2018.
no. 3, Jun. 2018, Art. no. e12250. [43] J. Li, L. Tang, A. Wang, and Z. Xu, ‘‘Online-purchasing behavior
[20] S. Peker, A. Kocyigit, and P. E. Eren, ‘‘LRFMP model for customer forecasting with a firefly algorithm-based SVM model considering
segmentation in the grocery retail industry: A case study,’’ Marketing Intell. shopping cart use,’’ EURASIA J. Math., Sci. Technol. Educ., vol. 13, no. 12,
Planning, vol. 35, no. 4, pp. 544–559, Jun. 2017. Nov. 2017, 7967–7983.
[21] M. R. Flores-Méndez, M. Postigo-Boix, J. L. Melús-Moreno, and B. Stiller, [44] G. Liu, T. T. Nguyen, G. Zhao, W. Zha, J. Yang, J. Cao, M. Wu, P. Zhao,
‘‘A model for the mobile market based on customers profile to analyze the and W. Chen, ‘‘Repeat buyer prediction for E-Commerce,’’ in Proc. 22nd
churning process,’’ Wireless Netw., vol. 24, no. 2, pp. 409–422, Feb. 2018. ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Aug. 2016,
[22] I. S. Y. Kwan, J. Fong, and H. K. Wong, ‘‘An e-customer behavior model pp. 155–164.
with online analytical mining for Internet marketing planning,’’ Decis. [45] A. Kumar, G. Kabra, E. K. Mussada, M. K. Dash, and P. S. Rana, ‘‘Com-
Support Syst., vol. 41, no. 1, pp. 189–204, Nov. 2005. bined artificial bee colony algorithm and machine learning techniques
[23] Q. Su and L. Chen, ‘‘A method for discovering clusters of e-commerce for prediction of online consumer repurchase intention,’’ Neural Comput.
interest patterns using click-stream data,’’ Electron. Commerce Res. Appl., Appl., vol. 31, no. S2, pp. 877–890, Feb. 2019.
vol. 14, no. 1, pp. 1–13, Jan. 2015. [46] D. Li, G. Zhao, Z. Wang, W. Ma, and Y. Liu, ‘‘A method of purchase
[24] S. Dhaliwal, N. N. Van, M. Dhaliwal, J. Rokne, R. Alhajj, and T. prediction based on user behavior log,’’ in Proc. IEEE Int. Conf. Data
Ozyer, ‘‘Integrating SOM and fuzzy k-means clustering for customer Mining Workshop (ICDMW), Nov. 2015, pp. 1031–1039.
classification in personalized recommendation system for non-text based [47] K. K. Boyer and G. T. M. Hult, ‘‘Customer behavior in an online
transactional data,’’ in Proc. 8th Int. Conf. Inf. Technol. (ICIT), May 2017, ordering application: A decision scoring model,’’ Decis. Sci., vol. 36, no. 4,
pp. 901–908. pp. 569–598, Nov. 2005.
[25] S. Palaniappan, A. Mustapha, C. F. Mohd Foozy, and R. Atan, ‘‘Customer [48] B. Shim, K. Choi, and Y. Suh, ‘‘CRM strategies for a small-sized online
profiling using classification approach for bank telemarketing,’’ Int. J. shopping mall based on association rules and sequential patterns,’’ Expert
Inform. Vis., vol. 1, nos. 4–2, p. 214, Nov. 2017. Syst. Appl., vol. 39, no. 9, pp. 7736–7742, Jul. 2012.
[26] K. Kalaidopoulou, S. Triantafyllou, A. Griva, and K. Pramatari, ‘‘Identi- [49] C. D. Francescomarino, M. Dumas, F. M. Maggi, and I. Teinemaa,
fying customer satisfaction patterns via data mining: The case of greek ‘‘Clustering-based predictive process monitoring,’’ IEEE Trans. Services
e-shops,’’ in Proc. 11th Medit. Conf. Inf. Syst. (MCIS), 2017, pp. 1–12. Comput., vol. 12, no. 6, pp. 896–909, Nov. 2019.
[27] C. Haruechaiyasak, C. Tipnoe, S. Kongyoung, C. Damrongrat, and [50] Y. S. Kim and B.-J. Yum, ‘‘Recommender system based on click stream
N. Angkawattanawit, ‘‘A dynamic framework for maintaining customer data using association rule mining,’’ Expert Syst. Appl., vol. 38, no. 10,
profiles in E-commerce recommender systems,’’ in Proc. IEEE Int. Conf. pp. 13320–13327, Sep. 2011.
e-Technol., e-Commerce e-Service, Mar. 2005, pp. 768–771. [51] Common Log Format (CLF). (1995). The World Wide Web
[28] O. Nasraoui, M. Soliman, E. Saka, A. Badia, and R. Germain, ‘‘A Web Consortium (W3C). [Online]. Available: http://www.w3.org/
usage mining framework for mining evolving user profiles in dynamic Daemon/User/Config/Logging.html#common-logfileforma
Web sites,’’ IEEE Trans. Knowl. Data Eng., vol. 20, no. 2, pp. 202–215, [52] S. Hernandez, P. Alvarez, J. Fabra, and J. Ezpeleta, ‘‘Analysis of Users’
Feb. 2008. behavior in structured e-Commerce websites,’’ IEEE Access, vol. 5,
[29] G. Suchacka, M. Skolimowska-Kulig, and A. Potempa, ‘‘Classification pp. 11941–11958, 2017.
of E-Customer sessions based on support vector machine,’’ in Proc. Eur. [53] Google Analytics Help Center. Accessed: Sep. 2020. [Online]. Available:
Council Modelling Simulation (ECMS), May 2015, 594–600. https://support.google.com/analytics
[30] E. Kim, W. Kim, and Y. Lee, ‘‘Combination of multiple classifiers for the [54] G. Suchacka and G. Chodak, ‘‘Practical aspects of log file analysis for E-
customer’s purchase behavior prediction,’’ Decis. Support Syst., vol. 34, commerce,’’ in Proc. Int. Conf. Comput. Netw., 2013, pp. 562–572.
no. 2, pp. 167–175, Jan. 2003. [55] M. Adnan, M. Nagi, K. Kianmehr, R. Tahboub, M. Ridley, and J. Rokne,
[31] E. Suh, S. Lim, H. Hwang, and S. Kim, ‘‘A prediction model for the ‘‘Promoting where, when and what? An analysis of Web logs by integrating
purchase probability of anonymous customers to support real time Web data mining and social network techniques to guide ecommerce business
marketing: A case study,’’ Expert Syst. Appl., vol. 27, no. 2, pp. 245–255, promotions,’’ Social Netw. Anal. Mining, vol. 1, no. 3, pp. 173–185,
Aug. 2004. Jul. 2011.

VOLUME 8, 2020 171849

J. Fabra et al.: Log-Based Session Profiling and Online Behavioral Prediction in E–Commerce Websites

[56] KNIME. Accessed: Sep. 2020. [Online]. Available: https://www. JAVIER FABRA received the Ph.D. degree in
knime.com computer science from the University of Zaragoza,
[57] The R Project for Statistical Computing. Accessed: Sep. 2020. [Online]. Spain, in 2010. He has been an Associate Professor
Available: https://www.r-project.org with the Department of Computer Science and
[58] NbClust: Determining the Best Number of Clusters in a Data Set. Systems Engineering, University of Zaragoza,
Accessed: Sep. 2020. [Online]. Available: https://cran.r-project.org/web/ Spain, since 2008. His main research interests
packages/NbClust/index.html
include data mining analysis techniques in the
[59] R.-S. Wu and P.-H. Chou, ‘‘Customer segmentation of multiple category
data in e-commerce using a soft-clustering approach,’’ Electron. Commerce
context of service-oriented computing and cloud
Res. Appl., vol. 10, no. 3, pp. 331–341, May 2011. architectures.
[60] Bain&Company. (2018). Management Tools-Customer Segmentation.
[Online]. Available: https://www.bain.com/insights/management-tools-
customer-segmentation
[61] J. S. E. Almquist and N. Bloch, ‘‘The elements of value,’’ in Harvard
PEDRO ÁLVAREZ received the Ph.D. degree
Business Review. Brighton, MA, USA: Harvard Business Publishing, 2016.
[62] Business2community. (2016). The Ultimate Guide to eCommerce in computer science engineering from the Uni-
Customer Segmentation. [Online]. Available: https://www. versity of Zaragoza, Zaragoza, Spain, in 2004.
business2community.com/ecommerce/ultimate-guide-ecommerce- He has been a Lecture Professor with University
customer-segmentation-01624275 of Zaragoza, since 2000. His current research
[63] J. M. Couvreur and J. Ezpeleta, ‘‘A linear temporal logic model checking interests include two main aspects on integration
method over finite words with correlated transition attributes,’’ in Data- problems of network-based systems and the use
Driven Process Discovery and Analysis. SIMPDA (Lecture Notes in of novel techniques and methodologies for solving
Business Information Processing), vol. 340, P. Ceravolo, M. van Keulen, them and the application of formal analysis tech-
and K. Stoffel, Eds. Cham, Switzerland: Springer, 2019, doi: 10.1007/978- niques to the mining of event logs and databases.
3-030-11638-5_5.
[64] M. Riedmiller and H. Braun, ‘‘A direct adaptive method for faster
backpropagation learning: The RPROP algorithm,’’ in Proc. IEEE Int.
Conf. Neural Netw., vol. 1, Mar. 1993, pp. 586–591. JOAQUÍN EZPELETA received the M.S. degree
[65] J. Cohen, ‘‘A coefficient of agreement for nominal scales,’’ Educ. Psychol. in mathematics and the Ph.D. degree in computer
Meas., vol. 20, no. 1, pp. 37–46, Apr. 1960. science from the University of Zaragoza, Spain.
[66] A. Bendavid, ‘‘Comparison of classification accuracy using Cohen’s He is currently a Professor with the Department of
weighted kappa,’’ Expert Syst. Appl., vol. 34, no. 2, pp. 825–832, Computer Science and Systems Engineering, Uni-
Feb. 2008.
versity of Zaragoza, where he conducts lectures
[67] M. L. McHugh, ‘‘Interrater reliability: The kappa statistic,’’ Biochemia
Medica, vol. 22, pp. 276–282, Oct. 2012.
on formal methods for sequential and concurrent
[68] (2020). Magento E-commerce Platform. [Online]. Available: programming and service-oriented architectures.
https://magento.com/ His research interests include problems of model-
[69] Monolog: Sends Your Logs to Files, Sockets, Inboxes, Databases ing, analysis, and control synthesis for concurrent
and Various Web Services. Accessed: Sep. 2020. [Online]. Available: systems, the application of formal techniques to help in the development of
https://packagist.org/packages/monolog/monolog correct distributed systems based on Internet and cloud technologies, and
[70] (2020). Apache Kafka: A Distributed Streaming Platform. [Online]. further the parallel processing of data and compute-intensive problems.
Available: https://kafka.apache.org/

171850 VOLUME 8, 2020

NEB-2000C (EPIRB) User's Manual - 20170921 V3 0
100% (2)
NEB-2000C (EPIRB) User's Manual - 20170921 V3 0
31 pages
Online-Shopper's Purchasing Intention Report
100% (2)
Online-Shopper's Purchasing Intention Report
28 pages
Sidewalks, Islands,&Medians Design Manuals (MOMRA) - (English)
100% (3)
Sidewalks, Islands,&Medians Design Manuals (MOMRA) - (English)
111 pages
Customer purchase behavior 323175536
No ratings yet
Customer purchase behavior 323175536
16 pages
CS229 Project Final Write-Up Predictive Analytics For E-Commerce Customer Behavior and Demand Forecasting Team Members
No ratings yet
CS229 Project Final Write-Up Predictive Analytics For E-Commerce Customer Behavior and Demand Forecasting Team Members
6 pages
A Computational Modelfor Predicting Customer Behaviors Using
No ratings yet
A Computational Modelfor Predicting Customer Behaviors Using
8 pages
ETCW18
No ratings yet
ETCW18
7 pages
MONETIZE CLOUD & AI: From technology innovation to business excellence
From Everand
MONETIZE CLOUD & AI: From technology innovation to business excellence
Chu Wenchang
No ratings yet
Towards Early Purchase Intention Prediction in Online Session Based Retailing Systems
No ratings yet
Towards Early Purchase Intention Prediction in Online Session Based Retailing Systems
19 pages
Sales Prediction and Product Recommendation Model Through
No ratings yet
Sales Prediction and Product Recommendation Model Through
20 pages
The Subscription Model Masterclass: Building Sustainable eCommerce Revenue
From Everand
The Subscription Model Masterclass: Building Sustainable eCommerce Revenue
Anthony Bailey
No ratings yet
Consumer Behavior Analysis of Social Media Networks by Using Machine Learning
No ratings yet
Consumer Behavior Analysis of Social Media Networks by Using Machine Learning
4 pages
Predicting Buying Behavior Using CPT+: A Case Study of An E-Commerce Company
No ratings yet
Predicting Buying Behavior Using CPT+: A Case Study of An E-Commerce Company
8 pages
Customer Behavior - Markov Model - e Commerce 2022
No ratings yet
Customer Behavior - Markov Model - e Commerce 2022
10 pages
2 Analyzing Target Customer Behavior Using Data Mining Techniques For E-Commerce Data
No ratings yet
2 Analyzing Target Customer Behavior Using Data Mining Techniques For E-Commerce Data
4 pages
Analyzing Target Customer Behavior Using Data Mining Techniques For E-Commerce Data
No ratings yet
Analyzing Target Customer Behavior Using Data Mining Techniques For E-Commerce Data
4 pages
HVAC SEO: How to Become the Top-Ranked HVAC Dealer in Your Area
From Everand
HVAC SEO: How to Become the Top-Ranked HVAC Dealer in Your Area
Scott Orth
No ratings yet
Net Profit (Review and Analysis of Cohan's Book)
From Everand
Net Profit (Review and Analysis of Cohan's Book)
BusinessNews Publishing
No ratings yet
B2B SaaS For Beginners: The Comprehensive Guide To Learning How To Build A Successful Startup, How To Scale A Business, And How To Implement Pricing Models That Your Customers Will Love
From Everand
B2B SaaS For Beginners: The Comprehensive Guide To Learning How To Build A Successful Startup, How To Scale A Business, And How To Implement Pricing Models That Your Customers Will Love
Kid Montoya
No ratings yet
Demo
No ratings yet
Demo
16 pages
Major 74 Team
No ratings yet
Major 74 Team
20 pages
makale 6
No ratings yet
makale 6
26 pages
1-s2.0-S1877050917310244-main
No ratings yet
1-s2.0-S1877050917310244-main
8 pages
The Rise of Subscription Models: A Simple Guide to Big Ideas
From Everand
The Rise of Subscription Models: A Simple Guide to Big Ideas
NOVA MARTIAN
No ratings yet
Data Analysis of Online Shopperrsquos Purchasing Intention Machinenbsplearning For Prediction Analytics
No ratings yet
Data Analysis of Online Shopperrsquos Purchasing Intention Machinenbsplearning For Prediction Analytics
9 pages
Web Server Log Analysis2
No ratings yet
Web Server Log Analysis2
10 pages
Helping Customers Win: Customer Success Insights
From Everand
Helping Customers Win: Customer Success Insights
Piyush Agrawal
No ratings yet
BADM (1)
No ratings yet
BADM (1)
9 pages
Click Stream Analysis
No ratings yet
Click Stream Analysis
96 pages
A Predictive Approach for improving the sales of
No ratings yet
A Predictive Approach for improving the sales of
5 pages
Presented by
No ratings yet
Presented by
17 pages
Prediction of Consumer Purchase Intension
100% (1)
Prediction of Consumer Purchase Intension
4 pages
AI for Entrepreneurs Leveraging Artificial Intelligence to Scale Businesses in the Digital Era
From Everand
AI for Entrepreneurs Leveraging Artificial Intelligence to Scale Businesses in the Digital Era
Yahya Zakaria
No ratings yet
mathematics-11-00025
No ratings yet
mathematics-11-00025
24 pages
E-commerce
From Everand
E-commerce
SANJIVAN SAINI
No ratings yet
CDP Systems and Implementation: Definitive Reference for Developers and Engineers
From Everand
CDP Systems and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Large Scale Product Recommendation of Supermarket
No ratings yet
Large Scale Product Recommendation of Supermarket
19 pages
Ieee Main
No ratings yet
Ieee Main
15 pages
Data Mining For Customer Segmentation
No ratings yet
Data Mining For Customer Segmentation
13 pages
Advanced E-Commerce Business Questions and Analytical Hints
From Everand
Advanced E-Commerce Business Questions and Analytical Hints
Zemelak Goraga
No ratings yet
E-Commerce Customer Churn Prevention Using Machine Learning-Based
No ratings yet
E-Commerce Customer Churn Prevention Using Machine Learning-Based
8 pages
Sales Analysis of E-Commerce Websites Using Data M
No ratings yet
Sales Analysis of E-Commerce Websites Using Data M
6 pages
The MSP’s Guide to the Ultimate Client Experience: Optimizing service efficiency, account management productivity, and client engagement with a modern digital-first approach.
From Everand
The MSP’s Guide to the Ultimate Client Experience: Optimizing service efficiency, account management productivity, and client engagement with a modern digital-first approach.
Jeff Farris
No ratings yet
Walking the Design for Six Sigma Bridge with Your Customer
From Everand
Walking the Design for Six Sigma Bridge with Your Customer
Carl Cordy
No ratings yet
Organization Development: Developing the Processes and Resources for High-Tech Businesses
From Everand
Organization Development: Developing the Processes and Resources for High-Tech Businesses
Baisham Chatterjee
No ratings yet
Machine Learning Evaluation of Key Aspects of User Preferences and Usability of E-Commerce Websites
No ratings yet
Machine Learning Evaluation of Key Aspects of User Preferences and Usability of E-Commerce Websites
7 pages
Customer Behaviour Prediction Using Web Usage Mining
No ratings yet
Customer Behaviour Prediction Using Web Usage Mining
5 pages
An_Effective_Predicting_E_Commerce_Sales
No ratings yet
An_Effective_Predicting_E_Commerce_Sales
11 pages
Free Antivirus and its Market Implimentation: a Case Study of Qihoo 360 And Baidu
From Everand
Free Antivirus and its Market Implimentation: a Case Study of Qihoo 360 And Baidu
Yang Yiming
No ratings yet
Seippel MA Eemcs
No ratings yet
Seippel MA Eemcs
95 pages
From Clicks To Conversion
From Everand
From Clicks To Conversion
Max Romano
No ratings yet
Mastering Lead Generation with DeepSeek AI: Unlocking the Future of Customer Acquisition
From Everand
Mastering Lead Generation with DeepSeek AI: Unlocking the Future of Customer Acquisition
Robert Cullen
No ratings yet
NCKH
No ratings yet
NCKH
44 pages
Digital Marketing Mastery: A Comprehensive Guide to Success in the Digital Landscape
From Everand
Digital Marketing Mastery: A Comprehensive Guide to Success in the Digital Landscape
Sam Marie
No ratings yet
How AI will Impact Retail Business
From Everand
How AI will Impact Retail Business
Ramesh Venkatachalam
No ratings yet
Ieee
No ratings yet
Ieee
8 pages
Business Analytics: Leveraging Data for Insights and Competitive Advantage
From Everand
Business Analytics: Leveraging Data for Insights and Competitive Advantage
Ronald BLaha
No ratings yet
Network Solutions: Mastering Excellence for Your Customers
From Everand
Network Solutions: Mastering Excellence for Your Customers
Pasquale De Marco
No ratings yet
E-Commerce Guide
From Everand
E-Commerce Guide
ngencoband
No ratings yet
Spencer: Privacy and Predictive Analytics in E-Commerce
No ratings yet
Spencer: Privacy and Predictive Analytics in E-Commerce
19 pages
On Applying On Applying On Applying On Applying Neuro Neuro Neuro Neuro - C C C Computing in E Omputing in E Omputing in E Omputing in E - Com Domain Com Domain Com Domain Com Domain
No ratings yet
On Applying On Applying On Applying On Applying Neuro Neuro Neuro Neuro - C C C Computing in E Omputing in E Omputing in E Omputing in E - Com Domain Com Domain Com Domain Com Domain
5 pages
Spa&Saloon-Srinivasa V
No ratings yet
Spa&Saloon-Srinivasa V
38 pages
RDO No. 77 - Bacolod City Negros Occidental
No ratings yet
RDO No. 77 - Bacolod City Negros Occidental
396 pages
PHINMA-NCP-FINAL-Template 3
No ratings yet
PHINMA-NCP-FINAL-Template 3
3 pages
Daily Inventory Report Sep 2024 Forecast
No ratings yet
Daily Inventory Report Sep 2024 Forecast
15 pages
专业的求职信服务
100% (2)
专业的求职信服务
6 pages
Balance Bar Setup
No ratings yet
Balance Bar Setup
5 pages
KHYF Viniyoga Therapy Europe 2018-20-21
No ratings yet
KHYF Viniyoga Therapy Europe 2018-20-21
15 pages
ME Answer Keys (Problem Set-1) - 2019
No ratings yet
ME Answer Keys (Problem Set-1) - 2019
6 pages
CSEC SAMPLES-LETTER, NOTICE, REPORT ETC
No ratings yet
CSEC SAMPLES-LETTER, NOTICE, REPORT ETC
9 pages
Catalog 202x
No ratings yet
Catalog 202x
16 pages
Presentation On ERP Implementation
No ratings yet
Presentation On ERP Implementation
11 pages
Export
No ratings yet
Export
14 pages
Vacancies For Various Positions in The County Government of Taita Taveta Dated Friday 9th December 2022
No ratings yet
Vacancies For Various Positions in The County Government of Taita Taveta Dated Friday 9th December 2022
34 pages
Flir-Duo-User-Guide FLIR-MAVLINK
No ratings yet
Flir-Duo-User-Guide FLIR-MAVLINK
110 pages
FS 2 Activity 1
No ratings yet
FS 2 Activity 1
8 pages
Blockchain For Transportation:: Where The Future Starts
No ratings yet
Blockchain For Transportation:: Where The Future Starts
14 pages
Marelli Motori Digital Avr: Functions and Key Features Inputs
No ratings yet
Marelli Motori Digital Avr: Functions and Key Features Inputs
2 pages
SC
No ratings yet
SC
33 pages
Internship Java[1]
No ratings yet
Internship Java[1]
37 pages
DP-203 Updated Dumps - Data Engineering On Microsoft Azure
No ratings yet
DP-203 Updated Dumps - Data Engineering On Microsoft Azure
60 pages
Hudson Clare Resume
No ratings yet
Hudson Clare Resume
2 pages
A Short Synthesis of Benzomorphane Analgesics ( ) - Metazocine and ( ) - Phenazocine
No ratings yet
A Short Synthesis of Benzomorphane Analgesics ( ) - Metazocine and ( ) - Phenazocine
7 pages
APA Referencing Summary
No ratings yet
APA Referencing Summary
27 pages
C8516-CLT-QMT-0002 Rev.02-Inspection Checklist For Precast Gully Installation
No ratings yet
C8516-CLT-QMT-0002 Rev.02-Inspection Checklist For Precast Gully Installation
1 page
TCPL New
No ratings yet
TCPL New
2 pages
SIST-EN-15085-3-2023
No ratings yet
SIST-EN-15085-3-2023
15 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
7 pages
Differential Amplifiers Problems of MOS
No ratings yet
Differential Amplifiers Problems of MOS
16 pages
Bank Secrecy Law
No ratings yet
Bank Secrecy Law
6 pages

Log-Based Session Profiling and Online Behavioral Prediction in ECommerce Websites

Uploaded by

Log-Based Session Profiling and Online Behavioral Prediction in ECommerce Websites

Uploaded by

Received July 28, 2020, accepted August 31, 2020, date of publication September 18, 2020, date of current

version September 30, 2020.

Log-Based Session Profiling and Online

ABSTRACT Improvements to customer experience give companies a competitive advantage, as understand-

I. INTRODUCTION applied to create a segmentation on the available data [4], [5].

VOLUME 8, 2020 171835

171836 VOLUME 8, 2020

VOLUME 8, 2020 171837

171838 VOLUME 8, 2020

the IP address from which the session was established,

VOLUME 8, 2020 171839

171840 VOLUME 8, 2020

TABLE 2. Properties considered as characterizing user sessions.

VOLUME 8, 2020 171841

TABLE 3. Normalized clustering results for k = 4.

171842 VOLUME 8, 2020

VOLUME 8, 2020 171843

TABLE 4. Definition of formulas used in validation queries.

TABLE 5. Results for the profile’s validation process.

171844 VOLUME 8, 2020

VOLUME 8, 2020 171845

171846 VOLUME 8, 2020

The online analysis system can make a first initial prediction

VIII. INTEGRATION OF THE PREDICTION SYSTEM INTO

A. DESCRIPTION OF THE TECHNICAL PROPOSAL

VOLUME 8, 2020 171847

171848 VOLUME 8, 2020

VOLUME 8, 2020 171849

171850 VOLUME 8, 2020

You might also like