0% found this document useful (0 votes)
17 views

2015 - A Location-Item-Time Sequential Pattern Mining Algorithm

This document presents a Location-Item-Time sequential pattern mining algorithm to discover visitors' spatial and temporal behavior patterns in a theme park. It then uses the frequent patterns to develop a route recommendation system that provides personalized route suggestions for visitors under their time, location and item constraints.

Uploaded by

Abusya Seyd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

2015 - A Location-Item-Time Sequential Pattern Mining Algorithm

This document presents a Location-Item-Time sequential pattern mining algorithm to discover visitors' spatial and temporal behavior patterns in a theme park. It then uses the frequent patterns to develop a route recommendation system that provides personalized route suggestions for visitors under their time, location and item constraints.

Uploaded by

Abusya Seyd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Knowledge-Based Systems 73 (2015) 97–110

Contents lists available at ScienceDirect

Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys

A Location-Item-Time sequential pattern mining algorithm for route


recommendation
Chieh-Yuan Tsai a,b,⇑, Bo-Han Lai a
a
Department of Industrial Engineering and Management, Yuan-Ze University, Taiwan
b
Innovation Center for Big Data and Digital Convergence, Yuan-Ze University, Taiwan

a r t i c l e i n f o a b s t r a c t

Article history: To survive in a rapidly changing environment, theme parks need to provide high quality services in terms
Received 14 January 2014 of visitor tastes and preferences. Understanding the spatial and temporal behavior of visitors could
Received in revised form 5 September 2014 enhance the attraction management and geographical distribution for visitors. To fulfill the need, this
Accepted 26 September 2014
research defined a Location-Item-Time (LIT) sequence to describe visitor’s spatial and temporal behavior.
Available online 5 October 2014
Then, the Location-Item-Time PrefixSpan (LIT-PrefixSpan) mining algorithm is developed to discover fre-
quent LIT sequential patterns. Next, the route suggestion procedure is proposed to retrieve suitable LIT
Keywords:
sequential patterns for visitors under the constraints of their intended-visiting time, favorite regions,
Recommendation systems
Sequential pattern
and favorite recreation facilities. A simplified theme park is used as an example to show the feasibility
Sequence mining of the proposed system. The experimental results show that the system can help managers understand
Behavior computing visitors’ behavior and provide appropriate visiting experiences for visitors.
Theme park Ó 2014 Elsevier B.V. All rights reserved.

1. Introduction a multi-criteria decision making technique to design a personalized


route planning system. Schiaffino and Amandi [14] developed an
A theme park is an aggregation of attractions including archi- expert software agent in the tourism and travel domain, named
tecture, landscape, rides, shows, food services, costumed personnel Traveler. This agent combines collaborative filtering with content-
and retail shops. Well-known examples include Disney World, Dis- based recommendations and demographic information about
neyland, Universal Studios and Six Flags. Although the theme park customers to make recommendations. García-Crespo et al. [3]
industry has enjoyed steady attendance growth in the past several presented the SPETA system, which uses knowledge of user’s current
decades, the theme park market has entered a mature stage and is location, preferences, as well as a history of past locations to provide
no longer experiencing high growth [5,6]. To survive in a rapidly the type of recommendation services that tourists expect from a real
changing environment, theme parks need to provide high quality tour guide. Tsai and Lo [17] took previous popular visiting behaviors
services in terms of visitor tastes and preferences. Understanding as the foundation and developed a sequential pattern based route
the spatial and temporal behavior of visitors could enhance the suggestion system to generate personalized tours. Tsai and Chung
management of attractions and contribute to extending the geo- [16] developed a route recommendation system that provides per-
graphical distribution of visitors within regions. sonalized visiting routes for tourist in theme parks that consider a
In the past decade, the recommendation technique has been set of visiting sequences. Based on the retrieved visiting behavior
regarded as a popular technique for providing a variety of products, data and facility queuing situation, their system can generate a
services and items to customers in the tourism industry [4,7,13]. proper route suggestion for visitors.
Personalized tourism services aim at helping users to find what they The above recommendation systems have demonstrated them-
are looking for by comparing the user profile to reference character- selves efficient tools by designing user interfaces that can smoothly
istics. Wang et al. [19] presented semantic web technologies for pro- interact with the environment, providing convenient information
viding personalized access to digital museum collections. Niaraki query tools, or suggesting a set of associated products (or services).
and Kim [12] proposed a generic ontology-based architecture using However, three major problems are revealed. First, these systems
simply return a set of suggested facilities (items) in a sequential
⇑ Corresponding author at: Department of Industrial order, but fail to illustrate the complete visiting path for visitors.
Engineering and
Management, Yuan-Ze University, Taiwan. For example, their systems might suggest a visitor visit items k1,
E-mail address: [email protected] (C.-Y. Tsai). k4, and k8 in order (i.e., k1 ? k4 ? k8). However, the actual path

http://dx.doi.org/10.1016/j.knosys.2014.09.012
0950-7051/Ó 2014 Elsevier B.V. All rights reserved.
98 C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110

to complete the route should contain ‘‘by-pass items’’ such as customers into consideration. Tseng and Lin [18] proposed a novel
k1 ? k4 ? k7 ? k8, k1 ? k4 ? k6 ? k8, or even k1 ? k4 ? k7 ? data mining method, namely SMAP-Mine that can efficiently dis-
k6 ? k8. Without providing complete path information, a visitor cover mobile users’ sequential movement patterns associated with
might get confused and spend much more time to finish the route. requested services. Through empirical evaluation under various
Second, previous systems seldom take the geographic constraints simulation conditions, SMAP-Mine is shown to deliver excellent
into consideration so that their suggested routes are often trivial performance in terms of accuracy, execution efficiency and scalabil-
and impractical. For example, previous studies might suggest visi- ity. Meanwhile, the proposed prediction strategies are also verified
tor a long route k1 ? k2 ? k6 ? k4 ? k7 ? k10 ? k8 ? k12. How- to be effective in measurements of precision, hit ratio and
ever, the route is trivial and hard-to-follow since k1, k2, and k4 applicability.
are in region A, k6, k7 and k8 are in region B, and k10 and k12 in Li et al. [8] proposed a Multi-Stage Collaborative Filtering
region C. In fact, a theme park consists of several regions where (MSCF) process to provide the location-aware event recommenda-
each region contains dozens of facilities and shops. It will be tion service in mobile environment. The first stage in MSCF per-
worthwhile to suggest a no-trivial suggestion such as A(k1, k4, forms the People-to-People Collaborative Filtering (P2P-CF),
k2) ? B(k8, k6) ? C(k10, k12) for visitors. Third, previous studies while the Event-to-Event Collaborative Filtering (E2E-CF) discovers
seldom took the time constraints into consideration when they the sequential rules of event-participation in the second stage. Liu
provided route suggestion for visitors. For example, previous and Chang [9] proposed a route recommendation system which
systems simply suggest a route format such as k1 ? k4 ? k8 for vis- guides the user through a series of locations. Their system used
itors. However, when time interval information between items are the methods of sequential pattern mining to extract popular route
revealed, this route will be k1 ? (1 h) ? k4 ? (1 h) ? k8. If the patterns from a large set of historical user’s route records. Then, the
intended-visiting time for a visitor is 90 min, this suggestion is system recommends routes by matching the user’s current route
unacceptable since the visitor cannot finish the route on time. On with the set of popular route patterns. Liu et al. [10] proposed a
the other hand, if intended-visiting time is 300 min, this suggestion novel hybrid recommendation approach that combines the
is not suitable also. Without providing time interval between items segmentation-based sequential rule (SSR) method with the seg-
in the suggestion, tourists are unsure whether she/he can complete mentation-based KNN-CF (SKCF) method. In order to enhance the
the suggested route on time or not. quality of product recommendations, their method considers
To solve the above problems, this research defines a Location- customers’ purchase sequences over time and their purchase data
Item-Time (LIT) sequence to describe visitor’s spatial and temporal for the current period. Hung and Peng [6] proposed a Regression-
behavior. To the best of our knowledge, this study is the first work based approach for mining User Movement Patterns (RUMP). Large
to include location (region), item, and time-interval information Sequence (LS) algorithm extracts the call detail records and Time
simultaneously into a sequence. Then, the Location-Item-Time Pre- Clustering (TC) algorithm determines the number of regression
fixSpan (LIT-PrefixSpan) mining procedure is developed to discover functions. Then, Movement Function (MF) algorithm generates
frequent LIT sequential patterns. Finally, the route suggestion the movement function representing user movement patterns of
procedure is proposed to retrieve suitable LIT sequential patterns mobile users. Lu et al. [11] proposed a hybrid semantic recommen-
under the constraints of visitor’s intended-visiting time, favorite dation approach which integrates item-based CF similarity with
regions and its related visiting time, favorite recreation facilities. item-based semantic similarity techniques. The hybrid semantic
This paper is organized as follows. Section 2 reviews previous recommendation approach has been implemented in an Intelligent
works related to sequential pattern mining and suggestion. Section Business Partner Locator recommendation system prototype
3 introduces the framework of the proposed route recommenda- named BizSeeker. Similarly, Zhang et al. [22] developed a hybrid
tion system. Section 4 demonstrates a case to show the feasibility recommendation approach which combines user-based and item-
of the proposed system. Finally, Section 5 summarizes the conclu- based collaborative filtering techniques with fuzzy set techniques
sions and points out possible future directions. and knowledge base for mobile product and service recommenda-
tion. It particularly implements the approach in a personalized rec-
2. Literature review ommender system for telecom products/services called FTCP-RS.
Although the above sequential pattern algorithms are efficient in
Yavas et al. [20] proposed a three-phase mobility prediction different environment, however, they did not take location, item,
algorithm for the prediction of user movement in a mobile comput- and time-interval information into consideration at the same time.
ing system. Their algorithm enables the system to allocate resource
for users in an efficient manner, and to produce more accurate 3. Research method
answers to location-dependent queries that refer to future positions
of mobile users. Cho et al. [2] proposed a sequential rule-based rec- 3.1. Environment assumption and system overview
ommendation method that considers the evolution of customers’
purchase sequences. The purchase transaction records of a cus- Typically, a theme park is divided into several regions and each
tomer for a certain period are used to build a customer profile. Then, region contains a set of recreation facilities. It is assumed that each
a collaboration-based system is in charge to find a set of customers, region is fully covered by RFID readers. In addition, RFID readers
through calculating the correlations among customers profile. Tan are installed in the entrance of each recreation facility, and
et al. [15] proposed a new approach to build personalization recom- entrance and exit of the park. When a visitor with a RFID tagged
mendation system based on access sequential patterns, named wristband enters a region or entrance of a facility, RFID readers
Frequent Accessed Sequence Tree (FAS-Tree). All frequent access record the RFID tag code, region id, facility id, and the time into a
sequential patterns are compressed into FAS-Tree to save storage route database. The recording process continues until the visitor
greatly. During personalization recommendation stage, it is only leaves the park. Let’s take the layout in Fig. 1 as an example. At
necessary to traverse sub paths of FAS-tree referring to page views timestamp t1, a visitor passes the entrance k11 of the park in region
in active window to find match patterns, without the need to gener- B. Then, she moves to region A at timestamp t2, region F at
ate association rules. Yun and Chen [21] developed a mining mobile timestamp t3, and region G at timestamp t4. In region G, she takes
sequential patterns algorithm to better reflect the customer usage facility k1. After that, she moves to region K at timestamp t5, region
patterns in the mobile commerce environment, which takes both O at timestamp t6. In region O, she takes facilities k2 and k3. The
the moving patterns (location) and purchase patterns (items) of recording process continues until she leaves the park from the exit
C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110 99

<sid, rs> where sid is the identifier of the record and rs is a route
sequence. Formally, rs is represented as <(B1, t1, itemset1), (B2, t2,
itemset2), . . . , (Bn, tn, itemsetn)> where (Bi, ti, itemseti) is an event;
Bi is the visited region and Bi 2 N; ti stands for the timestamp that
region Bi is first entered and ti1 6 ti for 2 6 i 6 n; itemseti is the set
of items visited in region Bi and itemseti # K Without timestamp
information, <Bi, itemseti> is called a transaction if itemseti is a
non-empty set.

Definition 1. A transaction pattern is defined as <Bi; z> where z is


the non-empty subset of itemseti. A transaction pattern <Bi; z> is
called a k-transaction pattern if the length of z is k.

Example 1. There are two route sequences sid 300 and sid 600 in
the route database RD shown in Table 1. 6 itemsets {k11}, {k1},
{k3}, {k4}, {k5}, and {k12} can be found in sid 300, while 4 itemsets
of {k11}, {k1}, {k2, k3}, and {k12} can be found in sid 600. Therefore,
transaction patterns <B; {k11}>, <G; {k1}>, <O; {k3}>, <L; {k4}>,
Fig. 1. An illustrative example for route sequence generation. <Q, {k5}>, <B, {k12}> can be extracted from sid 300. Similarly, trans-
action patterns <B; {k11}>, <G; {k1}>, <O; {k2, k3}>, <O, {k2}>,
<O, {k3}>, <B, {k12}> can be extracted from sid 600. Finally, seven
k12 in region B. Finally, the route sequence <(B, t1, {k11}), (A, t2, /), 1-transaction patterns of <B; {k11}>, <G; {k1}>, <O; {k3}>, <L; {k4}>,
(F, t3, /), (G, t4, {k1}), (K, t5, /), (O, t6, {k2, k3}), (K, t7, /), (G, t8, /), <Q, {k5}>, <B; {k12}> and <O, {k2}> and one 2-transaction pattern of
(B, t9, {k12})> is collected and stored in the route database. <O; {k2, k3}> can be obtained.
Whenever a visitor wants to request a route suggestion, he/she
Let Dt = ti+1  ti be the time interval between two successive
can reach the kiosk machine in the park and input his/her
events where 1 6 i 6 n  1 and Tc be a set of given constants for
preference information to the route recommendation system. The
1 6 c 6 r. Then, the time interval Dt can be mapped as one of the
preference includes intended total visiting time, favorite regions,
elements in the set of discrete time intervals TI = {I1, I2, . . . , Ir} by
intended visiting time in the favorite regions, and favorite recrea-

tion facilities. The route recommendation system consists of two I1 if 0 < Dt 6 T 1
major modules. The first module is to generate a set of frequent DiscTIðDtÞ ¼ ð1Þ
Ij if T j1 < Dt 6 T j for 1 < j 6 r
Location-Item-Time (LIT) sequential patterns from the route
database using the proposed Location-Item-Time PrefixSpan For example, assume T1 = 10, T2 = 20, T3 = 30, T4 = 40, T5 = 50, and
(LIT-PrefixSpan) mining procedure. The second module evaluates T6 = 60. Therefore, the set of discrete time intervals is TI = {I1, I2, I3,
the similarity between the visitor’s preference and candidate LIT I4, I5, I6}, where I1: 0 < Dt 6 10, I2: 10 < Dt 6 20, I3: 20 < Dt 6 30,
routes, retrieves top ranking routes for the visitor. The framework I4: 30 < Dt 6 40, I5: 40 < Dt 6 50, I6: 50 < Dt 6 60.
of the proposed system is shown in Fig. 2.
Definition 2. Let C = {c1, c2, . . . , cn} be the set of transaction
3.2. Location-Item-Time (LIT) sequential patterns patterns and TI = {I1, I2, . . . , Ir} be the set of discrete time intervals.
A sequence b = (D1, e1, D2, e2, . . . , Dq1, eq1, Dq) is a Location-Item-
Let N = {n1, n2, . . . , ng} be the set of cells (regions) in the theme Time (LIT) sequence if Ds e C for 1 6 s 6 q and es e TI for
park and K = {k1, k2, . . . , kh} be the set of items (facilities, entrance, 1 6 s 6 q  1.
and exit). In the route database RD, a record is represented by
3.3. Location-Item-Time mining procedure
Location-Item-Time
Route database Similar to the work of Yun and Chen [21], the proposed LIT
(LIT) mining procedure
sequential pattern mining method consists of three phases: the
First large-transaction generation phase, large-transaction transforma-
Module tion phase, and LIT sequential pattern generation phase.
The set of LIT
sequential patterns 3.3.1. Large-transaction generation phase
The large-transaction generation phase generates the large
transactions from the route database RD. Fig. 3 shows the pseudo-
code of the large-transaction generation algorithm. This algorithm
consists of two main steps. As shown from line 1 to 11, the first step
Visitors
input
Route recommendation derives all k-transaction patterns from the RD according to Defini-
procedure tion 1. In addition, the support count of each k-transaction pattern
preferences
is calculated. The second step, as shown from line 12 to 16, finds
Second the set of large k-transaction patterns. If the support count of a k-
Module transaction pattern is greater than or equal to the user-specified
Return suitable route
suggestion minimum support count (called min_sup_count), the k-transaction
pattern is called a large k-transaction pattern. Next, the itemsets
in all large k-transaction patterns is replaced by unique symbols.
The set of all large k-transaction patterns after symbol replacement
Fig. 2. Two modules in the proposed route recommendation system. are called large 1-sequential patterns.
100 C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110

Table 1
A simple route database, RD.

Sid Route sequence


300 <(B, 8, {k11}), (G, 9, {k1}), (F, 11, /), (K, 24, /), (O, 25, {k3}), (P, 35, /), (L, 37, {k4}), (Q, 39, {k5}), (M, 40, /), (H, 45, /), (D, 46, /), (C, 51, /), (B, 54, {k12})>
600 <(B, 7, {k11}), (A, 8, /), (F, 21, /), (G, 30, {k1}), (K, 41, /), (O, 44, {k2, k3}), (K, 51, /), (G, 54, /), (B, 58, {k12})>

Fig. 3. Pseudo-code of large-transaction generation algorithm.

Fig. 4. Six route sequences in the RD.

Example 2. Let’s take six route sequences in Fig. 4 as example deleted. Therefore, transaction patterns <P; {k9}>, <Q; {k6}>,
to explain the large-transaction generation phase. If the <Q; {k5, k6}> are deleted. Next, the itemsets in all large
min_sup_count is set as 2, 12 candidate 1-transaction patterns k-transaction patterns is replaced by unique symbols as shown
and 2 candidate large 2-transaction patterns are found and in Fig. 5(b). The set of all large k-transaction patterns after
shown in Fig. 5(a). If the sup_count of a transaction pattern is symbol replacement, called large 1-sequential patterns, is
less than the min_sup_count, the transaction pattern should be summarized in Fig. 5(c).
C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110 101

C a n d id a te 1 - tr a n s a c tio n p a tte r n s
C e ll I te m s e t Sup_count
G { k 1} 6
O { k 2} 3
O { k 3} 6
*P { k 9} 1
L { k 4} 5 L arg e L a r g e 1 -s e q u e n tia l p a tte r n s
C e ll I te m s e t
Q { k 5} 3 tr a n s a c tio n L a r g e tr a n s a c tio n Sup_count
*Q { k 6} 1 G { k 1} { G ;g 1 } < G ;g 1 > 6
M { k 7} 2 O { k 2} { O ;g 2 } < O ;g 2 > 3
U { k 8} 2 O { k 3} { O ;g 3 } < O ;g 3 > 6
R { k 10} 2 L { k 4} { L ;g 4} < L ;g 4> 5
B { k 11} 6 Q { k 5} { Q ;g 5 } < Q ;g 5 > 3
B { k 12} 6 M { k 7} { M ;g 6 } < M ;g 6 > 2
U { k 8} { U ;g 7 } < U ;g 7 > 2
C a n d id a te 2 -tr a n s a c tio n p a tte r n s R { k 10} { R ;g 8 } < R ;g 8 > 2
C e ll I te m s e t Sup_count B { k 11} { B ;g 9 } < B ;g 9 > 6
*Q { k 5 ,k 6 } 1 B { k 12} { B ;g 1 0 } < B ;g 1 0 > 6
O { k 2 ,k 3 } 2 O { k 2 ,k 3 } { O ;g 1 1 } < O ;g 1 1 > 2

(a) (b) (c)

Fig. 5. Large 1-sequential patterns.

3.3.2. Large-transaction transformation phase 3.3.3. Location-Item-Time sequential pattern generation phase
The large-transaction transformation phase transforms route Next, a LIT sequential pattern algorithm is developed to gener-
sequences into the maximal large-transaction sequences. Fig. 6 ate all large LIT sequential patterns from the TRD. Similar to Chen
shows the pseudo-code of large-transaction transformation algo- et al. [1], the proposed LIT sequential pattern algorithm, called
rithm. Line 2–4 initializes variables String, ML, and Path as empty LIT-PrefixSpan algorithm, is based on PrefixSpan mining concept.
values. String is a temporary variable storing the on-going string Before introducing the LIT-PrefixSpan algorithm, the following def-
in a buffer; ML represents the on-going maximal large-transaction initions are given.
sequence; Path represents the on-going path of the maximal large-
transaction sequence. For each event (Bi, ti, itemseti) in route Definition 3. For a maximal large-transaction sequence
sequence rs, itemseti might be non-empty or empty. If itemseti is a = (<(B1; z1), t1>, <(B2; z2), t2>, . . . , <(Bn; zn), tn>) and a Location-
non-empty (line 7–14), the algorithm checks whether <Bi, z> exists Item-Time (LIT) sequence b = (D1, e1, D2, e2, . . . , Dq1, eq1, Dq), b is
in the set of large 1-sequential patterns C0 or not where z is non- said to be contained in a or b is a LIT subsequence of a if the
empty subset of itemseti. If z does exist in C0 , the algorithm integers 1 6 j1 < j2 <    < jq 6 n exist such that,
appends its unique symbol g to Gi. After all z are checked,
<(Bi, Gi), ti> will be appended to ML and String will be appended to 1. D1 ¼ ðBj1 ; zj1 Þ; D2 ¼ ðBj2 ; zj2 Þ; . . . ; Dq ¼ ðBjq ; zjq Þ.
Path. Finally, the algorithm sets String to <Bi>. If itemseti is empty 2. tji  tji1 satisfies the condition of time-interval ei1 for 2 6 i 6 q.
(line 16–20), the algorithm checks whether <Bi> has been visited
or not. If <Bi> has not been visited, the algorithm will append
Definition 4. support_countTRD(a) = |{(sid, maximal large-transaction
<Bi> to String. Otherwise, anything after the first <Bi> in String will
sequence, path)| (sid, maximal large-transaction sequence,
be deleted. Through the phase, the record with the form of <sid, rs>
path) e TRD ^ a is contained in TRD}|. A LIT sequence a is called a LIT
in the RD will be transferred to the form of <sid, maximal large-
sequential pattern if the percentage of records in TRD consisting of
transaction sequence, path> which is stored in the transformed
a is greater than or equal to the pre-defined minimum support, called
route database TRD.
min_sup. That is, a is named a LIT sequential pattern in TRD if sup-
port_countTRD(a) P |TRD|  min_sup or support_countTR(a) P min_
sup_count. A LIT sequence whose length is l is denoted as a l-LIT
Example 3. According to the large 1-sequential patterns shown in
sequence.
Fig. 5, Table 2 illustrates the operations in the large-transaction
Definition 5. Given a maximal large-transaction sequence
transformation phase for route sequence sid 600. The first column
a = (<(B1; z1), t1>, <(B2; z2), t2>, . . . , <(Bn; zn), tn>) and a LIT sequence
is the sequence of movements, the second column is the visited
b = (D1, e1, D2, e2, . . . , Dq1, eq1, Dq) (q 6 n), b is a LIT prefix of a if
regions, the third column is the visited time, and the fourth column
and only if (1) Di = (Bi; zi) for 1 6 i 6 m; (2) ti  ti1 satisfies the
is the recreation facilities played by the visitor. The fifth column
condition of ei1 for 1 < i 6 m  1.
gives the on-going large-transaction in the buffer and the sixth
column gives on-going string in the buffer. The seventh column
shows the maximal large-transaction sequence and the eighth Definition 6. Given a maximal large-transaction sequence
column shows the path of the maximal large-transaction sequence. a = (<(B1; z1), t1>, <(B2; z2), t2>, . . . , <(Bn; zn), tn>) and a LIT sequence
After a series of transformation, the maximal large-transaction b = (D1, e1, D2, e2, . . . , Dq1, eq1, Dq) (q 6 n) such that b is a subse-
sequence for sid 600 becomes <<(B; g9), 7>, <(G; g1), 30>, quence of a. Let i1 < i2 <    <iq be the indexes of the large-transaction
<(O; g2, g3, g11), 44>, <(B; g10), 58>> and its path is BAFGKOKGB. patterns in a that match the large-transaction patterns of b. A subse-
Through the same process, all route sequences in the RD of Fig. 4 quence a0 = (< ðB01 ; z01 Þ; t01 >; < ðB02 ; z02 Þ; t 02 >; . . . ; < ðB0p ; z0p Þ; t0p >) of
are transformed to maximal large-transaction sequences in the sequence a, where p = q + n  iq is called a projection of a with
TRD as shown in Table 3. respect to b if and only if (1) b is a LIT prefix of a0 and (2) the last n  iq
102 C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110

Fig. 6. Pseudo-code of large-transaction transformation algorithm.

Table 2
Process of producing the maximal large-transaction sequence for sid 600.

Move Cell Time Items Large-transaction String Maximal large-transaction sequence Path
1 B 7 k11 <B; g9> B <(B; g9), 7> –
2 A 8 – – BA – –
3 F 21 – – BAF – –
4 G 30 k1 <G; g1> G <(B; g9), 7>, <(G; g1), 30> BAF
5 K 41 – – GK – –
6 O 44 k2, k3 <O; g2, g3, g11> O <(B; g9), 7>, <(G; g1), 30>, <(O; g2, g3, g11), 44> BAFGK
7 K 51 – – OK – –
8 G 54 – – OKG – –
9 B 58 k12 <B; g10> B <(B; g9), 7>, <(G; g1), 30>, <(O; g2, g3, g11), 44>, <(B; g10), 58> BAFGKOKGB

Table 3
Transformed route database, TRD.

Sid Maximal large-transaction sequence Path


100 <(B; g9), 4>, <(G; g1), 5>, <(O; g2, g3, g11), 14>, <(L; g4), 20>, <(Q; g5), 22>, <(M; g6), 38>, <(U; g7), 52>, <(B; g10), 60> BGKOTPLQRMQVUPKGB
200 <(B; g9), 1>, <(G; g1), 20>, <(O; g3), 39>, <(L; g4), 46>, <(Q; g5), 47>, <(M; g6), 50>, <(B; g10), 60> BAFGKOTPLQRMIHCB
300 <(B; g9), 8>, <(G; g1), 9>, <(O; g3), 25>, <(L; g4), 37>, <(Q; g5), 39>, <(B; g10), 54> BGFKOPLQMHDCB
400 <(B; g9), 2>, <(G; g1), 7>, <(O; g3), 17>, <(L; g4), 27>, <(R; g8), 46>, <(U; g7), 53>, <(O; g2), 56>, <(B; g10), 60> BFGKOTPLQRQVUTOKGB
500 <(B; g9), 1>, <(G; g1), 2>, <(O; g3), 14>, <(L; g4), 19>, <(R; g8), 40>, <(B; g10), 60> BGKOTPLQRNIDCB
600 <(B; g9), 7>, <(G; g1), 30>, <(O; g2, g3, g11), 44>, <(B; g10), 58> BAFGKOKGB

large-transaction patterns of a0 are the same as the last n  iq large- with respect to a is denoted as TRD|a. The major difference
transaction patterns of a. between LIT-PrefixSpan and I-PrefixSpan is that the LIT-PrefixSpan
includes both cells and items in transaction pattern. Therefore, a
table LIT_Table is used to store this type of relation, where a
Definition 7. Let a0 = (<ðB01 ; z01 Þ; t 01 >; < ðB02 ; z02 Þ; t02 >; . . . ; < ðB0p ; z0p Þ;
column corresponds to a large-transaction pattern and a row cor-
t0p >) be the projection of a with respect to a LIT prefix b = (D1, e1,
responds to a time-interval in TI = {I1, I2, . . . , Ir}. Each cell
D2, e2, . . . , Dq1, eq1, Dq). Then h = (<ðB0qþ1 ; z0qþ1 Þ; t 0qþ1 >; < ðB0qþ2 ; z0qþ2 Þ;
LIT_Table(Ii, c0i ) in the table records the number of transactions in
t0qþ2 >; . . . ; < ðB0p ; z0p Þ; t0p >) is the postfix of a with respect to prefix b.
TRD|a which contains transaction pattern and the time difference
The pseudo-code of the proposed LIT-PrefixSpan algorithm is between this transaction pattern and the last transaction pattern
illustrated in Fig. 7. The a-projected database defined by the collec- of a lies within Ii. Processing every transaction in TRD|a sequen-
tion of postfixes of maximal large-transaction sequences in TRD tially enables LIT_Table to be formed and the frequent cells to be
C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110 103

identified. If the cell LIT_Table(Ii, c0i ) is a frequent cell, (Ii, c0i ) can be the intended visiting time in FRi. For example,
appended to a to yield a LIT sequential pattern a0 , and to construct VP = <420, <G, {k1}, 90>, <O, {k2, k3}, 120>> indicates that a visitor
the a0 -projected database TRDja0 Recursively discovering the LIT intends to spend 420 min in the theme park. In addition, he/she
sequential patterns in TRDja0 finally yields all LIT sequential pat- would like to spend 90 min in region G and take recreation facility
terns in the TRD. k1 in region G, and 120 min in region O and take recreation facility
k1 and k3 in region O. Note that the more information a visitor cen-
Example 4. Suppose TI = {I1, I2, I3, I4, I5, I6}, where I1: 0 < Dt 6 10, I2: ters, the more satisfied suggestion the visitor can obtain.
10 < Dt 6 20, I3: 20 < t 6 30, I4: 30 < Dt 6 40, I5: 40 < Dt 6 50, I6:
50 < Dt 6 60. Consider the TRD shown in Table 3 and the 3.4.1. Time constraint
min_sup_count is set as 2. At the beginning, a is empty and the The number of LIT sequential patterns generated from LIT-Pre-
frequent transaction patterns <B; g9>, <G; g1>, <O; g2>, <O; g3> fixSpan algorithm might be large. However, a LIT sequential pat-
<O; g11>, <L; g4>, <Q; g5>, <M; g6>, <U; g7>, <R; g8> and <B; g10> are tern is a candidate LIT route if the pattern satisfies the following
discovered. Appending these frequent transaction patterns to a is rules. First, a LIT sequential pattern should include entrance and
empty and yields 9 different a0 . Table 4 summarizes the LIT exit. Second, a LIT sequential pattern should satisfy the time con-
sequential pattern mining result. The total number of LIT sequen- straint provided by the visitor. As mentioned in Section 3.2, the
tial patterns is 68 (=11 + 25 + 23 + 8 + 1) since there are 11 1-LIT time interval Dt can be transferred as one of elements in the set
sequential patterns, 25 2-LIT sequential patterns, and so on. of discrete time intervals TI = {I1, I2, . . . , Ir} according to Eq. (1).
Therefore, the lower bound and upper bound of a time intervals
Ij are derived using Eqs. (3) and (4) respectively.
3.4. Route recommendation procedure
(
0; if j ¼ 1
When a visitor requires a route suggestion, he/she is requested f LB ðIj Þ ¼ ð3Þ
to enter personal preference to the route recommendation system T j1 ; if 1 < j 6 r
in the kiosk. The visitor’s preference can be represented as a VP
(
vector: T 1 ; if j ¼ 1
f UB ðIj Þ ¼ ð4Þ
VP ¼< ITVT; < FR1 ; FItems1 ; IRVT 1 >; < FR2 ; FItems2 ; IRVT 2 >; . . . > Tj; if 1 < j 6 r
ð2Þ
Let a LIT sequential pattern b be represented as (D1, e1, D2, e2, . . . ,
where ITVT is the intended total visiting time. FRi is the favorite Dq1, eq1, Dq). The total visiting time of b can be represented as
region i, FItemsi is the set of favorite facilities in FRi, and IRVTi is VTb = (VT bLB ; VT bUB ] where the lower bound of VTb is derived as:

Fig. 7. Pseudo-code of the LIT-PrefixSpan algorithm.


104 C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110

Table 4 where \ is the set union operator and | | is the cardinality of the set.
LIT sequential pattern mining result. In addition, TimeIntervalSim(i, j) is the time interval similarity
k Number of k-LIT sequential patterns Sup_Count between IRVTi and ej which is defined as:
patterns
TSimði; jÞ ¼ 1  jf ðIRVT i Þ  f ðej Þj=f ðIr Þ ð9Þ
1 11 <B; g9> 6
<G; g1> 6 where || is the absolute value operator and f(Ib) is the rank of the
  time-interval Ib in TI and is defined as f(Ib) = b where b = 1, . . . , r.
<B; g10> 6
With Eq. (7), the similarity between VP and b is defined as:
2 25 <B; g9>, I6, <B; g10> 5 ,
<B; g9>, I2, <O; g3> 3 X
jVPj X
jbj

  SimðVP; bÞ ¼ simði; jÞ jVPj ð10Þ


<R; g8>, I2, <B; g10> 2 i¼1 j¼1

3 23 <B; g9>, I1, <G; g1>, I6, <B; g10> 3 where || is the length of the sequence. After the similarities
<B; g9>, I1, <G; g1>, I2, <O; g3> 2
 
between VP and all candidate routes are derived, they are sorted
<L; g4>, I1, <O; g5>, I2, <B;g10> 2 in decreasing order and returned back to the kiosk machine as sug-
4 8 <B; g9>, I1, <G; g1>, I4, <R; g8>, I2, <B; g10> 2
gested routes. If more than one candidate routes have the same
<B; g9>, I1, <G; g1>, I1, <O; g3>, I4, <U; g7> 2 similarity value, the route having larger number of total facilities
  has higher ranking order.
<G; g1>, I3, <L; g4>, I1, <Q; g5>, I2, <B; g10> 2
5 1 <B; g9>, I1, <G; g1>, I1, <O; g3>, I4, <U; g7>, 2 Example 6. Assume LIT sequential patterns 1, 2, and 3 in Example
I1, <B; g10> 5 are candidate LIT routes and visitor preference of a visitor is
VP = <300, <O, {k3}, 70>, <Q, {k5, k6}, 100>>. According to discrete
time-interval definition in Example 5, VP will be transferred as
X
q1 <300, <O, {k3}, I3>, <Q, {k5, k6}, I4>>. For candidate LIT route #1, we
VT bLB ¼ f LB ðes Þ ð5Þ have Sim1,1 = 1/3 + 1/3 * (1/1) + 1/3 * (1  |f(I3)  f(I2)|/f(I4)) = 11/
s¼1
12; Sim1,2 = 0; Sim1,3 = 0; Sim1,4 = 0. Sim2,1 = 0; Sim2,2 = 0; Sim2,3 =
and the upper bound of VTb is defined as: 1/3 + 1/3 * (1/2) + 1/3 * (1  |f(I4)  f(I1)|/f(I4)) = 7/12; Sim2,4 = 0.
Hence, the total similarity between VP and candidate LIT route
X
q1
VT bUB ¼ f UB ðes Þ ð6Þ #1 is ((11/12 + 0 + 0 + 0) + (0 + 0 + 7/12 + 0))/2 = 0.75. With the
s¼1 similar process, we have Sim(VP, #1) = 0.75, Sim(VP, #2) = 0.458,
and Sim(VP, #3) = 0.75. It is found that the candidate LIT route #1
If VT bLB 6 ITVT and VT bUB P ITVT, we say that LIT sequential pattern b and #3 have the same total similarity score. When the total
satisfies the a visitor’s time constraint where ITVT is the visitor’s the similarity score are the same, their total number of facilities will be
intended total visiting time in Eq. (2). compared. The total number of facilities for candidate LIT route #1
and #3 are 4 and 5 respectively. Therefore, candidate LIT route #3
Example 5. Suppose TI = {I1, I2, I3, I4} where I1: 0 < Dt 6 30, I2: is ranked as 1. Table 6 shows the final ranking result for the three
30 < Dt 6 60, I3: 60 < Dt 6 90, I4: 90 < Dt 6 120, and five LIT sequen- candidate LIT routes. Based on candidate LIT route #3 and its path,
tial patterns are shown in the first three columns of Table 5. the route recommendation system will suggest a visitor to pass
According to Eqs. (3)–(6), the total visiting time of each pattern can entrance k11 in region B. After time-interval I1, the visitor is
be derived in the last column of Table 5. If a visitor’s intend total suggested to move to region G and take k1. Then, after
visiting time ITVT is 320 min, LIT sequential patterns 1, 2, and 3 are time-interval I3, he/she is suggested to take k4 in region L, and
considered as candidate routes since LIT sequential patterns 4 and 5 so on.
do not satisfy the visitor’s time constraint.

3.4.2. Similarity measurement 4. Implementation and experiment results


The similarity between VP = <ITVT, <FR1, FItems1, IRVT1>, <FR2,
FItems2, IRVT2>, . . .> and candidate LIT route b = ((B1; z1), e1, (B2; z2), The proposed route recommendation system is implemented
e2, . . . , (Bq1; zq1), eq1, (Bq; zq)) is evaluated based on the following using C# and tested on a PC with Core i5 2.80 GHz CPU and 4 GB
concepts. First, the intended visiting time for region i, IRVTi, in VP will memory.
be mapped as one of the elements in TI = {I1, I2, . . . , Ir} according to Eq.
(1) for all i. Second, when conducting the similarity evaluation, <FRi, 4.1. Case description and route generator
FItemsi, IRVTi> in VP and <(Bj; zj), ej> in b are considered as comparison
units. Third, if FRi and Bj are the same region, similarity evaluation In this study, a simplified theme park is used as an example to
between <FItemsi, IRVTi> and <zj, ej> will be initialized. Base on above illustrate the feasibility of the proposed system. As shown in Fig. 8,
concepts, the similarity between ith unit in VP and the jth unit in b is there are seven thematic regions and thirty-four recreation facili-
defined as: ties (k1 to k34). For example, thematic region B contains facilities
 k1, k2, and k3, while thematic region H contains facilities k31, k32,
w1  1 þ w2  ISimði; jÞ þ w3  TSimði; jÞ if FRi ¼ Bj
Simi;j ¼ k33, and k34. To simulate visiting behavior, a route generator is
0 if FRi –Bj
developed. In the generator, visitors start their visiting from the
ð7Þ entrance (k35) and finish at the exit (k36). The regions that visitors
where w1, w2, and w3 are the important degrees for region, facility, pass through must be adjacent. The total visiting time of a route
time-interval considerations respectively, and w1 + w2 + w3 = 1. sequence is randomly determined by a uniform distribution within
ISim(i, j) is the itemset similarity between FItemsi and zj which is 780 (min) since the operation time of the park is from 9:00 a.m. to
defined as: 10:00 p.m. The time in which a visitor moves to the next region is
randomly generated from a uniform distribution between 15 (min)
ISimði; jÞ ¼ jFItemsi \ zj j=jFItemsi j ð8Þ and 30 (min). In addition, the time in which a visitor spends for
C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110 105

Table 5
Five LIT sequential patterns.

No LIT sequential pattern Path Total visiting time


1 <B; k11>, I1, <O; k3>, I2, <L; k4>, I4, <Q; k6>, I1, <M; k7>, I4, <B; k12> BAKOKLQMHCB (210, 360]
2 <B; k11>, I1, <G; k1>, I3, <O; k2, k3>, I4, <L; k4>, I4, <B; k12> BGKOTPLHCB (240, 360]
3 <B; k11>, I1, <G; k1>, I3, <L; k4>, I4, <Q; k6>, I1, <O; k2, k3>, I2, <B; k12> BGLQPOKFB (180, 330]
4 <B; k11>, I1, <G; k1>, I1, <O; k3>, I4, <U; k8>, I4, <B; k12> BGKOTUQLHCB (180, 300]
5 <B; k11>, I1, <G; k1>, I3, <L; k4>, I1, <Q; k5>, I2, <B; k12> BGLQMHCB (90, 210]

Table 6
Three LIT candidate routes and their rankings.

No. Candidate LIT route Path Total visiting Total similarity Total facility Final
time score number rank
3 <B; k11>, I1, <G; k1>, I3, <L; k4>, I4, <Q; k6>, I1, <O; k2, k3>, I2, BGLQPOKFB (180, 330] 0.75 5 1
<B; k12>
1 <B; k11>, I1, <O; k3>, I2, <L; k4>, I4, <Q; k6>, I1, <M; k7>, I4, <B; k12> BAKOKLQMHCB (210, 360] 0.75 4 2
2 <B; k11>, I1, <G; k1>, I3, <O; k2, k3>, I4, <L; k4>, I4, <B; k12> BGKOTPLHCB (240, 360] 0.458 4 3

taking a recreation facility is randomly generated from a uniform To validate the proposed route recommendation module, differ-
distribution between 30 (min) and 90 (min). ent visitor’s preferences shown in Table 8 are experimented. Case I
According to the tourism reports, five must-visited recreation is the case previously introduced and used as the benchmark case.
facilities are k4, k12, k13, k25, k26 and seven popular facilities are For Case II, a shorter intended-leaving time (300 min) is inputted.
k2, k6, k17, k22, k23, k24, k32. Therefore, if a generated route sequence Therefore, it is straightforward that less recreation facilities will
does not contain one of the five must-visited recreation facilities, be suggested. Fig. 10(b) shows the suggested rout <A; k35>, I1,
the route will be discarded. Likewise, if a generated route sequence <D; k12, k13>, I4, <F; k22>, I4, <A; k36> with the path ABDFCBA. For
does not contain one of the seven popular recreation facilities, this Case III, the visitor simply inputs the constraints of takings k12 in
route sequence will have 80% of probability to be discarded. In region D and spending 150 min in region D. Since less constrains
addition, the average number of visitors in the theme park is are provided in Case III, the similarity between the visitor’s prefer-
26,000 per day. Therefore, 26,000 route sequences are generated ence and many candidate routes are 1. Fig. 10(c) shows one of can-
to simulate the visiting behaviors of visitors in one day. didate routes, <A; k35>, I1, <B; k1, k2, k3>, I6, <D; k12, k13>, I4, <A; k36>,
suggested by the system. For Case IV, the intended-leaving time is
the same as the one in Case I, but other preferences are different.
4.2. Route recommendation
Fig. 10(d) shows the route recommendation system suggests 3 rec-
reation facilities (k3, k12, k32) among 3 regions (B, D, H) for Case IV.
Before executing the proposed LIT-PrefixSpan mining procedure,
the minimum support and the set of discrete time intervals should
be determined. For simplicity, the time intervals in this study are 4.3. Experimental designs
set as equal length of 30 min and the minimum support is set as
0.02%. That is, the set of discrete time intervals are TI = {I1, I2, I3, . . . , In the proposed route recommendation system, different
I20}, where I1: 0 < t 6 30, I2: 30 < t 6 60; I3: 60 < t 6 90, . . . , I20: parameter settings might affect the final suggestion results. There-
760 < t 6 800. Based on the settings, 380,735 LIT sequential pat- fore, a set of experiments are conducted to observe the affection
terns are discovered from LIT-PrefixSpan mining procedure. caused by these parameters. Without other notice, the setting of
Assume a new visitor intends to spend 420 min (7 h) and parameters and visitor preference is the same as Case I in Section
wishes to play recreation facility {k12} of region D, and recreation 4.2.
facility {k22} of region F. In addition, he/she wishes to spend
150 min in region D and 120 min in region F, respectively. Thus, 4.3.1. Discussion of data size
the visitor preference, VP, is <420, <D, {k12}, 150>, <F, {k22}, 120>>. As discussed in Section 3.3, the LIT-PrefixSpan mining proce-
The important degrees for region w1, facility w2, time-interval w3 dure module consists of three major phases: the large-transaction
in Eq. (7) are set equally as 1/3. Based on the set of discrete generation phase (Phase I), the large-transaction transformation
time-intervals I, the total visiting time (VTu) of each LIT sequential phase (Phase II), and the Location-Item-Time sequential pattern
pattern can be calculated. After deleting the sequential patterns generation phase (Phase III). To observe how the number of route
that do not contains entrance and exit as well as the patterns that sequences (data size) affects the LIT-PrefixSpan mining procedure
do not satisfy the time constraint, 5471 candidate LIT routes can be module, data size is changed from 10,000 to 26,000. Table 9 sum-
found. Table 7 shows the ranking information of candidate LIT marizes the execution time of each phase in the LIT-PrefixSpan
routes derived by the route recommendation generation module. mining procedure module. It is clear that, when the number of
Fig. 9 shows top one ranking visiting route. The recommenda- route sequences increases, the execution time for the LIT-Prefix-
tion system suggests the visitor starts the trip from the entrance Span mining procedure module increases linearly. In addition,
in region A. Within 30 min (time-interval I1), the visitor is sug- the execution time of Phase III is significantly longer than the time
gested to take k2 recreation facility in region B. After 120– of other two phases. Although the LIT-PrefixSpan mining proce-
160 min (time-interval I4), the system suggests the visitor takes dure module takes much time to execute, this module is typically
k12 and k13 in region D. Again, after 120–160 min (time-interval daily or weekly instead of every request.
I4), the visitor is suggested to take k22 in region F. Finally, after
120–160 min (time-interval I4), the visitor is suggested to leave 4.3.2. Discussion of minimum support
the theme park from the exit in region A by passing through To know how the minimum support in LIT-PrefixSpan mining
regions C, B, and A sequentially. procedure module affect the result, the minimum support ranging
106 C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110

G 26
25 31
29
32

F 22
30 27
28 H
24 18 14
23 16 19
E
17
20 33
15

12 34
D
13 21
8 11
3

10
C 1
7

5
B
6

4 2
9
35 36

Fig. 8. Layout of the implementation example.

Table 7
Ranking information of each candidate LIT routes.

Ranking Candidate LIT route Total visiting time (min) Total similarity score Total facility number Sup. Path
1 <A; k35>, I1, <B; k2>, I4, <D; k12, k13>, I4, <F; k22>, I4, <A; k36> (360, 520] 0.991667 4 5 ABDFCBA
ABDFCBA
ABDFCBA
ABDFCBA
ABDFCBA
2 <A; k35>, I1, <B; k2>, I4, <D; k12>, I4, <F; k22>, I4, <A; k36> (360, 520] 0.991667 3 5 ABDFCBA
ABDFCBA
ABDFCBA
ABDFCBA
ABDFCBA
3 <A; k35>, I1, <B; k2>, I3, <D; k12, k13>, I4, <F; k22>, I5, <A; k36> (360, 520] 0.983333 4 5 ABDFCBA
ABDFCBA
ABDFGDBA
ABDFCBA
ABDFCBA
4 <A; k35>, I3, <D; k12, k13>, I4, <F; k22>, I5, <A; k36> (360, 480] 0.983333 3 8 ABDFCBA
ABDFGDBA

ABDFCBA
      
5471 <A; k35>, I1, <B; k1>, I11, <A; k36> (400, 480] 0 1 522 ABDEBA
ABDCBA
..
.
ABDEHEBA

from 0.02% to 0.2% is experimented. Fig. 11(a) represents the num- of candidate LIT routes in route recommendation generation
ber of LIT sequential patterns generated from the LIT-PrefixSpan module under different minimum supports. When the minimum
mining procedure module, and Fig. 11(b) represents the number support increases, both the number of LIT sequential patterns
C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110 107

F 22

12
D
13

B
2

35 36
A

Fig. 9. Visiting sequence recommendation based on visitor’s preference.

and the number of candidate LIT routes decrease. If the minimum number of candidate LIT routes generated from route recommenda-
support value is set as 0.02%, the first module generates 380,735 tion generation module. As shown in Fig. 13, when the range of time
LIT sequential patterns and the second module generates 5471 interval increases, both LIT sequential patterns and the number of
candidate LIT routes. However, if the minimum support is 0.2%, candidate LIT routes increase. That is, if the time interval range is
there are only 14,831 LIT sequential patterns generated from the large, the time between two events will fall into the same time
first module, and 108 candidate LIT routes in the second module. interval range. Thus, it is easier to satisfy the minimum support
Therefore, based on the observation from Fig. 11, the minimum threshold and generate many same LIT sequential patterns. For
support value is suggested as 0.02% in this case. example, assume that there are ten route sequences of
Fig. 12 shows the execution time of LIT-PrefixSpan mining pro- <(D, 28, {k12}), (A, 45, {k35})> and ten route sequences of
cedure module and route recommendation generation module <(D, 40, {k12}), (A, 85, {k35})>. If time-interval range is set as 30 min,
under different minimum supports. It is clear that, when the min- two different LIT sequential patterns <(D; k12), I1, (A; k35)> and
imum supports increases, the execution time of the two modules <(D; k12), I2, (A; k35)> are found where I1: 0–30, I2: 30–60. However,
decreases. It is notes that the execution time for route recommen- if the time-interval range is set as 60 min, two the same LIT sequen-
dation generation module is 1.27 s if minimum support is 0.2%. The tial pattern <(D; k12), I1, (A; k35)> is generated since I1: 0–60. Based
execution time should be acceptable for visitors to conduct the on- on the observation from Fig. 13, time-interval range is suggested
line recommendation request. as the value between 40 min and 60 min to ensure the quality of
the suggested routes.
4.3.3. Discussion of time-interval range
To observe how the time-interval range affects the proposed 4.3.4. Discussion of w1, w2, and w3
route recommendation system, a set of time-interval ranges from In Eq. (7), w1, w2 and w3 are the important degree for region,
10 min to 120 min are experimented. Fig. 13(a) summarizes the facility and time-interval consideration respectively. To observe
number of LIT sequential patterns generated from the LIT-Prefix- how important degree values affect the route ranking, three more
Span mining procedure module and Fig. 13(b) summarizes the experiments are conducted. As shown in Table 10, no matter how
the important degree value is changed, the top-four ranking candi-
date LIT routes are the same. The reason is that the region compar-
Table 8
ison is conducted first according to the third rule of similarity
Different visitors’ preference settings.
measurement design in Section 3.4.2. That is, if the region in the
Case ITVT (min) <FRi, {Fav-itemseti}, VTi> VP vector is not the same with the region in the candidate LIT
I 420 <D, {k12}, 150>, <F, {k22}, 120> route, the similarity between the facilities and time-interval of
II 300 <D, {k12}, 150>, <F, {k22}, 120> the two regions will not be counted. This design makes the impor-
III 420 <D, {k12}, 150>
tant degree values have less affection for the proposed system.
IV 420 <H, {k32}, 120>, <B, {k1}, 90>
Based on the observation from Table 10, the important degree for
108 C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110

G G

F 22 F 22

H H

E E

12 12
D D
13 13

C C

B B
2

35 36 35 36
A A

(a) Case I (b) Case II

G G

32

F F
H
H
E E

12 12
D D
13

3 3

C 1
C
B
B
2

35 36 35 36
A A

(c) Case III (d) Case IV

Fig. 10. Route recommendation results.

Table 9 tools, or suggesting a set of associated products (or services). How-


Execution time (in second) of each phase in LIT-PrefixSpan mining procedure module. ever, three major problems are revealed. First, these systems sim-
The number of route sequences
ply return a set of suggested facilities (items) in sequential order,
but fail to illustrate the complete visiting path for visitors. Second,
10,000 13,000 16,000 19,000 22,000 26,000
previous systems seldom take the geographic constraints into con-
Phase I 0.58 0.74 0.89 1.16 1.22 1.44 sideration so that their suggested routes might be trivial and hard
Phase II 0.96 1.09 1.30 1.66 1.82 2.08
to follow. Third, previous studies seldom take the time interval
Phase III 8240.01 7241.19 10563.63 11088.18 15220.72 17285.32
between items into consideration. To solve the above problems,
Total 8241.55 7243.01 10565.82 11090.99 15223.77 17288.83
this research defines a Location-Item-Time (LIT) sequence to
describe visitor’s spatial and temporal behavior. To the best of
our knowledge, this study is the first work to include location
region, facility and time-interval is suggested as 1/3, 1/3, and 1/3 (region), item, and time-interval information at the same time into
respectively in this study. a sequence. Then, the Location-Item-Time PrefixSpan (LIT-Prefix-
Span) mining procedure is developed to discover frequent LIT
5. Conclusions and further study sequential patterns. Next, the route suggestion procedure is devel-
oped to retrieve suitable LIT sequential patterns. The experimental
In the past decade, the recommendation technique has been results show that the managers can understand their visitors
regarded as a popular technique for providing a variety of products, clearly in terms of proposed Location-Item-Time sequential
services and items to potential visitors in the tourism industry. patterns.
Many recommendation systems have demonstrated themselves Although the case of a theme park is illustrated in this paper,
efficient tools by designing user interfaces that can smoothly inter- the proposed three-phase methodology can be applied to any field
act with the environment, providing convenient information query if records of location, item, and time are available. For example, in
C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110 109

(a) (b)
Fig. 11. (a) Number of LIT sequential patterns and (b) number of candidate LIT routes under different minimum supports.

Fig. 12. Execution time of the two modules under different minimum supports.

(a) (b)

Fig. 13. (a) Number of LIT sequential patterns and (b) number of candidate LIT routes under different time-interval ranges.

mobile commerce environment, a customer moves among cellular mobile device. Through the proposed recommendation system, a
girds and makes transaction in the corresponding cell through the customer can obtain real time store/shopping suggestion by the
110 C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110

Table 10
Route ranking using different important degrees.

w1 w2 w3 Ranking Candidate LIT route Total similarity score


1/3 1/3 1/3 1 <A; k35>, I1, <B; k2>, I4, <D; k12, k13>, I4, <F; k22>, I4, <A; k36> 0.9917
2 <A; k35>, I1, <B; k2>, I4, <D; k12>, I4, <F; k22>, I4, <A; k36> 0.9917
3 <A; k35>, I1, <B; k2>, I3, <D; k12, k13>, I4, <F; k22>, I5, <A; k36> 0.9833
4 <A; k35>, I3, <D; k12, k13>, I4, <F; k22>, I5, <A; k36> 0.9833
5 <A; k35>, I1, <B; k3>, I4, <D; k12>, I3, <F; k22>, I4, <A; k36> 0.9833
6 <A; k35>, I1, <B; k2>, I3, <D; k12>, I4, <F; k22>, I5, <A; k36> 0.9833
0.8 0.1 0.1 1 <A; k35>, I1, <B; k2>, I4, <D; k12, k13>, I4, <F; k22>, I4, <A; k36> 0.9975
2 <A; k35>, I1, <B; k2>, I4, <D; k12>, I4, <F; k22>, I4, <A; k36> 0.9975
3 <A; k35>, I1, <B; k2>, I3, <D; k12, k13>, I4, <F; k22>, I5, <A; k36> 0.995
4 <A; k35>, I3, <D; k12, k13>, I4, <F; k22>, I5, <A; k36> 0.995
5 <A; k35>, I1, <B; k3>, I4, <D; k12>, I3, <F; k22>, I4, <A; k36> 0.995
6 <A; k35>, I1, <B; k2>, I3, <D; k12>, I4, <F; k22>, I5, <A; k36> 0.995
0.1 0.8 0.1 1 <A; k35>, I1, <B; k2>, I4, <D; k12, k13>, I4, <F; k22>, I4, <A; k36> 0.9975
2 <A; k35>, I1, <B; k2>, I4, <D; k12>, I4, <F; k22>, I4, <A; k36> 0.9975
3 <A; k35>, I1, <B; k2>, I3, <D; k12, k13>, I4, <F; k22>, I5, <A; k36> 0.995
4 <A; k35>, I3, <D; k12, k13>, I4, <F; k22>, I5, <A; k36> 0.995
5 <A; k35>, I1, <B; k3>, I4, <D; k12>, I3, <F; k22>, I4, <A; k36> 0.995
6 <A; k35>, I1, <B; k2>, I3, <D; k12>, I4, <F; k22>, I5, <A; k36> 0.995
0.1 0.1 0.8 1 <A; k35>, I1, <B; k2>, I4, <D; k12, k13>, I4, <F; k22>, I4, <A; k36> 0.98
2 <A; k35>, I1, <B; k2>, I4, <D; k12>, I4, <F; k22>, I4, <A; k36> 0.98
3 <A; k35>, I1, <B; k2>, I3, <D; k12, k13>, I4, <F; k22>, I5, <A; k36> 0.96
4 <A; k35>, I3, <D; k12, k13>, I4, <F; k22>, I5, <A; k36> 0.96
5 <A; k35>, I1, <B; k2>, I3, <D; k12>, I4, <F; k22>, I5, <A; k36> 0.96
6 <A; k35>, I3, <D; k12>, I4, <F; k22>, I5, <A; k36> 0.96

mobile device before he/she moves to the next cellular grid. Simi- [5] C.Y. Heo, S. Lee, Application of revenue management practices to the theme
park industry, Int. J. Hospitality Manage. 28 (3) (2009) 446–453.
larly, in a grocery store, a customer moves around store aisles and
[6] C.C. Hung, W.C. Peng, A regression-based approach for mining user movement
pick up his/her target products. The recommendation system can patterns from random sample data, Data Knowl. Eng. 70 (1) (2011) 1–20.
provide the customer an efficient moving path and prompt him/ [7] K. Kabassi, Personalizing recommendation for tourists, Telemetric Inform. 27
her other popular products to increase cross-selling opportunity. (1) (2010) 51–66.
[8] L.H. Li, F.M. Lee, Y.C. Chen, C.Y. Cheng, A multi-stage collaborative filtering
Some potential extensions for this research are as follows. First, approach for mobile recommendation, in: Proceedings of the 3rd International
in some cases, the entrance and exit of a facility might belong to Conference on Ubiquitous Information Management and Communication,
different regions. It would be worthwhile to discuss such irregular 2009, pp. 88–97.
[9] D. Liu, M. Chang, Recommend touring routes to travelers according to their
layouts in the future. Second, the minimum support, time interval sequential wandering behaviors, in: Proceedings of the 10th International
range, and the important degree should be decided by users cur- Symposium on Pervasive Systems, Algorithms, and Networks, 2009, pp. 350–
rently. Further studies can explore how to automate the parameter 355.
[10] D.R. Liu, C.H. Lai, W.J. Lee, A hybrid of sequential rules and collaborative
settings by adopting optimization techniques. Third, when visitors filtering for product recommendation, Inform. Sci. 179 (20) (2009) 3505–3519.
are visiting a theme park, they might plan to visit some facilities [11] J. Lu, Q. Shambour, Y. Xu, Q. Lin, G. Zhang, BizSeeker: a hybrid semantic
than others. As such, the further study can ask visitors input the recommendation system for personalized government-to-business e-services,
Internet Res. 20 (3) (2010) 342–365.
facility priorities and rearrange the route according to the priori- [12] A.S. Niaraki, K. Kim, Ontology based personalized route planning system using
ties. Finally, the proposed system assumes that a visitor makes a a multi-criteria decision making approach, Expert Syst. Appl. 36 (2) (2009)
recommendation request at the time he/she enters the park. It is 2250–2259.
[13] M. Salehi, I.N. Kamalabadi, Hybrid recommendation approach for learning
possible, however, that a visitor wants to make a recommendation
material based on sequential pattern of the accessed material and the learner’s
request at anytime and anywhere in the park. The future study preference tree, Knowl.-Based Syst. 48 (2013) 57–69.
might record visitor’s requested location and time so that the [14] S. Schiaffino, A. Amandi, Building an expert travel agent as a software agent,
system can provide more flexible suggestions. Expert Syst. Appl. 36 (2) (2009) 1291–1299.
[15] X. Tan, M. Yao, M. Xu, An effective technique for personalization
recommendation based on access sequential patterns, in: Proceedings of
2006 IEEE Asia-Pacific Conference on Services Computing, 2006, pp. 42–46.
Acknowledgement [16] C.Y. Tsai, S.H. Chung, A personalized route recommendation service for theme
parks using RFID information and tourist behavior, Decis. Support Syst. 52 (2)
(2012) 514–527.
This work was partially supported by the National Science [17] C.Y. Tsai, P.H. Lo, A sequential pattern based route suggestion system, Int. J.
Council of Taiwan, R.O.C., No. 102-2221-E-155-041-MY3. Innovative Comput., Inform. Control 6 (10) (2010) 4389–4408.
[18] V.S. Tseng, K.W. Lin, Efficient mining and prediction of user behavior patterns
in mobile web systems, Inf. Softw. Technol. 48 (6) (2006) 357–369.
References [19] Y. Wang, N. Stash, L. Aroyo, P. Gorgels, L. Rutledged, G. Schreiberb,
Recommendations based on semantically enriched museum collections, Web
[1] Y.L. Chen, M.C. Chiang, M.T. Ko, Discovering time-interval sequential patterns Semantics: Sci., Serv. Agents World Wide Web 6 (4) (2008) 283–290.
in sequence database, Expert Syst. Appl. 25 (3) (2003) 343–354. [20] G. Yavas, D. Katsaros, O. Ulusoy, Y. Manolopoulos, A data mining approach for
[2] Y.B. Cho, Y.-H. Cho, S.H. Kim, Mining changes in customer buying behavior for location prediction in mobile environments, Data Knowl. Eng. 54 (2) (2005)
collaborative recommendations, Expert Syst. Appl. 28 (2) (2005) 359–369. 121–146.
[3] A. García-Crespo, J. Chamizo, I. Rivera, M. Mencke, R. Colomo-Palacios, J.M. [21] C.H. Yun, M.S. Chen, Mining mobile sequential patterns in a mobile commerce
Gómez-Berbís, SPETA: social pervasive e-tourism advisor, Telematics Inform. environment, IEEE Trans. Syst., Man Cybernet. Part C: Appl. Rev. 37 (2) (2007)
26 (3) (2009) 306–315. 278–295.
[4] A. Guerbas, O. Addam, O. Zaarour, M. Nagi, A. Elhajj, M. Ridley, R. Alhajj, [22] Z. Zhang, H. Lin, K. Liu, D. Wu, G. Zhang, J. Lu, A hybrid fuzzy-based
Effective web log mining and online navigational pattern prediction, Knowl.- personalized recommender system for telecom products/services, Inform. Sci.
Based Syst. 49 (2013) 50–62. 235 (2013) 117–129.

You might also like