Purpose-of-Visit-Driven Semantic Similarity Analysis On Semantic Trajectories For Enhancing The Future Location Prediction-2
Purpose-of-Visit-Driven Semantic Similarity Analysis On Semantic Trajectories For Enhancing The Future Location Prediction-2
Abstract—The number of people that are using or are even conducting an ongoing online survey1 . Based on their results
dependent on Location Based Services (LBS) is growing rapidly so far, 62% of the participants feel that an A.I. should fulfil
every year. In order to offer timely and user-tailored services, their needs before even asking and 74% state that an A.I.
providers rely increasingly on forward-looking algorithms. For
this reason, location prediction plays a key role in LBS. Recent should prevent them from making mistakes, thus confirming
approaches in location prediction leverage semantics in order the significance of anticipatory behaviour. Consequently, the
to overcome drawbacks that characterise conventional non- significance and value of location prediction with respect to
semantic systems. However, when it comes to modelling locations, the domain of location-based services and of context-aware
the majority of them constrain themselves to static semantical systems in general is self-explanatory.
constructs and hierarchies, without taking the current situation,
and most importantly, the users’ varying personal perception into Location information reveals to us humans more than just
account. In this work, we introduce a novel dynamic approach the whereabouts. It gives indirectly insight about the what
that aims at taking the variation of the users’ perception explicitly and the when. For instance the location “night club” is put
into consideration when describing locations, in order to elevate usually in context with some overall, high semantic level
the overall prediction performance. For this purpose, we consider
purposes, such as “socializing” and “having fun”, and a set
explicitly time and purpose of visit by building so called Purpose-
of-Visit-Dependent Frames (PoVDF). Our framework is hybrid of elementary, lower semantic level activities, like “drinking”,
and combines both a data-driven, as well as a knowledge-driven “dancing”, and “meeting friends”. Moreover, a human would
model. To fuse these two models, we define a Purpose-of-Visit- additionally associate some corresponding temporal informa-
Driven Semantic Similarity (PoVDSS) metric and use it as a fusing tion, such as “night”, “weekend”, or “once a week” to it. In
component between the two models. We conducted a user study
order for us humans to be capable of interpreting locations at
to evaluate our approach on a real data set and compared it with
two state of the art semantic and non-semantic algorithms. Our such a high level and associating them with all this additional
evaluation shows that our approach yields a location prediction information, we rely both on a broad framework of semantics
accuracy of up to 80%. hidden behind them, as well as on a large portion of world
and common sense knowledge. At the same time, each human
I. I NTRODUCTION takes his/her own personal experience and knowledge into
consideration.
According to recent statistics published by eMarketer [1], A generic semantic framework together with a common
242 million people in the USA are expected to use location- sense knowledge base provide a mutual basis among different
based services (LBS) in 2018, almost twice the number of people for interpreting things similarly. In comparison, per-
LBS users in 2013 (123 millions). While typical context-aware sonal experience can rather lead to different interpretations
applications, such as LBS, are focusing on reacting to users’ among people. Let us clarify this in the location scenario by
location-dependent needs, current approaches strive to behave going back to the example mentioned before. In the particular
in a forward-looking manner in order to raise the quality of example, the location “night club” was interpreted from the
their service. This Proactivity is a key feature, which leads to perspective of a guest, which is the most common one. But
a great number of benefits including higher efficiency and an for the barkeeper, who may probably have to open the place at
improved human-machine-interaction. Space10, the innovation noon in order for the beverage suppliers to replace the empty
lab of IKEA, is currently attempting to determine the way
Artificial Intelligence (A.I.) should interfere in our lives by 1 http://www.doyouspeakhuman.com/
Authorized licensed use limited to: Silpakorn University provided by UniNet. Downloaded on February 15,2024 at 03:52:41 UTC from IEEE Xplore. Restrictions apply.
978-1-5386-3227-7/18/$31.00 ©2018 IEEE 100
CoMoRea'18 - 14th Workshop on Context and Activity Modeling and Recognition
bottles and carries the responsibility that everything works on the other hand indirectly the compliance with the data
alright during the night, the club is a working location, bound protection regulations. Finally and not unimportant in the age
to a completely different set of high and low level activities. A of the Internet of Things (IoT), semantics and particularly the
similar ambiguous effect would arise in the case of a restaurant associated knowledge graphs and ontologies, which are meant
between a guest and the cook or the waiter working there. as tools for sharing knowledge in first place, provide a robust
Let us now consider another example. People, visiting a basis for machine-to-machine (M2M) communication. This is
conference or having a business lunch at a hotel, would vital for creating independent, fully autonomous and intelligent
experience the hotel during that time from a perspective similar environments.
to that of the working receptionist, since their visit are both However, to the best of our knowledge, none of the se-
of professional nature. However, if the same people would mantic trajectory based approaches up to this point have been
enjoy a drink at the bar of the same hotel after their business taking the varying human perception of locations into account.
is over, they would rather associate the hotel more with a Instead, they constrain themselves to static semantic location
night life location, like the “night club” mentioned before. types and inflexible associations between locations and users
This example highlights another important issue, namely that as we will see in the related work chapter (section II). In
people tend to perceive, interpret and associate locations to this paper, we introduce a semantic trajectory based location
each other dynamically, depending on the situation, in which prediction approach that considers explicitly the dynamic,
they find themselves. Nathan et al. underpins this idea by purpose-of-visit-driven varying interpretation and clustering
interpreting movement between locations as the outcome of of locations in order to achieve a higher performance. Our
the synergy of four components [2]: the individual’s internal approach relies on the hypothesis that locations resemble one
state, its motion capacity, its navigation capacity and a group another from the point of view of the user in relation to the
of potential external factors. purpose of visit and that similar location show also similar
Location prediction algorithms that utilize semantics and transitions as well. For this purpose, we combine two different
rely on so called semantic trajectories go beyond plain nu- modeling techniques, a data-driven and a knowledge-driven
merical data, like GPS tracks and Cell-ID sequences. Fig. 1 one, by using semantic similarity as a fusing component. We
illustrates an example of such a semantic trajectory. The use designed and carried out a user study, in order to investigate
up to what degree users associate locations to more than
one purpose of visit and thus derive an indication about the
semantic ambiguity of locations in a real world scenario.
Furthermore, we used the same collected real data set to
evaluate our approach. We can show that our framework is
able to converge more towards human movement patterns and
can lead to a higher performance compared to other semantic
trajectory based approaches.
This paper is structured as follows. Section II provides
a brief summary of the most relevant semantic-enhanced
location prediction approaches. Next, section III describes in
Fig. 1: Example of an 1-day long Semantic Trajectory. (Image based detail our approach, while in IV we both give insight into our
on: https:// commons.wikimedia.org/ wiki/ File:San Francisco downtown.jpg) user study and the resulted data, as well as discuss thoroughly
our outcomes and performance of our framework. At last, in
of semantics provides a number of advantages. Spaccapietra et section V, we provide a short overview about our work and a
al. highlighted in [3] how important it is to utilize semantics summarization of our major results.
when analyzing trajectories, while at the same time pointing
out the variation of the underlying purpose and semantical II. R ELATED W ORK
meaning of them. The most significant advantage, is the fact Our focus in this paper lies on the semantic-enhanced
that semantic trajectories are capable of capturing the essence location prediction. However, in this section, we will go very
of human movement patterns. This can be particularly helpful briefly through some non-semantic, but well-known work on
for a location predictor in places that have not been visited location prediction as well.
before by the users and for which there are therefore no Ashbrook et al. introduce in [4] a method for learning
GPS recordings available on which the predictor could be semantically significant locations out from the users’ GPS
trained. Beyond that, semantic enhanced location prediction trajectories, like home, work and others, on the top of which
systems gain transparency through the use of semantics. Due a next place predictor is proposed using a simple 1st-degree
to the fact that the data collected and processed by the Markov Chain model. Focusing solely on next place predic-
respective systems are human-understandable, the user has the tion, Gambs et al. [5] use higher degree Markov Chains,
opportunity to better understand how such systems work and whereby the last n places are used to make a prediction about
why certain predictions come into effect. This leads on the the next one. Gao et al. [6] extends the Markov Chains by
one hand to a better human-machine relationship and assists adding temporal context.
Authorized licensed use limited to: Silpakorn University provided by UniNet. Downloaded on February 15,2024 at 03:52:41 UTC from IEEE Xplore. Restrictions apply.
101
CoMoRea'18 - 14th Workshop on Context and Activity Modeling and Recognition
In contrast, Ying et al. [7] use their own Geographic Seman- Context & User Data
support the location prediction. Ridhawi et al. apply a similar Ontology N-dim Markov Chains
algorithm for tracking and predicting users indoors in order to Propagation Training
Authorized licensed use limited to: Silpakorn University provided by UniNet. Downloaded on February 15,2024 at 03:52:41 UTC from IEEE Xplore. Restrictions apply.
102
Context & User LODB
Data
RawDB
Purpose-of-Visit-
Ontology OKB Driven Semantic
SADB Learning/
Similarity Analysis
Propagation
(PoVDSSA)
Reasoner
PoVDSSA - Intention/
Probabilistic Training/ Customization/
based Future Location
Graph/Model Identification Optimization
Graph/Model Prediction
Fig. 3: Purpose-of-Visit-Driven Semantic Similarity Analysis based location prediction (PoVDSSA) framework, whereby rawDB refers to
the raw data Database, SADB refers to the Semantically Annotated Database, LODB to a Linked Open Database, OKB to the Ontology-based
Knowledge Base and PoVDSSA to the Purpose-of-Visit-Driven Semantic Similarity Analysis Component respectively.
Authorized licensed use limited to: Silpakorn University provided by UniNet. Downloaded on February 15,2024 at 03:52:41 UTC from IEEE Xplore. Restrictions apply.
CoMoRea'18 - 14th Workshop on Context and Activity Modeling and Recognition
user, one for each timeslot and day of week. Fig. 5 illustrates an example to clarify this statement. The locations “park”,
a part of the location transition matrix. Each row and column “gym” and “restaurant” would probably land in three different
represent a single semantic location. Fig. 6 and 7 elucidate categories if we tried to cluster them by taking only the
vividly the multi-dimensionality of our predictor. location type into account. In contrast, our approach provides
a more dynamic clustering by considering additionally the
C. Purpose-of-Visit Semantic Similarity Analysis (PoVDSSA) purpose of visit. In this case, “park” and “gym” would be
Humans tend to employ cognitive frames, that is certain considered temporarily similar if the person visits the park for
mental constructs, in order to interpret things, entities and their jogging due to the fact that jogging is a fitness activity and thus
experiences about them [15]. Taxonomization and the build- common to the gym’s overall purpose of visit. Analogously,
ing of groups and relations between them and the included “park” would be found similar to the “restaurant” if the person
entities help clarifying concepts and are therefore of high has a pick-nick at that park.
importance to us. In order to build groups, humans draw on The PoVD similarity analysis takes place each time when
the fundamental notion of Similarity. Two objects are similar, a prediction is to be made, during which is investigated
when they share the same characteristics. While this definition how similar the current location is to each of the other
refers rather to the similarity between two physical objects, locations/location types found in our propagated knowledge
it can analogously be extended to a more general one that base. At each time, we weight the Markov matrix’ transition
expresses a characteristic-based similarity between two objects probability row of the location LmaxSim with the highest
in a knowledge graph or an ontology. This kind of similarity similarity to the current location Lcur (disregarding the current
can then consequently be referred to as semantic similarity. location itself, because it yields the absolut similarity of
Likavec et al. inspired from Tversky’s work [16], defined and 1.0) by multiplying it with the maximum similarity score
investigated such a property-based semantic similarity among SimScore itself. Finally, we use the resulted row to update
ontological objects in [17]. In our work, we adopt Likavec’s the transition probability row that corresponds to the current
method and define a similar equation to cluster semantically location by applying the following formula:
the visited locations of the users based on the purpose of visit
and the corresponding time. So, we treat both the time, and T P (Lcur )i,new =T P (LmaxSim ) × SimScore
(4)
the purpose of being at a location as characteristic features + of f set × T P (Lcur )i,old
of that particular location, which in turn reflects the PoVDF
The updating algorithm is described in detail in Algo. 1. So,
concept mentioned in section III-A. Equation 2 illustrates the
property-based semantic similarity adapted to our use case:
Algorithm 1: Markov transition probability updating process.
CP (l1 , l2 ) Data: Stay at current location l ∈ K, Current Context C (Purpose of Visit,
Sim(l1 , l2 ) = (2) Time, ..), Multi-Markov-Chain model M , Number of all locations
DP (l1 ) + DP (l2 ) + CP (l1 , l2 )
K = [k1 , . . . kn ]
Result: Updated transition probabilities for location l
, whereby l1 and l2 represent two different locations, CP 1 minsim ← 0.1, . . . , 0.9;
refers to the common purposes of visiting the particular loca- 2 probabilities[l→∗] ← M.getP robabilities(l, C);
3 probabilities ← probabilities[l→∗] ;
tions and DP gives the distinctive purposes that are associated 4 siml,∗ [siml,k1 , . . . , siml,kn ] ← getSimilarities(l, C);
only to the one location and do not appear in conjunction with 5 siml,∗ .sortReverse();
6 while siml,∗ .hasN ext() do
the other. However, after looking formula 3 closer, it becomes // location k ∈ K, shows highest similarity score to
apparent that it describes solely the similarity between two l
7 siml,k ← siml,∗ .next();
particular stays l1 and l2 at the locations L1 and L2 and 8 if siml,k ≥ minsim then
not the overall location similarity. In order to calculate the 9 probabilities[k→∗] ← M.getP robabilities(k, C);
10 probabilities ←
overall semantic similarity, we compute the average pairwise updateP robabilities(probabilities[l→∗] , probabilities[k→∗] ,
similarity of all existing stays at L1 and L2 as shown in 11 siml,k );
formula 3. This reflects our definition of a Purpose-of-Visit- 12 break;
13 end
Driven Semantic Similarity (PoVDSS). 14 end
15 return probabilities;
PN PM
i=1 j=1 sim(li , lj )
P oV DSS(L1 , L2 )pair∅ = (3)
M ∗N the updated transition probabilities for the current location
, whereby M and N provide the number of stays at the depend on the one hand of the transition probabilities on the
location L1 and L2 respectively. It must be noted here that most similar location and the corresponding similarity score.
our PoV-driven semantic similarity goes through all different On the other hand, they still depend on the old values, however
location and purpose-of-visit hierarchy levels that are mod- much less now, due to the offset factor. This provides us with
elled in our ontology and calculates a max-min normalized a smooth and adaptable rewarding-penalizing function.
aggregated value between 0 and 1.
By clustering locations in this way we are able to go beyond IV. E VALUATION AND D ISCUSSION
a simple type-specific categorization of locations and cluster A well known issue in research, and especially in the LBS
even locations of different type together. Let us consider community is the lack of appropriate training and evaluation
Authorized licensed use limited to: Silpakorn University provided by UniNet. Downloaded on February 15,2024 at 03:52:41 UTC from IEEE Xplore. Restrictions apply.
104
CoMoRea'18 - 14th Workshop on Context and Activity Modeling and Recognition
Fig. 5: Part of location transition matrix. Fig. 6: Timeslot-specific transition matrices. Fig. 7: Day-specific sets of transition matrices.
datasets. The few good ones that exist, lack either in con- 250
#Aufenthalte
# Semantic (alle
location labels (allNutzer)
users)
ar b
H e
e
G t
O m
W e
op Ce k
Bu Sta a
ng r
U tat l
ta r
aP h
fe
H Ba r
Ca icS r
D ur r
nc ch l
n
ht y
Ci lub
Ci erJ n
Tr yC int
Pi hu n
ce
rD ite
El rdw nP er
In on eS e
st sS re
lE re
Ke ugS nt
bP re
Li ce
D y
er
Re Do te
o
la S al
ke
pi nte
nS te
Ba
to be
st cto
Sh cal or
ig it
ar
as em
nd m
om
r c
rm Pu
ni io
rg tio
zz rc
C io
eS o
Ca
y
du ic to
ba to
ria to
a
a ia l
in
ffi
la
ec a la
la
M
N ers
t o
H As ea
ai en
t
c
m Ho
ie o
br
st
is r
G n
a
r
v
r
pe
as an activity-based energy efficient and phone battery saving
i
Fa
ed
Su
t r
bu
M
Am
tracking algorithm. The users were able to pause, or close
completely the app at any time. Each user was assigned an Fig. 8: Semantic label distribution of locations.
anonymous and random generated identification number (ID)
in order to preserve the users’ privacy. The data were collected 20 100
#Standorte
# Locations 90 #Seman
and stored first locally, on the mobile phone of the user, and #Standorte mit multi.
# Multi-purpose Zweck
locations
80
15
sent encrypted to our server at the end of the study. Three 70
60
Amazon vouchers were raffled among the participants in order 10 50
to increase their motivation. During the study we asked the 40
30
users to: 5
20
10
4
User1 User2 User3 User4 User5 User6
r
r
ze
ze
ze
ze
ze
ze
ze
ze
r
ze
ut
ut
ut
ut
ut
ut
ut
ut
• Define whether the particular location had been visited
ut
N
N
for another purpose up to that point Fig. 9: Number of multipurpose locations proportional to the total N
• Rate how important the particular location is to them, and number of locations per user.
• Provide additional descriptive information, like “sitting in
a coffee shop with a friend after work”, etc.
After analyzing our data, we came all in all to 431 differ- semantic trajectory-based approach of Ying [7], which we used
ent purpose-of-visit entries with respect to the following 9 here as reference. Fig. 10 shows how our PoVDSSA-based
high level location types: “home”, “transportation”, “night approach performs in comparison to Ying’s approach among
life”, “shopping”, “services”, “food”, “free time”, “educa- various k values with respect to accuracy, precision, recall
tion/university” and “work”. Fig. 8 shows the distribution of and f-score. We can see that our approach outperforms Ying’s
all annotated locations during our user study among all users. framework almost every time in all four metrics. It provides
Fig. 9 shows the proportion between the total number of the a clearly higher accuracy performance and recall value than
various (unique) semantic locations per user to the number Ying’s. This means that our system is not only more accurate,
of the locations at which more than one purpose of visit was but its estimations scatter less, especially when the k is high.
filled out. This can be attributed to the fact that our approach is able
We used k-fold cross validation as our evaluation method to replace and fill out missing current location transitions
and tested the following k-parameter values: [5, 10, 15, 20]. through existing transitions coming from the corresponding
We compared our approach to a multi-dimensional semantic similar location.
1st order Markov Chain Model, as well as to the well known Furthermore, on the hand, we can see that our purpose-of-
visit based approach shows an overall better performance than
4 http://www.awareframework.com Markov in most of the cases. Fig. 11 gives the performance of
Authorized licensed use limited to: Silpakorn University provided by UniNet. Downloaded on February 15,2024 at 03:52:41 UTC from IEEE Xplore. Restrictions apply.
105
CoMoRea'18 - 14th Workshop on Context and Activity Modeling and Recognition
1
YingYing
1
Ying Ying
study. We evaluated our framework in contrast to Ying’s well
P
r
0.8
semMarkov
PoVDSSA
0.8
semMarkov
PoVDSSA
known approach and to a conventional Markov Chain model.
R
e
We show that our approach outperforms both Ying’s and
Trefferquote
e
Präzision
c 0.6 0.6
c
i
s
0.4
a
l 0.4
the Markov-based approach. However, some drawbacks, due
i
o
0.2
l
0.2
mostly to the lack of a bigger dataset, can also be identified.
n
In the future we plan to investigate further semantic similar-
0 0
5 10 15 20 5 10 15 20 ity algorithms and their integration in the overall prediction
k k
1 1
model, while investigating other machine learning models at
YingYing Ying Ying
A 0.8
semMarkov
PoVDSSA
0.8
semMarkov
PoVDSSA the same time.
F
Treffergenauigkeit
c
-
c
0.6 S 0.6 R EFERENCES
F-Maß
u
c
r
o
a 0.4 0.4
c
r [1] eMarketer. (2015) Key trends in mobile advertis-
e
y
0.2 0.2 ing. [Online]. Available: https://www.statista.com/statistics/436071/
location-based-service-users-usa/
0 0
5 10 15 20 5 10 15 20 [2] R. Nathan, W. M. Getz, E. Revilla, M. Holyoak, R. Kadmon, D. Saltz,
k k and P. E. Smouse, “A movement ecology paradigm for unifying or-
ganismal movement research,” Proceedings of the National Academy of
Fig. 10: Comparison of our approach (PoVSSA) to Ying’s approach Sciences, vol. 105, no. 49, pp. 19 052–19 059, 2008.
with regard to accuracy, precision, recall and f-score. [3] S. Spaccapietra, C. Parent, M. L. Damiani, J. A. de Macedo, F. Porto,
and C. Vangenot, “A conceptual view on trajectories,” Data Knowl. Eng.,
Mittelwert
Mean value vol. 65, no. 1, pp. 126–146, Apr. 2008.
1 [4] D. Ashbrook and T. Starner, “Learning significant locations and pre-
Markov Markov dicting user movement with gps,” in Wearable Computers, 2002.(ISWC
PoVDSSA 1dim semMarkov
0.8 2002). Proceedings. Sixth International Symposium on. IEEE, 2002,
semMultiMarkov-(Aktivität)
PoVDSSA Multi-dim - Activity
semMultiMarkov-(Aktivität,Tageszeit)
Trefferquote
Authorized licensed use limited to: Silpakorn University provided by UniNet. Downloaded on February 15,2024 at 03:52:41 UTC from IEEE Xplore. Restrictions apply.
106