0% found this document useful (0 votes)
12 views17 pages

Local Event Retrieval

This document discusses an approach for local event retrieval using social media as a sensor to detect real-world events in real-time. It presents a framework that ranks tuples of location and time based on how likely they represent the starting time and location of a relevant event for a given query. The framework defines two components - one based on topic relevance between tweets from a location and time to the query, and another based on changes in tweeting rate that may indicate an event. Tweets are aggregated using CombSUM voting to estimate topic relevance scores, while tweeting rate changes are quantified to estimate event likelihood scores, which are combined in a linear ranking function.

Uploaded by

jeyalakshmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views17 pages

Local Event Retrieval

This document discusses an approach for local event retrieval using social media as a sensor to detect real-world events in real-time. It presents a framework that ranks tuples of location and time based on how likely they represent the starting time and location of a relevant event for a given query. The framework defines two components - one based on topic relevance between tweets from a location and time to the query, and another based on changes in tweeting rate that may indicate an event. Tweets are aggregated using CombSUM voting to estimate topic relevance scores, while tweeting rate changes are quantified to estimate event likelihood scores, which are combined in a linear ranking function.

Uploaded by

jeyalakshmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Local Event Retrieval

Introduction
• It has been suggested that a large proportion of queries submitted
to web search engines has a “local intent” and that these queries
compose the majority of searches submitted from mobile phones
• Examples of information needs expressed by such queries include
“what is happening near me?” or “finding restaurants in the Covent
Garden district”.
• The prevalence of such queries highlights the importance of
building effective local search tools that serve this type of
information need.
• In this section, an approach for local event retrieval is presented
where social media is considered as a social sensor to detect events
in real-time.
Social Sensors for Local Event Retrieval
• Our motivation stems from the fact that the communities of
users in Twitter often share messages about local events as they
progress
• The plot shows how local events are reflected in social media-
the volume of tweets that are posted within London and contain
the phrase “beach boys” over a period of 12 days, where “beach
boys” is the name of a rock band who held a concert in London’s
Royal Albert Hall during the considered time period.
• We observe that just before and during the concert, tweets
mentioning the “beach boys” within London have spiked.
• This is an indication that the concert as a real-world event has
been reflected in the tweeting activities within the city.
A plot of the volume of tweets in London that contain
the phrase “beach boys”
over time.
Attempts to harness Social media for IR
• This includes (i) identifying social media content relevant to known events
• (ii) detecting unknown events using user-generated content in social media .
• In the first case, social media content is identified to provide users with
more information about a planned event (e.g. a festival or a football match).
• Users would be able, for example, to access tweets about ticket prices before
the event, or Flickr photos posted by attendees after the event.
• The second case is more challenging as there is no prior knowledge about
the events.
• While some approaches have focused on detecting news-related
• events or simply clustering social media content based on a database of
targeted events, a recent work has devised methods for retrieving global
events from Twitter archives that correspond to an arbitrary query (event
type); a problem which the authors called “structured event retrieval” over
Twitter
• Unlike, which focused on non-local events, we make use of the
opportunities that social media can bring to local search services.
• In particular, we define a new localized IR task that extends the
aforementioned structured event retrieval task .
• The task we propose aims at identifying and ranking local events
based on social media activities in the area where the events
occur.
• In other words, we use social media as a social sensor to detect
local events in real-time.
• The work presented here advances the state-of-the-art in
detecting and locating unknown events in social media and
proposes a new IR task of local event retrieval
Problem Formulation
• Our overall goal is to identify and rank local events
happening in the real-world as a response to a user query.
• For a formal definition of a local event, we adopt a definition
that has been previously used in the new event detection
broadcast news task of the TDT (Topic Detection and
Tracking) evaluation forum.
• This definition states that an event is something that occurs
in a certain place at a certain time. Formally, we consider a
set of locations L = {l1, l2, . . .} that are of interest to the user.
• The granularity of locations can vary from buildings and
streets to entire cities.
Contd..
• For example, we might consider each location to represent an area
in a city in which the user is located.
• The city in this case is considered to be divided into equally sized
areas specified by polygons of geographical coordinates, or we can
use the divisions defined by the local authority such as postcodes or
boroughs.
• Each location li at a certain time tj is denoted by the tuple <li, tj>.
• We define the problem of local event retrieval as follows.
• For a user interested in local events within locations L (explicitly
defined or implicitly inferred from the current user’s location), the
event retrieval framework aims to score tuples <li, tj> according to
how likely tj represents a starting time of an event within the
location li that matches the user query.
Contd..
• An event is considered relevant if it matches the explicit
query of the user and/or the implicit context of the user (the
time of the query, the location of the user and or her profile).
• In other words, the event retrieval framework defines a
ranking function that gives a score R(q, li, tj) for each tuple
<li, tj> with regards to the user’s query q.
• Examples of events to retrieve include festivals, football
matches or security incidents.
• When expressed explicitly by a user, a query is assumed to be
in the form of a bag of words (e.g. “live music”,
“conference”).
• When using Twitter as a social sensor, a location li at a certain time tj is
characterised by the tweeting activities observed at that location within a
given timeframe (tj − tj−1).
• The tweeting activities are represented with a set of tweets originating from
that location shared publicly within the given timeframe (tj −tj−1).
• This set of tweets is denoted by Ti,j . Note that the fixed timeframe is defined
using an arbitrary sampling rate θ; ∀j : tj − tj−1 = θ.
• An event happening in the real-world is represented by a tuple <l, ts, tf >;
where l is the location where the event is taking place, ts is the starting time
and tf is the finishing time.
• Our aim is to use the tweeting activities as the main source of evidence to
define the ranking function R(q, <li, tj>).
• More specifically and to define the ranking function, we use the set of tweets
Ti,j , and a time series of tweets Ti,j =< . ., Ti,j−2, Ti,j−1, Ti,j >in the location
li before the current time tj .
• This allows us to identify sudden changes in the tweeting activities,which
may have been triggered by an occurrence of an event.
• Moreover, the event retrieval framework can identify a subset of the tweet set
Ti,j that matches the query, which may help the user in the event information
seeking process.
A Framework for Event Retrieval
• The framework aims to define an effective ranking function that scores
tuples
• of time and location according to how likely they represent the starting time
• and the location of a relevant event for a given query. Note that with
regards
• to the previous definition of the local event retrieval problem in Section
3.3.2,
• as a first step, we are not aiming to determine the finishing time of an
event.
• As discussed in Section 3.3.2, here we aim to use tweets as the main source
• of evidence to score the tuples. In particular, we define two components
built
• on this evidence:
• 1. The first component is based on the intuition that social media may reflect
• real-world events, hence when an event occurs somewhere we expect to
• find topically related social posts about it originating from the location
• where it occurs. To instantiate this component, for each location at a
• given time, i.e. for each tuple li, tj, we measure how much the tweets
• Ti,j corresponding to the tuple are topically related to the query q.
• 2. The second component is based on the intuition that events trigger
• an increasing tweeting activity [66] causing peaks of tweeting rates
• during the event (bursts). For this component, we aim to quantify the
• change in the tweeting rate, the volume of tweets over time, observed
• at li, tj when compared to previous observations over time at the
• same location. In other words, we aim to measure the unusual tweeting
• behaviour that may indicate an occurrence of an event. To compute the
• tweeting rate, we can either consider all the tweets posted within the
• given timeframe at the given location or only a subset of those which
• are relevant to the user query, e.g. tweets which contain terms of the
• query.
• Following this, the ranking function can be defined as a linear combination
of
• the previous two components as follows:
• R(q, li, tj) ∝ (1 − λ) ・ S(q, Ti,j) + λ ・ E(q, li, tj) (3.1)
• where S(q, Ti,j ) is the score of the tweet set Ti,j that quantifies how much
• they are topically related to the query q;E(q, li, tj) is a score proportionate
• to the change in the tweeting rate with regards to the query q at the given
• time tj within the location li, and 0 ≤ λ ≤ 1 is a parameter to control
• the contribution for each component in the linear combination in
• Equation (3.1). Next, we show how we approach the problem of
quantifying
• each component.
Aggregating Tweets
• To estimate S(q, Ti,j) in Equation (3.1), we propose to borrow ideas and
• techniques originally designed for the IR problem of expert search. In expert
• search, a profile of an expert candidate is typically represented by the
• documents associated to the candidate [8, 41]. Similarly, the tuple li, tj
• is associated with a set of tweets. Inspired by [41], the score of each tuple
• (candidate) can be estimated by aggregating the retrieval scores (votes) for
• each tweet (document) associated to it. In [41], several voting techniques were
• used to aggregate the scores. We use the intuitive, yet effective, CombSUM
• voting technique, which estimates the final score of the tweet set representing
• a tuple (candidate) as follows:
• S(q, Ti,j) =
• t∈Rel(q)∩Ti,j
• (Score(q, t)) (3.2)
• where Rel(q) is the subset of tweets that match the query q and Score(q, t)
• is the individual retrieval score obtained by a traditional bag-of-words ranking
• function, e.g. BM25 [53]. Higher scores represent more topically related
• tweets for the considered tuple.
Change Point Analysis
• The problem of quantifying the score E(q, li, tj) in
Equation (1) maps well to change point analysis
• Change point analysis aims at identifying points in
time series data where the statistical properties
change.
• It has been previously applied to detect events in
continuous streams of data.
• For example, Guralnik et al. developed change point
detection techniques that can accurately detect
events in traffic sensor data.
Contd..
• In our case, the change point analysis can be applied on the tweeting rate in a location
li to quantify the probability that the tweeting rate at a certain time tj represents a
change point when compared retrospectively to previous points in time tj−1, tj−2, . .,
tj−k.
• We apply the Grubb’s test as a change point detection technique as it is
computationally inexpensive and it has been successfully applied in a similar context,
namely first story detection from Twitter and Wikipedia .
• Given a location li and at each point of time, e.g. on minute intervals, we maintain a
moving window of size k points, e.g. k minutes, over the previous observations.
• We apply the Grubb’s test to each moving window to determine if the tweeting rate of
the last point is an outlier that stands out with respect to the tweeting rates of previous
observations.
• With Grubb’s test, rj is an outlier if v = (rj − xj,k)/σ2 > z,
• where xj,k is the mean tweeting rate in the window (tj−k, tj ),
• σ is the standard deviation of the tweeting rates in the window (tj−k, tj ), and
• z is a fixed threshold.
• Note that this test gives a binary decision for each point in time.
• We smooth this binary decision into a normalised score and use it for the second
component of Equation (1) as follows:
• E(q,< li, tj>) = Ec(tj) = 1 − e((−ln 2)/z ・ v) (3)
• where 0 ≤ Ec(tj) ≤ 1 represents a score of a change point using the Grubb’s test.
• Note that when v = z, the resulting score in Equation (3) is equal to 0.5.
• The tweeting rate rj can be estimated in two different ways:
• (i) By simply using the volume of tweets posted in the given
• location within the timeframe corresponding to tj , i.e. rj = |Ti,j |.We call this
• a query independent (QI) tweeting rate; and
• (ii) By using the score of the voting technique described above, i.e. rj = S(q, Ti,j).
• We call this a query dependent (QD) tweeting rate.
• It should be noted that this framework can operate in a real-time fashion
• on top of the SMART architecture where social feeds are incrementally indexed such
that the retrieval components are able to provide the freshest results.

You might also like