Event Detection
Event Detection
net/publication/317888462
CITATIONS READS
12 772
3 authors, including:
All content following this page was uploaded by Keven Richly on 09 August 2017.
this tasks [7, 8, 17]. In the world of sports, analytics and sta- (-52.5|34.0) (52.5|34.0)
Table 1: Tagged events for gold standard A linear movement results in no significant change of the
direction feature, whereas rapid movement tends to have
Set A25/10 Set B25 Set C10 Total a notable change of direction. We computed the change of
Pass 49 36 50 135
direction as visualized in Figure 2.
Reception 17 17 12 46
Clearance 0 5 1 6
Shot on Target 2 3 2 7 x P3
d2
Total Events 68 61 65 194 P2
Played Time 3:08 min 6:42 min 3:20 min 13:10 min
Excluded Time 0:58 min 1:49 min 1:36 min 4:23 min
Total Time 2:10 min 4:53 min 1:44 min 8:47 min d1 dc1
4 FEATURE COMPUTATION P1
Multiple features of the tracked objects characterize specific y
events in soccer matches. These objects move on the soccer
pitch and influence each other mutually. Events occur when Figure 2: Direction change of object
one or multiple features show a specific characteristic. In
this section, we present the definition of the implemented
Given the three position data points 𝑃0 = 𝑝(𝑜, 𝑡0 ), 𝑃1 =
features. All features are computed based on the positional
𝑝(𝑜, 𝑡1 ) and 𝑃2 = 𝑝(𝑜, 𝑡2 ), the first direction vectors are
data described in the previous section. The positional data
defined as 𝑑0 = 𝑑(𝑜, 𝑡0 ) and 𝑑1 = 𝑑(𝑜, 𝑡1 ). The angle created
is received per tracked object in a 2-by-𝑛 matrix where 𝑛 is
by 𝑑0 and 𝑑1 is the change of direction 𝑑𝑐1 . Possible values
the number of collected data points in a specific time period.
for direction changes are in the range from 0 to 180. To
Each column vector represents the position of the object 𝑜
determine the direction change value, the 𝑎𝑟𝑐𝑐𝑜𝑠 function is
at time 𝑡.
applied to the quotient of the scalar product of 𝑑0 and 𝑑1 and
(︂
𝑥𝑜,𝑡1 𝑥𝑜,𝑡2 · · · 𝑥𝑜,𝑡𝑛
)︂ the product of length of 𝑑0 and 𝑑1 . The direction change 𝑑𝑐
𝑃 𝑜𝑠𝑜,𝑛 = (1) of object 𝑜 at time 𝑡𝑛+1 is defined in the following way:
𝑦𝑜,𝑡1 𝑦𝑜,𝑡2 · · · 𝑦𝑜,𝑡𝑛
We can derive the following definitions from the received
(︂ )︂
𝑑(𝑜, 𝑡𝑛 ) · 𝑑(𝑜, 𝑡𝑛+1 )
positional data. The position of object 𝑜 at time 𝑡 is defined as 𝑑𝑐(𝑜, 𝑡𝑛+1 ) = arccos (6)
|𝑑(𝑜, 𝑡𝑛 )| · |𝑑(𝑜, 𝑡𝑛+1 )|
𝑝(𝑜, 𝑡). Whereas the horizontal position of object 𝑜 at time 𝑡
is 𝑝𝑥 (𝑜, 𝑡) and the vertical position of object 𝑜 at time 𝑡 is 5 EVENT DETECTION
𝑝𝑦 (𝑜, 𝑡). Based on the spatio-temporal data, we calculated In following section, we present our approach to recognize
the time-dependent movement features velocity, acceleration, events based on the features already introduced in the previ-
and change of direction. In this context, we concentrated ous section. The most central object of a soccer match is the
primarily on features of the ball, because it represents the ball. The ball is the object that shows the most and highly
main interaction point in the game. rapid movements on the pitch. Therefore, we computed all
To determine the velocity of two consecutive positions features based on the spatio-temporal data of the ball and
𝑝(𝑜, 𝑡1 ) and 𝑝(𝑜, 𝑡2 ) with 𝑡2 = 𝑡1 + 1, we initially compute created a vector for every time 𝑡 containing all corresponding
the Euclidian distance of these points. Based on the distance, feature values.
we can compute the average velocity or rate of change of Velocity and acceleration describe the current momentum.
position over time as defined in Equation 3. Acceleration peaks are a indicator for interactions with the
ball. The direction change feature covers ball interactions
𝑑𝑖𝑠𝑡(𝑜, 𝑡1 ) = with high intensity (e.g. passes) as well as ball interactions
√︁
with little intensity (e.g. ball touches during dribbling). Each
(𝑝𝑥 (𝑜, 𝑡2 ) − 𝑝𝑥 (𝑜, 𝑡1 ))2 + (𝑝𝑦 (𝑜, 𝑡2 ) − 𝑝𝑦 (𝑜, 𝑡1 ))2 (2)
vector describes an instant of the soccer match and consecu-
𝑤𝑖𝑡ℎ 𝑡2 = 𝑡1 + 1 tive vectors can represent a certain event. Depending on the
∆𝑑𝑖𝑠𝑡(𝑜, 𝑡) type of the event, features become more or less important and
𝑣(𝑜, 𝑡) = (3) have characteristic values. To determine the specific events,
∆𝑡
Accordingly, Equation 4 determines the acceleration as the we trained and used an artificial neural network.
rate of change of the velocity over time. Neural networks are biologically inspired models, that can
model complex non-linear functions [15]. They consist of
∆𝑣(𝑜, 𝑡) several connected layers of artificial neurons as shown in
𝑎(𝑜, 𝑡) = (4)
∆𝑡 Figure 3. A neuron is a single computational function that
While objects move on the soccer pitch they will eventually maps several 𝑥𝑖 of an input vector ⃗ 𝑥 (and a bias term b)
change their direction 𝑑. to a single output value 𝑎 called activation. It computes a
weighted linear combination 𝑧 of the inputs by using different
𝑑(𝑜, 𝑡1 ) = 𝑝(𝑜, 𝑡2 ) − 𝑝(𝑜, 𝑡1 ) 𝑤𝑖𝑡ℎ 𝑡2 = 𝑡1 + 1 (5) weights 𝑤𝑖 for each input and transforms it using a non-linear
MiLeTS’17, August 2017, Halifax, Nova Scotia Canada
In general, there are several parameters in a neural network, Table 2: Optimal parameter configurations for the
which have an effect on the learning outcome. One of these different temporal resolutions
is the architecture of the network. The number of neurons
in the hidden layer is not specified and can be adjusted to Parameter 25 Hz 10 Hz
the given data set. Neural Networks with a higher number of Number of Hidden Units 50 20
neurons have the ability the represent the data characteristics Learning Rate 0.01 0.01
more precise, but they also have the risk of over fitting the Dropout 0.05 0.01
training data [15]. Therefore we tried to find a number of
hidden neurons that produces the highest general accuracy.
and the true event we can compute the precision and recall
Another factor that has to be taken into account is the
scores to quantify the detection quality.
learning rate. It controls the rate at which the weights are
updated on the basis of new information in the learning pro-
𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
cess. Low values result in a network that adopts very slowly. 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (11)
However, if these values are too high, the learning process 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝑓 𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
may not converge [15]. To avoid over fitting we implemented 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝑟𝑒𝑐𝑎𝑙𝑙 = (12)
a technique called dropout. Hereby, a random number of 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝑓 𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
activations is set to zero for each training instance. This We also computed the 𝐹1 -score, which is the harmonic
mechanism helps to prevent co-adaption of neurons on the mean of precision and recall, and use it as our main evaluation
training data [18]. A too high dropout rate complicates the metric to compare different network settings.
effective learning of the network.
To find an optimal model we used a grid-search approach 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 · 𝑟𝑒𝑐𝑎𝑙𝑙
𝐹1 = 2 · (13)
to test multiple parameter configurations. Here, we list the 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
parameters and the specified values we used:
6.2 Model Optimization
∙ Number of Hidden Units Since we have data sets with different temporal resolutions,
We used different values for 10 Hz and 25 Hz to account we conducted the parameter search for each data set sepa-
for their different sizes of input to the network. rately. For the grid-search approach we selected the labeled
Values (25 Hz): 1 to 50 data of the first minutes of the game Berlin vs. Mainz (𝐴25 ,
Values (10 Hz): 1 to 20 𝐴10 ). Based on the data sets, we generated the training and
∙ Learning Rate testing instances. We tested each configuration by using a
Values: 0.1, 0.05, 0.01, 0.005, 0.001 five-fold cross validation with a 60/40 split for training and
∙ Dropout testing instances. To compare the accuracy of the different
Values: 0, 0.01, 0.05, 0.1, 0.2 configuration the 𝐹1 -score was used. Table 2 shows the con-
These parameters are augmented by the fixed parameter figurations that achieved the highest scores for the given data
for the window size as described in the previous section. The set.
grid-search implementation uses parallel processing to test
different configurations in parallel and speed up the process. 6.3 Model Comparison
The results of the grid-search for the presented parameters By using the configurations presented in the previous section,
are evaluated in following section. we analyzed in more detail the accuracy of the different
models. To compare the performance of the 10 Hz and 25 Hz
6 EVALUATION model, we tested each one using a 100-fold cross validation.
In this section, we present the evaluation results of our ap- Analogous to the configuration computation, we used a 60/40
proach. Based on the presented data (see Section 3), we split between training and testing instances. Afterwards,
optimized the configuration of your neural network and com- we calculated an overall precision, recall and 𝐹1 -score for
pared the accuracy of the different settings. For the evaluation, each iteration and averaged them over all iterations. The
we focused on pass events, which occur most frequently in results are shown in Table 3. The evaluation shows that the
the gold standard. A pass event consists of two consecutive model trained and tested on the 10 Hz data performs much
actions – kick and reception. better than the 25 Hz model with an averaged 𝐹1 -score of
0.89, and averaged precision and recall scores of 0.89 and
0.90 respectively. The 25 Hz model only achieves averaged
6.1 Preliminaries precision, recall and 𝐹1 -scores of 0.52, 0.52 and 0.49.
To evaluate the quality of our event detection, we used the In the next step, we compared the performance of the two
gold standard to generate test and training instances. These models per class. Therefore, we calculated the precision, recall
instances are labeled windows over the feature data at specific and 𝐹1 -scores for each class separately over all iterations. The
time points. The system is trained on the training instances results are shown in Table 4. As expected, we observed that
and then presented with the unknown test instances, which the 10 Hz have a higher accuracy for both classes compared
it has to label. Based on the assigned label by the system to the 25 Hz model. For kick events, both models achieved
MiLeTS’17, August 2017, Halifax, Nova Scotia Canada
Table 3: Precision, recall and 𝐹1 -score, averaged over this evaluation, we focused on the 10 Hz model, because it
all classes produced the best results in the previous experiments. The
motivation for this evaluation is that the characteristics (e.g.
Data set Precision Recall 𝐹1 -Score playing speed, team tactics) of a game could vary between
25 Hz 0.52 0.52 0.49 different matches and teams.
10 Hz 0.89 0.90 0.89 First, we trained and tested the model on the merged data
of two different matches (𝐴10 and 𝐶10 ) to set a baseline. We
Table 4: Precision, recall and 𝐹1 -score per class merged the data sets of the matches of Berlin vs. Mainz and
Berlin vs. Braunschweig. Afterwards, we extracted training
Data set Event Precision Recall 𝐹1 -Score and testing instances based on a 60/40 split. The presented
25 Hz
Kick 0.73 0.91 0.81 results for the merged matches strategy are averaged over
Reception 0.32 0.14 0.18 100 iterations with a random selection of training and testing
Kick 0.95 0.92 0.93
10 Hz
Reception 0.82 0.89 0.85
values.
The second evaluated strategy is the across matches strat-
egy. In this case, the training instances for the model were
Table 5: Comparison of overall results for different
randomly selected from data set 𝐴10 and afterwards tested on
training and testing strategies for two matches.
instances of the data set 𝐶10 . The results of the two strategies
are listed in Table 5 together with the single match results
Strategy Precision Recall 𝐹1 -Score
Merged Matches 0.81 0.75 0.75
of the previous section.
Across Matches 0.65 0.73 0.66 In general, we observed that the scores for the merged
Single Match 0.89 0.90 0.89 matches strategy have a higher accuracy compared to the
results of the across matches strategy. However, both have a
lower performance in comparison to the single match strategy,
a similar recall value of 0.91 or 0.92 respectively. However, where only one single match was used for training and testing.
the 10 Hz model had a better precision score of 0.95 than This suggests that we have to consider differences in playing
the 25 Hz model, which had a precision score of 0.73. That style between different matches.
results in a 𝐹1 -score of 0.93 for the 10 Hz model, and one of When we drill down and evaluate the performance per
0.81 for the 25 Hz model. Both models detect most of the class, the results show that the scores for kick events are
true kick events in the data. However, the lower precision of generally higher than those for reception events in all evalu-
the 25 Hz model implies that this model is more likely to ated strategies. For the kick events both strategies produced
detect a false kick event. Next to that, the comparison for results comparable to these of the model, which was eval-
the reception event showed a more diverging picture. The uated on a single match. One exception to that was the
10 Hz model achieved precision, recall and 𝐹1 scores of 0.82, recall of the across matches strategy, which was slightly lower
0.89 and 0.85. The 25 Hz model however performed not as with a value of 0.72 compared to 0.93 and 0.92 for the other
well with only 0.32 for precision and 0.14 for recall, with an strategies. The implication of this is that applying the across
averaged 𝐹1 -score of 0.18. matches strategy will not be able to detect as many of the
This great difference in performance could be due to the real kick-events. In comparison to the scores of the kick class,
fixed size of the gold standard and the fact that the 25 Hz the scores of the reception class are much lower. While for
model is more complex to train due to its larger structure. the merged matches strategy the receptions have a precision
In this experiment we used the data of match 𝐴10 and 𝐴25 , of 0.75 and recall of 0.56, for the across matches strategy
for which the gold standard holds 85 labeled kick events but they have a precision of 0.39 and a recall of 0.75.
only 34 reception events. One reason for the performance To conclude, the results showed that neural networks
differences could be the unequal distribution of kick and present a viable model to detect events in soccer data. Our
reception events. experiments showed that keeping the complexity of the model
To summarize the previous experiment we can state that low in combination with smoothed data helps to achieve bet-
a model trained and tested on 10 Hz data achieved a higher ter results. The best results were achieved if the model was
accuracy compared to one trained and tested on the 25 Hz trained and tested on data of a single match or mixed matches.
data, using the given gold standard. This could be due to For that reason, we recommend to use merged data of differ-
the fact that the 10 Hz data has been smoothed and has ent matches as training instances to classify completely new
therefore fewer outliers. matches.
Table 6: Comparison of results by event for different in neural information processing systems (1996), 757–763.
training/testing strategies for two matches [2] Alina Bialkowski, Patrick Lucey, Peter Carr, Yisong Yue, Sridha
Sridharan, and Iain Matthews. 2014. Large-scale analysis of
soccer matches using spatiotemporal tracking data. In 2014 IEEE
Strategy Event Precision Recall 𝐹1 -Score International Conference on Data Mining. IEEE, 725–730.
Kick 0.86 0.93 0.90 [3] Christopher M Bishop. 2006. Pattern recognition. Machine
Merged Matches
Reception 0.75 0.56 0.61 Learning 128 (2006), 1–58.
Kick 0.92 0.72 0.81
Across Matches [4] Christopher Carling, A Mark Williams, and Thomas Reilly. 2005.
Reception 0.39 0.75 0.51
Handbook of soccer match analysis: A systematic approach to
Kick 0.95 0.92 0.93 improving performance. Psychology Press.
Single Match
Reception 0.82 0.89 0.85 [5] Alexandre de Brébisson and Pascal Vincent. 2015. An Exploration
of Softmax Alternatives Belonging to the Spherical Loss Family.
arXiv preprint arXiv:1511.05042 (2015).
cannot capture ball movements on the z-axis. This fact leads [6] Peter Dizikes. 2013. Sports analytics: a real game-changer. Mas-
sachusetts Institute of Technology, MIT News Mar 4 (2013).
to small inaccuracies in the computed features. For that [7] Wenchao Jiang and Zhaozheng Yin. 2015. Human Activity Recog-
reason, the incorporation of the z values could improve the nition using Wearable Sensors by Deep Convolutional Neural
accuracy of the features and consequently lead to an improve- Networks. In Proceedings of the 23rd ACM international confer-
ence on Multimedia. ACM, 1307–1310.
ment of the results. As our experiments have shown, there is [8] Payal Khanwani, Susmita Sridhar, and Mrs K Vijaylakshmi. 2010.
a difference in detection quality between the models working Automated Event Detection of Epileptic Spikes using Neural
with the smoothed 10 Hz and 25 Hz data. We have to further Networks. International Journal of Computer Applications 2, 4
(2010).
analyze, if this is due to the fact that the gold standard [9] Ho-Chul Kim, Oje Kwon, and Ki-Joune Li. 2011. Spatial and
includes only manageable number of events or if the data spatiotemporal analysis of soccer. In Proceedings of the 19th
ACM SIGSPATIAL international conference on advances in
smoothing supports the learning capabilities. Accordingly, geographic information systems. ACM, 385–388.
the effects of different smoothing function on the accuracy of [10] Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews,
the results could be evaluated. Jessica Hodgins, and Irfan Essa. 2010. Motion fields to predict
play evolution in dynamic sport scenes. In Computer Vision and
Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE,
8 CONCLUSION 840–847.
[11] Wen-Nung Lie, Ting-Chih Lin, and Sheng-Hsiung Hsia. 2004.
In this paper we presented a system that is able to de- Motion-based event detection and semantic classification for base-
tect events from spatio-temporal soccer data. Using two- ball sport videos. In Multimedia and Expo, 2004. ICME’04. 2004
dimensional positional data, we computed velocity, accelera- IEEE International Conference on, Vol. 3. IEEE, 1567–1570.
[12] Patrick Lucey, Dean Oliver, Peter Carr, Joe Roth, and Iain
tion and change of angle features to capture time-dependent Matthews. 2013. Assessing team strategy using spatiotemporal
movement information from the data. On these features, we data. In Proceedings of the 19th ACM SIGKDD international
conference on Knowledge discovery and data mining. ACM,
then trained a neural network to detect kick and reception 1366–1374.
events and optimize its parameters through a grid-search [13] Rob Mackenzie and Chris Cushion. 2013. Performance analysis
approach. We evaluated and compared the event detection in football: A critical review and implications for future research.
Journal of sports sciences 31, 6 (2013), 639–676.
performance on raw 25 Hz data and smoothed 10 Hz data. [14] Andrew Miller, Luke Bornn, Ryan Adams, and Kirk Goldsberry.
Our experiments showed, that the neural network trained and 2014. Factorized Point Process Intensities: A Spatial Analysis of
tested on 10 Hz data achieved an 𝐹1 -score of 0.89 whereas a Professional Basketball.. In ICML. 235–243.
[15] Thomas M Mitchell. 1997. Machine learning. New York (1997).
network for 25 Hz data achieved only a score of 0.49. Both [16] Keven Richly, Max Bothe, Tobias Rohloff, and Christian Schwarz.
models achieved high scores for kick events, however the 10 2016. Recognizing Compound Events in Spatio-Temporal Football
Data. In International Conference on Internet of Things and
Hz model performed substantially better on reception events Big Data (IoTBD).
with a 𝐹1 -score of 0.85, compared to 0.18 for the 25 Hz [17] Sami Saalasti. 2003. Neural networks for heart rate time series
model. The evaluation of the precision, recall, and 𝐹1 -score analysis. University of Jyväskylä.
[18] Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya
showed that neural networks are a viable model to detect Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a sim-
events in spatio-temporal soccer data. Further experiments ple way to prevent neural networks from overfitting. Journal of
showed that training and testing on different matches have Machine Learning Research 15, 1 (2014), 1929–1958.
[19] Chinh T Vu, Raheem A Beyah, and Yingshu Li. 2007. Compos-
a significant effect on the accuracy of the results. This indi- ite event detection in wireless sensor networks. In 2007 IEEE
cates that different matches and teams have different game International Performance, Computing, and Communications
Conference. IEEE, 264–271.
characteristics, which influence the detection performance. [20] Kasun Wickramaratna, Min Chen, Shu-Ching Chen, and Mei-Ling
To minimize those effects, the training data should consist of Shyu. 2005. Neural network based framework for goal event detec-
data from different matches. tion in soccer videos. In Seventh IEEE International Symposium
on Multimedia (ISM’05). IEEE, 8–pp.
[21] Zengyuan Yue, Holger Broich, Florian Seifriz, and Joachim Mester.
REFERENCES 2008. Mathematical analysis of a soccer game. Part I: Individual
[1] Shun-ichi Amari, Andrzej Cichocki, Howard Hua Yang, et al. 1996. and collective behaviors. Studies in applied mathematics 121, 3
A new learning algorithm for blind signal separation. Advances (2008), 223–243.