Soccer Object Motion Recognition
Soccer Object Motion Recognition
15439/2018F48
Computer Science and Information Systems pp. 129–134 ISSN 2300-5963 ACSIS, Vol. 17
Abstract—Due to the development of video understanding and home team performance and analyze the strengths and weak-
big data analysis research field using deep learning technique, nesses of the away teams to win the 2014 World Cup [5].
intelligent machines have replaced the tasks that people per- In addition to SAP, many international companies such as
formed in the past in various fields such as traffic, surveillance,
and security area. In the sports field, especially in soccer Chyronhego, OPTA, Deltatre, GPSports, and StatSport have
games, it is also attempting quantitative analysis of players and technologies and services to perform quantitative analyzes on
games through deep learning or big data analysis technique. soccer matches and players.
However, because of the nature of soccer analysis, it is still In general, quantitative analysis of soccer game is consist of
difficult to make sophisticated automatic analysis due to technical three steps: multi-object tracking, event analysis, and tactical
limitations. In this paper, we propose a deep learning based
motion recognition technique which is the basis of high level analysis. Multi-object tracking can be automatically performed
automatic soccer analysis. For sophisticated motion recognition, due to technological advances. However, in the cases of event
we maximize recognition accuracy by sequentially processing the analysis and tactical analysis, which require understanding of
data in three steps: data acquisition, data augmentation, and 3D high level semantic from a given match, data is still extracted
CNN based motion classifier learning. As can be seen from the depending on the manual work of the expert group, and only
experimental results, the proposed method guarantees real-time
speed performance and satisfactory accuracy performance. the big data extracted by hand is secondarily processed and
visualized. There are many reasons why these steps are not
I. I NTRODUCTION automated, but one of the biggest reasons is that the soccer
N the past, professional sports field was a human-oriented event can be recognized only by the motion information of
I area. The training of the player has been done through the
subjective guidance based on the know-how and experience
the player or referee. For example, it is necessary to be able
to recognize a tackle motion of the player, a movement of the
of the manager and the coaching staff. Even in the case of head referee’s hand, and a flag motion of the assistant referee
a game judgement, it is judged through the intuition and so that the tackle event, the foul event, and the offside event
observation of the referee, and the occasional misjudgement can be recognized. To solve this problem, this paper proposes
by the referee is accepted as part of sports. In addition, sports a soccer object motion recognition technique.
audiences were able to enjoy sports through unilateral delivery This paper is composed as follows. In Sec. II, we describe
of sports contents. However, in recent years, many changes the related researches. In Sec. III, we propose a soccer object
have been made in the field of professional sports as a result motion recognition pipeline based on 3D convolutional neural
of quantitative analysis of sports through sports science and networks (CNN). Sec. IV shows the experimental results of
ICT technology. The manager and coaching staff can use the proposed method. Finally, Sec. V discusses the concluding
data and video-based match analysis tools (eg, dartfish video remarks.
analysis tool [1]) to check the objective player performance
or conditions in detail, and to enable player training method II. R ELATED W ORKS
or tactical changes. It also uses technology to help referee Motion recognition is a kind of computer vision field that
judges such as high-speed camera readings (eg, hawk-eye recognizes human pose or action. The general process of
technology [2]) and produces interesting content using brilliant motion recognition is as follows: 1) extracting feature points
visualization tools (eg, freeD technology in NFL [3]) to give a necessary for motion recognition in a given input source; 2)
sense of sports immersion. These sports analytic technologies analyzing pattern of obtained feature points; 3) calculating
are being developed to reflect the needs of people in many similarity with predefined motion list; and 4) determining the
directions, thus the sports analysis market size was $4.7 final motion that has the highest similarity for the given input
Billions in 2017 [4]. source. It is a kind of image classification technology in that
This trend has also affected the professional soccer market. the purpose of video-based motion recognition technology is
Germany World Cup is to take advantage of big data analytics to determine the final motion based on the similarity with
company, SAP’s Match Insights technology to improve the predefined motion list.
Conventional motion recognition technologies are divided the problem defined in this paper. In this paper, we use 2D
into several subdivisions according to several criteria. As a first video sources taken from camera equipment installed in the
criterion, it can be classified into two-dimensional(2D) motion stadium. The target area of the field player and referees in the
recognition and three-dimensional(3D) motion recognition ac- game is tracked, and the goal is to recognize the motion based
cording to the dimension of the input source. 2D motion recog- on the tracking data. In addition, it is possible to construct
nition technique performs motion recognition from 2D video large-sized learning data, which is suitable for data-driven
sources taken from a general camera equipment [6], [7], [8]. feature learning and extraction. According to this analysis, the
3D motion recognition technique performs motion recognition motion recognition technology proposed in this paper can be
with stereoscopic video sources taken using special equipment specified as 1) 2D video source based, 2) data-driven feature
such as MicroSoft’s Kinect [9], [10]. As s second criterion, it extraction, and 3) technology to recognize human pose.
can be classified as recognizing human action according to
the human pose recognition and recognizing a gesture of a III. P ROPOSED M ETHOD
specific part of the human body. Motion recognition technique
In this Section, a method of performing motion recognition
based on human pose tries to recognize motions such as
by inputting object regions tracked from a soccer game video
human arm movements, arm extension, waist bending, and
will be described in detail. Figure 1 depicts the system
jumping motion based on video sources of human action [6],
outline of the proposed method. For the motion recognition
[7], [8], [9], [10]. Motion recognition technique for a specific
specialized for the soccer object, we constructed the motion
gesture recognizes a partial movement of a specific part of
recognition system through three steps of data acquisition,
the human body (hands, legs, etc.) [11], [12]. The third
data processing, and motion classifier learning [14]. A detailed
criterion is feature extraction method for motion recognition,
description of each is given in the subsection.
and it is divided into hand-crafted feature extraction method
and data-driven feature extraction method. A feature point A. Data Acquisition
is a clue that is used to distinguish different labels when
performing motion classification. The accuracy of motion The data acquisition is performed first to recognize the
classification depends on the quality of the feature points. motion. To do this, we need to define motion classification
The hand-crafted feature extraction method is a method in criteria. We classify motions of each soccer object and gener-
which the user manually designs and extracts feature points ate learning data based on the following principles:
according to a given classification purpose [6], [7], [8], [9], • The object is categorized into field player, head referee
[10], [11], [12]. The hand-crafted feature extraction method are and assistant referee.
advantageous in that direct design of the user is easy and the • All the motions that each object can take on the field
patterns of motions to be classified are monotonous, but they must be included in the motion list.
have a disadvantage in that the performance is significantly • The body direction of the object with respect to the same
lowered for motions with complex patterns. Recently, data- motion secures data of at least four directions.
driven feature extraction method automatically learns feature
points necessary for classification based on given information
TABLE I
(video clip and label) [13]. Although this feature extraction D EFINED MOTION LIST FOR EACH SOCCER OBJECT
method requires a large amount of computation and huge input
data for learning, it performs much better than the hand-crafted Field player Head referee Assistant referee
feature extraction method in terms of accuracy and execution Stand Sidle
Walk Walk Walk
speed. Run in Run Run
Kick One arm pointing Flag up
According to the above classification criteria, we can specify Tackle/Lie Card Flag chest
the category of motion recognition technique needed to solve Throw in Flag side
JIWON LEE ET AL: SOCCER OBJECT MOTION RECOGNITION BASED ON 3D CONVOLUTIONAL NEURAL NETWORKS 131
TABLE II
P ERFORMANCE VARIATIONS OF MOTION CLASSIFIER FOR EACH GPU
Performance
GPU types
ops f ps
(c) Third camera (d) Fourth camera GeForce GTX 1070 800 32
GeForce GTX TITAN X 930 37.2
Fig. 6. A sample screenshot of four cameras in the test video clip NVIDIA TITAN X 1200 48
JIWON LEE ET AL: SOCCER OBJECT MOTION RECOGNITION BASED ON 3D CONVOLUTIONAL NEURAL NETWORKS 133
TABLE V
C ONFUSION MATRIX OF MOTION CLASSIFIER FOR ASSISTANT REFEREE
Tackle Throw
This research is supported by Ministry of Culture,
Out\GT Stand Walk Run Kick Sports and Tourism(MCST) and Korea Creative Content
/Lie in
Stand 1,779 0 0 1 0 0 Agency(KOCCA) in the Culture Technology(CT) Reasearch
Walk 58 325 1,158 261 6 0 & Development Program 2016 (R2016030044, Development
Run 0 0 0 4 0 0
Kick 569 694 702 1,440 1,147 648 of Context-Based Sport Video Analysis, Summarization, and
Tackle/Lie 70 1 0 207 768 357 Retrieval Technologies)
Throw in 9 254 37 308 35 989
Accuracy 0.716 0.255 0 0.648 0.393 0.496 R EFERENCES
[1] DartFish sports analysis tool [Online] Available :
TABLE IV http://www.dartfish.com
C ONFUSION MATRIX OF MOTION CLASSIFIER FOR HEAD REFEREE [2] Hawk-eye innovations [Online] Available: https://www.hawkeyeinno va-
tions.com
One arm [3] FreeD on NFL [Online] Available : https://newsroom.intel.com/news/
Out\GT Walk Run Card intel-nfl-kickoff-freed-technology-11-stadiums-create-immersive-
pointing
Walk 11,244 202 376 17 highlights-2017-season/
Run 1,210 10,579 446 132 [4] “Sports analytics: market shares, strategies, and forecasts, worldwide,
One arm pointing 395 499 8,254 143 2015 to 2021,” Wintergreen Research, 472 pages, May 2015
Card 388 295 402 546 [5] A. Ghosh, “How ‘Match Insight’ is changing soccer,” 6th Aug. 2014.
[Online] Available: https://blogs.sap.com/2014/08/06/how-software- is-
Accuracy 0.850 0.914 0.871 0.652 making-football-even-more-beautiful/
134 COMMUNICATION PAPERS. POZNAŃ, 2018
[6] C. P. Huang, C. H. Hsieh, K. T. Lai, and W. Y. Huang, “Human W. Yoo, “3D convolutional neural networks for soccer object motion
action recognition using histogram of oriented gradient of motion history recognition,” in Proc. ICACT 2018, pp. 354-358, Feb. 2018.
image,” in International Conference on Instrumentation, Measurement, [15] W. Kim, S. Moon, J. Lee, D. Nam, and C. Jung , “Multiple Player Track-
Computer, Communication and Control, pp. 353-356, Oct. 2011. ing in Soccer Videos : An Adaptive Multiscale Sampling Approach,”
[7] L. Hu, W. Liu, B. Li, and W. Xing, “Robust motion detection using Multimedia Systems, pp. 1-13, Feb. 2018.
histogram of oriented gradients for illumination variations,” in Proc. [16] A. Krizhevsky, I. Sutskever, and G. E. hinton, “ImageNet classification
ICIMA 2010, pp. 443-447, May. 2010. with deep convolutional neural network,” in Proc. NIPS 2012, pp. 1-9,
[8] P. Banerjee and S. Sengupta, “Human motion detection and tracking for Dec. 2012.
video surveillance,” in National Conference for Communication, 2008. [17] S. Ji, W. Xu, M. Yang, and K. Yu, “3D convolutional neural networks for
[9] O. Patsadu, C. Nukoolkit, and B. Watanapa, “Human gesture recognition human action recognition,” IEEE Trans. Pattern Analysis and Machine
using Kinect camera,” in Proc. JCSSE 2012, pp. 28-32, May. 2012. Intelligence, vol. 35, no. 1, pp. 221-231, Mar. 2012.
[10] E. E. Stone and M. Skubic, “Fall detection in homes of older adults using [18] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning
the Microsoft Kinect,” IEEE Jour. Biomedical and Health Informatics, spatiotemporal features with 3d convolutional networks,” in Proc. ICCV
vol. 19, no. 1, pp. 290-301, Mar. 2014. 2015, pp. 4489-4497, Dec. 2015.
[11] N. C. Kiliboz and U. Gudukbay, “A hand gesture recognition for [19] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
human computer interaction,” Jour. Visual Communication and Image Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow,
Representation, vol. 28, pp. 97-104, Apr. 2015. A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M.
[12] M. B. Brahem, B. J. Menelas, and M. D. Otis, “Use of 3DOF accelerom- Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C.
eter for foot tracking and gesture recognition in mobile HCI,” Peocedia Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P.
Computer Science, vol. 19, pp. 453-460, 2013. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden,
[13] Y. LeCun, Y. Bengio, and G. E. Hinton, “Deep learning,” in Nature, vol. M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, “Tensorflow: Large-
521, pp. 436-444, May. 2015. scale machine learning on heterogeneous distributed systems,” arXiv
[14] J. Lee, Y. Kim, M. Jeong, C. Kim, D. Nam, J. Lee, S. Moon, and prepreprint arXiv:1603.04467v2, 2016.