100% found this document useful (4 votes)

15 views

Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 2 Kohei Arai instant download

The document is a collection of proceedings from the 2020 Intelligent Systems Conference (IntelliSys), edited by Kohei Arai, Supriya Kapoor, and Rahul Bhatia. It includes contributions from various researchers on intelligent systems and applications, showcasing advancements in fields such as artificial intelligence and machine learning. The conference featured 545 submissions from over 50 countries, with 177 papers accepted for publication, reflecting the international significance of the topics discussed.

Uploaded by

blagadaniozl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (4 votes)

15 views

Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 2 Kohei Arai instant download

Uploaded by

blagadaniozl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 68

Intelligent Systems and Applications:

Proceedings of the 2020 Intelligent Systems

Conference (IntelliSys) Volume 2 Kohei Arai pdf
download
https://textbookfull.com/product/intelligent-systems-and-
applications-proceedings-of-the-2020-intelligent-systems-
conference-intellisys-volume-2-kohei-arai/

Download more ebook from https://textbookfull.com

We believe these products will be a great fit for you. Click
the link to download now, or visit textbookfull.com
to discover even more!

Intelligent Systems and Applications: Proceedings of

the 2020 Intelligent Systems Conference (IntelliSys)
Volume 3 Kohei Arai

https://textbookfull.com/product/intelligent-systems-and-
applications-proceedings-of-the-2020-intelligent-systems-
conference-intellisys-volume-3-kohei-arai/

Intelligent Computing Proceedings of the 2020 Computing

Conference Volume 2 Kohei Arai

https://textbookfull.com/product/intelligent-computing-
proceedings-of-the-2020-computing-conference-volume-2-kohei-arai/

Intelligent Computing Proceedings of the 2020 Computing

Conference Volume 3 Kohei Arai

https://textbookfull.com/product/intelligent-computing-
proceedings-of-the-2020-computing-conference-volume-3-kohei-arai/

Intelligent Computing Proceedings of the 2020 Computing

Conference Volume 1 Kohei Arai

https://textbookfull.com/product/intelligent-computing-
proceedings-of-the-2020-computing-conference-volume-1-kohei-arai/
Intelligent Computing: Proceedings of the 2018
Computing Conference, Volume 2 Kohei Arai

https://textbookfull.com/product/intelligent-computing-
proceedings-of-the-2018-computing-conference-volume-2-kohei-arai/

Proceedings of the Future Technologies Conference (FTC)

2020, Volume 1 Kohei Arai

https://textbookfull.com/product/proceedings-of-the-future-
technologies-conference-ftc-2020-volume-1-kohei-arai/

Proceedings of the Future Technologies Conference FTC

2018 Volume 2 Kohei Arai

https://textbookfull.com/product/proceedings-of-the-future-
technologies-conference-ftc-2018-volume-2-kohei-arai/

Human Centred Intelligent Systems Proceedings of KES

HCIS 2020 Conference Alfred Zimmermann

https://textbookfull.com/product/human-centred-intelligent-
systems-proceedings-of-kes-hcis-2020-conference-alfred-
zimmermann/

Proceedings of the Future Technologies Conference FTC

2018 Volume 1 Kohei Arai

https://textbookfull.com/product/proceedings-of-the-future-
technologies-conference-ftc-2018-volume-1-kohei-arai/
Advances in Intelligent Systems and Computing 1251

Kohei Arai
Supriya Kapoor
Rahul Bhatia Editors

Intelligent
Systems and
Applications
Proceedings of the 2020 Intelligent
Systems Conference (IntelliSys)
Volume 2
Advances in Intelligent Systems and Computing

Volume 1251

Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland

Advisory Editors
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
Rafael Bello Perez, Faculty of Mathematics, Physics and Computing,
Universidad Central de Las Villas, Santa Clara, Cuba
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
Hani Hagras, School of Computer Science and Electronic Engineering,
University of Essex, Colchester, UK
László T. Kóczy, Department of Automation, Széchenyi István University,
Gyor, Hungary
Vladik Kreinovich, Department of Computer Science, University of Texas
at El Paso, El Paso, TX, USA
Chin-Teng Lin, Department of Electrical Engineering, National Chiao
Tung University, Hsinchu, Taiwan
Jie Lu, Faculty of Engineering and Information Technology,
University of Technology Sydney, Sydney, NSW, Australia
Patricia Melin, Graduate Program of Computer Science, Tijuana Institute
of Technology, Tijuana, Mexico
Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro,
Rio de Janeiro, Brazil
Ngoc Thanh Nguyen , Faculty of Computer Science and Management,
Wrocław University of Technology, Wrocław, Poland
Jun Wang, Department of Mechanical and Automation Engineering,
The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications
on theory, applications, and design methods of Intelligent Systems and Intelligent
Computing. Virtually all disciplines such as engineering, natural sciences, computer
and information science, ICT, economics, business, e-commerce, environment,
healthcare, life science are covered. The list of topics spans all the areas of modern
intelligent systems and computing such as: computational intelligence, soft comput-
ing including neural networks, fuzzy systems, evolutionary computing and the fusion
of these paradigms, social intelligence, ambient intelligence, computational neuro-
science, artificial life, virtual worlds and society, cognitive science and systems,
Perception and Vision, DNA and immune based systems, self-organizing and
adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics
including human-machine teaming, knowledge-based paradigms, learning para-
digms, machine ethics, intelligent data analysis, knowledge management, intelligent
agents, intelligent decision making and support, intelligent network security, trust
management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are
primarily proceedings of important conferences, symposia and congresses. They
cover significant recent developments in the field, both of a foundational and
applicable character. An important characteristic feature of the series is the short
publication time and world-wide distribution. This permits a rapid and broad
dissemination of research results.
** Indexing: The books of this series are submitted to ISI Proceedings,
EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **

More information about this series at http://www.springer.com/series/11156

Kohei Arai Supriya Kapoor
• •

Rahul Bhatia
Editors

Intelligent Systems
and Applications
Proceedings of the 2020 Intelligent Systems
Conference (IntelliSys) Volume 2

123
Editors
Kohei Arai Supriya Kapoor
Saga University The Science and Information
Saga, Japan (SAI) Organization
Bradford, West Yorkshire, UK
Rahul Bhatia
The Science and Information
(SAI) Organization
Bradford, West Yorkshire, UK

ISSN 2194-5357 ISSN 2194-5365 (electronic)

Advances in Intelligent Systems and Computing
ISBN 978-3-030-55186-5 ISBN 978-3-030-55187-2 (eBook)
https://doi.org/10.1007/978-3-030-55187-2
© Springer Nature Switzerland AG 2021
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Editor’s Preface

This book contains the scientific contributions included in the program of the
Intelligent Systems Conference (IntelliSys) 2020, which was held during September
3–4, 2020, as a virtual conference. The Intelligent Systems Conference is a pres-
tigious annual conference on areas of intelligent systems and artificial intelligence
and their applications to the real world.
This conference not only presented state-of-the-art methods and valuable
experience from researchers in the related research areas, but also provided the
audience with a vision of further development in the fields. We have gathered a
multi-disciplinary group of contributions from both research and practice to discuss
the ways how intelligent systems are today architectured, modeled, constructed,
tested and applied in various domains. The aim was to further increase the body of
knowledge in this specific area by providing a forum to exchange ideas and discuss
results.
The program committee of IntelliSys 2020 represented 25 countries, and authors
submitted 545 papers from 50+ countries. This certainly attests to the widespread,
international importance of the theme of the conference. Each paper was reviewed
on the basis of originality, novelty and rigorousness. After the reviews, 214 were
accepted for presentation, out of which 177 papers are finally being published in the
proceedings.
The conference would truly not function without the contributions and support
received from authors, participants, keynote speakers, program committee mem-
bers, session chairs, organizing committee members, steering committee members
and others in their various roles. Their valuable support, suggestions, dedicated
commitment and hard work have made the IntelliSys 2020 successful. We warmly
thank and greatly appreciate the contributions, and we kindly invite all to continue
to contribute to future IntelliSys conferences.

v
vi Editor’s Preface

It has been a great honor to serve as the General Chair for the IntelliSys 2020 and
to work with the conference team. We believe this event will certainly help further
disseminate new ideas and inspire more international collaborations.
Kind Regards,

Kohei Arai
Conference Chair
Contents

CapsNet vs CNN: Analysis of the Effects of Varying Feature

Spatial Arrangement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Ugenteraan Manogaran, Ya Ping Wong, and Boon Yian Ng
Improved 2D Human Pose Tracking Using Optical
Flow Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Aleksander Khelvas, Alexander Gilya-Zetinov, Egor Konyagin,
Darya Demyanova, Pavel Sorokin, and Roman Khafizov
Transferability of Fast Gradient Sign Method . . . . . . . . . . . . . . . . . . . . 23
Tamás Muncsan and Attila Kiss
Design of an Automatic System to Determine the Degree
of Progression of Diabetic Retinopathy . . . . . . . . . . . . . . . . . . . . . . . . . 35
Hernando González, Carlos Arizmendi, and Jessica Aza
Adaptive Attention Mechanism Based Semantic Compositional
Network for Video Captioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Zhaoyu Dong, Xian Zhong, Shuqin Chen, Wenxuan Liu, Qi Cui,
and Luo Zhong
Estimated Influence of Online Management Tools on Team
Management Based on the Research with the Use of the System
of Organizational Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Olaf Flak
Liveness Detection via Facial Expressions Queue . . . . . . . . . . . . . . . . . . 73
Bat-Erdene Batsukh
Java Based Application Development for Facial Identification
Using OpenCV Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Askar Boranbayev, Seilkhan Boranbayev, and Askar Nurbekov

vii
viii Contents

Challenges in Face Recognition Using Machine Learning

Algorithms: Case of Makeup and Occlusions . . . . . . . . . . . . . . . . . . . . . 86
Natalya Selitskaya, Stanislaw Sielicki, and Nikolaos Christou
The Effects of Social Issues and Human Factors on the Reliability
of Biometric Systems: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Mohammadreza Azimi and Andrzej Pacut
Towards Semantic Segmentation Using Ratio Unpooling . . . . . . . . . . . . 111
Duncan Boland and Hossein Malekmohamadi
Adaptive Retraining of Visual Recognition-Model in Human
Activity Recognition by Collaborative Humanoid Robots . . . . . . . . . . . . 124
Vineet Nagrath, Mossaab Hariz, and Mounim A. El Yacoubi
A Reasoning Based Model for Anomaly Detection
in the Smart City Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Patrick Hammer, Tony Lofthouse, Enzo Fenoglio, Hugo Latapie,
and Pei Wang
Document Similarity from Vector Space Densities . . . . . . . . . . . . . . . . . 160
Ilia Rushkin
Food Classification for Inflammation Recognition Through
Ingredient Label Analysis: A Real NLP Case Study . . . . . . . . . . . . . . . 172
Stefano Campese and Davide Pozza
Classification Based Method for Disfluencies Detection
in Spontaneous Spoken Tunisian Dialect . . . . . . . . . . . . . . . . . . . . . . . . 182
Emna Boughariou, Younès Bahou, and Lamia Hadrich Belguith
A Comprehensive Methodology for Evaluating Conversation-Based
Interfaces to Relational Databases (C-BIRDs) . . . . . . . . . . . . . . . . . . . . 196
Majdi Owda, Amani Yousef Owda, and Fathi Gasir
Disease Normalization with Graph Embeddings . . . . . . . . . . . . . . . . . . . 209
D. Pujary, C. Thorne, and W. Aziz
Quranic Topic Modelling Using Paragraph Vectors . . . . . . . . . . . . . . . . 218
Menwa Alshammeri, Eric Atwell, and Mhd Ammar Alsalka
Language Revitalization: A Benchmark for Akan-to-English
Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Kingsley Nketia Acheampong and Nathaniel Nii Oku Sackey
A Machine Learning Platform for NLP in Big Data . . . . . . . . . . . . . . . 245
Mauro Mazzei
Contents ix

Recent News Recommender Using Browser’s History . . . . . . . . . . . . . . 260

Samer Sawalha and Arafat Awajan
Building a Wikipedia N-GRAM Corpus . . . . . . . . . . . . . . . . . . . . . . . . . 277
Jorge Ramón Fonseca Cacho, Ben Cisneros, and Kazem Taghva
Control Interface of an Automatic Continuous Speech Recognition
System in Standard Arabic Language . . . . . . . . . . . . . . . . . . . . . . . . . . 295
Brahim Fares Zaidi, Malika Boudraa, Sid-Ahmed Selouani,
Mohammed Sidi Yakoub, and Ghania Hamdani
Emotion Detection Throughout the Speech . . . . . . . . . . . . . . . . . . . . . . 304
Manuel Rodrigues, Dalila Durães, Ricardo Santos, and Cesar Analide
Understanding Troll Writing as a Linguistic Phenomenon . . . . . . . . . . . 315
Sergei Monakhov
Spatial Sentiment and Perception Analysis of BBC News Articles
Using Twitter Posts Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Farah Younas and Majdi Owda
Human-Machine Interaction for Improved Cybersecurity Named
Entity Recognition Considering Semantic Similarity . . . . . . . . . . . . . . . 347
Kazuaki Kashihara, Jana Shakarian, and Chitta Baral
Predicting University Students’ Public Transport Preferences
for Sustainability Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
Ali Bakdur, Fumito Masui, and Michal Ptaszynski
Membrane Clustering Using the PostgreSQL Database
Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Tamás Tarczali, Péter Lehotay-Kéry, and Attila Kiss
STAR: Spatio-Temporal Prediction of Air Quality Using
a Multimodal Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Tien-Cuong Bui, Joonyoung Kim, Taewoo Kang, Donghyeon Lee,
Junyoung Choi, Insoon Yang, Kyomin Jung, and Sang Kyun Cha
Fair Allocation Based Soft Load Shedding . . . . . . . . . . . . . . . . . . . . . . . 407
Sarwan Ali, Haris Mansoor, Imdadullah Khan, Naveed Arshad,
Saﬁullah Faizullah, and Muhammad Asad Khan
VDENCLUE: An Enhanced Variant of DENCLUE Algorithm . . . . . . . 425
Mariam S. Khader and Ghazi Al-Naymat
Detailed Clustering Based on Gaussian Mixture Models . . . . . . . . . . . . 437
Nikita Andriyanov, Alexander Tashlinsky, and Vitaly Dementiev
Smartphone Applications Developed to Collect Mobility Data:
A Review and SWOT Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Cristina Pronello and Pinky Kumawat
x Contents

A Novel Approach for Heart Disease Prediction Using Genetic

Algorithm and Ensemble Classification . . . . . . . . . . . . . . . . . . . . . . . . . 468
Indu Yekkala and Sunanda Dixit
An Improved Algorithm for Fast K-Word Proximity Search Based
on Multi-component Key Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
Alexander B. Veretennikov
A Feedback Integrated Web-Based Multi-Criteria Group Decision
Support Model for Contractor Selection Using Fuzzy Analytic
Hierarchy Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
Abimbola H. Afolayan, Bolanle A. Ojokoh, and Adebayo O. Adetunmbi
AIS Ship Trajectory Clustering Based
on Convolutional Auto-encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
Taizheng Wang, Chunyang Ye, Hui Zhou, Mingwang Ou, and Bo Cheng
An Improved Q-Learning Algorithm for Path Planning
in Maze Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
Shimin Gu and Guojun Mao
Automatic Classification of Web News: A Systematic
Mapping Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
Mauricio Pandolfi-González, Christian Quesada-López,
Alexandra Martínez, and Marcelo Jenkins
Big Data Clustering Using MapReduce Framework: A Review . . . . . . . 575
Mariam S. Khader and Ghazi Al-Naymat
A Text Extraction-Based Smart Knowledge Graph Composition
for Integrating Lessons Learned During the Microchip Design . . . . . . . 594
Hasan Abu Rasheed, Christian Weber, Johannes Zenkert, Peter Czerner,
Roland Krumm, and Madjid Fathi
Clustering Approach to Topic Modeling in Users Dialogue . . . . . . . . . . 611
E. Feldina and O. Makhnytkina
Knowledge-Based Model for Formal Representation of Complex
System Visual Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
Andrey I. Vlasov, Ludmila V. Juravleva, and Vadim A. Shakhnov
Data Mining Solutions for Direct Marketing Campaign . . . . . . . . . . . . . 633
Torubein Fawei and Duke T. J. Ludera
A Review of Phishing URL Detection Using Machine
Learning Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
Sajjad Jalil and Muhammad Usman
Contents xi

Data Mining and Machine Learning Techniques for Bank

Customers Segmentation: A Systematic Mapping Study . . . . . . . . . . . . 666
Maricel Monge, Christian Quesada-López, Alexandra Martínez,
and Marcelo Jenkins
Back to the Past to Charter the Vinyl Electronic Market:
A Data Mining Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685
Sara Lousão, Pedro Ramos, and Sérgio Moro
Learning a Generalized Matrix from Multi-graphs Topologies
Towards Microservices Recommendations . . . . . . . . . . . . . . . . . . . . . . . 693
Ilias Tsoumas, Chrysostomos Symvoulidis, and Dimosthenis Kyriazis
Big Data in Smart Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
Will Serrano
Academic Articles Recommendation Using
Concept-Based Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
Dina Mohamed, Ayman El-Kilany, and Hoda M. O. Mokhtar
Self-organising Urban Trafﬁc Control on Micro-level Using
Reinforcement Learning and Agent-Based Modelling . . . . . . . . . . . . . . 745
Stefan Bosse
The Adoption of Electronic Administration by Citizens: Case
of Morocco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765
Fadwa Satry and Ez-zohra Belkadi

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781

CapsNet vs CNN: Analysis of the Eﬀects
of Varying Feature Spatial Arrangement

Ugenteraan Manogaran(B) , Ya Ping Wong, and Boon Yian Ng

Faculty of Computing and Informatics, Multimedia University,

Persiaran Multimedia, 63100 Cyberjaya, Selangor, Malaysia
[email protected], [email protected]

Abstract. Despite the success over the recent years, convolutional neu-
ral network (CNN) has a major limitation of the inability to retain spa-
tial relationship between learned features in deeper layers. Capsule net-
work with dynamic routing (CapsNet) was introduced in 2017 with a
speculation that CapsNet can overcome this limitation. In our research,
we created a suitable collection of datasets and implemented a simple
CNN model and a CapsNet model with similar complexity to test this
speculation. Experimental results show that both the implemented CNN
and CapsNet models have the ability to capture the spatial relationship
between learned features. Counterintuitively, our experiments show that
our CNN model outperforms our CapsNet model using our datasets. This
implies that the speculation does not seem to be entirely correct. This
might be due to the fact that our datasets are too simple, hence requir-
ing a simple CNN model. We further recommend future research to be
conducted using deeper models and more complex datasets to test the
speculation.

Keywords: CapsNet · Convolutional Neural Network · Spatial

relationship

1 Introduction

Ever since Krizhevsky et al. [1] demonstrated the outstanding performance of

a convolutional neural network (CNN) model on ImageNet, CNN has become
the center of attraction for computer vision researchers to solve problems such
as image segmentation, object detection, object localization, image classifica-
tion, and image retrieval. Some of the well-known CNN models are AlexNet [1],
GoogleNet [2], VGG-16 [3], YOLO [4], and RCNN [5].
A significant advantage of CNN is its ability to maintain translation invari-
ance for feature detection [6]. This means that the position of an object that
is known by CNN in the input image does not affect the performance of CNN.
CNN is able to recognize the object regardless of its position in the image. This
is achieved by the use of pooling layers in CNN which are also responsible for
reducing the size of feature maps.
c Springer Nature Switzerland AG 2021
K. Arai et al. (Eds.): IntelliSys 2020, AISC 1251, pp. 1–9, 2021.
https://doi.org/10.1007/978-3-030-55187-2_1
2 U. Manogaran et al.

However, this advantage of CNN inevitably leads to its drawbacks. Two

major drawbacks are the lack of rotational invariance [8] and the failure to retain
spatial relationships between features [7]. The failure to be invariant to rotations
would cause CNN to produce false negatives when an object that is known by
the network is rotated to a certain extent. On the other hand, the failure of
CNN to retain spatial relationships between features would cause the network
to produce false positives [9]. Even though CNNs can achieve state-of-the-art
results on many challenges despite these drawbacks, the drawbacks can become
serious concerns in applications such as in security systems.
To overcome the lack of rotational invariance, augmenting the training data
by rotations became a standard practice in the training of CNN models [10].
However, the training time would also increase tremendously. Therefore, in order
to solve the drawbacks of CNN, Sabour, S. et al. [11] proposed a novel neural
network architecture known as capsule network with dynamic routing (CapsNet).
Unlike CNN, CapsNet produces outputs in the form of vectors instead of scalars.
This allows CapsNet to retain the properties of an object such as rotation,
skewness, and thickness.
It has also been explicitly speculated in a number of research papers [9–11],
that CapsNet would be able to retain the spatial relationships between features
in contrast to CNN. In other words, features of an object in the wrong places
such as a face with eyes below the nose instead of above the nose would still be
a face to CNN. However, CapsNet is speculated to be able to identify it as a
non-face.
To the best of our knowledge, there has been no research literature that
has produced comprehensive experiments and analysis on this speculation. In
this paper we present our experiments and analysis to test both CapsNet and
CNN on this speculation. We generated our own dataset for the purpose of our
experiment and implemented a CapsNet and a CNN model. Evaluation of these
models were done on the generated dataset in such a way that the speculation
is tested.

2 Related Works

The concept of CNN was ﬁrst proposed by LeCun et al. [12] in 1989. However,
due to the lack of computational power and availability of dataset, it was not
until recent years that researchers are able to develop feasible models utilizing
modern high-performance computers. One notable breakthrough was the work of
Krizhevsky et al. [1] which achieved state-of-the-art performance in the ImageNet
challenge [13] in 2012. Since then, countless researches have been conducted to
develop more advanced CNN models to be used successfully in real-world appli-
cations such as speech recognition [14], gait recognition [15], steering controls in
self-driving cars [16], human crowd detection [17], and medical image segmenta-
tion [18].
Despite successful demonstrations of CNN models, one of the pioneers, Geof-
frey Hinton argued that the current CNNs “are misguided in what they are
CapsNet vs CNN 3

trying to achieve” [19], due to the use of pooling layers for subsampling in CNN
models. The models lose the ability to compute precise spatial relationships
between learned features in the deeper layers. When a pooling layer is used in
between convolutional layers, only the most active neuron in a local region of
a feature map would be retained while the rest of the neurons are disregarded.
Such disregard of neurons causes the loss of spatial information of the features.
Furthermore, due to the use of scalars instead of vectors, properties of fea-
tures such as orientation, thickness, and skewness are lost. Therefore, Hinton,
G. E. et al. [19] proposed to group neurons together as vectors and use them to
represent the features of an object. These vectors are called capsules.
In 2017, Hinton and his team [11] proposed an architecture called capsule
networks with dynamic routing (CapsNet) that performed better than CNN
on the MNIST dataset. It achieved state-of-the-art results in MNIST with only
0.25% of test error. The CapsNet model which achieved 99.23% on the expanded
MNIST set were able to reach an accuracy of 79% on affNIST test set while a
CNN that achieved 99.22% accuracy on the expanded MNIST test set only
achieved 66% accuracy on affNIST test set. This proves that CapsNet is more
robust to affine transformations.
We have implemented CapsNet based on the original research paper [11] and
through the reconstruction network as mentioned in the paper, it can be seen
that CapsNet preserves the properties of the features well as shown in Fig. 1.

Fig. 1. Original (Top Row) vs Reconstructed (Bottom Row) images.

Following the success of CapsNet on MNIST, studies have been conducted to

push the capability of CapsNet. LaLonde, R. et al. [20] proposed a novel archi-
tecture called SegCaps which is based on CapsNet to perform image segmen-
tation. SegCaps outperformed previous state-of-the-art networks even though
SegCaps had lower number of parameters on the LUNA16 subset of the LIDC-
IDRI database. In a diﬀerent research [7], a CapsNet model outperformed a CNN
model with similar complexity in Human Action Recognition task on KTH and
UFC-sports dataset.
One of the speculated properties of a CapsNet model is its ability to retain
spatial relationships between learned feature unlike a CNN model [9–11]. In
other words, the relative positions of features are insigniﬁcant to a CNN model.
This causes a CNN model to produce false positives such as labelling an image
of a face with eyes below the nose as a face. In contrast to CNN, CapsNet is
4 U. Manogaran et al.

speculated to be able to avoid such false positives. However, to date, no studies

have been conducted to test this speculation. In this paper, we seek to test this
speculation to gain deeper insights into CapsNet.

3 Methodology
We implemented a CNN model and a CapsNet model for this study. In general, a
CNN model consists of several convolutional layers with pooling layers between
them followed by fully-connected layers as explained in Sect. 3.2. A CapsNet
model consists of several convolutional layers without any pooling layers followed
by a primary capsule layer and a digit capsule layer as explained in Sect. 3.3.
Both of the models were designed to have the same number of layers in order
for them to be comparable.
In order to test the speculation, we need to design a dataset in such a way
that there are two classes of images containing the same features but the features
from different classes have different spatial arrangements. Training our models
directly on such a dataset may not yield any insight into the models as the
models will learn to identify the shape of the entire objects successfully instead
of the distinct features as intended.
Therefore, we prepared two groups of datasets whereby the first group con-
tains images of only distinct features while the second group contains objects
formed by the composition of the features. Our models are first trained on the
dataset from Group 1. Once the training is completed, the weights of the con-
volutional layers in both models will be frozen while the weights of the rest of
the layers will be re-trained on the dataset from Group 2. This will ensure that
our models learn the distinct features first before learning to identify the objects
using the learned features. This strategy is known as transfer learning.
Below we describe in detail regarding the dataset generation, testing of con-
volutional neural network model, and testing of capsule network with dynamic
routing model. Since our datasets only consist of objects with simple features,
relatively simple models should be sufficient to achieve good accuracy on the
evaluations.

3.1 Dataset Generation

Our dataset consists of two groups. Figure 2 shows samples from Group 1, which
contains images of arrows and non-arrows. Figure 3 shows samples from Group
2, which contains images of equilateral triangles and rectangles. Each image is
of 64 × 64 pixels.
We chose to use generated images in our dataset because there is too much
ambiguity in real-life images. Furthermore, simple polygon objects were chosen as
they are well-deﬁned mathematically. This would enable us to test out diﬀerent
ideas on how to design our experiments. Table 1 shows the organizations of our
datasets.
CapsNet vs CNN 5

(a) Arrows (b) Non-Arrows

Fig. 2. Samples from Group 1

(a) Triangles (b) Rectangles

Fig. 3. Samples from Group 2

Table 1. Organizations of our datasets

Dataset Description Subsets Number of images

Group 1 Contain images of Training Set 1 500
arrows and Training Set 2 1000
non-arrows
Training Set 3 2000
Testing Set 2000
Group 2 Contain images of Training Set 1 500
equilateral triangles Training Set 2 1000
and rectangles
Training Set 3 2000
Testing Set 2000

3.2 Convolutional Neural Network (CNN)

We implemented a CNN model using Tensorflow that has 3 convolutional layers
and 2 fully-connected layers. Max pooling layer was implemented after each
convolutional layer. Rectified Linear Unit (ReLU) was used as the activation
function on every layer except for the output layer. Dropouts were also applied
to prevent the model from overfitting.
As mentioned in the methodology section above, we carried out our exper-
iment by first training our CNN model on the dataset from Group 1 and once
the model was trained, we re-trained the weights of the fully-connected layers of
the model on the dataset from Group 2 while freezing the weights of the con-
volutional layers. After each training, the trained model was evaluated on the
testing sets from their respective groups.

3.3 Capsule Network with Dynamic Routing (CapsNet)

Our CapsNet model was also implemented using Tensorﬂow. We implemented
3 convolutional layers, 1 primary capsule layer and 1 digit capsule layer. The
6 U. Manogaran et al.

architecture of CapsNet is similar to the original paper [11] except that we added
an extra convolutional layer and we used 16-D capsules on primary capsule layer
and 32-D capsules in digit capsule layer. We used the activation function as
proposed in the paper. To prevent overﬁtting, a reconstruction network [11] was
used. There were no pooling layers used.
Similar to CNN, our experiment was carried out by ﬁrst training our CapsNet
model on the dataset from Group 1 and once the model was trained, we re-trained
the weights of the primary capsule layer and digit capsule layer of the model on
the dataset from Group 2 while freezing the weights of the convolutional layers.
The trained model was evaluated on the testing sets from their respective groups
after each training.

4 Experimental Results and Discussion

The trainings and the evaluations of the models were performed on a workstation
running on Ubuntu 16.04 equipped with 16 GB RAM, Core i7-6700K processor,
and two NVIDIA GTX1080Ti GPUs. The models were trained using the training
subsets and were evaluated on their respective testing sets. The evaluation results
in terms of accuracy (acc), precision (prec), recall (rec) and F1-score (F1) for
both models are shown in Table 2 below.

Table 2. (a) Evaluation Results for CapsNet. (b) Evaluation Results for CNN

(a)
Subset 1 Subset 2 Subset 3
(%) Acc Prec Rec F1 Acc Prec Rec F1 Acc Prec Rec F1
Group 1
Triangles vs 88.4 88.7 87.2 87.9 91 92.5 89 90.7 90.4 92.8 87.6 90.1
Rectangles
Group 2
Arrows vs 67.6 70.6 59.7 64.7 77.8 86.6 62.1 72.3 80.4 83.9 75.1 79.3
Non-Arrows

(b)
Subset 1 Subset 2 Subset 3
(%) Acc Prec Rec F1 Acc Prec Rec F1 Acc Prec Rec F1
Group 1
Triangles vs 98.5 99.4 97.1 98.2 99.3 99.5 98.9 99.2 99.6 99.8 99.2 99.6
Rectangles
Group 2
Arrows vs 92.6 95 87.2 90.9 95.8 95.2 95.8 95.5 96.6 96.8 92.5 94.6
Non-Arrows
CapsNet vs CNN 7

All the images were shuﬄed in their respective sets and normalized before
they were used for training and evaluation purposes. From Table 2(a), it is evi-
dent that CapsNet is able to determine whether a given image contains an arrow
or non-arrow by computing the spatial relationship between the learned features.
It can also be seen in Table 2 (b) that CNN has achieved near-perfect accuracies.
This is due to the fact that the generated datasets do not contain any real-world
noise.
We expected the CNN model to perform worse than CapsNet based on the
speculation stated earlier but it can be seen from the results that CNN actually
performed better than CapsNet. This might be due to the dataset being too
simple hence not requiring a deeper CNN model.
The use of pooling layers in between the convolutional layers should cause
the loss of spatial information of the features in a CNN. Hence, it might be
the case where our model is not deep enough. We expected our CNN model to
perform poorly to at least some degree due to the use of 3 pooling layers but
based on the results this is not the case. We chose a CNN model with only 3
pooling layers due to the simplicity of the datasets. From the results, it is evident
that the problem of retaining the spatial relationship between features is not a
serious issue for a relatively shallow model such as a model with only 3 pooling
layers. However, it is questionable whether a deeper CNN model would perform
well on a more complex dataset or not.
In our experiment, the objects in the images are formed by composing simple
features. There is only one equilateral triangle and one rectangle in every image.
Given the success of CNN, identifying such generated simple objects without
real-world noise is rather a trivial task for CNN. This could be another reason for
the high accuracy that CNN models have achieved in this experiment despite the
use of pooling layers. Our implementations are publicly available in this github
link.1

5 Conclusions and Future Work

In this work, we have designed an experiment to test the speculation that Cap-
sNet is able to retain the spatial relationship between features better than CNN.
In order to carry out the experiment, we have generated our own datasets.
From our results, both the shallow CNN and CapsNet models have shown
the capability to retain the spatial relationship between features. However, the
speculation that CapsNet is able to retain spatial relationship between features
better than CNN does not seem to be true for shallow models on simple datasets.
It is still uncertain whether this speculation is true for deeper models on more
complex datasets and on noisy datasets.
Considering the fact that CNN has been developed extensively since its inven-
tion in 1989 [12], it is possible that our experiment was too simple for CNN.
CapsNet on the other hand, is still at a rudimentary stage and the fact that its

1
https://github.com/MMU-VisionLab/CapsNet-vs-CNN.
8 U. Manogaran et al.

performance level is close to CNN in this experiment means that CapsNet has
great potential.
Future research in this area should consider the usage of more complex fea-
tures to represent the objects in the datasets and deeper models in order to
further understand the capabilities and limitations of these models. Gaining
deeper insights on these models in the retention of spatial relationship between
features will guide future developments in a better way.

Acknowledgment. The authors are grateful to the Ministry of Higher Education,

Malaysia and Multimedia University for the financial support provided by the Funda-
mental Research Grant Scheme (MMUE/150030) and MMU Internal Grant Scheme
(MMUI/170110).

References
1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep con-
volutional neural networks. In: Advances in Neural Information Processing Sys-
tems, pp. 1097–1105 (2012)
2. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich,
A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 1–9 (2015)
3. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556 (2014)
4. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified,
real-time object detection. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 779–788 (2016)
5. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accu-
rate object detection and semantic segmentation. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
6. Nair, P., Doshi, R., Keselj, S.: Pushing the limits of capsule networks. Technical
note (2018)
7. Algamdi, A.M., Sanchez, V., Li, C.T.: Learning temporal information from spatial
information using CapsNets for human action recognition. In: 2019 IEEE Interna-
tional Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP
2019, pp. 3867–3871 (2019)
8. Xi, E., Bing, S., Jin, Y.: Capsule network performance on complex data. arXiv
preprint arXiv:1712.03480 (2017)
9. Xiang, C., Zhang, L., Tang, Y., Zou, W., Xu, C.: MS-CapsNet: a novel multi-scale
capsule network. IEEE Signal Process. Lett. 25(12), 1850–1854 (2018)
10. Chidester, B., Do, M.N., Ma, J.: Rotation equivariance and invariance in convolu-
tional neural networks. arXiv preprint arXiv:1805.12301 (2018)
11. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In:
Advances in Neural Information Processing Systems, pp. 3856–3866 (2017)
12. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.,
Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural
Comput. 1(4), 541–551 (1989)
13. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-
scale hierarchical image database. In: IEEE Conference on Computer Vision and
Pattern Recognition (2009)
CapsNet vs CNN 9

14. Palaz, D., Magimai-Doss, M., Collobert, R.: Analysis of CNN-based speech recog-
nition system using raw speech as input. In: Sixteenth Annual Conference of the
International Speech Communication Association (2015)
15. Zhang, C., Liu, W., Ma, H., Fu, H.: Siamese neural network based gait recognition
for human identification. In: IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), pp. 2832–2836 (2016)
16. Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal,
P., Zhang, X.: End to end learning for self-driving cars. arXiv preprint
arXiv:1604.07316 (2016)
17. Tzelepi, M., Tefas, A.: Human crowd detection for drone flight safety using
convolutional neural networks. In: 25th European Signal Processing Conference
(EUSIPCO), pp. 743–747. IEEE (2017)
18. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks
for volumetric medical image segmentation. In: IEEE Fourth International Con-
ference on 3D Vision (3DV), pp. 565–571 (2016)
19. Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: Pro-
ceedings of the 21th International Conference on Artificial Neural Networks-
Volume Part I, pp. 44–51 (2011)
20. LaLonde, R., Bagci, U.: Capsules for object segmentation. arXiv preprint
arXiv:1804.04241 (2018)
Improved 2D Human Pose Tracking
Using Optical Flow Analysis

Aleksander Khelvas1(B) , Alexander Gilya-Zetinov1 , Egor Konyagin2 ,

Darya Demyanova1 , Pavel Sorokin1 , and Roman Khaﬁzov1
1
Moscow Institute of Physics and Technologies, Dolgoprudnii, Russian Federation
[email protected]
2
Skolkovo Institute of Science and Technology,
Bolshoy Boulevard 30, bld. 1, Moscow, Russian Federation
http://www.mipt.ru

Abstract. In this paper, we propose a novel human body pose reﬁne-

ment method that relies on an existing single-frame pose detector and
uses an optical flow algorithm in order to increase quality of output tra-
jectories. First, a pose estimation algorithm such as OpenPose is applied
and the error of keypoint position measurement is calculated. Then, the
velocity of each keypoint in frame coordinate space is estimated by an
optical flow algorithm, and results are merged through a Kalman filter.
The resulting trajectories for a set of experimental videos were calculated
and evaluated by metrics, which showed a positive impact of optical flow
velocity estimations. Our algorithm may be used as a preliminary step
to further joint trajectory processing, such as action recognition.

Keywords: Video processing · Human pose detection · Skeleton

motion

1 Introduction
Human motion tracking is an important application of machine vision algorithms
that could be used for many business purposes. The most popular tasks in the
digital world include distributed video surveillance system, solutions for digital
marketing, solutions for human tracking in an industrial environment.
This task can have diﬀerent levels of details. The high-level approach is object
detection, when the position of human as a whole object is extracted and its
bounding box in 2D or 3D space is estimated.
A more interesting approach would be to detect a human pose in motion. This
task is more complicated because human pose has substantially more dimensions
compared to a bounding box.
Recent advances in deep learning have resulted in eﬃcient single-frame pose
tracking algorithms, such as [6,14]. By applying them sequentially to a video
stream, a set of trajectories for joints may be obtained. However, since these

c Springer Nature Switzerland AG 2021

K. Arai et al. (Eds.): IntelliSys 2020, AISC 1251, pp. 10–22, 2021.
https://doi.org/10.1007/978-3-030-55187-2_2
Improved 2D Human Pose Tracking Using Optical Flow Analysis 11

algorithms usually analyze input frames independently, the obtained trajectories

usually have various artifacts, such as discontinuities or missing points.
In the reported research, we solve a task of enhancing obtained joint tra-
jectories for multiple persons in a scene by leveraging the temporal information
using an optical ﬂow algorithm.

2 Related Work

The task of retrieving pose dynamics for all persons in the video may be con-
sidered as a variant of multiple object tracking (MOT) task, where the con-
sidered objects are not persons but individual pose keypoints. There are two
major paradigms in the field of MOT - detection-based tracking and detection-
free tracking [11]. In the first case, machine vision algorithm capable of detect-
ing individual objects is applied to every frame separately and then individual
detections are linked into trajectories. The second approach has no detection
algorithm and instead relies on temporal changes in the video stream to detect
objects. With the development of efficient real-time object detection algorithms
in recent years, the detection-based approach has become dominant in the lit-
erature. However, independent analysis of video frames results in inevitable loss
of information conveyed by temporal changes in the video. This information
may be relevant to object detection and could help improve the tracker perfor-
mance. Various approaches were suggested to combine these individual frame
and temporal features.
For example, in [12] a novel approach to combine temporal and spatial fea-
tures was proposed by adding recurrent temporal component to a convolutional
neural network (CNN) designed to detect objects in a single frame. The outputs
of object detection network in sequential frames were fed into recurrent neural
network (RNN). The resulting architecture is then trained to predict the refined
tracking locations.
In [1] a tracker using prior information about possible person pose dynamics
is proposed. This information is modelled as a hierarchical Gaussian process
latent variable model, and allows to impose some temporal coherency in detected
articulations.
In [17] a method leveraging optical flow for a pose tracking is proposed. The
velocities obtained from flow data are used to generate expected coordinates of
a pose in next frame. Predicted coordinates are used later to form tracks by
greedy matching.
Our research is based on OpenPose as a body pose detector, proposed in
[3]. It is a real-time solution capable to detect a 2D pose of multiple people
in an image. It uses a non-parametric representation, which is referred to as
Part Affinity Fields (PAFs), to learn to associate body parts with individuals in
the image. This bottom-up system achieves high accuracy and real-time perfor-
mance, regardless of the number of people in the image.
12 A. Khelvas et al.

3 Definitions
Fist let us deﬁne several frames of reference (FoR) for our research, which are
shown in Fig. 1.

Fig. 1. Frames of references for 2D skeletons parameters calculation

U, V – this frame of reference is associated with virtual or real motionless

camera.
Ucf , Vcf – this frame of reference is associated with frames in video. If camera
is motionless, this FoR will be the same for all frames in video. This would be a
common case of video surveillance systems for security or marketing.
Upf k , Vpf k – this frame of reference is associated with object k, detected for
the frame f .
We will not use index f for video processing of motionless camera viewed
scenes.

4 Method

Our goal is to propose a novel algorithm for robust tracking of multiple person
poses in the video stream by leveraging both temporal and spatial features of the
data. To achieve this, we combine predictions done by a single-frame person/pose
detection algorithm (such as OpenPose and YOLO) with Optical Flow - based
estimations through a Kalman ﬁlter.
The complete algorithm is described below and shown in Fig. 2.
Improved 2D Human Pose Tracking Using Optical Flow Analysis 13

Fig. 2. Full algorithm for 2D skeleton model calculation and ﬁltration

1. Video preliminary processing step produces a set of frames with normalized

brightness/contrast and calculates Gf values.

2. Objects detection step provides a set of bounding boxes for each person,
detected by YOLO or some other object detection algorithm.

3. Pose detection ROI generation step provides a set of input frame regions for
further pose detection.

4. 2D pose estimation and person identification step computes a set of vectors

B f p = {uf p , v f p , uf p , v f p , . . . , uf p , v f p , }, where N = 25 is the number of joints
1 1 2 2 N N
for selected model of human body. (BODY-25 model provided by the OpenPose
solution)

5. Optical Flow calculation step applies an optical ﬂow estimation algorithm to

the input frame, producing pixel velocity vectors for every joint position returned
by the pose detector.
14 A. Khelvas et al.

6. Kalman Filtration step calculates the time series for ﬁltered movement vectors

f p = {ûf p , v̂ f p , ûf p , v̂ f p , . . . , ûf p , v̂ f p , }.
B 1 1 2 2 N N
Let’s discuss each step in detail.
After performing necessary source-speciﬁc video pre-processing, the next step
would be extracting poses from single video frames where possible. We have
selected an OpenPose-based solution as a human pose detector. OpenPose is a
multi-person real-time pose keypoint detection algorithm, initially presented in
[3]. An open-source implementation of OpenPose is used, providing pose key-
points in BODY-25 format. Reduced model example for 18 keypoints is shown
in Fig. 3.

Fig. 3. Keypoints of a human pose model used in OpenPose (CMU-Perceptual-

Computing-Lab, 2017)

However, direct application of OpenPose library to a high-resolution 4K video

stream would not work. Since algorithm memory requirements grow linearly with
input image area and amount of 3 GB for a default resolution of 656 × 656 is
consumed, a distributed video processor system would be needed. Downscaling
of the input image results in a drastic loss of detected pose quality. We solve
this problem by splitting image into a set of overlapping regions and invoking
the detector on these regions in a parallel manner, combining detection results
afterwards. We can substantially boost the algorithm’s efficiency, if a crowd
on the video is sparse, which is often the case for video surveillance systems.
Instead of processing the whole input frame, we employ an object detection
algorithm to detect persons first and then build a set of regions that cover all
persons’ bounding boxes. We used YOLO (You Only Look Once)-based solution,
as it performs fast detection of objects for 4K images [13]. It was observed that
YOLO object detections are also useful for eliminating false positives generated
by OpenPose.
For every frame in the input frame sequence, the algorithm first applies
YOLO-based object detector trained on COCO dataset [10]. The result of YOLO
Improved 2D Human Pose Tracking Using Optical Flow Analysis 15

processing is a set of bounding boxes, with top left and bottom right corners
defined in Ocf , Ucf , Vcf FoR. A set of regions fully covering these rectangles is
generated with resolution matching the selected input resolution of OpenPose
network.
The result of video processing in the detection stage is a list of persons’
bounding boxes for each frame. For each detected object we have the bounding
box coordinates u1 , v1 , u2 , v2 , detection confidence and the OpenPose keypoint
data if a skeleton was successfully matched with YOLO object: vector B f p.
Additionally, we calculate approximate coordinates standard deviation for each
keypoint by integrating over part heatmaps returned by the OpenPose. These
values are later used as input for the Kalman filter as a measurement error
estimate.
To further refine poses extracted from single frames, algorithm uses an optical
flow solution. Optical flow is a technology used in various fields of computer
vision to find displacement of individual pixels between two video frames. It is
usually expressed as a 2D vector field {vf low = (dx, dy)T } for every pixel of the
initial frame In (x, y). Corresponding pixel in the next frame is In+m (x + dx, y +
dy). Many different approaches to optical flow calculation are proposed in the
literature. In our work, we use several open source optical flow implementations
provided by the OpenCV library.
The first one is presented in [7] called Dense Inverse Search. It belongs to the
family of ‘dense optical flow’ algorithms and is notable for low computational
complexity while it preserves good scores in standard optical flow benchmarks.
The another one is called DeepFlow and was presented in paper [15].
The example of used soccer game frame for optical flow visualization calcu-
lated by two different algorithms is presented in Fig. 4.

Fig. 4. The example of used soccer game frame for optical ﬂow visualization calculated
by two diﬀerent algorithms

For Optical Flow visualization we use the HSV model. Hue represents the
motion vector angle and saturation encodes the motion vector length.
16 A. Khelvas et al.

The result of Dense-Inverse-Search algorithm for Optical Flow calculation is

presented in Fig. 5.

Fig. 5. Example of optical ﬂow, calculated by Dense-Inverse-Search algorithm

The result of DeepFlow algorithm for Optical Flow calculation is presented

in Fig. 6.

Fig. 6. Example of optical ﬂow, calculated by DeepFlow algorithm

By applying a selected algorithm to every video frame in the input stream

and by taking pixel velocity estimations at keypoints generated by OpenPose,
we achieve a new velocity measurement.
We also need to build trajectories from individual detections in order to
perform matching of detected poses belonging to the same person in different
frames [2,9].
To combine pose keypoint measurements generated by OpenPose and corre-
sponding pixel velocities estimated through optical flow, we use Kalman filter.
Improved 2D Human Pose Tracking Using Optical Flow Analysis 17

Kalman Filter was first proposed in 1960 [5] and, as a matter of fact, became
the industry standard in the tasks related to fusion of measures performed by
sensors of different types. Its application requires specification of a motion model
for modeled object. There are several common motion models used in case the
model of real motion is difficult or impossible to formalize [8]. Popular choices
for a 2D case include a constant Cartesian velocity model and polar velocity
non-linear model employing extended Kalman filter [2]. Alternative and more
complex models can be implemented when 3D pose information is available, but
their assessment lies beyond the scope of this work.
In our experiments we used the constant Cartesian velocity model applied
independently to joint coordinates in 2D video frame FoR. In this instance,
the state vector consists of 4 components representing estimated position and
velocity of pose keypoint: s(t) = (ût , v̂t , uˆ˙t , vˆ˙t )T . The state in the moment t + 1
may be linked to the state values in the moment t with the following equation:
⎛ ⎞⎛ ⎞ ⎛ ⎞
1 0 δt 0 ût 0.5δt2 0
⎜0 1 0 δt⎟ ⎜ v̂t ⎟ ⎜ 0 0.5δt2 ⎟ νx (t)
s(t + 1) = ⎜ ⎟⎜ ⎟ ⎜
⎝0 0 1 0 ⎠ ⎝uˆ˙t ⎠ + ⎝ δt
⎟ (1)
0 ⎠ νy (t)
00 0 1 vˆ˙t 0 δt

and the process noise covariance matrix:

⎛ ⎞
00 0 0
⎜0 0 0 0 ⎟
⎜ ⎟ (2)
⎝0 0 āu 0 ⎠
0 0 0 āv

For every experiment conﬁguration, two independent trajectory estimation

passes were performed - with and without optical ﬂow velocity measurements.

5 Results

To evaluate performance of proposed solution for diﬀerent applications, we pre-

pared a set of video fragments.
They are:

1. a fragment of 4k soccer video broadcast

2. a video from indoor surveillance system in a supermarket
3. a video from outdoor surveillance system

The soccer match video had an additional pre-processing step - the fan stands
were cut.
Figure 7 presents skeletons for soccer and supermarket cases.
Examples of filtered trajectories are presented in Fig. 8, 9.
Figure 8(a) presents selected and filtered U coordinate trajectories of an
ankle of walking person on outdoor CCTV camera. Periodic increased difference
Another Random Document on
Scribd Without Any Related Topics
oppression of the people, resistance to lawful power, and a refuge
from justice for the wrongdoer. This was entirely incompatible with
the great reforms insisted upon by Edward I., and passed into law by
parliament; law and order became the rule and not the exception,
and the position of the castle grew anomalous.

SKIPTON CASTLE,
YORKSHIRE.
With the ascendancy of an efficient administration of justice came
the desire for comfort and a display of luxury, and probably no one
who has become acquainted with the internal disposition of an early
castle will qualify the assertion that the acme of discomfort and
inconvenience must have prevailed within them.
Consequent upon this alteration in the economic conditions of the
nation, the need for the impregnable stronghold of the past ages
ceased to exist, and in many parts of England, but more especially in
the south and east, the existing structures were largely altered or
added to in order to afford conditions suitable to the changed
amenities of social life. These alterations in nearly every case were
made at the sacrifice of efficiency, and many castles which had
played a notable part in the history of the nation became merely the
residences of their lords, who made no attempt to put them to their
original uses in time of war. Arundel, the great midland castles of
Warwick, Kenilworth, and many others, fall under this category.
So far as gunpowder is concerned the part which it played in causing
the abandonment of the feudal castle is strangely varied and
dependent upon local circumstances. A well-found castle with an
efficient and adequate garrison, supported by an army in active
operation in the field, had no more to fear from an attack in the
fifteenth century than it had in the thirteenth, perhaps not so much.
Very few bombards of the period mentioned could throw stone shot
weighing over 150 lbs., whereas the medieval trebuchet could hurl a
missile of twice that weight, or even more, and to almost as great a
distance. The effect of low-trajectory cannon upon castle walls in the
fifteenth century under ordinary conditions may almost be left out of
consideration, so small was the calibre. It is true that Sir Ralph Grey,
when besieged in Bamborough Castle in 1464, was forced to
surrender in a short space of time by the army of the Kingmaker,
who used his basilisks, aspiks, serpentines, dragons, syrens, and
sakers with excellent effect; but we may justly claim that this was an
exception, the configuration of the ground enabling Warwick to place
his pieces close up to the walls, while Grey could look for no effective
relief from a sympathetic army outside. Ten years afterwards the
Castle of Harlech, under the able governance of Davydd ap Ifan, held
out against all the force that Edward IV. could bring to bear upon it,
and was the last of the castles garrisoned by Lancastrians to render
up its keys.
But perhaps the greatest argument against the belief that the
"venomous saltpetre" was the chief cause of the decline in
castellation is that of the gallant resistance made by many of these
old strongholds in the Great Civil War. At that time the newest of the
castles was, perhaps, about two hundred years old and had not been
constructed entirely for defence; the older structures were in many
cases devoid of woodwork which had perished through age and
neglect. Yet these ancient buildings, now once more called upon to
play their part in deadly strife, in many cases showed a resistance to
attack which was simply marvellous, sometimes, as in the case of
Pembroke, defying the ordnance brought to bear upon them. If a
Royalist army of respectable proportions happened to be in the
vicinity of a beleaguered fortress, the Parliamentarians appeared to
regard its reduction as an impossibility, and in the first place devoted
their entire attention to the dispersal of the field force. It is true that
the condition of the unmetalled trackways, which were dignified by
the name of roads, at that time, presented almost insuperable
obstacles to the passage of heavy ordnance, and the advance of a
cumbrous baggage train was at times an impossibility.
But even if cannon of respectable proportions could be brought
against a castle in the Great Civil War, the effects produced were in
many cases out of all proportion to the enormous trouble involved.
Thus at the first siege of Pontefract Castle in 1644 a cannon
throwing a 42-lb. shot was used in conjunction with another of 36
lbs. and two of 24 lbs., the least being 9 lbs., and yet the siege failed
chiefly by reason of the small effect produced by the 1400 projectiles
which were fired into it. Again although Scarborough Castle was
quite ruinous in 1644 when its siege commenced, and in addition
was ill-supplied with ammunition or food, yet it gallantly sustained a
siege lasting for twelve months.
It may therefore be conceded from the foregoing that the assertion
respecting gunpowder causing the disuse of the castle in the British
Isles must be taken with a large degree of reservation, since many
other causes have to be considered, and even those who maintain
the assertion must admit that the reason assigned took an
unconscionably long time in effecting its object.
IGHTHAM MOTE, KENT.
In the very few castles which saw their origin during the fourteenth
and fifteenth centuries in Britain, domestic comforts and attempts at
effective defensive works appear to have run side by side, often to
the almost total exclusion of the latter. The substitution of brick for
stone masonry in many of these was in itself a startling change, but
when combined with this, large and lofty apartments were
introduced, many with magnificent carved and moulded wooden
ceilings, windows of large dimensions filled with beautiful tracery
characteristic of Perpendicular architecture, walls hung with rich
tapestry and decorated with gorgeous heraldic devices and trophies
of arms, costly furniture and other fittings betokening an advanced
education in domestic requirements,—the feeling was borne in upon
the minds of the nation that the feudal castle, as such, had seen its
day, and that the age of the baronial residence and the manorial
dwelling-house had superseded it.
In these later castellated residences the kitchens, larders, cellars,
dining halls, residential rooms and general offices became matters of
supreme moment, the defensive works of secondary importance, but
designed nevertheless with a view to impressiveness and an
assumption of strength which they rarely possessed. Within these
lordly halls the noble owners held high revel, while troops of
servitors, henchmen, and servants of every degree swarmed in the
passages and halls in marked contradistinction to the old time grim
men-at-arms, bearded archers, and steel-clad retainers of the feudal
fortress.
There was naturally a period of transition during which the
characteristics of the Castle predominated over the domestic
influences, and those which sprang into existence during the reigns
of Henry IV. and V. very ably show this feature. To this intermediate
period we may ascribe those structures which were chiefly reared by
the spoils acquired upon the Continent by soldiers of fortune who
"followed the wars," and returning to their native land built palatial
residences for themselves, out of their lawful, or it may be, ill-
acquired, gains. Many of these were based upon designs which the
adventurers had seen abroad, thus our first example, Bodiam, is a
replica of many castles which were to be found at the time of its
erection in Gascony. Bodiam Castle is one of the finest in Sussex, and
certainly one of the most picturesque in England; it is situated upon
the Rother, which here forms the boundary between Sussex and
Kent. The building owes its origin to Sir Edward Dalyngrugge, who
had served in France and Spain under the Black Prince with singular
credit to himself and marked advantage to his worldly estate. A
portion of this superfluous wealth was expended in erecting Bodiam
Castle, which, while affording every comfort as a residence,
possessed most of the essential qualities for effective defence.
It presents a singularly beautiful and romantic spectacle at the
present time, the towers and enceinte being entire, while a wealth of
foliage and the wide waters of the surrounding moat afford a coup
d'œil seldom equalled and probably not excelled in England. The
licence to crenellate dates from 1386; the building was erected in the
middle of a lake connected with the river, thus forming a broad and
deep moat. A causeway, defended by an ingenious system of bridges
and small gateways, leads across the latter, and terminates in a small
barbican, now partly dismantled; the entrance is between two tall
square towers which present beautiful examples of machicolation
upon their summits. Upon the opposite, or south face, is the postern
leading to the moat and defended by a massive square tower, being
one of nine in all surrounding the enclosure. The interior is now
simply an empty shell, all the domestic buildings having been
destroyed by Sir William Waller in 1643, after the siege of Arundel,
although the Chapel and the chief apartments are capable of being
located. We have therefore simply the outer walls remaining of a
particularly fine castle of the Perpendicular period.
The entrance consists of a vaulted passage with many openings for
the discharge of missiles upon assailants while they were
endeavouring to overcome the three portcullises and the massive
wooden gate defending it. In addition to ordinary loopholes there are
round holes for the discharge of harquebuses. The castle underwent
a siege by the Earl of Surrey in the reign of Richard III. in
consequence of a descendant of Sir Thomas Lewkenor, into whose
hands it had passed, proving obnoxious to the King.
Shirburn Castle is also of the same type and very similar to Bodiam;
it dates from the year 1377 and was erected by Warine de Lisle who
had gained wealth and distinction under Edward III. It stands in the
Chiltern Hills near Stokenchurch and is a large square pile
surrounded by a broad moat.
WRESSLE CASTLE, YORKSHIRE.
Wressle Castle, Yorkshire.—The Castle of Wressle lies to the south-
east of York, near the junction of the Derwent with the Ouse, the
navigation of which it was probably designed to protect. Sir Thomas
Percy, the brother of the first Earl of Northumberland, is reputed to
have been the founder. It fell to the Crown, and Henry IV. granted it
to his son John, Earl of Bedford, and after his demise to Sir Thomas
Percy, son of Henry, the second Earl of Northumberland. The Percies
seem to have maintained their Court in the Castle with a
magnificence befitting their illustrious race, and during their
occupation the Castle saw the most glorious portion of its history.
In 1642 and 1648 it was garrisoned by the Parliamentarians and
shortly afterwards was ordered to be dismantled. Three sides of the
quadrangle were thrown down, leaving only the south façade. It was
in the possession of the Seymour family from 1682 to 1750, when it
again passed into the hands of descendants of the Percy family, and
now is owned by Lord Leconfield.
The building originally possessed five towers, one at each corner and
another over the entrance on the south side, which still remains,
together with the curtain wall and flanking towers. These present a
very imposing appearance, but the general effect of the ruins
suggests the castellated mansion of the Perpendicular period more
than the grim sternness of a medieval castle. The square corner
towers appear singularly inadequate for an effective flanking fire,
and no doubt the building relied for defence chiefly upon the broad
moat which encompassed it upon three sides and the deep dry ditch
defending the approach.
Hever undoubtedly owes its fame partly to its magnificent
gatehouse, which forms by far the most impressive part of the
structure, and partly to the rich store of human interest imparted by
its intimate connection with the ill-fated Anne Boleyn. It was built in
the reign of Edward III. by Sir William de Hever, whose daughter
brought it to her husband, Lord Cobham. In the time of Henry VI.,
Sir Geoffrey Boleyn, Lord Mayor of London, an opulent mercer,
purchased it, and added greatly to the existing buildings, the work
being subsequently finished by his grandson, Sir Thomas, the father
of Anne.
HEVER CASTLE, KENT.
The latter was born in 1501, and brought up at Hever under a
French governess. After she attracted the notice of the King, her
father was created Viscount Rochford, and Earl of Wiltshire and
Ormond, while Anne was made Marchioness of Pembroke. It was in
the garden at Hever that Henry first saw her, and subsequently his
wooing of that unfortunate queen occurred there. After the execution
of Anne and her brother, the castle went to the Crown and was
settled on Anne of Cleves. In 1557 Sir Edward Waldegrave purchased
it, and it passed to Sir William Humfreys and subsequently to Sir T.
Waldo, whose descendant is the present owner.
The Castle is surrounded by a double moat, fed by the river Eden; it
is a small castellated house of the fifteenth century, the chief feature
being the superb entrance, battlemented and machicoulied, and
containing three portcullis grooves in the main passage. The
buildings completing the rectangle are chiefly of the Elizabethan
period, but have been very extensively restored by the present
owner.
Maxstoke is one of the very few castles which have come down to us
without the expression "dismantled by order of Parliament" being
applied to it. It affords us an idea of the beauty the face of England
would present, so far as magnificent castles are concerned, if the
forces of destruction and revolution had never been let loose upon
our fair isle. It dates from 1346, when William de Clynton, Earl of
Huntingdon, obtained licence to crenellate. The Duke of Buckingham
owned and occupied it in 1444; he was killed at Northampton in
1460, and his son Humphrey, Earl of Stafford, having died of wounds
received at the First Battle of St. Albans in 1455, his grandson Henry
succeeded him but was beheaded without trial at Salisbury in 1483.
Edward Stafford, however, succeeded to the estates in the reign of
Henry VII.; his death by beheading occurred on Tower Hill in 1521.
Maxstoke came to the Crown but was given by Henry VIII. to Sir
William Compton, from whose descendants it was purchased by the
family of Dilke in whose possession it still remains.

MAXSTOKE CASTLE,
WARWICKSHIRE.
The gatehouse is in excellent preservation, the entrance being
flanked by hexagonal towers, while the archway contains the
grooves for the portcullis, and also the old gates themselves, plated
with iron and bearing the arms of the Stafford family. A fine groined
roof is inside the gatehouse, while the battlements have an alur
behind them. The walls of the enceinte and the four towers at the
corners are in good preservation, and show marks of the wooden
buildings formerly erected against them for accommodating the
soldiers. The Chapel and a number of the domestic apartments are
original, dating from the time of Edward III.
Raglan, one of the most imposing ruins in the British Isles, was
erected shortly after 1415 by Sir William ap Thomas, who had
returned rich in honours and also in worldly wealth from many a
stricken field, the last being that of Agincourt. He married the
daughter of Sir David Gam, and commenced the erection of the
magnificent building which combines in such an excellent manner the
characteristics of a mansion and a fortress. If either predominates it
is undoubtedly the warlike portion since, presumably, the builder
could not at once forget his bellicose proclivities. His son was made a
baron by Edward IV. and afterwards Earl of Pembroke, and was
beheaded at Northampton, 1469. The Castle came into the
possession of the Somersets in 1503, the ancestors of the present
Duke of Beaufort. The fifth earl carried out extensive work upon the
pile, but shortly afterwards the demolition of the Castle was ordered
by the parliament. Probably the most striking feature of the Castle is
the detached Keep lying to the left of the main entrance, and called
the Yellow Tower. It is surrounded by a wide and deep moat, and
was undoubtedly a formidable obstacle before being slighted. It
underwent a vigorous siege in 1646, when Sir Thomas Fairfax
assailed it with a large force. The garrison ran short of ammunition,
and, the north wall being breached, a capitulation ensued.
Herstmonceaux Castle.—One of the finest examples of the later
castles is Herstmonceaux, in Sussex, dating from the year 1440. It
has been described as "the most perfect example of the mansion of
a feudal lord in the south of England," and, when visited by Walpole
in 1752, was in a perfect state of preservation; Grose, writing a few
decades later, gives a vivid description of all the principal
apartments, which seem to have suffered but little at that time. Now,
however, when there is some rumour prevailing of an intended
restoration, the building is in ruins,—roofless, ivy-grown, and in
many parts dismantled by the falling-in of roofs and floors. It is built
of the small bricks then in use, two inches or less in thickness; they
were brought to England from Belgium, strange to say the art of
brick-making having apparently been lost since the departure of the
Romans. Belgian workmen were also brought over to erect it.
Sir Roger Fiennes, an Agincourt veteran, was the founder, and
probably the site had borne a previous fortalice. Like Bodiam,
erected some half-century previously, the plan is quadrilateral,
almost square, with four octagonal towers at the corners and three
of pentagonal plan strengthening the curtain walls. The gateway is
one of the finest and most impressive in existence; the towers which
flank it rise over 80 feet in height, cylindrical at the upper parts and
superposed upon 50 feet of octagonal bases, with smaller turrets
rising still higher above them. A magnificent range of machicoulis
with crenellation above protects the towers and the curtain between,
the merlons being pierced with oillets. A moat, long since dry,
encircles the building, a bridge spanning it at the principal entrance.
There are three tiers of cross loopholes, and below occur openings
for matchlocks to defend the bridge. With the exception of the grand
towers of the south gateway and the shells of some adjoining
buildings, there are only broken arches and shattered walls, piers,
and buttresses now to be seen, and it is only by the description left
by Grose and Walpole that the ichnography of the interior can be
traced. Wyatt the architect is responsible for the vandalism
committed, as he dismantled the Castle to furnish material for the
owner's new residence adjacent.
HERSTMONCEAUX CASTLE, SUSSEX.
Although Herstmonceaux has never undergone any struggles in the
"fell arbitrament of war," yet painful memories cling to the ruins.
Thomas Fiennes, the ninth Lord Dacre, succeeded to the estate at
the age of seventeen. The youth had already laid the foundation of a
brilliant career at Court when an escapade, planned by himself and
some madcap companions, whereby they essayed to play the rôle of
poachers upon a neighbouring estate, led to the death of a keeper
whom they encountered. His three companions were arrested and
hanged for murder near Deptford; Dacre was also tried and
condemned, and the sentence was duly executed at Tyburn in 1541,
the young man being twenty-five years old at the time.
Tattershall Castle, on the Witham in Lincolnshire, is contemporary
with Herstmonceaux, and constructed likewise of Flemish brick
bonded with exquisite workmanship. The tower still standing
contains four stories with a total altitude of 112 feet; large Gothic-
headed windows occur filled with Perpendicular tracery, and these
windows are repeated on a smaller scale in the four octagonal
towers which clamp the angles of the building. Massive timber balks
once supported the various floors, and a number of carved chimney-
pieces are to be found. The walls are about 14 feet thick at the base,
and many passages and apartments have been made in their
thickness. The well in the base is covered by a massive arched crypt,
upon which the Castle has been erected. But perhaps the most
notable feature in this beautiful relic of the past is the grand and
markedly-perfect system of machicolation combined with the
bretasche, which is exemplified in the cornice surmounting the tops
of the curtain walls. Upon massive stone corbels is built a substantial
stone wall pierced with square apertures for an all-round fire with
various arms; in the floor of the alur are the openings for dropping
missiles upon assailants at the base of the walls; above this again
are the merlons and embrasures giving upon the battlement walk.
The Castle was erected by Ralph, Lord Cromwell, treasurer to King
Henry V., whose vast wealth sought for an opening in which to
display itself, and probably could not have done so more effectively
than in the rearing of a magnificent pile of buildings of which but a
small portion, the tower described, now remains. In its later years it
suffered a partial dismantling during the Commonwealth period,
followed by a rifling in the eighteenth century similar to that which
overtook the sister castle of Herstmonceaux.
After the middle of the fifteenth century castles were no longer built,
and we have to look to the fortified manor-house such as was
designed by the Lord Cromwell above mentioned at Wingfield,
Derbyshire, or that at Exburgh in Norfolk; these when surrounded by
moats were capable of being placed in a good state of defence, and
many a thrilling tale is told of the sieges they underwent during the
Civil War when the stout resistance they made was nearly or quite
equal to the defence of the massive ramparts and cyclopean bastions
of the earlier castle-builder.
PENSHURST PLACE. KENT.
Penshurst Place.—This was originally an embattled mansion of the
fourteenth century, and gradually expanded by constant additions
into an excellent example of a combined castle and a manorial
dwelling-house. The licence to crenellate is dated the fifteenth year
of Edward III., and stands in the name of Sir John de Pulteneye. This
opulent knight erected a stately mansion in the form of an irregular
square as to plan. It reverted to the Crown in the reign of Henry VI.
and was held by the Duke of Bedford, Regent for a time, and then by
his brother, Humphrey, Duke of Gloucester. The Staffords held it
afterwards, but at the decease of the Duke of Buckingham Edward
VI. gave it to Ralph Fane and then to Sir William Sydney, one of the
heroes of Flodden Field. Its associations with Sir Philip Sydney form
one of its chief claims upon the public. The spacious Hall measures
60 feet in length by the same in height; it is 40 feet wide, and is a
grand example of fourteenth-century architecture. The beautiful
windows reach from the floor to a considerable height, the roof is
open, there is a minstrels' gallery, and an elaborate arrangement for
the fire in the middle of the Hall. Adjacent is a range of buildings
much altered in the Elizabethan period, containing state rooms, the
Queen's drawing-room, etc. Portions of the wall of enceinte are to be
found upon the south and east.
Ightham Mote.—This building is undoubtedly one of the most perfect
examples of the combination of domestic convenience with an
efficient system of defence to be found in England. It stands about
two miles from Ightham village in Kent in a deep hollow, through
which runs a rivulet flowing into the moat surrounding the House,
from which the latter takes its name. Ivo de Haut possessed the
Mote in the reign of Henry II.; it reverted to the Crown for a time in
the reign of Richard III., but was restored to the family, and
subsequently passed through the hands of many owners.
The House appears to be of three distinct periods, Edward II., Henry
VII., and Elizabeth. The Hall is of the first period; it has a slender
stone arch to carry the roof and contains many ancient features;
some of the original shingles, for example, are still in existence,
though a modern roof covers them. Other objects are a Chapel,
original, and the Gateway Tower with the gateway itself and the
doors.
There are many examples in England of the simple manorial hall of
purely domestic type whose owners found it expedient, at some
critical period, to fortify in some manner, and these additions are of
the greatest interest to the antiquarian. Perhaps the best example to
be found is that of Stokesay, near Ludlow, which is a unique
specimen of a small mansion of the thirteenth century subsequently
fortified. The licence is dated 1291, and a stone wall is mentioned;
only a few yards remain of this.
A wide ditch surrounds the area, and a high tower, similar to two
towers joined together, affords the required defence. It is embattled,
the merlons being pierced, while the embrasures have the ancient
shutters still depending. It dates from the end of the thirteenth
century. The Hall stands adjacent and vies with that at Winchester in
being the most perfect example of a thirteenth-century hall
remaining to us. It is about 50 feet long by 30 wide and over 30 feet
in height. The windows are in the E.E. style, and the corbels carrying
the roof are of the same period. The lord's apartment overlooked the
Hall. It has been occupied by the de Says, the Verduns, and ten
generations of the Ludlows, the first of whom built the crenellated
parts. The prompt surrender of the Cavalier garrison to the
Parliamentarian army is no doubt responsible for the fact that no
destruction of the House occurred at that critical time.
The examples given of the Castellated Mansion and fortified Manor-
House are necessarily meagre in number, and many more, such as
Broughton Castle in Oxfordshire, Sudley in Gloucestershire, Wingfield
Manor, Derbyshire; Hilton, Durham; Hampton Court, Hereford;
Whitton, Durham, etc., call for remark if the exigencies of space
permitted.
CHAPTER X
THE CASTLES OF SCOTLAND

Prehistoric and other Earthworks.—The numerous remains of

strongholds and defensive works of a prehistoric character readily fall
as a rule under one of the divisions used in describing the English
examples. They are usually of a circular or oval formation, and where
irregular the shape has been determined by the site.
The Hill-forts, known as Vitrified Forts, are, however, not represented
in England, and, although found in a few places upon the Continent,
appear to have been chiefly developed in Scotland. By some means,
not definitely determined as yet, the walls of these strongholds have
been subjected to intense heat, whereby the stones have become
plastic, and amalgamated when cool into one coherent mass. It is
unnecessary to dilate upon the obvious advantages which a
homogeneous defence of this nature would possess. These forts
chiefly lie in a broad band between the Moray Firth and Argyle and
Wigtown, and are generally constructed of igneous rocks; when
provided with a suitable flux of alkali in the form of wood-ashes or
seaweed a comparatively moderate heat would be sufficient to cause
fusion. The walls of Vitrified Forts are of about half the thickness of
unvitrified, and appear to belong to the Late Celtic Age.
Brochs are also peculiar to Scotland. They are massive, tower-like
buildings, chiefly occurring in the northern counties and upon the
islands; they are remarkably similar in outline and construction, and
they have been ascribed chronologically to the period immediately
before or after the Roman occupation of Britain, and as being
essentially Celtic. The Broch of Mousa is generally believed to be the
most perfect example extant; it is in Shetland, and consists of a wall
15 feet thick enclosing a court 20 feet in diameter. The wall is about
45 feet in height and contains a solitary entrance, narrow and low. In
the thickness of the wall, and approached by three internal openings,
are chambers, while a spiral staircase leads upwards to where
passages constructed in the walls are served by the stairway. Other
Brochs which have been examined appear to possess a similarity of
plan, but some have subsidiary defences in the shape of external
walls, ramparts, and fosses; thus the example at Clickamin, Lerwick,
was surrounded by a stone wall. That found upon Cockburn Law, and
known as Odin's, or Edin's Hold, is of note by reason of the double
rampart of earth surrounding it. It is one of the largest as yet
discovered, the wall being 17 feet thick and the area 56 feet wide.
Probably the many hut circles which surround this Broch are of later
date and were formed from its ruins. The great thickness of the wall
is exceeded, however, by the Broch at Torwoodlee, Selkirkshire, by 6
inches.
With the advent of the historical period firmer ground is reached,
and there are numerous evidences that the Motte and Bailey Castle
was introduced at an early period into Scotland. During the second
half of the eleventh century this was the prevailing type as in
England.
It has been found possible to divide the era of castellation proper in
the northern kingdom into four distinct periods:
First Period, 1100-1300.—The roving spirit and warlike disposition of
the Normans prompted their adventurers to penetrate into the
fastnesses of the North, where the innovations they introduced made
them acceptable in the main to the inhabitants. They taught the
latter how to raise towers of a design based upon the Rectangular
Keep, with thick cemented walls, and many of the great fortresses,
such as Edinburgh, Stirling, and Dumbarton, originated at this time.
The early type of Keep was quadrangular in plan with towers at the
angles, which were sometimes detached from the main building and
placed upon short curtain walls; but some were naturally modified or
specially adapted to the site like those of Home and Loch Doon. The
use of water as a defence was recognised at an early stage; some
towers were placed on islands in lakes, and most of them were
furnished with moats and ditches. At this period castles were seldom
placed upon high promontories. The workmanship was as a rule
poor, rough, and crude, but some exceptions occur like Kildrummie
and Dirleton.
Second Period, 1300-1400.—The years of this century were marked
in Scotland by anarchy, war, and bloodshed, which devastated the
kingdom and placed the arts of peace in complete abeyance, while
poverty was universal. The period was consequently unfavourable for
the erection of Scottish castles upon a large scale, but many scores
of small Keeps sprang into existence. Bruce was antagonistic to the
building of large and roomy castles, arguing that their capture by an
invader would give him a standing in the country which otherwise he
would not possess.
The towers erected were based upon the Norman Keep; they were of
stone throughout, so that their destruction by fire was impossible.
Their walls were so thick and massive that restoration after a siege
was easy. The basement was always vaulted, and was intended for
storage purposes and the herding of cattle in an emergency. As a
general rule it had no interior communication with the upper floors,
but trap-doors are not unknown. The entrance to the building was on
the first storey through a narrow door reached by a ladder; it gave
upon the Hall, the chief apartment, where all dined in common, and
the household slept, a subsidiary half floor being constructed above
for this purpose.
BARTIZAN.
The second floor was the private apartment of the chieftain and his
family, and was also provided with a wooden gallery for sleeping
purposes. The roof was a pointed arch resting solidly upon the walls
and covered with stone slabs. At the angles of the building bartizans
were usually built, although rounded corners like those at Neidpath
and Drum sometimes occur. In the massive walls spiral staircases,
small rooms, cupboards, and other conveniences were arranged.
Round the Tower a wall was generally erected, within which the
stables, offices, and kitchens were built. In the wall of the Tower
itself, and sometimes below the level of the ground, the universal
"pit" or prison was built, ventilated by a shaft carried upwards in the
thickness of the wall. At times the battlements were provided with
parapets resting upon corbels but executed in a crude manner.
BOTHWELL CASTLE,
LANARKSHIRE.
The century in question saw numerous castles of this type come into
existence, all based upon the same plan, that of the king differing
only in size from that of the small chieftain. The largest are from 40
to 60 feet square, but the majority are much smaller. These Keeps
formed nuclei for subsequent additions as at Loch Leven, Craigmillar,
Campbell, and Aros, and many of them served as ordinary residences
down to the seventeenth century, long after the tide of war had
passed.
Third Period, 1400-1550.—With the coming of peace and a period of
commercial and industrial prosperity, the nobles of Scotland were
able to observe the progress of castellation around them in England
and France, and began to adopt the styles which they found in those
countries. A type of castle appeared based like that of Bodium upon
a French ideal,—the building of a high embattled wall strengthened
with towers around a quadrangular space. This plan, derived from
the Concentric ideal, was adopted for the largest castles, such as
Stirling, which is the most perfect example of a courtyard plan, and
Tantallon.
In the smaller castles the Hall is placed in the centre with the
kitchen, pantry, and buttery adjoining it, and the lord's solar and
private apartments at the daïs end. The wine-vaults and cellars are
built beneath, while the bedrooms occur above. In contrast to the
English buildings of the period, the question of defence was the
dominating idea in spite of the altered conditions of better living and
increased luxury. Many plain and simple Keeps were also built during
this period.
Fourth Period, after 1550.—The development of artillery led to
alterations being made in castellation, while the progress of the
Reformation gradually introduced the fortified mansion and Manor-
House. Many small Keeps, or Peel Towers, were built, however,
chiefly on the Border. Ornamentation up to this period had been
conspicuously absent, but now it assumed a very high importance.
Corbelling became almost a mania,—floors, windows, parapets,
chimneys, and other details projecting to an excessive distance in
order to enhance the effect. The bartizans were covered with high
conical roofs, and turrets similarly ornamented became a prominent
style. The accommodation in the upper floors was greatly increased
when compared with the basement, through the excess of corbelling.
Gables were furnished with crow-steps, while machicolation became
at times almost fantastic. Gargoyles shaped like cannon in stone are
a marked feature of the period.
Bothwell Castle, Lanarkshire (1st Period)
Bothwell Castle is generally termed the grandest ruin of a thirteenth-
century castle in Scotland. It belonged in the thirteenth century to
the Murray family; was captured by Edward I. and given to Aymer de
Valence, Earl of Pembroke. The English had possession until the year
1337 when, after capturing it, the Scots dismantled it. From the
Douglas family it passed by marriage to the Earls of Home. It is
placed upon a rocky promontory above the Clyde, and consists of an
oblong courtyard with high curtain walls and strengthening towers,
round or square, while a large circular donjon lies at the west end.
The latter bestrides the enceinte and is separated from the bailey by
a moat; it is of noble proportions, 60 feet in diameter and 90 feet
high, with walls 15 feet thick. The Tower forcibly suggests that at
Coucy in many particulars. The Hall and various other apartments
occupy the eastern portion of the Bailey.
Neidpath Castle (2nd Period)
Neidpath Castle is situated upon elevated land overlooking a winding
of the Tweed. It was built upon the L plan, probably in the
fourteenth century, being a main central tower of the Keep type with
a square projection of considerable size attached to one side. The
walls are 11 feet in thickness and the original door was on the
basement floor facing the river, a departure from the general rule. A
spiral stair gave access to the upper storeys. The Tower was
originally of enormous strength, being really two immense vaults
superposed upon each other, but other, wooden, floors have been
inserted between. The parapet and corners are rounded similar to
those at Drum Castle. It was greatly altered and added to in the
seventeenth century. No particular history attaches to the building,
which belonged to the Hays of Yester for centuries; it has only
undergone one siege, that by Cromwell, when it surrendered after a
short defence.
NEIDPATH CASTLE, PEEBLES.
Edinburgh Castle (3rd Period)
The site of Edinburgh Castle has undoubtedly been occupied by
some description of fortress from the most remote antiquity. The
Romans occupied it and subsequently Malcolm Canmore fortified it as
an aid towards keeping the English out of Scotland. In 1291 Edward
I. besieged and took it in fifteen days; he recaptured it again in
1294. In 1313 it fell into the hands of Bruce by a daring escalade,
and was stripped of its defences. Edward III. rebuilt it, and placed a
strong garrison there, but the Scots took it four years later. David II.
refortified it and rendered it so strong that neither Richard II. nor
Henry IV. had any success in their attempts to take it. Since that
period it has undergone a number of sieges.
It is built upon the courtyard plan, and is one of the survivors of the
four chief fortresses in the country, the others being Stirling,
Roxburgh, and Berwick.
The moat at the entrance is now dry and filled up, and the Gateway
there is modern. The Argyle Tower (sometimes called the St. David's
Tower) is a portion of the old castle, as are also the ruins of the
Wellhouse Tower, while St. Margaret's Chapel is the oldest building
and also the oldest church in Scotland, containing Early Norman
work and probably also Saxon. The general aspect of the Castle
suffers much from a picturesque point of view by the addition of the
great demi-lune battery and ranges of modern buildings.
Stirling Castle (3rd Period)
The commanding rock upon which Stirling Castle is placed was
originally an old hill fort, but in the twelfth century was one of the
four chief castles. Thus in 1304 it held out for three months against
Edward I. and a powerful army. So important was it considered that
Edward II. attempted to relieve it, and thus led to Bannockburn.
Baliol occupied it, and King David only captured it after a long and
obstinate siege. At the Stuart period it became a Royal Castle and
the favourite residence of the Scottish kings. The present walls are
undoubtedly raised upon the old foundations, but, so far as antiquity
is concerned, the oldest part of the Castle remaining is the
Parliament Hall opening from the Inner Ward which is of late
Perpendicular architecture. The Palace is of the Renaissance, and
dates from 1594.
EDINBURGH CASTLE, FROM THE TERRACE OF
HERIOT'S HOSPITAL.
Dunnottar Castle, Kincardineshire (3rd Period)
One mile south of Stonehaven stands Dunnottar Castle, upon a flat
platform of rock with the North Sea washing three of the precipitous
sides. A small isthmus, not much above the level of the sea,
connects it to the mainland.
The oldest parts of the Castle date from c. 1382. The entrance is at
the base of the rock upon the land side, where an outwork of
remarkable strength is placed. After ascending a steep incline a
tunnel 26 feet long is reached, also defended, and a second similar
defence occurs beyond, thus the approach was of an extremely
formidable character.
The Keep stands at the south-west corner, and is of the L shape four
stories in height, and built early in the fifteenth century. The stables
and domestic buildings are of a later date, and arranged round part
of an irregular courtyard. The Castle, although credited with being
one of the most impregnable in Scotland, and to which the Scottish
regalia was entrusted for safe keeping during the Commonwealth,
was captured by Sir William Wallace in 1297, whose troops scaled
the precipices and put the English garrison of 4000 men to the
sword. In 1336 Edward III. refortified it, but the Scots took it as
soon as he had left the kingdom. General Lambert blockaded the
Castle in 1652, and eventually captured it.
Tantallon Castle (3rd Period)
Tantallon Castle is of the courtyard type, similar to Caerlaverock and
Doune, and was erected about the end of the fourteenth century.
Situated upon a rocky precipitous site, with three sides washed by
the North Sea, it was only imperative to construct defences upon the
fourth or west side. A deep ditch cut in the rock, curtain walls 12 feet
thick and 50 feet high, battlemented, with a level court in front,
beyond which was another deep ditch,—these were the defences
deemed all-sufficient to baffle intruders. The Keep also acted as a
flanking defence to the curtain walls, and contained the only
entrance, which passed completely through it. Many traces exist of
the work carried out in the early part of the sixteenth century in the
endeavour to make it impregnable to artillery. The buildings now
occupy only two sides of the interior quadrangle, the rest having
been dismantled.
DUNNOTTAR CASTLE, KINCARDINESHIRE.
In the rich history of the Castle we find that in 1528 James V.
invested it with 20,000 men and a formidable battering train, the
structure itself being supplied with large artillery. The siege lasted
twenty days and proved unavailing, the great thickness of the walls
resisting the efforts of the gunners. It underwent another siege in
1639 when the Earl of Angus made a stand in it against the
Covenanters. General Monk invested it and found after two days that
his mortars had no effect; he then tried heavy siege guns which
breached the wall, but the garrisons retreated into the central tower
where they were safe, and were allowed to capitulate upon good
terms. The fortress fell into ruin in the beginning of the eighteenth
century.
CHAPTER XI
THE SIEGE AND DEFENCE OF A MEDIEVAL CASTLE

A work upon castellation would undoubtedly be incomplete if it

omitted to deal with the interesting subject of the means by which
the medieval knight defended his castle, and of the methods he
employed for attacking his neighbour's, or an enemy's town, whether
in a private feud or legitimate warfare.
Through the almost universal habit of perusing medieval romances
the general public has formed a mental picture of the hero and his
followers riding round a castle and summoning it to surrender, or
challenging the garrison to emerge from their retreat and essay
mortal combat in the open. As the engineer and captain of the
sappers and miners, the director of the artillery, the designer of
movable towers, and the general head of the various artifices
calculated to bring the besieged to their senses, the hero is less well
known.
The coup de main method of attack has probably been the same in
most ages, and undoubtedly was the chief means resorted to by
primitive man. His missile weapons during the Stone, Bronze, and
Early Iron Ages were of no use against earth ramparts crowned by
thick palisading; sling, stones, arrows, and spears were only
efficacious against the bodies of his enemies, and hand-to-hand
combat was therefore a necessity. Hence we may imagine a
concentration against a presumably weak point, a sudden rush, the
plunge into the dry ditch and a rapid scramble up the scarp towards
the palisades under a shower of arrows, stones, and other missiles;
the mad escalade of the defences surmounting the earthwork and
the fierce resistance of the defenders, followed by a successful entry
or a disastrous repulse and retreat.
Precisely the same course was pursued in the medieval period when
a rapid bridging of the moat by planks and beams would be
attempted, scaling ladders would be reared, and, protected by their
shields from the rain of missiles, the assailants, covered by their
archers' fire of arrows and bolts upon the ramparts, would mount
their ladders and attempt to effect a lodgment upon the walls. And,
although weapons and conditions have changed, the assault to-day
is made upon the self-same methods.
If, instead of the coup de main, a sustained siege is decided upon
the knight will order his "gyns" to be brought up to the front, and
large and heavy ones to be built upon the spot. From the time when
Uzziah "made in Jerusalem engines, invented by cunning men, to be
on the towers and upon the bulwarks, to shoot arrows and great
stones withal," [1] down to the invention of cannon, the ingenuity of
man has been exercised in devising machines for hurling missiles to
a distance.
The Greeks, Romans, and other nations of antiquity brought them to
perfection, and marvellous results were obtained in ancient sieges;
the vivid account by Plutarch of the great engines used at the attack
upon Syracuse, B.C. 214-212, reads almost like romance. Caesar
frequently mentions this artillery, and especially the portable balistae
for throwing arrows and casting stones; they were fitted with axles
and wheels and manœuvred like batteries of cannon at the present
day. Larger engines were constructed as required like those of the
medieval period.
[1] 2 Chron. xxvi. 15.
TANTALLON CASTLE, HADDINGTONSHIRE.
The ancient engines were distinct from those of a later age in
depending for their efficacy upon the forces of tension and torsion
as compared with that of counterpoise in the middle ages. The art of
preparing the sinews of animals so as to preserve their elastic
powers was known to the ancients, and great bundles so treated
were utilised in different ways in the various engines. Experiments
on sinews, ropes of hair, and other materials at the present day have
proved that loss of elasticity soon occurs, whereas we learn that
such was not the fact in classical times with their special method of
preparation. By fixing an endless skein in a suitable frame, stretching
it tightly and then twisting the skein in the centre by means of a
beam of wood, the necessary torsion was obtained; if a missile were
placed upon the beam when drawn back and the beam released, the
projectile would be hurled to a distance proportionate to the velocity
of the arm and the weight of the missile.
The principle may readily be gleaned from the accompanying
diagram which exemplifies the two vertical skeins used in a portable
balista for throwing arrows; by being fixed in a suitable frame an
action like that of the bow could be obtained. By using immense
coils of twisted sinew the nations of antiquity, and especially the
Greeks, threw stones weighing 50 lbs. or more to a distance of from
400 to 500 yards, and as a general rule with marvellous accuracy,
while lighter missiles are stated to have been hurled to between 700
and 800 yards. These engines received the general name of
"catapults," although the Greeks generally referred to them under
the term "tormentum," in reference to the twisted sinews, thongs,
and hair, of which the skeins were made. Broadly speaking, catapults
shot darts, arrows, and the falarica,—a long iron-headed pole;
balistas projected stones or similar missiles, though the names are
often interchanged by the chroniclers. Some time after the fall of the
Roman empire the secret of preparing the sinews was lost.

DIAGRAM ILLUSTRATING THE

PRINCIPLE OF CONSTRUCTION IN
CLASSICAL ENGINES.
The Trebuchet.—Another force was called into play for medieval
artillery. This was the counterpoise, or gravitation, and the principle
upon which all large engines or "gyns" were constructed during the
middle ages. A long wooden arm was pivoted in a framework so that
a short and a long portion projected upon either side; to the shorter
part a great weight in a swinging cradle was fixed which necessarily
raised the longer arm to the vertical position. If the latter were
drawn backwards and downwards the great weight was accordingly
raised, and upon release the long arm would sweep upwards in a
curve and project any missile attached to it. By fixing a sling of
suitable length to the arm the efficiency was immensely increased
(see Title-page). Such was the principle of the "trebuchet," the
enormous engines which carried devastation and destruction to
medieval castles. The French are said to have introduced these in
the twelfth century, and by the end of the thirteenth they were the
most formidable siege engines of the time.

STIRLING CASTLE, STIRLINGSHIRE.

The transition period in England between the classical weapons and
the trebuchet was the twelfth century and the early part of the
thirteenth. The veterans from the crusades undoubtedly introduced
the torsion and tension engines, but found that the home-made
article could not compete in efficiency with the Oriental examples
and therefore the advent of the trebuchet was welcomed. Roughly
speaking, the original balista or catapults depending upon torsion,
and throwing shafts rather than balls, were not so frequently in use
as those engines which depended upon tension and threw heavy
stones. In the early part of the thirteenth century the balista catapult
came into vogue once more; it was of the cross-bow type, and at
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.