0% found this document useful (0 votes)
24 views

Computer Vision for Population Density and Evacuation

This study presents a machine vision-based method for estimating crowd density and simulating evacuations to mitigate stampede risks in crowded public spaces. The method integrates a pedestrian detection model using YOLOv3 for real-time crowd density heatmaps, a pedestrian positioning algorithm, and a cellular automata evacuation model (CAEM) for simulation. Experimental results show high accuracy in pedestrian detection and positioning, with evacuation simulations closely matching results from existing software, indicating its potential for enhancing safety in emergencies.

Uploaded by

Emil Benny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Computer Vision for Population Density and Evacuation

This study presents a machine vision-based method for estimating crowd density and simulating evacuations to mitigate stampede risks in crowded public spaces. The method integrates a pedestrian detection model using YOLOv3 for real-time crowd density heatmaps, a pedestrian positioning algorithm, and a cellular automata evacuation model (CAEM) for simulation. Experimental results show high accuracy in pedestrian detection and positioning, with evacuation simulations closely matching results from existing software, indicating its potential for enhancing safety in emergencies.

Uploaded by

Emil Benny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Safety Science 167 (2023) 106285

Contents lists available at ScienceDirect

Safety Science
journal homepage: www.elsevier.com/locate/safety

A machine vision-based method for crowd density estimation and


evacuation simulation
Shijie Huang , Jingwei Ji *, Yu Wang , Wenju Li , Yuechuan Zheng
Jiangsu Key Laboratory of Fire Safety in Urban Underground Space (China University of Mining and Technology), No1, Daxue Road, Xuzou, Jiangsu 221116, China

A R T I C L E I N F O A B S T R A C T

Keywords: In this study, we propose a machine vision-based method for crowd density estimation and evacuation simulation
Machine vision to help reduce the occurrence of stampedes in crowded public places. The method consists of pedestrian
Pedestrian positioning detection model, pedestrian positioning algorithm, and cellular automata evacuation model (CAEM). In the
Cellular automata
pedestrian detection model, an adaptive 2D Gaussian kernel was used to generate crowd density heatmaps for
Evacuation simulation
Crowd density estimation
crowd density estimation. The model was found to have achieved a real-time detection accuracy of 96.5% for
real-time images of evacuation scenes after being trained on a dataset of 66,808 pedestrian images using the
YOLOv3. The pedestrian positioning algorithm was developed to determine the coordinates of pedestrians in
real-world scenarios. Experiments show that the average positioning errors in the x and y directions are 0.177 m
and 0.176 m, respectively. The pedestrian coordinates were then input into the Python-based cellular automata
evacuation model (CAEM) for evaluation simulation, which is compared with the simulation results by Path­
finder, another piece of software that can deliver agent-based evacuation simulation. The difference in evacu­
ation time calculated by the two models was found to be less than a second, which indicates a high degree of
consistency.

1. Introduction into three categories, namely detection-based methods, regression-


based methods, and convolutional neural network-based (CNN-based)
Due to economic growth and urbanization, there has been a methods (Jiang et al., 2021; Sindagi and Patel, 2018; Yu et al., 2021).
considerable increase in the number of public places, such as subway As for detection-based methods, the computer scans and identifies
station, stadium, school building and large office building. In such the pedestrians in a crowd image one by one by sliding window de­
public places usually with high crowd densities, there is often a high risk tectors on the image, thereby obtaining crowd density data based on the
of crowd stampede, especially in emergencies like fire, which in most number of pedestrians detected. Such methods require first manually
cases can cause massive injuries, fatalities and property losses. To take extracting the shape features (Sabzmeydani and Mori, 2007) and texture
the 2022 Seoul stampede as an example, at least 153 people were killed features (Rabaud and Belongie, 2006) of the crowd, and then training
and 82 others were injured after a large crowd pushed forward on a the pedestrian detector based on the features extracted. Multiple ma­
narrow street at Halloween festivities. In fact, such casualties can be chine learning methods such as support vector machines (Hearst et al.,
reduced and even avoided if crowd density can be effectively monitored 1998) and Boosting algorithm (Viola et al., 2005) are often adopted to
and estimated. Therefore, in recent years, how to use cameras for crowd improve detection performance.
density estimation and how to conduct evacuation simulation of crowds In a Regression-based method, the computer directly learns the
have become the main concerns of researchers. mapping relationship between the crowd images input and the total
In recent years, thanks to the development of computer vision (Pulli number of people in the crowds in two steps, including extracting low-
et al., 2012; Shi et al., 2016; Voulodimos et al., 2018) and machine level features of the crowd (such as edge features and gradient fea­
learning algorithms (Clancy et al., 2007; Jordan and Mitchell, 2015; tures) and establishing a regression model (Wang, 2020). First, the
Liakos et al., 2018), research on crowd density estimation has been computer extracts features from the input images, and then it learns the
boosted. Commonly used methods in previous research can be divided features using different regression methods. Commonly used regression

* Corresponding author.
E-mail address: [email protected] (J. Ji).

https://doi.org/10.1016/j.ssci.2023.106285
Received 6 April 2023; Received in revised form 1 June 2023; Accepted 30 July 2023
Available online 4 August 2023
0925-7535/© 2023 Elsevier Ltd. All rights reserved.
S. Huang et al. Safety Science 167 (2023) 106285

Fig. 1. Makeup of the method.

methods include Bayesian algorithms (Chan and Vasconcelos, 2012) and To complement existing research, this study proposes a machine
neural network regression algorithms (Marana et al., 1998). vision-based method for crowd density estimation and evacuation
In CNN-based methods, the computer builds a crowd detection simulation. Integrating a pedestrian detection model, a pedestrian
model by learning the various features of crowd images or the nonlinear positioning algorithm, and a cellular automata evacuation model
relationship between crowd images and density maps through a CNN. (CAEM), the method we developed can produce real-time crowd density
After detecting pedestrians in the image, a Gaussian kernel is used to heatmaps, perform evacuation simulations based on the real-world po­
smooth the two-dimensional coordinates marking the pedestrian posi­ sitions of pedestrians.
tions so as to generate crowd density maps (Lempitsky and Zisserman,
2010). Zhang et al. (2015) proposed Crowd-CNN, a deep convolutional 2. General introduction to the method
neural network, in which the computer is trained to predict crowd
densities and crowd quantities. By changing data input, the model can The method proposed in this paper consists of two components:
find better local optimal solutions. Sindagi and Patel (2017) proposed a crowd density estimation and evacuation simulation. For the component
context semantic extraction network based on a pyramid network (CP- of crowd density estimation, the YOLOv3 object detection algorithm is
CNN), which is able to produce high-quality crowd density maps and exploited to detect pedestrians in real-time camera video streams, and
accurate crowd quantity data by integrating global information and then markers are placed on the detected pedestrians. A Gaussian kernel
local context information of crowd features. Xu et al. (2019) developed is used to smooth the pedestrian markers so as to generate crowd density
an algorithm called Digcrowd based on the YOLO algorithm. This al­ heatmaps. Simultaneously calculate the crowd density in the scene,
gorithm uses depth information to segment images into far-field and referring to Rahmalan et al. (2006) scheme to categorize the level of
near-field regions and detects pedestrians in each region separately. It crowding. When the level of crowding reaches high, issue an alert and
proved to be able to produce high-quality crowd density maps through initiate the evacuation simulation component. With regard to the
calculation. component of evacuation simulation, a pedestrian positioning algorithm
When it comes to evacuation simulation, the main method adopted is employed to convert pedestrian positions in images to real-world
in existing research is to establish evacuation simulation models. positions. The position information is then input into the CAEM for
Scholars have proposed dozens of evacuation models (Duives et al., real-time evacuation simulation. The CAEM can calculate the optimal
2013; Muramatsu and Nagatani, 2000; Zheng et al., 2009). Generally evacuation path, evacuation time for each pedestrian and evacuation
speaking, personnel evacuation dynamics models can be divided into flow rates at different exits of the evacuation site, and it can also produce
two main categories: continuous models and discrete models. Contin­ real-time crowd density heatmaps while simulating evacuation. Fig. 1 is
uous models typically define the behavior of evacuees through func­ a graphical description of the makeup of our method.
tions. Common continuous models include social force models (Helbing This method can play a positive role in safety assessment and plan­
et al., 2000; Helbing and Molnár, 1995; Hou et al., 2014), fluid dynamics ning, emergency management, facility optimization, and traffic man­
models (Helbing, 1998; Henderson, 1971), etc. Discrete models feature agement. For example, by obtaining real-time pedestrian location data,
the temporal or spatial discretization of evacuation sites, which allows it is possible to assess the safety of existing buildings, public spaces, or
simulated evacuees to move autonomously. Representative discrete evacuation routes. This can help planners, architects, or safety experts
models include Agent-based models (Chen et al., 2015; Vainstein et al., identify potential risk areas and bottlenecks, propose improvement
2014) and Cellular Automata Evacuation Models (CAEM) (Fu et al., measures, and optimize evacuation plans to ensure the safe evacuation
2015; Kirchner and Schadschneider, 2002; Li et al., 2019; Varas et al., of individuals in emergency situations. In the event of a crisis, obtaining
2007). real-time pedestrian location can assist emergency management
However, existing research is insufficient in certain aspects. For one personnel in better understanding the distribution and movement trends
thing, crowd density estimation algorithms developed in previous of people, enabling them to make timely decisions and take appropriate
studies only enable real-time crowd density estimation and usually does evacuation measures, thereby improving the efficiency and accuracy of
not have the functionality of evacuation simulation. For another, most emergency response. Additionally, analyzing pedestrian location data
traditional evacuation simulation models are only suitable for stampede can provide information on people flow, crowding levels, etc., which can
risk assessment of a building or simulation of incidents, and cannot assist administrators of commercial centers, transportation hubs, sports
perform real-time evacuation simulations based on the distribution of venues, and other locations in optimizing facility layouts, formulating
pedestrians during emergencies. Therefore, it is supposed that a com­ more effective traffic management strategies, and enhancing the effi­
bination of the two types of model would hopefully help reduce the ciency and quality of service.
probability of stampede and resulting losses.

2
S. Huang et al. Safety Science 167 (2023) 106285

Fig. 2. Structure of the YOLOv3 feature extraction network.

3. The component of crowd density estimation


Table 1
Anchor box size for each downsampling factor.
In this study, the YOLOv3 algorithm is employed to detect pedes­
trians in real-time images of evacuation scenes. Pedestrian positions are Downsampling factor Size(pixels £ pixels)
then marked on the images, generating crowd density heatmaps. When 32× 116 × 90 156 × 198 373 × 326
the level of crowding reaches high, an alert is issued. By observing the 16× 30 × 61 62 × 45 59 × 119
colors on the heatmap, administrators can identify areas with higher 8× 10 × 13 16 × 30 33 × 23

crowd density and take timely measures to evacuate the crowd.


downsampling factor is shown in the table below (see Table 1).
A pedestrian image is given below to exemplify feature maps ob­
3.1. The YOLOv3 algorithm
tained when different downsampling factors are adopted. The red box
represents the target center; the yellow box represents the ground truth;
The YOLOv3 algorithm based on convolutional neural networks is
the blue box represents the anchor box; and the black box represents a
adopted in this study to detect pedestrians in real-time video streams
grid unit in the feature map (see Fig. 3).
(Redmon and Farhadi, 2018). This algorithm is an emergent object
Then, these feature maps will be passed to the logical layer, where
detection algorithm in recent years, which is highly suitable for
the sizes of three bounding boxes, the rectangular boxes used to describe
embedded devices such as industrial control computers for its advantage
the pedestrian position in the image, will be calculated for each grid unit
of enabling a good balance between detection accuracy and detection
based on the size of corresponding anchor boxes. The formula is shown
speed.
below:
The process of detecting pedestrians using this algorithm is as fol­

lows: first, the RTSP video stream from the camera is input into the pre- ⎪ bx = σ((tx )) + cx


trained YOLOv3 model, and then the model inputs the video stream by = σ ty + cy
(1)
frame by frame into the feature extraction network of that the features of ⎪
⎪ bw = pw etw

pedestrians can be extracted (see Fig. 2). bh = ph eth
The feature extraction network is composed of multiple convolu­
tional blocks and residual blocks. Each convolutional block consists of a where Pw, Ph represent the width and height of the anchor box,
convolutional layer, a batch normalization layer and a Leaky ReLU respectively; σ represents the sigmoid function; cx and cy indicate the
layer, and it is used to extract such pedestrian features as texture, color, coordinate of the upper left corner of the grid unit, tx, ty represent the
shape and body proportion. Each residual block consists of two con­ center offset of the bounding box, tw, th represent the scaling ratios of the
volutional blocks and an add layer, and residual blocks are often used to width and height, and bx, by, bw, bh represent the position and size of the
solve such problems as gradient disappearance or gradient explosion in bounding box in the feature map. By using this formula, the position of
deep neural networks. The network also includes a upsampling layer and the bounding box in the feature map can be calculated. The total number
a tensor concatenation layer. These different layers, when combined, of bounding boxes is (13 × 13 + 26 × 26 + 52 × 52) × 3.
can perform downsampling on video frames by a factor of 32, 16, or 8, Finally, to reduce the number of bounding boxes, non-maximum
thereby generating three feature maps. suppression will be performed on the remaining bounding boxes using
The three feature maps contain 13 × 13, 26 × 26, and 52 × 52 grids, the YOLOv3 algorithm in two steps. First, bounding boxes with object
respectively, each grid containing 3 anchor boxes of different sizes, scores greater than 0.6 and pedestrian class confidence scores greater
which are obtained by clustering the ground truth in the dataset and are than 0.5 are selected. Then, the overlaps among these bounding boxes
used for the rough estimation of pedestrian positions. Each anchor box are compared, and the bounding boxes with the smallest overlap with
contains four coordinates, one object score, and one pedestrian class their adjacent bounding boxes are retained to avoid detecting the same
confidence score. The object score indicates the confidence level pedestrian again. In this way, the detection of pedestrians in the video
whether there is an object in the anchor box, and the pedestrian class stream is achieved.
confidence score indicates the confidence score whether a detected
object is a pedestrian. The anchor box size corresponding to each

3
S. Huang et al. Safety Science 167 (2023) 106285

Fig. 3. Feature maps for different downsampling factors.

3.2. Crowd density heatmap However, due to the perspective effect, setting the diffusion parameter
of the Gaussian kernel as the variance would cause an issue where the
After completing pedestrian detection, the central coordinates of the size of a region in pixel coordinates becomes smaller as the distance from
bounding box for each pedestrian are used as pedestrian markers. In the the camera increases. As a result, the crowd density calculated using
2D image, we use × to represent the pixel in the image. Assuming that pixel coordinates would be higher. To address this problem, this study
there is a pedestrian marker at pixel xi, then there is a function δ(x-xi) for adopts a method proposed by Zhang et al. (2016), which is widely used
the marker. When x = xi, δ(x-xi) = 1, otherwise δ(x-xi) = 0. If there are M and has been proven to effectively mitigate the impact of the perspective
pedestrian markers in the image, the crowd density function of the effect. Specifically, we calculate the distance between each pedestrian
image can be expressed by the following formula. marker xi and its k nearest neighbors and compute the average distance


M di = 1k kj=1 dij . Based on the average distance, we can infer that the pixels
H(x) = δ(x − xi ) (2)
i=1
associated with pedestrian xi correspond approximately to a circular
region with a radius proportional to the average distance d‾i. To esti­
To represent it as a continuous function, it is necessary to use an mate the crowd density around xi, H(x) and an adaptive 2D Gaussian
adaptive 2D Gaussian kernel for smoothing. Specifically, convolution kernel with a diffusion coefficient proportional to d‾i are convolved. The
calculation is performed between H(x) and the Gaussian kernel. Then, diffusion coefficient of this Gaussian kernel can be expressed by formula
the crowd density function can be expressed as formula (3), where σ (4).
represents the diffusion parameter of the Gaussian kernel, and the
σ i = βdi (4)
variance of the Gaussian kernel is often used for it.
− ‖x− xi ‖ After multiple experiments, it was found that k = 12 and β = 0.2 are the
F(x) = H(x) × Gσ (x), withGσ (x) = e 2σ 2 (3)
optimal values. The final crowd density function can be expressed as

Fig. 4. Training results of the YOLOv3 pedestrian detection algorithm.

4
S. Huang et al. Safety Science 167 (2023) 106285

Table 2 detection model and another object detection algorithm called Faster R-
Relationship between crowd density and the level of crowding. CNN (Girshick, 2015), which is based on convolutional neural networks.
Level of crowding Crowd density(people/m2) The results are shown in Fig. 5. It displays the actual number of pedes­
trians present in each image, the number of pedestrians detected by
Very Low density < 0.5
Low 0.5 ≤ density < 0.8 YOLOv3, the number of pedestrians detected by Faster R-CNN, and the
Moderate 0.8 ≤ density < 1.27 pedestrian density for each image. The models accurately identify the
High 1.27 ≤ density < 2.0 pedestrians in the images, with YOLOv3 having a maximum error of 5
Very High density ≥ 2.0 and an average error of 1.16, while Faster R-CNN has a maximum error
of 4 and an average error of 0.92.
formula (5), and then the crowd density heatmap can be obtained by To evaluate the model performance more effectively, we use recog­
using the Matplotlib library to map F(x) to colors, where low-risk areas nition accuracy as the evaluation metric, with the following formula.
are mapped to blue, medium-risk areas to green, and high-risk areas to TP
red. accuracy = (6)
P
F(x) = H(x) × Gσi (x) (5) where TP represents the number of correctly recognized pedestrians,
and P represents the actual number of pedestrians.
To assist administrators in quickly identifying the actual locations of
The accuracy of YOLOv3 and Faster R-CNN at each level of crowding
crowded areas, the opacity of the heatmap is set to 0.7 and overlaid onto
is calculated and presented in Table 3. The average accuracy of YOLOv3
the original image. By employing this method, administrators can
and Faster R-CNN is 96.5% and 96.8% respectively. Both models achieve
swiftly pinpoint areas of high crowd density and promptly guide people
high recognition accuracy that is practically usable. Faster R-CNN has
to evacuate.
slightly higher accuracy than YOLOv3, but it takes an average of 1.7 s to
recognize an image, whereas YOLOv3 only requires 0.05 s. There is a
3.3. Experimental test of the crowd density estimation significant difference in recognition speed between the two. Therefore,
considering both accuracy and recognition speed, this study ultimately
To demonstrate the effectiveness of the crowd density estimation selects the YOLOv3 pedestrian detection model.
component, the following experiment was carried out. Firstly, only Finally, two images with very high level of crowding were selected,
images containing humans were extracted from the COCO dataset (Lin and their corresponding crowd density heatmaps were generated using
et al., 2014) to train the pedestrian detection model. A total of 66,808 Formula (5). In the heatmaps, the red areas represent high-risk zones in
images were extracted, of which 64,115 were used as the training set the current image, the green areas represent moderate-risk zones, and
and 2,693 the validation set. Next, the above datasets were input into the blue areas represent low-risk zones. To avoid misleading in­
the YOLOv3 algorithm for training, with a learning rate set to 0.01 and a terpretations in scenes with low average crowd density, the heatmaps
cosine annealing hyperparameter set to 0.2. After 20 rounds of training, also include annotations indicating the current scene’s average crowd
the following results were obtained on the validation set (see Fig. 4): the density and level of crowding (see Fig. 6).
model’s recognition accuracy rate reached over 80%, and the [email protected] We have found that in these images, despite the severe overlap of
value was close to 0.8 ([email protected] refers to the accuracy rate when the pedestrians, the model accurately identifies the pedestrians in the scene
recall rate is 50% on the accuracy rate-recall rate balance curve, which is and assigns them corresponding IDs. For example, in Image (a), there are
a common metric for evaluating model performance). This proved that
the model had achieved a certain degree of usability, so the training was Table 3
stopped and the model with the maximum [email protected] value was selected Recognition accuracy of YOLOv3 and Faster-RCNN.
as the final pedestrian detection model.
Level of crowding Accuracy of YOLOv3 Accuracy of Faster-RCNN
To validate the recognition performance and superiority of the model
in practice, 25 images containing dense crowds were captured for testing Very Low 98.2% 98.2%
Low 97.9% 96.8%
purposes. Based on Rahmalan et al. (2006) scheme, the captured images Moderate 97% 97.8%
were categorized into 5 levels of crowding (see Table 2), with each level High 94.8% 96.1%
consisting of five images. Very High 94.6% 95.4%
The captured images were recognized using the trained pedestrian

Fig. 5. Pedestrian detection results.

5
S. Huang et al. Safety Science 167 (2023) 106285

Fig. 6. Calculation of crowd density heatmaps.

Fig. 7. Mechanism of the pinhole model.

a total of 37 people, and the YOLOv3 model detects 36 people, achieving collisions. Therefore, the method developed in this paper issues an alert
an accuracy rate of 97.2%. In Image (d), there are a total of 62 people, and initiates evacuation simulation when the scene reaches high level of
and the model detects 57 people, achieving an accuracy rate of 91.9%. crowding. In the evacuation simulation component, the first step is to
Furthermore, the crowd density heatmap accurately identifies the high- obtain the position of pedestrians in the three-dimensional world using
risk areas. This validates the effectiveness of the crowd density warning the pedestrian positioning algorithm developed in this study. Then, the
initiate the evacuation simulation component. position information is input into a CAEM for evacuation simulation.

4. The component of evacuation simulation 4.1. Pedestrian positioning algorithm

Based on the previous component of crowd density estimation, we To obtain the positional information of pedestrians, it is necessary to
have observed that when a scene reaches high level of crowding, the convert their positions in the image into positions in the real world. A
gaps between individuals become smaller, increasing the likelihood of common method for this is the camera pinhole model, a mathematical

6
S. Huang et al. Safety Science 167 (2023) 106285

Fig. 8. The calibration process.

model that can achieve bidirectional conversion between coordinates in If the Zhang’s method is used for calibration, the world coordinate
the image and coordinates in the real world. It represents the camera’s system is located in the plane of the calibration board, so the z-axis
imaging process using pixel coordinates, image coordinates, camera coordinate of all points on the calibration board is 0. When the cali­
coordinates, and world coordinates, as shown in Fig. 7. bration board is placed on the ground for calibration, the z-axis co­
The position of the pedestrian in the image is represented by the pixel ordinates of the points at the bottom of a pedestrian’s feet are also 0.
coordinate system, a two-dimensional coordinate system measured in This means that if the determination of pedestrian coordinates is
pixels, in which the origin is set at the upper left corner of the image. The transformed into the determination of the coordinates of points at the
position of the pedestrian in the real world is represented by the world bottom of the pedestrian’s feet, only xw and yw need to be determined.
coordinate system usually measured from a point in the real world. The Also, considering that the CAEM is a two-dimensional model and does
image coordinate system is similar to the pixel coordinate system, except not require the height or z-axis coordinate of the object, in this study we
that the origin is located at the center of the image. The camera coor­ have decided to place the calibration board flat on the ground for cali­
dinate system is a three-dimensional coordinate system with the camera bration and determine the position of the pedestrian using the world
as the origin. Assuming that the coordinate of the point at the foot of the coordinates of the bottom points at their feet.
pedestrian in the world coordinate system is (xw, yw, zw), the corre­ The specific process is as follows: We use the midpoint located on the
sponding point in the pixel coordinate system is (u’, v’). Using the bottom edge of the pedestrian bounding box, as determined by YOLOv3,
pinhole model, these two coordinates can be converted to each other, to represent the pixel coordinate (u’, v’) at the pedestrian’s feet, and set k
⎡ ⎤
and the specific conversion formula is as follows: l1 l2 l3 l4
⎡ ⎤ [R|t] = l5 l6 l7 l8 ⎦. Then, substituting zw = 0 into formula (7),

1 ⎡ ⎤ ⎡ ⎤ l9 l10 l11 l12
⎡ ′ ⎤ ⎢ dx 0 u0 ⎥⎡ ⎤
u ⎢ ⎥ f 0 0 0 [ ] xw xw
formula (8) is obtained.
⎢ ⎥ R t ⎢ y ⎥ ⎢ yw ⎥
⎣ ′ ⎦
zc v = ⎢ 1 ⎣ 0 f 0 0 ⎦ ⎢ w ⎥ = K[R|t]⎣ ⎥
⎢ (7)
⎢0 v0 ⎥
⎥ 0T 1 ⎣ zw ⎦ zw ⎦ ⎧
1 ⎣ dy ⎦ 0 0 1 0 ⎨ l1 xw + l2 yw + l4 = zc u′
1 1
l x + l6 yw + l8 = zc v′ (8)
0 0 1 ⎩ 5 w
l9 xw + l10 yw + l12 = zc
Here, dx and dy represent the width and length of a pixel respectively;
Next, substituting zc = l9xw + l10yw + l12 into formula (8), formula (9)
(u0, v0) represents the coordinates of the origin of the image coordinate
is obtained.
system in the pixel coordinate system; f represents the camera focal
{
length; R represents the camera rotation matrix; t represents the camera (l1 − l9 u′)xw + (l2 − l10 u′)yw = l12 u′ − l4 (9)
translation matrix, [R|t] represents the camera extrinsic matrix; K rep­ (l5 − l9 v′)xw + (l6 − l10 v′)yw = l12 v′ − l8
resents the camera intrinsic matrix; and zc represents the depth of the
Finally, converting formula (9) into matrix form yields formula (10).
pixel point from the camera.
[ ][ ] [ ]
According to Formula (7), to achieve bidirectional conversion be­ l1 − l9 u′ l2 − l10 u′ xw l u′ − l4
= 12 ′ (10)
tween pixel coordinates to world coordinates only requires the three ′
l5 − l9 v l6 − l10 v ′ yw l12 v − l8
parameters K, [R|t], and zc. K and [R|t] can be obtained through camera
calibration using the most commonly used camera calibration method, According to formula (10), after obtaining the pixel coordinates of
the Zhang calibration method (Zhang, 2000), which requires a calibra­ the bottom of the pedestrian’s feet, camera intrinsic matrix k, and
tion board and at least six images taken from different positions. Then, camera extrinsic matrix [R|t], the world coordinates of the pedestrian
the camera’s intrinsic and extrinsic matrices can be obtained by pro­ can be obtained. As Fig. 8 shows, the black checkerboard is the cali­
cessing the images. However, zc cannot be obtained through a single bration board; the world coordinates of the pedestrian are (xw, yw, 0);
camera. To solve this problem, this article proposes a pedestrian posi­ and the corresponding pixel coordinates are (u’, v’).
tioning algorithm.

7
S. Huang et al. Safety Science 167 (2023) 106285

with higher potential to another with lower potential.


Local moving potential is determined by the distance between the
grid cell and the building exit. We define the local moving potential of a
grid cell adjacent to a building exit as 1. The local moving potential of
grid cells adjacent to these cells increases incrementally until all cells in
the evacuation area are assigned corresponding potential values. The
potential of all obstacles is infinite (inf).
Taking a building with two 3 m wide evacuation exits and a 2 m × 2
m obstacle on a 10 m × 10 m area as an example, the local moving po­
tential of exit A and exit B are shown in Fig. 10. The cell marked with a
Fig. 9. Architectural space visualization diagram. red circle, M, has a local moving potential of 4 and 6 in relation to exit A
and exit B, respectively.
4.2. Cellular evacuation simulation model
Global moving potential refers to the moving potential of a grid when
there are multiple exits at an evacuation site, and the value of it is the
The Cellular Automata Evacuation Model is a mathematical model
minimum value of the local moving potential of the all grid cells. An
used to simulate the evacuation of large crowds in response to disasters
example is given, as shown in Fig. 11. The local moving potential of grid
or emergencies. In a Cellular Automata Evacuation Model, an evacua­
cell M with respect to exit A and exit B is 4 and 6, respectively, so the
tion area is divided into continuous cells of a grid, with each cell rep­
global moving potential of this grid should be 4. If a pedestrian is in grid
resenting a portion of the area. Individuals in the crowd are represented
cell M, he or she will choose exit A for evacuation.
as intelligent agents or particles moving in the grid, with their move­
To establish pedestrian movement laws, a circle with a diameter of
ments determined by rules and conditions specified in the model. These
40 cm is used to represent each pedestrian, and the maximum allowed
rules may include the mobility of the agents, their perception of the
crowd density per square meter is set to 5 people. Firstly, based on the
environment, and their interaction with other agents. The behavior of
pedestrian IDs from YOLOv3, the speed and direction of each pedestrian
the agents is updated in discrete time steps, allowing the model to
for the next movement are calculated. The pedestrian’s movement speed
simulate the evacuation process over time.
is determined using an empirical formula derived from Xie et al. (2016)
This study developed an improved Cellular Automaton Evacuation
evacuation simulation experiments with university students:
Model (CAEM) based on this principle. The innovation of CAEM lies in
the improvement of pedestrian movement laws, allowing multiple pe­
destrians to occupy the same grid, while traditional cellular automaton
evacuation models only accommodate one pedestrian per grid. This
approach introduces more diversified movement directions for pedes­
trians compared to the traditional eight-way cellular automaton, making
it more realistic.
The CAEM developed in this study achieves the core algorithm logic
using Python and uses the PYQT5 graphics library for graphical and data
visualization. Its main functions include analyzing evacuation based on
the real-time locations of people, calculating the shortest evacuation
path, evacuation time, exit flow rate, and producing crowd density
heatmaps in real time during simulation. The model consists of three
parts: architectural space visualization, pedestrian moving potential
calculation, and establishment of pedestrian movement laws.
Architectural space visualization is the conversion of three-
dimensional architectural space into a two-dimensional image. This
requires collecting data on the geometry of the architectural space, the
location and width of the evacuation exits, and the position and size of
obstacles. After collecting all the features, an architectural space is
divided into multiple 1 m × 1 m cells of a grid, with green cells repre­
senting obstacles, green lines representing exits, and white cells repre­
senting areas that pedestrians can walk through, as shown in Fig. 9.
Pedestrian moving potential calculation: Pedestrians have two kinds
of potential in the moving field, local moving potential and global
Fig. 11. Distribution of global moving potential at a multi-exit evacuation site.
moving potential. When pedestrians move, they move from a grid cell

Fig. 10. Distribution of local moving potential energy for different exits.

8
S. Huang et al. Safety Science 167 (2023) 106285

Fig. 12. Schematic diagram of collision detection algorithm.


{
0.033 × D(i)2 − 0.636 × D(i) + 3.362(D(i) > 4) Finally, the above steps are repeated until all pedestrians have suc­
v(i) = (11)
1.4(D(i)⩽4) cessfully evacuated, as shown in the specific solution process flowchart
below (see Fig. 13). This pedestrian rule effectively ensures that pe­
where v(i) represents the walking speed of pedestrian i measured in m/s, destrians do not overlap during the movement process and allows for
and D(i) represents the crowd density of the grid cell where pedestrian i more diverse movement directions, aligning with reality. Additionally,
is located measured in people/m2. this approach demonstrates high computational efficiency. Experi­
Next, the calculation of the next movement direction begins. During mental tests have shown that in a scenario with dimensions of 30 m × 30
this process, each pedestrian i selects a movement direction from the m, two 3 m-wide exits, and two 2 m × 2 m obstacles, calculating the
eight possible directions: up, down, left, right, upper-left, lower-left, optimal evacuation path for 1000 people only requires 1.7 s.
upper-right, and lower-right, based on the global moving potential.
Then, a collision detection algorithm is applied to check if moving in 4.3. Evacuation simulation
that direction would result in overlap with the other i-1 pedestrians. If
overlap occurs, the position is adjusted to avoid collisions. Fig. 12 il­ To check the effectiveness of the evacuation simulation component, a
lustrates this process . contrastive experiment was performed between our evaluation simula­
Taking pedestrian i as an example, with its center (xi, yi) located in grid tion component and Pathfinder, one piece of the most widely used
3 and pedestrian i-1 with its center (xi-1, yi-1) in grid 2, the radius of the evacuation simulation software based on agent technology developed by
pedestrian is denoted as r. If pedestrian i moves with a speed of v(i) to­ Thunderhead Engineering in the United States. We chose a corridor as
wards the upper-right direction, it will move to grid 2 with a new center at the experiment setting, which is 5 m × 5 m in size and has two 2 m wide
(xold, yold). In this case the distance between (xold, yold) and (xi-1, yi-1) is less exits. The experimental instruments included a 1080p camera and a 60
than 2r, This indicates that if pedestrian i moves in that direction, it would cm × 45 cm calibration board.
overlap with pedestrian i-1. To avoid this situation, the center of pedes­ First, 10 images of the calibration board were taken from different
trian i must lie on the circumference of a circle with (xi-1, yi-1) as the center directions using the camera and the then were processed using the
and a radius of 2r, and it must also lie on the circumference of a circle with calibration toolbox model of Matlab to obtain 10 datasets on the
(xi, yi) as the center and a radius of v(i). Therefore, to satisfy these two intrinsic matrices and extrinsic matrices of the camera. The calibration
conditions, the intersection points (xp1, yp1) and (xp2, yp2) of these two process was evaluated by calculating the value of reprojection error (the
circles are calculated, and the next movement position of pedestrian i is reprojection error refers to the distance between the projected point and
adjusted to the nearest intersection. The formula for solving the inter­ the actual detected corner point of each corner point on the calibration
section points is given by (Pedoe, 2013). board using the calibration parameters, and it is generally acknowl­
⎧ edged that the calibration is favorable when the average error is less



⎪ a h than 0.2 pixels). The results showed that the average error of our cali­


⎪ x =x + ×(xi− 1 − xi )− ×(yi− 1 − yi )
⎪ p1 i d d bration was 0.10 pixel, which indicates that the calibration was effective




⎪ a h (see Fig. 14).


⎨ yp1 =yi + ×(yi− 1 − yi )+ ×(xi− 1 − xi )
d d Next, the moment the last image of the calibration board was shot,
⎪ we fixed the camera in its position. Then a world coordinate system was
⎪ a h


⎪ x =x + ×(xi− 1 − xi )+ ×(yi− 1 − yi ) established by taking the upper left corner of the calibration board as the
⎪ p2 i d d



⎪ origin, with the x-axis and y-axis both parallel to the calibration board
⎪ a h


⎪ yp2 =yi + ×(yi− 1 − yi )− ×(xi− 1 − xi ) plane, as shown in Fig. 15. The camera’s intrinsic matrix k and extrinsic
⎪ d d
⎩ matrix [R|t] in this case are as follows.
( √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅)
v(i)2 − 4r2 +d2
with d = (xi− 1 − xi )2 +(yi− 1 − yi )2 , a= ,h= v(i)2 − a2
2d
(12)

9
S. Huang et al. Safety Science 167 (2023) 106285

Fig. 13. Algorithm for the determination of the shortest evacuation path.

⎡ ⎤
1596.98 0 901.61 Comparison between the actual measured coordinates with the calcu­
k=⎣ 0 1681.60 709.43 ⎦, [R|t] lated coordinates showed that the maximum error of the pedestrian
0 0 1 positioning algorithm has been controlled within 0.3 m, with an average
⎡ ⎤
0.8803 0.0397 0.0533 − 954.62 error of 0.12 m in the x-axis direction and 0.18 m in the y-axis direction
⎢ 0.1405
⎢ 0.3974 − 0.9412 307.92 ⎥⎥ (see Table 4). Since the unit used by the CAEM model is meter, the world
=⎣
− 0.0637 0.0514 0.3012 3909.37 ⎦ coordinate accuracy obtained using our pedestrian positioning algo­
0 0 0 1 rithm meets the requirements of CAEM.
Finally, an evacuation simulation was performed by inputting the
Then, the RTSP video stream from the camera was inputted into the
world coordinates calculated above into the CAEM and Pathfinder,
trained pedestrian detection model for real-time detection of pedes­
respectively. The initial distribution of the “pedestrians” is shown in
trians. During the forward movement of four pedestrians, a snapshot was
Fig. 17. In the CAEM, the blue dashed line represents the wall; the green
taken for experimentation. The recognition results are shown in Fig. 16,
line represents the exit; and the purple circle with a diameter of 0.4 m
where the left image is the original photo and the right image is the
represents a pedestrian. In Pathfinder, the green line represents the exit,
model’s output image. The model accurately identifies each person and
and the pedestrian is represented by a purple circle with a shoulder
assigns them corresponding IDs.
width of 40 cm and a moving speed of 1.3 m/s (see Fig. 18).
Taking the midpoint of the bottom edge of the pedestrian bounding
The experimental results showed that the total evacuation time for
box as the pixel coordinate of the pedestrian, we calculated the corre­
the CAEM was 4.1 s, while that of Pathfinder was 4.3 s. It can be seen by
sponding world coordinate of the pedestrian using formula (10).

10
S. Huang et al. Safety Science 167 (2023) 106285

Fig. 14. Calibration error.

Fig. 15. The world coordinate system.

comparison that there is not much difference in evacuation time. We Pathfinder. In addition, the calculated evacuation path satisfies the
infer that the fact that the evacuation time of the CAEM is slightly “shortest path” principle. These results indicate that the component of
shorter can be attributed to Xie’s speed formula, by which a slightly evacuation simulation is truly feasible.
larger speed was calculated than that of Pathfinder when the crowd
density was relatively small. To further verify that the pedestrian’s 5. Experimental verification
choice of exit satisfies the “shortest path” principle, we drew the each
pedestrian’s evacuation path, and it can be seen that each simulated Considering that the experiments above have demonstrated that the
“pedestrian” had chosen the nearest exit, indicating an agreement with method proposed in this study is suitable for scenarios of low crowd
the “shortest path” principle (see Fig. 19). density, we conducted a reduced-scale experiment to verify the appli­
In summary, the pedestrian positioning algorithm proposed in this cability of the method for scenarios with high crowd-density. The rea­
article meet the accuracy requirements of the CAEM, and the perfor­ sons for such a reduced-scale experiment are twofold, because more
mance of the CAEM in the evacuation simulation is similar to that of detailed measurements and analysis can be made in a reduced-scale

11
S. Huang et al. Safety Science 167 (2023) 106285

Fig. 16. Results of pedestrian detection and numbering.

experiment, and a reduced-scale experiment is safer and more efficient. shown below, where the green line represents the exit; the green rect­
The experiment setting was made by scaling down an authentic exper­ angle represents the obstacle; and the blue line represents the wall (see
iment setting by a ratio of 17:1. The experimental instruments included Figs. 20 and 21).
a 1080p camera, a 10 cm × 10 cm calibration board, and 20 miniature 2. Camera calibration
pedestrians 1/17 the size of an authentic human figure. The process of Nineteen images of the calibration board were taken from different
the experiment are as follows: angles and then processed using the calibration toolbox in Matlab to
1. Construction of the simulated evacuation site obtain the intrinsic matrices and extrinsic matrices of the camera. The
First, the size of the reduced-scale evacuation site was found to be camera calibration errors are shown in the figure below. The maximum
23.5 cm × 23.5 cm through measurement, where there were an 5.8 cm × error of 0.08 pixels and the average error of 0.07 pixels indicate the
5.8 cm obstacle and an 5.8 cm wide exit. Then, we modeled in the CAEM calibration was acceptable (see Fig. 22).
an evacuation site 17 times the size of the reduced-scale evacuation site. We maintained the ultimate position of the camera and established a
The map of local movement potential and global movement potential is world coordinate system in which the upper left corner of the calibration
board was the origin and the x-axis and y-axis both parallel to the plane

Table 4
Errors between actual coordinates and calculated coordinates.
ID Pixel Actual World Computed Error (m)
Coordinates Coordinates Coordinates

0 (727,710) (0.53,-1.27,0) (0.65,-1.00,0) (0.12,0.27)


1 (485,772) (-0.075,-0.67,0) (-0.051,-0.39,0) (0.024,0.28)
2 (1302,724) (2.32,-1.37,0) (2.18,-1.47,0) (-0.14,-
0.10)
3 (984,896) (1.12,-0.067,0) (1.30,-0.017,0) (0.18,-0.05)

Fig. 17. Initial distribution of pedestrians in the CAEM.

Fig. 19. Each pedestrian’s evacuation path.

Fig. 18. Initial distribution of pedestrians in Pathfinder. Fig. 20. Evacuation site.

12
S. Huang et al. Safety Science 167 (2023) 106285

⎡ ⎤
⎡ ⎤ 0.9985 0.0035 0.0551 − 91.37
1908.45 0 998.18 ⎢ 0.0424 0.5770 − 0.8055 41.63 ⎥
k =⎣ 0 2475 497.07 ⎦,[R|t]⎢ ⎥
⎣ − 0.0354 0.1066 0.5900 422.96 ⎦
0 0 1
0 0 0 1

3. Crowd density estimation


The camera’s video stream was input into the pedestrian detection
model, and the detection results are shown in Fig. 24. The model ac­
curacy was found to be 100%, which means each pedestrian has been
correctly detected and assigned a corresponding ID.
The center coordinates of the pedestrian bounding boxes were used
Fig. 21. Distribution of global moving potential. as pedestrian markers, and the crowd density was estimated via formula
(5). In the crowd density heatmap produced (see Fig. 25), the areas with
of the calibration board, as shown in Fig. 23. At this moment, the high crowd density at the evacuation site are clearly marked and
intrinsic matrix k and the extrinsic matrix [R|t] of the camera are as correspond to the real-world setting. This indicates that the crowd
follows. density estimation component of our method is effective.

Fig. 22. Calibration errors.

Fig. 23. The world coordinate system.

13
S. Huang et al. Safety Science 167 (2023) 106285

Fig. 24. Image comparison.

Fig. 25. Crowd density heatmap of the experimental setting.

maximum positioning error is 0.4 m; the average error is 0.176 m; and


the variance is 0.042. The maximum error did not exceed 0.45 m. Since
the unit used in the CAEM is meters, we consider the results of the
positioning algorithm meet the accuracy requirements.
5. Evacuation simulation
The pedestrian positions calculated in the previous section were
imported into the CAEM and Pathfinder respectively for evacuation
simulation. As shown by Fig. 27, both in Pathfinder and the CAEM,
pedestrians are represented by purple circles with a diameter of 40 cm.
The evacuation time for Pathfinder is 16.8 s, and that for the CAEM is
16.0 s, the difference being 0.8 s. As shown in Fig. 28, the distance be­
tween pedestrians in the CAEM is significantly smaller than in Path­
finder at 6 s, 9 s, and 12 s, moreover, the CAEM encountered severe
congestion between 6 s and 12 s. We speculate that this can be explained
by the characteristics of the CAEM. The simulated evacuees are uncon­
scious individuals who do not intentionally maintain a safe distance
from others, which could easily results in relatively higher crowd den­
sities. As a result, the evacuation flow rates at the exits increase, thereby
accelerated the evacuation process. To confirm this hypothesis, we
Fig. 26. Positioning errors. calculated the flow rate of pedestrian at exits.
It can be seen that CAEM has 7 s to reach the peak exit traffic (2
4. Pedestrian coordinates calculation people/s), while Pathfinder only has 5 s. This finding validates the hy­
Taking the midpoint on the bottom edge of the pedestrian bounding pothesis of this study that CAEM tends to reach higher crowd densities,
box as the pixel coordinates for the pedestrian’s feet, we converted the resulting in a quicker attainment of peak exit flow rate, which better
pixel coordinates to world coordinates using formula (10), and then aligns with pedestrian behavior in hazardous situations.
magnified the results by a factor of 17 to obtain the pedestrian’s world Furthermore, the study presents the evacuation paths of pedestrians,
coordinates. To determine the positioning error accurately, we calcu­ as shown in Fig. 29. It can be observed that the movement directions of
lated the error for each pedestrian in the × and y directions, as shown in pedestrians in CAEM exhibit a greater diversity compared to the con­
Fig. 26. The maximum error in the × direction is 0.42 m; the average ventional eight-way cellular automaton model, aligning more closely
error is 0.177 m; and the variance is 0.038. In the y direction, The with actual pedestrian movement trajectories. Additionally, in

14
S. Huang et al. Safety Science 167 (2023) 106285

Fig. 27. Evacuation process.

15
S. Huang et al. Safety Science 167 (2023) 106285

Fig. 28. The pedestrian flow rate at exits.

Fig. 29. Evacuation paths of pedestrians.

conjunction with Fig. 27, it can be noted that there are no instances of 6. Conclusion
pedestrian overlap during the evacuation process. These findings
demonstrate the effectiveness of the collision detection algorithm This study applies computer vision technology to the field of emer­
employed in this study. gency management and proposes a method for crowd density estimation
By conducting a comparative experiment with Pathfinder, the study and evacuation simulation using a monocular camera. The effectiveness
validates the rationality and effectiveness of the evacuation simulation of this method has been validated through experiments, providing
presented in this paper. valuable insights and contributions to existing research.
Lastly, it is worth noting that during the first 2 s, the exit flow rate is In terms of crowd density estimation, we trained a YOLOv3 pedes­
zero, and the exit is too narrow, resulting in intense collisions among trian detection model on a self-made dataset and evaluated its detection
pedestrians during the evacuation process. From this, it can be inferred accuracy under different levels of crowding. The experimental results
that the exit setup is not reasonable. To address this issue, the following show that the model achieves recognition accuracies of 98.2%, 97.9%,
solutions are recommended: (1) adding an exit on the left or right of the 97%, 94.8% and 94.6% for different levels of crowding, with an average
site; (2) designating emergency management staff to help with evacu­ accuracy of 96.5%. Moreover, The model can process a single image in
ation at the current exit; (3) adding guardrails at the exit to help pe­ just 0.05 s. By combining a two-dimensional adaptive Gaussian kernel,
destrians evacuate along the specified route. high-quality crowd density heatmaps are generated to calculate the
Based on a series of experiments, it is found that the crowd density current scene’s level of crowding in real-time. The heatmap also high­
estimation component of our method enables effective real-time lights congested areas within the scene, effectively estimating crowd
pedestrian detection, production of crowd density heatmaps. In addi­ density.
tion, the evacuation simulation performance of our evacuation model is For pedestrian localization, we designed a pedestrian positioning
similar to that of Pathfinder, a piece of commonly used evacuation algorithm based on the pinhole model and Zhang’s camera calibration
simulation software. CAEM demonstrates higher crowd density and method. The algorithm’s accuracy was verified through experiments,
greater diversity in movement directions during the evacuation process, with average errors of 0.177 m and 0.176 m in the x and y directions,
which means that our evacuation model can better simulate evacuation respectively, meeting the precision requirements of CAEM.
in real scenarios. Based on these results, it is reasonable to say that the To simulate the real-time evacuation based on the positions of pe­
method in this study is feasible. destrians, a visually intuitive CAEM was developed using Python and
PYQT5. By integrating the pedestrian positioning algorithm, CAEM

16
S. Huang et al. Safety Science 167 (2023) 106285

enables real-time simulation based on pedestrian positions. When a Helbing, D., 1998. A Fluid Dynamic Model for the Movement of Pedestrians. https://doi.
org/10.48550/arXiv.cond-mat/9805213.
disaster occurs, it quickly calculates the optimal evacuation paths for
Helbing, D., Molnár, P., 1995. Social force model for pedestrian dynamics. Phys. Rev. E
pedestrians, assisting management personnel in guiding the evacuation 51, 4282–4286. https://doi.org/10.1103/PhysRevE.51.4282.
process. To validate the rationality of CAEM, a comparative experiment Helbing, D., Farkas, I., Vicsek, T., 2000. Simulating dynamical features of escape panic.
was conducted with the widely used evaluation simulation software, Nature 407, 487–490. https://doi.org/10.1038/35035023.
Henderson, L.F., 1971. The Statistics of Crowd Fluids. Nature 229, 381–383. https://doi.
Pathfinder. The evacuation times calculated by the two methods differ org/10.1038/229381a0.
by less than 1 s, demonstrating a high level of consistency. Hou, L., Liu, J.-G., Pan, X., Wang, B.-H., 2014. A social force evacuation model with the
This research addresses the limitations of traditional evacuation leadership effect. Phys. Stat. Mech. Its Appl. 400, 93–99. https://doi.org/10.1016/j.
physa.2013.12.049.
models, which cannot perform real-time evacuation simulations based Jiang, N., Zhou, H., Yu, F., 2021. Review of Computer Vision Based Object Counting
on current distribution of individuals during emergencies. By moni­ Methods. Laser Optoelectron. Prog. 58, 43–59.
toring and locating crowds in real-time, building management personnel Jordan, M.I., Mitchell, T.M., 2015. Machine learning: Trends, perspectives, and
prospects. Science 349, 255–260. https://doi.org/10.1126/science.aaa8415.
can identify high-risk areas prone to stampedes or congestion and take Kirchner, A., Schadschneider, A., 2002. Simulation of evacuation processes using a
timely measures to reduce crowd density. In the event of emergencies bionics-inspired cellular automaton model for pedestrian dynamics. Phys. Stat.
such as fires or earthquakes, our proposed method can quickly obtain Mech. Its Appl. 312, 260–276. https://doi.org/10.1016/S0378-4371(02)00857-9.
Lempitsky, V., Zisserman, A., 2010. Learning To Count Objects in Images, in: Advances in
the initial positions of pedestrians and calculate the optimal evacuation Neural Information Processing Systems. Curran Associates, Inc.
paths for each individual based on the specific layout and features of the Li, Y., Chen, M., Dou, Z., Zheng, X., Cheng, Y., Mebarki, A., 2019. A review of cellular
scene. This can assist management personnel and emergency response automata models for crowd evacuation. Phys. Stat. Mech. Its Appl. 526, 120752
https://doi.org/10.1016/j.physa.2019.03.117.
teams in making informed decisions, enhancing pedestrian safety, and
Liakos, K.G., Busato, P., Moshou, D., Pearson, S., Bochtis, D., 2018. Machine Learning in
minimizing the potential for casualties or stampedes. Agriculture: A Review. Sensors 18, 2674. https://doi.org/10.3390/s18082674.
Furthermore, this method contributes to the objective evaluation of Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick,
building capacity and the suitability of evacuation exit distribution, C.L., 2014. Microsoft COCO: Common Objects in Context, in: Fleet, D., Pajdla, T.,
Schiele, B., Tuytelaars, T. (Eds.), Computer Vision – ECCV 2014, Lecture Notes in
providing feedback for optimizing site design and strengthening emer­ Computer Science. Springer International Publishing, Cham, pp. 740–755. Doi:
gency management procedures. 10.1007/978-3-319-10602-1_48.
In conclusion, this study combines machine vision technology with Marana, A.N., Costa, L.F., Lotufo, R.A., Velastin, S.A., 1998. In: No.98EX237). Presented
at the Proceedings SIBGRAPI’98. International Symposium on Computer Graphics,
evacuation simulation, addressing the limitations of traditional evacu­ Image Processing, and Vision (Cat, pp. 354–361. https://doi.org/10.1109/
ation models and making contributions to the field of emergency SIBGRA.1998.722773.
management. Muramatsu, M., Nagatani, T., 2000. Jamming transition in two-dimensional pedestrian
traffic. Phys. Stat. Mech. Its Appl. 275 (1-2), 281–291.
Pedoe, D., 2013. Geometry: A comprehensive course. Courier Corporation.
CRediT authorship contribution statement Pulli, K., Baksheev, A., Kornyakov, K., Eruhimov, V., 2012. Real-time computer vision
with OpenCV. Commun. ACM 55, 61–69. https://doi.org/10.1145/
2184319.2184337.
Shijie Huang: Conceptualization, Writing – original draft, Writing – Rabaud, V., Belongie, S., 2006. Counting Crowded Moving Objects, in: 2006 IEEE
review & editing, Visualization, Validation, Methodology. Jingwei Ji: Computer Society Conference on Computer Vision and Pattern Recognition
Supervision, Methodology, Conceptualization. Yu Wang: Writing – re­ (CVPR’06). Presented at the 2006 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR’06), pp. 705–711. Doi: 10.1109/
view & editing, Investigation, Data curation. Wenju Li: Writing – review
CVPR.2006.92.
& editing, Data curation. Yuechuan Zheng: Writing – review & editing, Rahmalan, H., Nixon, M.S., Carter, J.N., 2006. On crowd density estimation for
Data curation. surveillance 540–545. Doi: 10.1049/ic:20060360.
Redmon, J., Farhadi, A., 2018. YOLOv3: An Incremental Improvement. Doi: 10.48550/
arXiv.1804.02767.
Declaration of Competing Interest Sabzmeydani, P., Mori, G., 2007. Detecting Pedestrians by Learning Shapelet Features.
In: in: 2007 IEEE Conference on Computer Vision and Pattern Recognition. Presented
The authors declare that they have no known competing financial at the 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.
https://doi.org/10.1109/CVPR.2007.383134.
interests or personal relationships that could have appeared to influence Shi, W., Cao, J., Zhang, Q., Li, Y., Xu, L., 2016. Edge Computing: Vision and Challenges.
the work reported in this paper. IEEE Internet Things J. 3, 637–646. https://doi.org/10.1109/JIOT.2016.2579198.
Sindagi, V.A., Patel, V.M., 2017. Generating High-Quality Crowd Density Maps using
Contextual Pyramid CNNs. https://doi.org/10.48550/arXiv.1708.00953.
Acknowledgments Sindagi, V.A., Patel, V.M., 2018. A survey of recent advances in CNN-based single image
crowd counting and density estimation. Pattern Recognit. Lett. Video Surveillance-
The study was supported by the Fundamental Research Funds for the oriented Biometrics 107, 3–16. https://doi.org/10.1016/j.patrec.2017.07.007.
Vainstein, M.H., Brito, C., Arenzon, J.J., 2014. Percolation and cooperation with mobile
Central Universities (No. 2022ZZCX05K02). agents: Geometric and strategy clusters. Phys. Rev. E 90, 022132. https://doi.org/
10.1103/PhysRevE.90.022132.
References Varas, A., Cornejo, M.D., Mainemer, D., Toledo, B., Rogan, J., Muñoz, V., Valdivia, J.A.,
2007. Cellular automaton model for evacuation process with obstacles. Phys. Stat.
Mech. Its Appl. 382, 631–642. https://doi.org/10.1016/j.physa.2007.04.006.
Chan, A.B., Vasconcelos, N., 2012. Counting people with low-level features and bayesian
Viola, P., Jones, M.J., Snow, D., 2005. Detecting Pedestrians Using Patterns of Motion
regression. IEEE Trans. Image Process. 21, 2160–2177. https://doi.org/10.1109/
and Appearance. Int. J. Comput. Vis. 63, 153–161. https://doi.org/10.1007/s11263-
TIP.2011.2172800.
005-6644-8.
Chen, D., Wang, L., Zomaya, A.Y., Dou, M., Chen, J., Deng, Z., Hariri, S., 2015. Parallel
Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E., 2018. Deep Learning for
simulation of complex evacuation scenarios with adaptive agent models. IEEE Trans.
Computer Vision: A Brief Review. Comput. Intell. Neurosci. 2018, 1–13. https://doi.
Parallel Distrib. Syst. 26, 847–857. https://doi.org/10.1109/TPDS.2014.2311805.
org/10.1155/2018/7068349.
Clancy, C., Hecker, J., Stuntebeck, E., O’Shea, T., 2007. Applications of machine learning
Wang, L., 2020. Image Crowd Counting based on Convolutional Neural Network (Ph.D.).
to cognitive radio networks. IEEE Wirel. Commun. 14, 47–52. https://doi.org/
University of Science and Technology of China. https://doi.org/10.27517/d.cnki.
10.1109/MWC.2007.4300983.
gzkju.2020.000373.
Duives, D.C., Daamen, W., Hoogendoorn, S.P., 2013. State-of-the-art crowd motion
Xie, X., Ji, J., Wang, Z., Lu, L., Yang, S., 2016. Experimental study on the influence of the
simulation models. Transp. Res. Part C Emerg. Technol. 37, 193–209. https://doi.
crowd density on walking speed and stride length. J. Saf. Environ. 16, 232–235.
org/10.1016/j.trc.2013.02.005.
https://doi.org/10.13637/j.issn.1009-6094.2016.04.047.
Fu, Z., Zhou, X., Zhu, K., Chen, Y., Zhuang, Y., Hu, Y., Yang, L., Chen, C., Li, J., 2015.
Xu, M., Ge, Z., Jiang, X., Cui, G., Lv, P., Zhou, B., Xu, C., 2019. Depth Information Guided
A floor field cellular automaton for crowd evacuation considering different walking
Crowd Counting for complex crowd scenes. Pattern Recogn. Lett. 125, 563–569.
abilities. Phys. Stat. Mech. Its Appl. 420, 294–303. https://doi.org/10.1016/j.
https://doi.org/10.1016/j.patrec.2019.02.026.
physa.2014.11.006.
Yu, Y., Zhu, H., Qian, J., Pan, C., Miao, D., 2021. Survey on Deep Learning Based Crowd
Girshick, R., 2015. Fast R-CNN. In: Presented at the Proceedings of the IEEE International
Counting. J. Comput. Res. Dev. 58, 2724–2747.
Conference on Computer Vision, pp. 1440–1448.
Zhang, Z., 2000. A flexible new technique for camera calibration. IEEE Trans. Pattern
Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B., 1998. Support vector
Anal. Mach. Intell. 22, 1330–1334. https://doi.org/10.1109/34.888718.
machines. IEEE Intell. Syst. Their Appl. 13, 18–28. https://doi.org/10.1109/
5254.708428.

17
S. Huang et al. Safety Science 167 (2023) 106285

Zhang, C., Li, H., Wang, X., Yang, X., 2015. Cross-Scene Crowd Counting via Deep Conference on Computer Vision and Pattern Recognition (CVPR), pp. 589–597.
Convolutional Neural Networks. In: Presented at the Proceedings of the IEEE https://doi.org/10.1109/CVPR.2016.70.
Conference on Computer Vision and Pattern Recognition, pp. 833–841. Zheng, X., Zhong, T., Liu, M., 2009. Modeling crowd evacuation of a building based on
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y., 2016. Single-Image Crowd Counting via seven methodological approaches. Build. Environ. 44, 437–445. https://doi.org/
Multi-Column Convolutional Neural Network. In: in: 2016 IEEE Conference on 10.1016/j.buildenv.2008.04.002.
Computer Vision and Pattern Recognition (CVPR). Presented at the 2016 IEEE

18

You might also like