Computer Vision for Population Density and Evacuation
Computer Vision for Population Density and Evacuation
Safety Science
journal homepage: www.elsevier.com/locate/safety
A R T I C L E I N F O A B S T R A C T
Keywords: In this study, we propose a machine vision-based method for crowd density estimation and evacuation simulation
Machine vision to help reduce the occurrence of stampedes in crowded public places. The method consists of pedestrian
Pedestrian positioning detection model, pedestrian positioning algorithm, and cellular automata evacuation model (CAEM). In the
Cellular automata
pedestrian detection model, an adaptive 2D Gaussian kernel was used to generate crowd density heatmaps for
Evacuation simulation
Crowd density estimation
crowd density estimation. The model was found to have achieved a real-time detection accuracy of 96.5% for
real-time images of evacuation scenes after being trained on a dataset of 66,808 pedestrian images using the
YOLOv3. The pedestrian positioning algorithm was developed to determine the coordinates of pedestrians in
real-world scenarios. Experiments show that the average positioning errors in the x and y directions are 0.177 m
and 0.176 m, respectively. The pedestrian coordinates were then input into the Python-based cellular automata
evacuation model (CAEM) for evaluation simulation, which is compared with the simulation results by Path
finder, another piece of software that can deliver agent-based evacuation simulation. The difference in evacu
ation time calculated by the two models was found to be less than a second, which indicates a high degree of
consistency.
* Corresponding author.
E-mail address: [email protected] (J. Ji).
https://doi.org/10.1016/j.ssci.2023.106285
Received 6 April 2023; Received in revised form 1 June 2023; Accepted 30 July 2023
Available online 4 August 2023
0925-7535/© 2023 Elsevier Ltd. All rights reserved.
S. Huang et al. Safety Science 167 (2023) 106285
methods include Bayesian algorithms (Chan and Vasconcelos, 2012) and To complement existing research, this study proposes a machine
neural network regression algorithms (Marana et al., 1998). vision-based method for crowd density estimation and evacuation
In CNN-based methods, the computer builds a crowd detection simulation. Integrating a pedestrian detection model, a pedestrian
model by learning the various features of crowd images or the nonlinear positioning algorithm, and a cellular automata evacuation model
relationship between crowd images and density maps through a CNN. (CAEM), the method we developed can produce real-time crowd density
After detecting pedestrians in the image, a Gaussian kernel is used to heatmaps, perform evacuation simulations based on the real-world po
smooth the two-dimensional coordinates marking the pedestrian posi sitions of pedestrians.
tions so as to generate crowd density maps (Lempitsky and Zisserman,
2010). Zhang et al. (2015) proposed Crowd-CNN, a deep convolutional 2. General introduction to the method
neural network, in which the computer is trained to predict crowd
densities and crowd quantities. By changing data input, the model can The method proposed in this paper consists of two components:
find better local optimal solutions. Sindagi and Patel (2017) proposed a crowd density estimation and evacuation simulation. For the component
context semantic extraction network based on a pyramid network (CP- of crowd density estimation, the YOLOv3 object detection algorithm is
CNN), which is able to produce high-quality crowd density maps and exploited to detect pedestrians in real-time camera video streams, and
accurate crowd quantity data by integrating global information and then markers are placed on the detected pedestrians. A Gaussian kernel
local context information of crowd features. Xu et al. (2019) developed is used to smooth the pedestrian markers so as to generate crowd density
an algorithm called Digcrowd based on the YOLO algorithm. This al heatmaps. Simultaneously calculate the crowd density in the scene,
gorithm uses depth information to segment images into far-field and referring to Rahmalan et al. (2006) scheme to categorize the level of
near-field regions and detects pedestrians in each region separately. It crowding. When the level of crowding reaches high, issue an alert and
proved to be able to produce high-quality crowd density maps through initiate the evacuation simulation component. With regard to the
calculation. component of evacuation simulation, a pedestrian positioning algorithm
When it comes to evacuation simulation, the main method adopted is employed to convert pedestrian positions in images to real-world
in existing research is to establish evacuation simulation models. positions. The position information is then input into the CAEM for
Scholars have proposed dozens of evacuation models (Duives et al., real-time evacuation simulation. The CAEM can calculate the optimal
2013; Muramatsu and Nagatani, 2000; Zheng et al., 2009). Generally evacuation path, evacuation time for each pedestrian and evacuation
speaking, personnel evacuation dynamics models can be divided into flow rates at different exits of the evacuation site, and it can also produce
two main categories: continuous models and discrete models. Contin real-time crowd density heatmaps while simulating evacuation. Fig. 1 is
uous models typically define the behavior of evacuees through func a graphical description of the makeup of our method.
tions. Common continuous models include social force models (Helbing This method can play a positive role in safety assessment and plan
et al., 2000; Helbing and Molnár, 1995; Hou et al., 2014), fluid dynamics ning, emergency management, facility optimization, and traffic man
models (Helbing, 1998; Henderson, 1971), etc. Discrete models feature agement. For example, by obtaining real-time pedestrian location data,
the temporal or spatial discretization of evacuation sites, which allows it is possible to assess the safety of existing buildings, public spaces, or
simulated evacuees to move autonomously. Representative discrete evacuation routes. This can help planners, architects, or safety experts
models include Agent-based models (Chen et al., 2015; Vainstein et al., identify potential risk areas and bottlenecks, propose improvement
2014) and Cellular Automata Evacuation Models (CAEM) (Fu et al., measures, and optimize evacuation plans to ensure the safe evacuation
2015; Kirchner and Schadschneider, 2002; Li et al., 2019; Varas et al., of individuals in emergency situations. In the event of a crisis, obtaining
2007). real-time pedestrian location can assist emergency management
However, existing research is insufficient in certain aspects. For one personnel in better understanding the distribution and movement trends
thing, crowd density estimation algorithms developed in previous of people, enabling them to make timely decisions and take appropriate
studies only enable real-time crowd density estimation and usually does evacuation measures, thereby improving the efficiency and accuracy of
not have the functionality of evacuation simulation. For another, most emergency response. Additionally, analyzing pedestrian location data
traditional evacuation simulation models are only suitable for stampede can provide information on people flow, crowding levels, etc., which can
risk assessment of a building or simulation of incidents, and cannot assist administrators of commercial centers, transportation hubs, sports
perform real-time evacuation simulations based on the distribution of venues, and other locations in optimizing facility layouts, formulating
pedestrians during emergencies. Therefore, it is supposed that a com more effective traffic management strategies, and enhancing the effi
bination of the two types of model would hopefully help reduce the ciency and quality of service.
probability of stampede and resulting losses.
2
S. Huang et al. Safety Science 167 (2023) 106285
3
S. Huang et al. Safety Science 167 (2023) 106285
3.2. Crowd density heatmap However, due to the perspective effect, setting the diffusion parameter
of the Gaussian kernel as the variance would cause an issue where the
After completing pedestrian detection, the central coordinates of the size of a region in pixel coordinates becomes smaller as the distance from
bounding box for each pedestrian are used as pedestrian markers. In the the camera increases. As a result, the crowd density calculated using
2D image, we use × to represent the pixel in the image. Assuming that pixel coordinates would be higher. To address this problem, this study
there is a pedestrian marker at pixel xi, then there is a function δ(x-xi) for adopts a method proposed by Zhang et al. (2016), which is widely used
the marker. When x = xi, δ(x-xi) = 1, otherwise δ(x-xi) = 0. If there are M and has been proven to effectively mitigate the impact of the perspective
pedestrian markers in the image, the crowd density function of the effect. Specifically, we calculate the distance between each pedestrian
image can be expressed by the following formula. marker xi and its k nearest neighbors and compute the average distance
∑
∑
M di = 1k kj=1 dij . Based on the average distance, we can infer that the pixels
H(x) = δ(x − xi ) (2)
i=1
associated with pedestrian xi correspond approximately to a circular
region with a radius proportional to the average distance d‾i. To esti
To represent it as a continuous function, it is necessary to use an mate the crowd density around xi, H(x) and an adaptive 2D Gaussian
adaptive 2D Gaussian kernel for smoothing. Specifically, convolution kernel with a diffusion coefficient proportional to d‾i are convolved. The
calculation is performed between H(x) and the Gaussian kernel. Then, diffusion coefficient of this Gaussian kernel can be expressed by formula
the crowd density function can be expressed as formula (3), where σ (4).
represents the diffusion parameter of the Gaussian kernel, and the
σ i = βdi (4)
variance of the Gaussian kernel is often used for it.
− ‖x− xi ‖ After multiple experiments, it was found that k = 12 and β = 0.2 are the
F(x) = H(x) × Gσ (x), withGσ (x) = e 2σ 2 (3)
optimal values. The final crowd density function can be expressed as
4
S. Huang et al. Safety Science 167 (2023) 106285
Table 2 detection model and another object detection algorithm called Faster R-
Relationship between crowd density and the level of crowding. CNN (Girshick, 2015), which is based on convolutional neural networks.
Level of crowding Crowd density(people/m2) The results are shown in Fig. 5. It displays the actual number of pedes
trians present in each image, the number of pedestrians detected by
Very Low density < 0.5
Low 0.5 ≤ density < 0.8 YOLOv3, the number of pedestrians detected by Faster R-CNN, and the
Moderate 0.8 ≤ density < 1.27 pedestrian density for each image. The models accurately identify the
High 1.27 ≤ density < 2.0 pedestrians in the images, with YOLOv3 having a maximum error of 5
Very High density ≥ 2.0 and an average error of 1.16, while Faster R-CNN has a maximum error
of 4 and an average error of 0.92.
formula (5), and then the crowd density heatmap can be obtained by To evaluate the model performance more effectively, we use recog
using the Matplotlib library to map F(x) to colors, where low-risk areas nition accuracy as the evaluation metric, with the following formula.
are mapped to blue, medium-risk areas to green, and high-risk areas to TP
red. accuracy = (6)
P
F(x) = H(x) × Gσi (x) (5) where TP represents the number of correctly recognized pedestrians,
and P represents the actual number of pedestrians.
To assist administrators in quickly identifying the actual locations of
The accuracy of YOLOv3 and Faster R-CNN at each level of crowding
crowded areas, the opacity of the heatmap is set to 0.7 and overlaid onto
is calculated and presented in Table 3. The average accuracy of YOLOv3
the original image. By employing this method, administrators can
and Faster R-CNN is 96.5% and 96.8% respectively. Both models achieve
swiftly pinpoint areas of high crowd density and promptly guide people
high recognition accuracy that is practically usable. Faster R-CNN has
to evacuate.
slightly higher accuracy than YOLOv3, but it takes an average of 1.7 s to
recognize an image, whereas YOLOv3 only requires 0.05 s. There is a
3.3. Experimental test of the crowd density estimation significant difference in recognition speed between the two. Therefore,
considering both accuracy and recognition speed, this study ultimately
To demonstrate the effectiveness of the crowd density estimation selects the YOLOv3 pedestrian detection model.
component, the following experiment was carried out. Firstly, only Finally, two images with very high level of crowding were selected,
images containing humans were extracted from the COCO dataset (Lin and their corresponding crowd density heatmaps were generated using
et al., 2014) to train the pedestrian detection model. A total of 66,808 Formula (5). In the heatmaps, the red areas represent high-risk zones in
images were extracted, of which 64,115 were used as the training set the current image, the green areas represent moderate-risk zones, and
and 2,693 the validation set. Next, the above datasets were input into the blue areas represent low-risk zones. To avoid misleading in
the YOLOv3 algorithm for training, with a learning rate set to 0.01 and a terpretations in scenes with low average crowd density, the heatmaps
cosine annealing hyperparameter set to 0.2. After 20 rounds of training, also include annotations indicating the current scene’s average crowd
the following results were obtained on the validation set (see Fig. 4): the density and level of crowding (see Fig. 6).
model’s recognition accuracy rate reached over 80%, and the [email protected] We have found that in these images, despite the severe overlap of
value was close to 0.8 ([email protected] refers to the accuracy rate when the pedestrians, the model accurately identifies the pedestrians in the scene
recall rate is 50% on the accuracy rate-recall rate balance curve, which is and assigns them corresponding IDs. For example, in Image (a), there are
a common metric for evaluating model performance). This proved that
the model had achieved a certain degree of usability, so the training was Table 3
stopped and the model with the maximum [email protected] value was selected Recognition accuracy of YOLOv3 and Faster-RCNN.
as the final pedestrian detection model.
Level of crowding Accuracy of YOLOv3 Accuracy of Faster-RCNN
To validate the recognition performance and superiority of the model
in practice, 25 images containing dense crowds were captured for testing Very Low 98.2% 98.2%
Low 97.9% 96.8%
purposes. Based on Rahmalan et al. (2006) scheme, the captured images Moderate 97% 97.8%
were categorized into 5 levels of crowding (see Table 2), with each level High 94.8% 96.1%
consisting of five images. Very High 94.6% 95.4%
The captured images were recognized using the trained pedestrian
5
S. Huang et al. Safety Science 167 (2023) 106285
a total of 37 people, and the YOLOv3 model detects 36 people, achieving collisions. Therefore, the method developed in this paper issues an alert
an accuracy rate of 97.2%. In Image (d), there are a total of 62 people, and initiates evacuation simulation when the scene reaches high level of
and the model detects 57 people, achieving an accuracy rate of 91.9%. crowding. In the evacuation simulation component, the first step is to
Furthermore, the crowd density heatmap accurately identifies the high- obtain the position of pedestrians in the three-dimensional world using
risk areas. This validates the effectiveness of the crowd density warning the pedestrian positioning algorithm developed in this study. Then, the
initiate the evacuation simulation component. position information is input into a CAEM for evacuation simulation.
Based on the previous component of crowd density estimation, we To obtain the positional information of pedestrians, it is necessary to
have observed that when a scene reaches high level of crowding, the convert their positions in the image into positions in the real world. A
gaps between individuals become smaller, increasing the likelihood of common method for this is the camera pinhole model, a mathematical
6
S. Huang et al. Safety Science 167 (2023) 106285
model that can achieve bidirectional conversion between coordinates in If the Zhang’s method is used for calibration, the world coordinate
the image and coordinates in the real world. It represents the camera’s system is located in the plane of the calibration board, so the z-axis
imaging process using pixel coordinates, image coordinates, camera coordinate of all points on the calibration board is 0. When the cali
coordinates, and world coordinates, as shown in Fig. 7. bration board is placed on the ground for calibration, the z-axis co
The position of the pedestrian in the image is represented by the pixel ordinates of the points at the bottom of a pedestrian’s feet are also 0.
coordinate system, a two-dimensional coordinate system measured in This means that if the determination of pedestrian coordinates is
pixels, in which the origin is set at the upper left corner of the image. The transformed into the determination of the coordinates of points at the
position of the pedestrian in the real world is represented by the world bottom of the pedestrian’s feet, only xw and yw need to be determined.
coordinate system usually measured from a point in the real world. The Also, considering that the CAEM is a two-dimensional model and does
image coordinate system is similar to the pixel coordinate system, except not require the height or z-axis coordinate of the object, in this study we
that the origin is located at the center of the image. The camera coor have decided to place the calibration board flat on the ground for cali
dinate system is a three-dimensional coordinate system with the camera bration and determine the position of the pedestrian using the world
as the origin. Assuming that the coordinate of the point at the foot of the coordinates of the bottom points at their feet.
pedestrian in the world coordinate system is (xw, yw, zw), the corre The specific process is as follows: We use the midpoint located on the
sponding point in the pixel coordinate system is (u’, v’). Using the bottom edge of the pedestrian bounding box, as determined by YOLOv3,
pinhole model, these two coordinates can be converted to each other, to represent the pixel coordinate (u’, v’) at the pedestrian’s feet, and set k
⎡ ⎤
and the specific conversion formula is as follows: l1 l2 l3 l4
⎡ ⎤ [R|t] = l5 l6 l7 l8 ⎦. Then, substituting zw = 0 into formula (7),
⎣
1 ⎡ ⎤ ⎡ ⎤ l9 l10 l11 l12
⎡ ′ ⎤ ⎢ dx 0 u0 ⎥⎡ ⎤
u ⎢ ⎥ f 0 0 0 [ ] xw xw
formula (8) is obtained.
⎢ ⎥ R t ⎢ y ⎥ ⎢ yw ⎥
⎣ ′ ⎦
zc v = ⎢ 1 ⎣ 0 f 0 0 ⎦ ⎢ w ⎥ = K[R|t]⎣ ⎥
⎢ (7)
⎢0 v0 ⎥
⎥ 0T 1 ⎣ zw ⎦ zw ⎦ ⎧
1 ⎣ dy ⎦ 0 0 1 0 ⎨ l1 xw + l2 yw + l4 = zc u′
1 1
l x + l6 yw + l8 = zc v′ (8)
0 0 1 ⎩ 5 w
l9 xw + l10 yw + l12 = zc
Here, dx and dy represent the width and length of a pixel respectively;
Next, substituting zc = l9xw + l10yw + l12 into formula (8), formula (9)
(u0, v0) represents the coordinates of the origin of the image coordinate
is obtained.
system in the pixel coordinate system; f represents the camera focal
{
length; R represents the camera rotation matrix; t represents the camera (l1 − l9 u′)xw + (l2 − l10 u′)yw = l12 u′ − l4 (9)
translation matrix, [R|t] represents the camera extrinsic matrix; K rep (l5 − l9 v′)xw + (l6 − l10 v′)yw = l12 v′ − l8
resents the camera intrinsic matrix; and zc represents the depth of the
Finally, converting formula (9) into matrix form yields formula (10).
pixel point from the camera.
[ ][ ] [ ]
According to Formula (7), to achieve bidirectional conversion be l1 − l9 u′ l2 − l10 u′ xw l u′ − l4
= 12 ′ (10)
tween pixel coordinates to world coordinates only requires the three ′
l5 − l9 v l6 − l10 v ′ yw l12 v − l8
parameters K, [R|t], and zc. K and [R|t] can be obtained through camera
calibration using the most commonly used camera calibration method, According to formula (10), after obtaining the pixel coordinates of
the Zhang calibration method (Zhang, 2000), which requires a calibra the bottom of the pedestrian’s feet, camera intrinsic matrix k, and
tion board and at least six images taken from different positions. Then, camera extrinsic matrix [R|t], the world coordinates of the pedestrian
the camera’s intrinsic and extrinsic matrices can be obtained by pro can be obtained. As Fig. 8 shows, the black checkerboard is the cali
cessing the images. However, zc cannot be obtained through a single bration board; the world coordinates of the pedestrian are (xw, yw, 0);
camera. To solve this problem, this article proposes a pedestrian posi and the corresponding pixel coordinates are (u’, v’).
tioning algorithm.
7
S. Huang et al. Safety Science 167 (2023) 106285
Fig. 10. Distribution of local moving potential energy for different exits.
8
S. Huang et al. Safety Science 167 (2023) 106285
9
S. Huang et al. Safety Science 167 (2023) 106285
Fig. 13. Algorithm for the determination of the shortest evacuation path.
⎡ ⎤
1596.98 0 901.61 Comparison between the actual measured coordinates with the calcu
k=⎣ 0 1681.60 709.43 ⎦, [R|t] lated coordinates showed that the maximum error of the pedestrian
0 0 1 positioning algorithm has been controlled within 0.3 m, with an average
⎡ ⎤
0.8803 0.0397 0.0533 − 954.62 error of 0.12 m in the x-axis direction and 0.18 m in the y-axis direction
⎢ 0.1405
⎢ 0.3974 − 0.9412 307.92 ⎥⎥ (see Table 4). Since the unit used by the CAEM model is meter, the world
=⎣
− 0.0637 0.0514 0.3012 3909.37 ⎦ coordinate accuracy obtained using our pedestrian positioning algo
0 0 0 1 rithm meets the requirements of CAEM.
Finally, an evacuation simulation was performed by inputting the
Then, the RTSP video stream from the camera was inputted into the
world coordinates calculated above into the CAEM and Pathfinder,
trained pedestrian detection model for real-time detection of pedes
respectively. The initial distribution of the “pedestrians” is shown in
trians. During the forward movement of four pedestrians, a snapshot was
Fig. 17. In the CAEM, the blue dashed line represents the wall; the green
taken for experimentation. The recognition results are shown in Fig. 16,
line represents the exit; and the purple circle with a diameter of 0.4 m
where the left image is the original photo and the right image is the
represents a pedestrian. In Pathfinder, the green line represents the exit,
model’s output image. The model accurately identifies each person and
and the pedestrian is represented by a purple circle with a shoulder
assigns them corresponding IDs.
width of 40 cm and a moving speed of 1.3 m/s (see Fig. 18).
Taking the midpoint of the bottom edge of the pedestrian bounding
The experimental results showed that the total evacuation time for
box as the pixel coordinate of the pedestrian, we calculated the corre
the CAEM was 4.1 s, while that of Pathfinder was 4.3 s. It can be seen by
sponding world coordinate of the pedestrian using formula (10).
10
S. Huang et al. Safety Science 167 (2023) 106285
comparison that there is not much difference in evacuation time. We Pathfinder. In addition, the calculated evacuation path satisfies the
infer that the fact that the evacuation time of the CAEM is slightly “shortest path” principle. These results indicate that the component of
shorter can be attributed to Xie’s speed formula, by which a slightly evacuation simulation is truly feasible.
larger speed was calculated than that of Pathfinder when the crowd
density was relatively small. To further verify that the pedestrian’s 5. Experimental verification
choice of exit satisfies the “shortest path” principle, we drew the each
pedestrian’s evacuation path, and it can be seen that each simulated Considering that the experiments above have demonstrated that the
“pedestrian” had chosen the nearest exit, indicating an agreement with method proposed in this study is suitable for scenarios of low crowd
the “shortest path” principle (see Fig. 19). density, we conducted a reduced-scale experiment to verify the appli
In summary, the pedestrian positioning algorithm proposed in this cability of the method for scenarios with high crowd-density. The rea
article meet the accuracy requirements of the CAEM, and the perfor sons for such a reduced-scale experiment are twofold, because more
mance of the CAEM in the evacuation simulation is similar to that of detailed measurements and analysis can be made in a reduced-scale
11
S. Huang et al. Safety Science 167 (2023) 106285
experiment, and a reduced-scale experiment is safer and more efficient. shown below, where the green line represents the exit; the green rect
The experiment setting was made by scaling down an authentic exper angle represents the obstacle; and the blue line represents the wall (see
iment setting by a ratio of 17:1. The experimental instruments included Figs. 20 and 21).
a 1080p camera, a 10 cm × 10 cm calibration board, and 20 miniature 2. Camera calibration
pedestrians 1/17 the size of an authentic human figure. The process of Nineteen images of the calibration board were taken from different
the experiment are as follows: angles and then processed using the calibration toolbox in Matlab to
1. Construction of the simulated evacuation site obtain the intrinsic matrices and extrinsic matrices of the camera. The
First, the size of the reduced-scale evacuation site was found to be camera calibration errors are shown in the figure below. The maximum
23.5 cm × 23.5 cm through measurement, where there were an 5.8 cm × error of 0.08 pixels and the average error of 0.07 pixels indicate the
5.8 cm obstacle and an 5.8 cm wide exit. Then, we modeled in the CAEM calibration was acceptable (see Fig. 22).
an evacuation site 17 times the size of the reduced-scale evacuation site. We maintained the ultimate position of the camera and established a
The map of local movement potential and global movement potential is world coordinate system in which the upper left corner of the calibration
board was the origin and the x-axis and y-axis both parallel to the plane
Table 4
Errors between actual coordinates and calculated coordinates.
ID Pixel Actual World Computed Error (m)
Coordinates Coordinates Coordinates
Fig. 18. Initial distribution of pedestrians in Pathfinder. Fig. 20. Evacuation site.
12
S. Huang et al. Safety Science 167 (2023) 106285
⎡ ⎤
⎡ ⎤ 0.9985 0.0035 0.0551 − 91.37
1908.45 0 998.18 ⎢ 0.0424 0.5770 − 0.8055 41.63 ⎥
k =⎣ 0 2475 497.07 ⎦,[R|t]⎢ ⎥
⎣ − 0.0354 0.1066 0.5900 422.96 ⎦
0 0 1
0 0 0 1
13
S. Huang et al. Safety Science 167 (2023) 106285
14
S. Huang et al. Safety Science 167 (2023) 106285
15
S. Huang et al. Safety Science 167 (2023) 106285
conjunction with Fig. 27, it can be noted that there are no instances of 6. Conclusion
pedestrian overlap during the evacuation process. These findings
demonstrate the effectiveness of the collision detection algorithm This study applies computer vision technology to the field of emer
employed in this study. gency management and proposes a method for crowd density estimation
By conducting a comparative experiment with Pathfinder, the study and evacuation simulation using a monocular camera. The effectiveness
validates the rationality and effectiveness of the evacuation simulation of this method has been validated through experiments, providing
presented in this paper. valuable insights and contributions to existing research.
Lastly, it is worth noting that during the first 2 s, the exit flow rate is In terms of crowd density estimation, we trained a YOLOv3 pedes
zero, and the exit is too narrow, resulting in intense collisions among trian detection model on a self-made dataset and evaluated its detection
pedestrians during the evacuation process. From this, it can be inferred accuracy under different levels of crowding. The experimental results
that the exit setup is not reasonable. To address this issue, the following show that the model achieves recognition accuracies of 98.2%, 97.9%,
solutions are recommended: (1) adding an exit on the left or right of the 97%, 94.8% and 94.6% for different levels of crowding, with an average
site; (2) designating emergency management staff to help with evacu accuracy of 96.5%. Moreover, The model can process a single image in
ation at the current exit; (3) adding guardrails at the exit to help pe just 0.05 s. By combining a two-dimensional adaptive Gaussian kernel,
destrians evacuate along the specified route. high-quality crowd density heatmaps are generated to calculate the
Based on a series of experiments, it is found that the crowd density current scene’s level of crowding in real-time. The heatmap also high
estimation component of our method enables effective real-time lights congested areas within the scene, effectively estimating crowd
pedestrian detection, production of crowd density heatmaps. In addi density.
tion, the evacuation simulation performance of our evacuation model is For pedestrian localization, we designed a pedestrian positioning
similar to that of Pathfinder, a piece of commonly used evacuation algorithm based on the pinhole model and Zhang’s camera calibration
simulation software. CAEM demonstrates higher crowd density and method. The algorithm’s accuracy was verified through experiments,
greater diversity in movement directions during the evacuation process, with average errors of 0.177 m and 0.176 m in the x and y directions,
which means that our evacuation model can better simulate evacuation respectively, meeting the precision requirements of CAEM.
in real scenarios. Based on these results, it is reasonable to say that the To simulate the real-time evacuation based on the positions of pe
method in this study is feasible. destrians, a visually intuitive CAEM was developed using Python and
PYQT5. By integrating the pedestrian positioning algorithm, CAEM
16
S. Huang et al. Safety Science 167 (2023) 106285
enables real-time simulation based on pedestrian positions. When a Helbing, D., 1998. A Fluid Dynamic Model for the Movement of Pedestrians. https://doi.
org/10.48550/arXiv.cond-mat/9805213.
disaster occurs, it quickly calculates the optimal evacuation paths for
Helbing, D., Molnár, P., 1995. Social force model for pedestrian dynamics. Phys. Rev. E
pedestrians, assisting management personnel in guiding the evacuation 51, 4282–4286. https://doi.org/10.1103/PhysRevE.51.4282.
process. To validate the rationality of CAEM, a comparative experiment Helbing, D., Farkas, I., Vicsek, T., 2000. Simulating dynamical features of escape panic.
was conducted with the widely used evaluation simulation software, Nature 407, 487–490. https://doi.org/10.1038/35035023.
Henderson, L.F., 1971. The Statistics of Crowd Fluids. Nature 229, 381–383. https://doi.
Pathfinder. The evacuation times calculated by the two methods differ org/10.1038/229381a0.
by less than 1 s, demonstrating a high level of consistency. Hou, L., Liu, J.-G., Pan, X., Wang, B.-H., 2014. A social force evacuation model with the
This research addresses the limitations of traditional evacuation leadership effect. Phys. Stat. Mech. Its Appl. 400, 93–99. https://doi.org/10.1016/j.
physa.2013.12.049.
models, which cannot perform real-time evacuation simulations based Jiang, N., Zhou, H., Yu, F., 2021. Review of Computer Vision Based Object Counting
on current distribution of individuals during emergencies. By moni Methods. Laser Optoelectron. Prog. 58, 43–59.
toring and locating crowds in real-time, building management personnel Jordan, M.I., Mitchell, T.M., 2015. Machine learning: Trends, perspectives, and
prospects. Science 349, 255–260. https://doi.org/10.1126/science.aaa8415.
can identify high-risk areas prone to stampedes or congestion and take Kirchner, A., Schadschneider, A., 2002. Simulation of evacuation processes using a
timely measures to reduce crowd density. In the event of emergencies bionics-inspired cellular automaton model for pedestrian dynamics. Phys. Stat.
such as fires or earthquakes, our proposed method can quickly obtain Mech. Its Appl. 312, 260–276. https://doi.org/10.1016/S0378-4371(02)00857-9.
Lempitsky, V., Zisserman, A., 2010. Learning To Count Objects in Images, in: Advances in
the initial positions of pedestrians and calculate the optimal evacuation Neural Information Processing Systems. Curran Associates, Inc.
paths for each individual based on the specific layout and features of the Li, Y., Chen, M., Dou, Z., Zheng, X., Cheng, Y., Mebarki, A., 2019. A review of cellular
scene. This can assist management personnel and emergency response automata models for crowd evacuation. Phys. Stat. Mech. Its Appl. 526, 120752
https://doi.org/10.1016/j.physa.2019.03.117.
teams in making informed decisions, enhancing pedestrian safety, and
Liakos, K.G., Busato, P., Moshou, D., Pearson, S., Bochtis, D., 2018. Machine Learning in
minimizing the potential for casualties or stampedes. Agriculture: A Review. Sensors 18, 2674. https://doi.org/10.3390/s18082674.
Furthermore, this method contributes to the objective evaluation of Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick,
building capacity and the suitability of evacuation exit distribution, C.L., 2014. Microsoft COCO: Common Objects in Context, in: Fleet, D., Pajdla, T.,
Schiele, B., Tuytelaars, T. (Eds.), Computer Vision – ECCV 2014, Lecture Notes in
providing feedback for optimizing site design and strengthening emer Computer Science. Springer International Publishing, Cham, pp. 740–755. Doi:
gency management procedures. 10.1007/978-3-319-10602-1_48.
In conclusion, this study combines machine vision technology with Marana, A.N., Costa, L.F., Lotufo, R.A., Velastin, S.A., 1998. In: No.98EX237). Presented
at the Proceedings SIBGRAPI’98. International Symposium on Computer Graphics,
evacuation simulation, addressing the limitations of traditional evacu Image Processing, and Vision (Cat, pp. 354–361. https://doi.org/10.1109/
ation models and making contributions to the field of emergency SIBGRA.1998.722773.
management. Muramatsu, M., Nagatani, T., 2000. Jamming transition in two-dimensional pedestrian
traffic. Phys. Stat. Mech. Its Appl. 275 (1-2), 281–291.
Pedoe, D., 2013. Geometry: A comprehensive course. Courier Corporation.
CRediT authorship contribution statement Pulli, K., Baksheev, A., Kornyakov, K., Eruhimov, V., 2012. Real-time computer vision
with OpenCV. Commun. ACM 55, 61–69. https://doi.org/10.1145/
2184319.2184337.
Shijie Huang: Conceptualization, Writing – original draft, Writing – Rabaud, V., Belongie, S., 2006. Counting Crowded Moving Objects, in: 2006 IEEE
review & editing, Visualization, Validation, Methodology. Jingwei Ji: Computer Society Conference on Computer Vision and Pattern Recognition
Supervision, Methodology, Conceptualization. Yu Wang: Writing – re (CVPR’06). Presented at the 2006 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR’06), pp. 705–711. Doi: 10.1109/
view & editing, Investigation, Data curation. Wenju Li: Writing – review
CVPR.2006.92.
& editing, Data curation. Yuechuan Zheng: Writing – review & editing, Rahmalan, H., Nixon, M.S., Carter, J.N., 2006. On crowd density estimation for
Data curation. surveillance 540–545. Doi: 10.1049/ic:20060360.
Redmon, J., Farhadi, A., 2018. YOLOv3: An Incremental Improvement. Doi: 10.48550/
arXiv.1804.02767.
Declaration of Competing Interest Sabzmeydani, P., Mori, G., 2007. Detecting Pedestrians by Learning Shapelet Features.
In: in: 2007 IEEE Conference on Computer Vision and Pattern Recognition. Presented
The authors declare that they have no known competing financial at the 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.
https://doi.org/10.1109/CVPR.2007.383134.
interests or personal relationships that could have appeared to influence Shi, W., Cao, J., Zhang, Q., Li, Y., Xu, L., 2016. Edge Computing: Vision and Challenges.
the work reported in this paper. IEEE Internet Things J. 3, 637–646. https://doi.org/10.1109/JIOT.2016.2579198.
Sindagi, V.A., Patel, V.M., 2017. Generating High-Quality Crowd Density Maps using
Contextual Pyramid CNNs. https://doi.org/10.48550/arXiv.1708.00953.
Acknowledgments Sindagi, V.A., Patel, V.M., 2018. A survey of recent advances in CNN-based single image
crowd counting and density estimation. Pattern Recognit. Lett. Video Surveillance-
The study was supported by the Fundamental Research Funds for the oriented Biometrics 107, 3–16. https://doi.org/10.1016/j.patrec.2017.07.007.
Vainstein, M.H., Brito, C., Arenzon, J.J., 2014. Percolation and cooperation with mobile
Central Universities (No. 2022ZZCX05K02). agents: Geometric and strategy clusters. Phys. Rev. E 90, 022132. https://doi.org/
10.1103/PhysRevE.90.022132.
References Varas, A., Cornejo, M.D., Mainemer, D., Toledo, B., Rogan, J., Muñoz, V., Valdivia, J.A.,
2007. Cellular automaton model for evacuation process with obstacles. Phys. Stat.
Mech. Its Appl. 382, 631–642. https://doi.org/10.1016/j.physa.2007.04.006.
Chan, A.B., Vasconcelos, N., 2012. Counting people with low-level features and bayesian
Viola, P., Jones, M.J., Snow, D., 2005. Detecting Pedestrians Using Patterns of Motion
regression. IEEE Trans. Image Process. 21, 2160–2177. https://doi.org/10.1109/
and Appearance. Int. J. Comput. Vis. 63, 153–161. https://doi.org/10.1007/s11263-
TIP.2011.2172800.
005-6644-8.
Chen, D., Wang, L., Zomaya, A.Y., Dou, M., Chen, J., Deng, Z., Hariri, S., 2015. Parallel
Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E., 2018. Deep Learning for
simulation of complex evacuation scenarios with adaptive agent models. IEEE Trans.
Computer Vision: A Brief Review. Comput. Intell. Neurosci. 2018, 1–13. https://doi.
Parallel Distrib. Syst. 26, 847–857. https://doi.org/10.1109/TPDS.2014.2311805.
org/10.1155/2018/7068349.
Clancy, C., Hecker, J., Stuntebeck, E., O’Shea, T., 2007. Applications of machine learning
Wang, L., 2020. Image Crowd Counting based on Convolutional Neural Network (Ph.D.).
to cognitive radio networks. IEEE Wirel. Commun. 14, 47–52. https://doi.org/
University of Science and Technology of China. https://doi.org/10.27517/d.cnki.
10.1109/MWC.2007.4300983.
gzkju.2020.000373.
Duives, D.C., Daamen, W., Hoogendoorn, S.P., 2013. State-of-the-art crowd motion
Xie, X., Ji, J., Wang, Z., Lu, L., Yang, S., 2016. Experimental study on the influence of the
simulation models. Transp. Res. Part C Emerg. Technol. 37, 193–209. https://doi.
crowd density on walking speed and stride length. J. Saf. Environ. 16, 232–235.
org/10.1016/j.trc.2013.02.005.
https://doi.org/10.13637/j.issn.1009-6094.2016.04.047.
Fu, Z., Zhou, X., Zhu, K., Chen, Y., Zhuang, Y., Hu, Y., Yang, L., Chen, C., Li, J., 2015.
Xu, M., Ge, Z., Jiang, X., Cui, G., Lv, P., Zhou, B., Xu, C., 2019. Depth Information Guided
A floor field cellular automaton for crowd evacuation considering different walking
Crowd Counting for complex crowd scenes. Pattern Recogn. Lett. 125, 563–569.
abilities. Phys. Stat. Mech. Its Appl. 420, 294–303. https://doi.org/10.1016/j.
https://doi.org/10.1016/j.patrec.2019.02.026.
physa.2014.11.006.
Yu, Y., Zhu, H., Qian, J., Pan, C., Miao, D., 2021. Survey on Deep Learning Based Crowd
Girshick, R., 2015. Fast R-CNN. In: Presented at the Proceedings of the IEEE International
Counting. J. Comput. Res. Dev. 58, 2724–2747.
Conference on Computer Vision, pp. 1440–1448.
Zhang, Z., 2000. A flexible new technique for camera calibration. IEEE Trans. Pattern
Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B., 1998. Support vector
Anal. Mach. Intell. 22, 1330–1334. https://doi.org/10.1109/34.888718.
machines. IEEE Intell. Syst. Their Appl. 13, 18–28. https://doi.org/10.1109/
5254.708428.
17
S. Huang et al. Safety Science 167 (2023) 106285
Zhang, C., Li, H., Wang, X., Yang, X., 2015. Cross-Scene Crowd Counting via Deep Conference on Computer Vision and Pattern Recognition (CVPR), pp. 589–597.
Convolutional Neural Networks. In: Presented at the Proceedings of the IEEE https://doi.org/10.1109/CVPR.2016.70.
Conference on Computer Vision and Pattern Recognition, pp. 833–841. Zheng, X., Zhong, T., Liu, M., 2009. Modeling crowd evacuation of a building based on
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y., 2016. Single-Image Crowd Counting via seven methodological approaches. Build. Environ. 44, 437–445. https://doi.org/
Multi-Column Convolutional Neural Network. In: in: 2016 IEEE Conference on 10.1016/j.buildenv.2008.04.002.
Computer Vision and Pattern Recognition (CVPR). Presented at the 2016 IEEE
18