0% found this document useful (0 votes)
83 views

Flood Survivor Detection Through Image Fusion and Yolo Model

This paper is used to detect flood survivor with yolo model

Uploaded by

d4technozgamer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Flood Survivor Detection Through Image Fusion and Yolo Model

This paper is used to detect flood survivor with yolo model

Uploaded by

d4technozgamer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2024 Third International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS)

Flood Survivor Detection through Image Fusion and


Yolo model
2024 Third International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS) | 979-8-3503-6118-6/24/$31.00 ©2024 IEEE | DOI: 10.1109/INCOS59338.2024.10527628

Vinothini.R
M.Tech-AI&ML
School of Computer Science and
Engineering
Vellore Institute of Technology
Vellore – 632014
Tamil Nadu
India
[email protected]

Abstract— Natural disasters like flood happen when water image fusion. This spatial and frequency information
overtakes normally dry places. During rescue missions aimed at separation improves performance and noise robustness in a
saving lives, rescue teams responding to floods encounter variety of applications by making it easier to extract
numerous obstacles. Locating and discovering survivors is a vital significant details and provide a sparse representation.
component of rescue operations. For quick flood survivor
recognition, the suggested approach combines image fusion with CV2 is a Python module that works with image
the YOLO (You Only Look Once) model, offering a novel method. processing and computer vision applications within the
This model utilizes image fusion by merging data from various OpenCV package. The cv2 module, an essential component
sources, including RGB and Near Infrared Red (NIR) images of the OpenCV framework, has functions designed especially
captured by Unmanned Aerial Vehicles (UAVs). In order to for working with images and videos. It is also an important
efficiently identify flood survivors and aid rescue crews in their component of image fusion, which makes possible the
effort to preserve lives, the resulting fused image is processed remarkable process of combining data from several images.
using the YOLO model.
The Unet model is particularly good at capturing fine
Keywords— RGB, NIR, Image Fusion, DWT, CV2, Unet, spatial features and using skip links to maintain context. The
YOLOv8 goal of image fusion is to combine data from different sources
in an easy-to-use manner, and one important way that the
I. INTRODUCTION Unet model helps is by keeping important aspects of each
Flooding is the most destructive type of natural disaster, input image. This skill produces a composite image that
resulting in significant harm to property, lives, and preserves important information across several modalities.
infrastructure. It has severe repercussions for impacted This makes the Unet model a useful tool for applications like
communities and posing risks to ecosystems, human safety, medical imaging, remote sensing, and other situations where
and animal welfare. Proactive rescue planning becomes combining complementing data sources is essential to
essential, ensuring timely relocation to safer areas. improving the overall quality and interpretability of the
Unmanned Aerial Vehicles (UAVs) are the best option for information.
gathering data during emergency situations because of their You Only Look Once version 8, or YOLOv8, is notable
affordability, ease of use, and widespread availability. The for its remarkable accuracy and speed in object detection,
proposed model initially employs image fusion, followed by especially when it comes to identifying people and animals.
the application of YOLOv8 to the fused image for the YOLOv8's simplified architecture allows it to process full
detection of animals and humans. photos in a single forward pass, enabling rapid real-time
Image fusion is widely used in many fields, such as detection. Its capacity to work with little items and different
medical imaging, environmental monitoring, military and scales improves its ability to identify people and animals in
defense, and remote sensing, where it plays a critical role in complicated scenarios. The model's versatility and state-of-
integrating information. Visible images capture colors and the-art performance make it an indispensable tool in a wide
details that are visible to the human eye, closely mimicking range of applications, such as surveillance, wildlife
human visual perceptions. However, because of complex monitoring, and any situation where quick and precise object
environmental conditions, they face limitations like noise and detection especially of people and animals is necessary for
reduced scene information. Infrared images, on the other analysis and decision-making.
hand, are less impacted by outside influences. The essential
component of the fusion process is the smooth integration of
both visible and infrared information to produce an
informative image.
In addition to its ability to analyze images at many
resolutions and its ability to break down images into distinct
scales, the Discrete Wavelet Transform (DWT) is essential to

979-8-3503-6118-6/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: National Institute of Technology - Puducherry. Downloaded on July 25,2024 at 09:11:02 UTC from IEEE Xplore. Restrictions apply.
2024 Third International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS)

II. DATA ACQUISITION

A. Input Data for Image Fusion


The dataset is collected from Kaggle and other sources.
As shown in table I, the dataset consists of 1024 images, of
which half are RGB and the other half are NIR.

TABLE I. INPUT DATA

Sno Image Count


1 RGB 512
2 NIR 512
Figure 1. Flow chart of proposed model

B. Fused Data A. DWT


The combined dataset now comprises 1536 images. In A popular method for combining images is the Discrete
this set, 512 images come from using the DWT method, 512 Wavelet Transform (DWT), which can record frequency and
more come from using the CV2 method for fusion, and the spatial information at various scales. In this fusion process,
remaining 512 images come from using the Unet model (as RGB and near-infrared (NIR) images are combined. The
shown in table II). RGB image is first splitted into Red (R), Green (G), and Blue
TABLE II. FUSED DATA
(B) color channels. Each color channel is then subjected to a
2D Discrete Wavelet Transform (DWT) with boundary
handling using the Symlet 8 (sym8) wavelet filter and the
Sno Fused Count symmetric mode. The resulting low- and high-frequency
Image wavelet coefficients are fused together using an average
fusion method. To reconstruct the fused image, the fused
1 DWT 512 coefficients are subjected to an inverse wavelet transform.
The fused images from the three channels are averaged to
2 CV2 512 create the final fused image. Pixel values are normalized
3 Unet 512 between 0 and 255 to present and store the fused image as an
8-bit image. Signal continuity at image edges is maintained
by the Symlet 8 wavelet filter and the symmetric mode of the
C. Input Data for Yolo model wavelet transform. All in all, by utilizing complementary
features from both the visual and NIR domains, this all-
From the pool of 1536 fused images, 461 images are encompassing process seeks to improve the information
selectively chosen for the YOLOv8 model. Next, 80% of this content of the original images.
selected data is allocated for training, with 13% for
validation, and 7% for testing.
B. CV2
Part of the OpenCV (Open Source Computer Vision)
III. EXPERIMENTAL PROCEDURE
library, the Python CV2 library is made for image processing
Methodology and computer vision applications. It works well with both
images and videos. The CV2 library uses the weighted sum
The step-by-step process is visually represented in Figure technique in image processing to combine two input images:
1. The proposed model initially takes RGB and NIR images near-infrared (NIR) and visual (RGB) images. Weights,
as input, fusing them using Discrete Wavelet Transform represented by alpha and beta, control the blending process
(DWT), CV2, and a Unet model. A number of metrics are by calculating the relative contributions of each source image
used to assess the image fusion process, including peak to the final output without adding extra brightness
signal-to-noise ratio, mean squared error, and entropy. To adjustments. In both visual and infrared images, the algorithm
identify flood survivors, the resultant images are iterates through each pixel pixel-by-pixel. The corresponding
subsequently fed into the YOLOv8 model. The confusion pixel values from the RGB and NIR images are scaled using
matrix, mean average precision, recall, and precision are used the assigned alpha and beta weights to determine the output
to evaluate the YOLO model's performance. pixel value for each pixel. Without changing the brightness
further, these scaled pixel values are then added together, and
the resultant value is then assigned to the corresponding pixel
position in the final image. The entire image is blended by
repeating this pixel-by-pixel blending process, creating a
fused output image that combines the contributions of the two
source images according to predetermined weights.

979-8-3503-6118-6/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: National Institute of Technology - Puducherry. Downloaded on July 25,2024 at 09:11:02 UTC from IEEE Xplore. Restrictions apply.
2024 Third International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS)

C. Unet optimized during the training phase. Standard metrics such as


Image segmentation tasks are a common application of precision, recall and mean Average Precision (mAP) are used
Convolutional Neural Networks (CNNs), like UNet. The to assess YOLOv8’s performance. These metrics provide
architecture is called "UNet" because of its U-shaped design. information about how well the model can identify and
UNet is a popular CNN for image segmentation, as shown in categorize objects, especially those that are relevant to flood
Figure 2. Its architecture consists of an input layer that can response operations.
handle numerous channels of data, which frequently
represent different modalities. Within the network, max IV. RESULTS AND DISCUSSIONS
pooling and convolutional layers extract features and
downsample spatial dimensions along a contracting route. A. Graphs / Tables
Regularization is aided by dropout layers, and training A detailed summary of the evaluation metrics used to rate
process stability is achieved through batch normalization. the image fusion performance is found in Table III. The table
Conv2D transpose layers make feature map upsampling includes information on visual entropy, near-infrared
easier, and concatenation operations combine high-level and entropy, fused entropy, mean squared error (MSE), peak
low-level features in the expansive path. The output layer signal-to-noise ratio (PSNR). In contrast, the Unet model
produces predictions for segmentation, usually with multi- produces better results, as shown by its low MSE, lowest
class segmentation using softmax and binary segmentation fused entropy, and slightly higher PSNR. These results
using sigmoid. Owing to its ability to capture fine spatial collectively indicate enhanced image quality and reduced
details and handle multi-modal inputs, UNet is widely used distortion in the combined images when utilizing the Unet
in various applications. model.
The Keras library makes it easier to implement the U-Net TABLE III. EVALUATION METRICS FOR IMAGE FUSION
paradigm. The U-Net model is optimized for segmentation
tasks and is configured with a specific input shape of (512, Evaluation DWT CV2 Unet
512, 6), where visual (RGB) and near-infrared (NIR) images metrics
are concatenated along the channel axis. Loading,
normalizing, and resizing the visual and NIR components to Visual 196825333 196825333 196825333
conform to the model input specifications are the steps in the Entropy .45 .45 .45
process. The concatenated input is fed into the U-Net model,
which iterates over a range of images to predict segmentation NIR 188942137 188942137 188942137
masks. Segmented visual and NIR images are obtained by Entropy .53 .53 .53
applying these masks to the original images. The segmented Fused 188263183 282641735 192066857
visual and NIR images are applied with predetermined Entropy .65 .37 .04
weights to produce the final fused image. The fused images
are saved in a designated directory after post-processing MSE 106.45 94.32 93.34
steps. In order to achieve image fusion, the process of PSNR 27.86 28.43 28.44
combining complementary data for improved feature analysis
and representation from the visual and near-infrared domains,
this method leverages the U-Net model for semantic
segmentation. The YOLO model's performance metrics are listed in
Table IV. The table presents precision, recall, mean average
precision at 50% Intersection over Union (IoU) (mAP50).
The results provided show how accurately the classes
predicted. Precision measures the accuracy of positive
predictions, recall measures the percentage of actual positive
instances that the model correctly detects, and mean Average
Precision of an object detector shows the accuracy level.

TABLE IV. PERFORMANCE METRICS FOR YOLOV8

Class Precision Recall mAP50


All 0.845 0.729 0.818
Cow 0.655 0.929 0.901
Figure 2. Architecture of Unet model
Dog 0.898 0.533 0.615
D. YOLOv8 Goat 0.822 0.9 0.922
The YOLOv8 model is trained to recognize specific Horse 1 0.472 0.1792
object classes relevant to flood scenarios, namely cow, dog,
goat, horse and person, given their frequent occurrence in Person 0.847 0.814 0.858
flood datasets. A subset of fused images obtained through the
proposed image fusion techniques is utilized for YOLOv8
model training. For robust model performance, the dataset is The overall results are shown in Figure 3 and the
divided into training, validation, and testing sets. To improve confusion matrix is displayed in Figure 4.
accuracy and generalization, the model’s parameters are

979-8-3503-6118-6/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: National Institute of Technology - Puducherry. Downloaded on July 25,2024 at 09:11:02 UTC from IEEE Xplore. Restrictions apply.
2024 Third International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS)

Figure 6. Resultant fused image using cv2


Figure 3. Overall results

Figure 7. Resultant fused image using Unet

The figure 8 illustrates the prediction on test sample, and


figures 9, 10, 11, 12 depict the predictions made on unseen
data.

Figure 4. Confusion matrix

B. Output Screens
The figure 5, 6, 7 illustrates the resultant fused image
obtained using DWT, CV2 and Unet.

Figure 8. Prediction on test data


Figure 5. Resultant fused image using DWT

979-8-3503-6118-6/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: National Institute of Technology - Puducherry. Downloaded on July 25,2024 at 09:11:02 UTC from IEEE Xplore. Restrictions apply.
2024 Third International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS)

C. Comparative analysis
Compared to current flood survivor detection methods,
the proposed model presents a novel combination of image
fusion methods, namely Discrete Wavelet Transform (DWT),
CV2, and a Unet model, along with the effective YOLOv8
for object detection. This distinguishes it from current state-
of-the-art methods, which frequently depend on fusion or
singular detection techniques. It demonstrates the
effectiveness of the model with a thorough evaluation that
includes metrics like mean squared error (MSE), entropy, and
peak signal-to-noise ratio (PSNR) for image fusion, along
with precision, recall, and mean average precision at 50%
Intersection over Union (mAP50) for YOLOv8. Unet's
semantic segmentation adds a layer of complexity to the
Figure 9. Prediction on unseen data (1)
image fusion process by capturing minute spatial details that
improve feature analysis. The model emphasizes flexibility
to various flood scenarios and presents itself as a reliable and
fast way to identify flood survivors.

V. CONCLUSION
The proposed model, integrating image fusion with
YOLOv8, proves effective in challenging flood scenarios
where conventional detection methods fall short. YOLOv8
excels in accuracy, speed, and adaptability, striking a balance
between real-time processing and precision. The model
demonstrates proficiency in identifying common flood-
related classes such as cow, dog, goat, horse, and person.
Notably, it shows exceptional precision, recall, and mean
Average Precision for the cow, goat, and person classes
compared to others, attributed to missing instances. The
future work aims at enhancing the prediction efficiency and
also focus on expanding the capability to detect additional
Figure 10. Prediction on unseen data (2) animal classes.

ACKNOWLEDGMENT
The original version of this document was developed with
encouragement from Mr.S. Rajkumar, an Associate Professor
at the School of Computer Science and Engineering, VIT
University, for which the author is extremely grateful.

REFERENCES
[1] Sharma, S., 2017. Flood-survivors detection using IR imagery on an
autonomous drone.
Figure 11. Prediction on unseen data (3) [2] Li, H., Wu, X.J. and Kittler, J., 2018, August. Infrared and visible
image fusion using a deep learning framework. In 2018 24th
international conference on pattern recognition (ICPR) (pp. 2705-
2710). IEEE.
[3] Ravichandran, R., Ghose, D. and Das, K., 2019, June. UAV based
survivor search during floods. In 2019 International Conference on
Unmanned Aircraft Systems (ICUAS) (pp. 1407-1415). IEEE.
[4] Liu, Y., Dong, L., Ji, Y. and Xu, W., 2019. Infrared and visible image
fusion through details preservation. Sensors, 19(20), p.4556.
[5] Afridi, A., Minallah, N., Sami, I., Allah, M., Ali, Z. and Ullah, S., 2019,
December. Flood rescue operations using artificially intelligent UAVs.
In 2019 15th International Conference on Emerging Technologies
(ICET) (pp. 1-5). IEEE.
[6] Shetty, S.J., Ravichandran, R., Tony, L.A., Abhinay, N.S., Das, K. and
Ghose, D., 2021. Implementation of survivor detection strategies using
drones. In Unmanned Aerial Systems (pp. 417-438). Academic Press.
[7] Gowtham Raj, M., Kalaiarasan, D., Mahavishnu, C., Nithish, P., &
Sathish, V. (2021). "Human Rescue System for Flood Areas Using
OpenCV Computer Vision." International Journal of Innovative
Figure 12. Prediction on unseen data (4) Research in Technology, 7(11), 446. ISSN: 2349-6002.

979-8-3503-6118-6/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: National Institute of Technology - Puducherry. Downloaded on July 25,2024 at 09:11:02 UTC from IEEE Xplore. Restrictions apply.
2024 Third International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS)

[8] Dong, J., Ota, K. and Dong, M., 2021. UAV-based real-time survivor [13] CM, B., 2023. Robust human detection system in flood related images
detection system in post-disaster search and rescue operations. IEEE with data augmentation. Multimedia Tools and Applications, 82(7),
Journal on Miniaturization for Air and Space Systems, 2(4), pp.209- pp.10661-10679.
219. [14] Ummah, K., Hidayat, M.T. and Risano, A.Y.E., 2022. Deep Learning
[9] Shashank Priyadarshi. (2020), ‘Imfusion: An Open Source fusion Implementation on Aerial Flood Victim Detection System. AVIA, 4(2).
desktop app’.URL: https://iamsh4shank.github.io/projects/Imfusion/ [15] Lu, C., Qin, H., Deng, Z. and Zhu, Z., 2023. Fusion2Fusion: An
[10] Zhang, H., Yuan, J., Tian, X. and Ma, J., 2021. GAN-FM: Infrared and Infrared–Visible Image Fusion Algorithm for Surface Water
visible image fusion using GAN with full-scale skip connection and Environments. Journal of Marine Science and Engineering, 11(5),
dual Markovian discriminators. IEEE Transactions on Computational p.902.
Imaging, 7, pp.1134-1147. [16] Wang, X., Hua, Z. and Li, J., 2023. Cross-UNet: dual-branch infrared
[11] Prabhu, B.B., Lakshmi, R., Ankitha, R., Prateeksha, M.S. and Priya, and visible image fusion framework based on cross-convolution and
N.C., 2022. RescueNet: YOLO-based object detection model for attention mechanism. The Visual Computer, 39(10), pp.4801-4818.
detection and counting of flood survivors. Modeling Earth Systems and [17] Li, X., Qian, W., Xu, D. and Liu, C., 2021, February. Image
Environment, 8(4), pp.4509-4516. segmentation based on improved unet. In Journal of Physics:
[12] Sharma, T., Debaque, B., Duclos, N., Chehri, A., Kinder, B. and Conference Series (Vol. 1815, No. 1, p. 012018). IOP Publishing.
Fortier, P., 2022. Deep learning-based object detection and scene [18] Chaikan, P. and Mitatha, S., 2017. Improving the Addweighted
perception under bad weather conditions. Electronics, 11(4), p.563. Function in OpenCV 3.0 Using SSE and AVX Intrinsics. International
Journal of Engineering and Technology, 9(1), p.45.

979-8-3503-6118-6/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: National Institute of Technology - Puducherry. Downloaded on July 25,2024 at 09:11:02 UTC from IEEE Xplore. Restrictions apply.

You might also like