Deep Learning Fish Recognition in Underwater Using YOLOv8
Deep Learning Fish Recognition in Underwater Using YOLOv8
Abstract— Underwater robotic systems play a crucial role in blurred contours, non-uniform fish movements, and
marine research, ecological monitoring, and autonomous environmental noise necessitate a model capable of
exploration. However, accurately identifying fish species poses dynamically adapting to varying conditions. This project
significant challenges due to factors such as low visibility, changing proposes YOLO8-FASG, an enhanced object detection
lighting conditions, and the swift movements of fish. Traditional
object detection models, including earlier iterations of YOLO, have
framework specifically designed for underwater fish
struggled in these underwater environments. This paper introduces identification. The model builds on the robust foundation of
YOLO8-FASG, a novel deep learning model that enhances the YOLOv8 and incorporates three innovative improvements—
YOLOv8 framework by integrating three key innovations: Alterable AKConv, GAM, and SimSPPF—each addressing critical
Kernel Convolution (AKConv) for adaptive feature extraction, gaps in conventional architectures. Extensive
Global Attention Mechanism (GAM) for improved focus on essential experimentation on a real-world dataset validates that
spatial and channel features, and Simplified Spatial Pyramid YOLO8-FASG outperforms previous models in both
Pooling - Fast (SimSPPF) for efficient multi-scale feature learning. accuracy and speed, setting a new standard for underwater
Experiments conducted using the Fish4Knowledge dataset object detection.
demonstrate that YOLO8-FASG surpasses baseline models,
achieving a precision of 92.7%, a recall of 94.3%, and a mean
Average Precision (mAP) of 88.1% at IoU 50-95. The model is II. LITERATURE SURVEY
optimized for real-time execution, making it an excellent candidate
for deployment in autonomous underwater vehicles (AUVs). This The field of underwater object detection has evolved
research significantly advances the field of intelligent underwater significantly over the past two decades, driven by increasing
exploration by providing a robust, efficient, and scalable detection demand for autonomous marine monitoring, biodiversity
solution. studies, and fisheries management. Traditional image
processing techniques initially dominated the field but were
I. INTRODUCTION later supplanted by the rise of deep learning, which
dramatically enhanced detection accuracy and robustness in
The oceans harbor a vast array of biological diversity, much complex environments. This section reviews key
of which remains unexplored. Understanding marine developments in underwater object detection, the evolution
ecosystems requires extensive monitoring, which has become of YOLO-based frameworks, and the incorporation of
increasingly feasible through the use of autonomous adaptive convolutions and attention mechanisms, leading to
underwater robots. A key functionality of these robotic the design motivations behind YOLO8-FASG.
systems is the real-time detection and classification of marine
species, particularly fish, to support research, conservation, A. Challenges in Underwater Object Detection
and resource management efforts. However, the underwater
environment presents unique challenges for computer vision Underwater imaging poses unique challenges that
systems. Water distorts light, reduces visibility, and differentiate it from terrestrial environments. The presence of
introduces dynamic artifacts such as bubbles, turbidity, and light scattering, absorption, color distortion, and turbidity
color shifts. Additionally, fish are often small, agile, and significantly degrade image quality [1]. Additionally, the
blend into their surroundings, making them difficult targets non-uniform movement of marine organisms, such as fish,
for conventional object detectors. While popular object along with dynamic backgrounds (e.g., water currents,
detection frameworks like YOLOv3, YOLOv5, and bubbles, vegetation), complicates the task of reliable object
YOLOv8 have achieved success in terrestrial settings, their detection. Early techniques, including threshold-based
effectiveness underwater is limited. Challenges such as segmentation and template matching, struggled to maintain
consistency across varying conditions and failed to adapt to such as Squeeze-and-Excitation Networks (SENet) [6]
unpredictable underwater dynamics. introduced channel attention, allowing networks to
B. Evolution from Handcrafted Features to Deep recalibrate channel-wise feature responses adaptively.
Learning Meanwhile, Convolutional Block Attention Modules
(CBAM) [7] combined both spatial and channel attention,
Initially, underwater object detection relied on handcrafted improving the model’s ability to localize and classify objects.
feature extractors such as edge detection (e.g., Sobel, Canny In underwater contexts, attention mechanisms are particularly
operators), histogram-based methods, and shape descriptors valuable due to the noisy and cluttered backgrounds. By
[2]. Although computationally efficient, these methods applying attention, models can dynamically prioritize fish
lacked semantic understanding, often producing brittle regions even when visibility is low or when the fish is
models sensitive to environmental noise. partially camouflaged. Research by Liu et al. [8] integrated
The introduction of convolutional neural networks (CNNs) attention into underwater object detection pipelines,
revolutionized object detection tasks. CNN-based models achieving noticeable improvements in robustness.
demonstrated an ability to learn hierarchical feature The Global Attention Mechanism (GAM) extends these ideas
representations automatically, enabling them to outperform by applying sequential attention across both dimensions,
traditional methods significantly. In underwater domains, enhancing the receptive field and capturing dependencies at
CNNs like AlexNet and VGGNet were adapted for fish a global scale. GAM forms an essential component in
recognition and coral reef monitoring. However, these early YOLO8-FASG, addressing one of the key weaknesses of
CNNs were primarily classification networks and not conventional CNNs in underwater imaging.
designed for real-time object detection or localization.
E. Adaptive Convolutions for Deformable Objects
C. YOLO Series: Real-Time Detection Frameworks
Traditional convolutional layers employ fixed kernel shapes,
The YOLO (You Only Look Once) family of detectors which are often suboptimal when modeling deformable
represented a paradigm shift in object detection. Introduced objects like fish. Deformable Convolutional Networks
by Redmon et al. in 2016, YOLOv1 formulated object (DCNs) [9] introduced the concept of dynamic sampling
detection as a regression problem, enabling fast, real-time points within convolution kernels, allowing the network to
predictions [3]. Subsequent versions, such as YOLOv2 and adjust to object shapes during training.
YOLOv3, improved upon accuracy and robustness by In underwater detection tasks, where fish outlines are
introducing anchor boxes, batch normalization, and feature irregular and movements unpredictable, adaptive
pyramid structures. convolutions enable better feature extraction. By learning the
In underwater settings, YOLOv3 was commonly adopted due optimal receptive fields dynamically, models can more
to its balance between speed and accuracy. For instance, accurately track and localize moving fish even under
Mahmud et al. [4] demonstrated the feasibility of YOLOv3 challenging conditions.
for underwater fish species detection. However, YOLOv3 Alterable Kernel Convolution (AKConv), a core innovation
still faced challenges in detecting small, fast-moving fish due in YOLO8-FASG, is inspired by this principle. AKConv
to its reliance on fixed anchor sizes and lack of adaptive focus allows the model to adjust convolution kernel shapes during
mechanisms. training, effectively capturing the shape variability of
YOLOv5 introduced further optimizations such as auto- different fish species and minimizing false detections.
learning bounding boxes, advanced data augmentation
(Mosaic augmentation), and a more lightweight backbone F. Lightweight Multi-Scale Feature Extraction
(CSPDarknet53). Although YOLOv5 models improved
generalization, they struggled under extreme underwater Multi-scale feature extraction is critical for detecting objects
conditions where object deformations and blurred textures at various sizes, particularly in scenarios where fish may
prevailed. appear at different depths and distances. Traditional Spatial
Recently, YOLOv8, developed by Ultralytics, proposed Pyramid Pooling (SPP) modules were effective but
anchor-free detection, decoupled head design, and additional computationally heavy. Spatial Pyramid Pooling-Fast (SPPF)
performance gains through an optimized training process [5]. attempted to speed up this process but introduced extra
Nevertheless, YOLOv8 still lacked specialized modules to computational complexity through activation functions like
adapt dynamically to deformable or partially occluded SiLU.
underwater objects. To address the need for speed without sacrificing accuracy,
YOLO8-FASG incorporates a Simplified Spatial Pyramid
D. Attention Mechanisms for Feature Enhancement Pooling-Fast (SimSPPF) module. SimSPPF maintains the
benefits of multi-scale feature capture while using faster
Attention mechanisms have emerged as a powerful tool in ReLU activations, improving inference speed — an essential
deep learning to selectively emphasize informative parts of requirement for real-time deployment on underwater robots.
input data while suppressing irrelevant information. Works
G. Summary and Research Gaps
The proposed YOLO8-FASG system builds upon YOLOv8 • Preserving multi-scale feature extraction.
while incorporating three specialized modules to better
handle the complexities of underwater environments. • Reducing computational load, boosting real-time
performance.
To validate the efficiency and reliability of the proposed fish When compared against YOLOv3, YOLOv5, and YOLOv8
detection framework, a comprehensive evaluation was models, YOLO8-FASG demonstrated superior performance
conducted using standard object detection metrics. Precision, in both detection accuracy and inference speed. The
recall, F1-score, and mean Average Precision (mAP) were incorporation of AKConv and GAM proved particularly
used to assess classification and localization accuracy. In effective in improving small object detection and scene
particular, [email protected] and [email protected]:0.95 were computed to understanding.
measure detection performance across varying Intersection
over Union (IoU) thresholds. The system achieved high F. Real-Time Deployment Feasibility
precision and recall values, demonstrating its robustness in
identifying fish under challenging underwater conditions. Inference times were measured to validate real-time
Real-time performance was also evaluated by calculating deployment capabilities. YOLO8-FASG maintained an
frames per second (FPS) on edge devices, confirming the average inference speed of 25 frames per second (FPS) on
model's suitability for deployment in underwater robotic high-resolution underwater footage, making it suitable for
systems. integration into AUVs and edge devices for live fish
monitoring missions.
VI.RESULTS
VII SOFTWARE OUTPUT SCREENS
A. Dataset and Experimental Setup
B. Evaluation Metrics
C. Quantitative Performance
reliable system that can both visually and physically respond
to the environment. The combined functionality of machine
learning inference and embedded control demonstrates the
feasibility of using YOLO8-FASG for real-time underwater
object detection and alert generation, effectively bridging the
gap between intelligent sensing and mechanical actuation.
The results highlight the robustness and practicality of the
system for deployment in autonomous underwater
monitoring platforms.
VIII DISCUSSION
Informed Consent:
Not Applicable.
X REFERENCES