0% found this document useful (0 votes)
31 views8 pages

Deep Learning Fish Recognition in Underwater Using YOLOv8

The document presents YOLO8-FASG, an advanced deep learning model for underwater fish recognition, which enhances the YOLOv8 architecture with innovations like Alterable Kernel Convolution, Global Attention Mechanism, and Simplified Spatial Pyramid Pooling. This model addresses challenges such as low visibility, dynamic fish movements, and environmental noise, achieving high precision and recall rates in real-time detection. YOLO8-FASG is optimized for deployment in autonomous underwater vehicles, significantly advancing underwater object detection capabilities.

Uploaded by

mathewfedrick13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views8 pages

Deep Learning Fish Recognition in Underwater Using YOLOv8

The document presents YOLO8-FASG, an advanced deep learning model for underwater fish recognition, which enhances the YOLOv8 architecture with innovations like Alterable Kernel Convolution, Global Attention Mechanism, and Simplified Spatial Pyramid Pooling. This model addresses challenges such as low visibility, dynamic fish movements, and environmental noise, achieving high precision and recall rates in real-time detection. YOLO8-FASG is optimized for deployment in autonomous underwater vehicles, significantly advancing underwater object detection capabilities.

Uploaded by

mathewfedrick13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Deep Learning-Driven Fish Recognition in Complex Underwater

Scenes Using Modified YOLOv8 Architecture


Sikkandhar Batcha J Ciril Japeth V * Hari Darsan C
Assistant Professor UG Scholar UG Scholar
Department of AI & DS Department of AI & DS Department of AI & DS
KGiSL Institute of Technology KGiSL Institute of Technology KGiSL Institute of Technology
Coimbatore, Tamilnadu, India Coimbatore, Tamilnadu, India Coimbatore, Tamilnadu, India
[email protected] Corresponding author: [email protected]
[email protected]
Mathew Fedrick I
UG Scholar Haris Nihaal S A
Department of AI & DS UG Scholar
KGiSL Institute of Technology Department of AI & DS
Coimbatore, Tamilnadu, India KGiSL Institute of Technology
[email protected] Coimbatore, Tamilnadu, India
[email protected]

Abstract— Underwater robotic systems play a crucial role in blurred contours, non-uniform fish movements, and
marine research, ecological monitoring, and autonomous environmental noise necessitate a model capable of
exploration. However, accurately identifying fish species poses dynamically adapting to varying conditions. This project
significant challenges due to factors such as low visibility, changing proposes YOLO8-FASG, an enhanced object detection
lighting conditions, and the swift movements of fish. Traditional
object detection models, including earlier iterations of YOLO, have
framework specifically designed for underwater fish
struggled in these underwater environments. This paper introduces identification. The model builds on the robust foundation of
YOLO8-FASG, a novel deep learning model that enhances the YOLOv8 and incorporates three innovative improvements—
YOLOv8 framework by integrating three key innovations: Alterable AKConv, GAM, and SimSPPF—each addressing critical
Kernel Convolution (AKConv) for adaptive feature extraction, gaps in conventional architectures. Extensive
Global Attention Mechanism (GAM) for improved focus on essential experimentation on a real-world dataset validates that
spatial and channel features, and Simplified Spatial Pyramid YOLO8-FASG outperforms previous models in both
Pooling - Fast (SimSPPF) for efficient multi-scale feature learning. accuracy and speed, setting a new standard for underwater
Experiments conducted using the Fish4Knowledge dataset object detection.
demonstrate that YOLO8-FASG surpasses baseline models,
achieving a precision of 92.7%, a recall of 94.3%, and a mean
Average Precision (mAP) of 88.1% at IoU 50-95. The model is II. LITERATURE SURVEY
optimized for real-time execution, making it an excellent candidate
for deployment in autonomous underwater vehicles (AUVs). This The field of underwater object detection has evolved
research significantly advances the field of intelligent underwater significantly over the past two decades, driven by increasing
exploration by providing a robust, efficient, and scalable detection demand for autonomous marine monitoring, biodiversity
solution. studies, and fisheries management. Traditional image
processing techniques initially dominated the field but were
I. INTRODUCTION later supplanted by the rise of deep learning, which
dramatically enhanced detection accuracy and robustness in
The oceans harbor a vast array of biological diversity, much complex environments. This section reviews key
of which remains unexplored. Understanding marine developments in underwater object detection, the evolution
ecosystems requires extensive monitoring, which has become of YOLO-based frameworks, and the incorporation of
increasingly feasible through the use of autonomous adaptive convolutions and attention mechanisms, leading to
underwater robots. A key functionality of these robotic the design motivations behind YOLO8-FASG.
systems is the real-time detection and classification of marine
species, particularly fish, to support research, conservation, A. Challenges in Underwater Object Detection
and resource management efforts. However, the underwater
environment presents unique challenges for computer vision Underwater imaging poses unique challenges that
systems. Water distorts light, reduces visibility, and differentiate it from terrestrial environments. The presence of
introduces dynamic artifacts such as bubbles, turbidity, and light scattering, absorption, color distortion, and turbidity
color shifts. Additionally, fish are often small, agile, and significantly degrade image quality [1]. Additionally, the
blend into their surroundings, making them difficult targets non-uniform movement of marine organisms, such as fish,
for conventional object detectors. While popular object along with dynamic backgrounds (e.g., water currents,
detection frameworks like YOLOv3, YOLOv5, and bubbles, vegetation), complicates the task of reliable object
YOLOv8 have achieved success in terrestrial settings, their detection. Early techniques, including threshold-based
effectiveness underwater is limited. Challenges such as segmentation and template matching, struggled to maintain
consistency across varying conditions and failed to adapt to such as Squeeze-and-Excitation Networks (SENet) [6]
unpredictable underwater dynamics. introduced channel attention, allowing networks to
B. Evolution from Handcrafted Features to Deep recalibrate channel-wise feature responses adaptively.
Learning Meanwhile, Convolutional Block Attention Modules
(CBAM) [7] combined both spatial and channel attention,
Initially, underwater object detection relied on handcrafted improving the model’s ability to localize and classify objects.
feature extractors such as edge detection (e.g., Sobel, Canny In underwater contexts, attention mechanisms are particularly
operators), histogram-based methods, and shape descriptors valuable due to the noisy and cluttered backgrounds. By
[2]. Although computationally efficient, these methods applying attention, models can dynamically prioritize fish
lacked semantic understanding, often producing brittle regions even when visibility is low or when the fish is
models sensitive to environmental noise. partially camouflaged. Research by Liu et al. [8] integrated
The introduction of convolutional neural networks (CNNs) attention into underwater object detection pipelines,
revolutionized object detection tasks. CNN-based models achieving noticeable improvements in robustness.
demonstrated an ability to learn hierarchical feature The Global Attention Mechanism (GAM) extends these ideas
representations automatically, enabling them to outperform by applying sequential attention across both dimensions,
traditional methods significantly. In underwater domains, enhancing the receptive field and capturing dependencies at
CNNs like AlexNet and VGGNet were adapted for fish a global scale. GAM forms an essential component in
recognition and coral reef monitoring. However, these early YOLO8-FASG, addressing one of the key weaknesses of
CNNs were primarily classification networks and not conventional CNNs in underwater imaging.
designed for real-time object detection or localization.
E. Adaptive Convolutions for Deformable Objects
C. YOLO Series: Real-Time Detection Frameworks
Traditional convolutional layers employ fixed kernel shapes,
The YOLO (You Only Look Once) family of detectors which are often suboptimal when modeling deformable
represented a paradigm shift in object detection. Introduced objects like fish. Deformable Convolutional Networks
by Redmon et al. in 2016, YOLOv1 formulated object (DCNs) [9] introduced the concept of dynamic sampling
detection as a regression problem, enabling fast, real-time points within convolution kernels, allowing the network to
predictions [3]. Subsequent versions, such as YOLOv2 and adjust to object shapes during training.
YOLOv3, improved upon accuracy and robustness by In underwater detection tasks, where fish outlines are
introducing anchor boxes, batch normalization, and feature irregular and movements unpredictable, adaptive
pyramid structures. convolutions enable better feature extraction. By learning the
In underwater settings, YOLOv3 was commonly adopted due optimal receptive fields dynamically, models can more
to its balance between speed and accuracy. For instance, accurately track and localize moving fish even under
Mahmud et al. [4] demonstrated the feasibility of YOLOv3 challenging conditions.
for underwater fish species detection. However, YOLOv3 Alterable Kernel Convolution (AKConv), a core innovation
still faced challenges in detecting small, fast-moving fish due in YOLO8-FASG, is inspired by this principle. AKConv
to its reliance on fixed anchor sizes and lack of adaptive focus allows the model to adjust convolution kernel shapes during
mechanisms. training, effectively capturing the shape variability of
YOLOv5 introduced further optimizations such as auto- different fish species and minimizing false detections.
learning bounding boxes, advanced data augmentation
(Mosaic augmentation), and a more lightweight backbone F. Lightweight Multi-Scale Feature Extraction
(CSPDarknet53). Although YOLOv5 models improved
generalization, they struggled under extreme underwater Multi-scale feature extraction is critical for detecting objects
conditions where object deformations and blurred textures at various sizes, particularly in scenarios where fish may
prevailed. appear at different depths and distances. Traditional Spatial
Recently, YOLOv8, developed by Ultralytics, proposed Pyramid Pooling (SPP) modules were effective but
anchor-free detection, decoupled head design, and additional computationally heavy. Spatial Pyramid Pooling-Fast (SPPF)
performance gains through an optimized training process [5]. attempted to speed up this process but introduced extra
Nevertheless, YOLOv8 still lacked specialized modules to computational complexity through activation functions like
adapt dynamically to deformable or partially occluded SiLU.
underwater objects. To address the need for speed without sacrificing accuracy,
YOLO8-FASG incorporates a Simplified Spatial Pyramid
D. Attention Mechanisms for Feature Enhancement Pooling-Fast (SimSPPF) module. SimSPPF maintains the
benefits of multi-scale feature capture while using faster
Attention mechanisms have emerged as a powerful tool in ReLU activations, improving inference speed — an essential
deep learning to selectively emphasize informative parts of requirement for real-time deployment on underwater robots.
input data while suppressing irrelevant information. Works
G. Summary and Research Gaps

Despite major advancements, existing underwater object C. YOLO-based Object Detection


detection frameworks still exhibit shortcomings. Key gaps
include poor handling of highly dynamic scenes, difficulty in The emergence of YOLO frameworks marked a major
detecting small and blurred objects, and limited focus breakthrough in real-time object detection. YOLOv1 to
mechanisms for noisy backgrounds. Moreover, most prior YOLOv3 offered end-to-end object localization and
models are computationally intensive and unsuitable for classification, enabling faster and more accurate fish
deployment on lightweight AUVs. detection compared to previous methods. YOLOv3, in
The proposed YOLO8-FASG aims to bridge these gaps by particular, gained popularity in underwater applications due
integrating adaptive convolutions, advanced attention to its speed and simplicity. It utilized anchor boxes and multi-
mechanisms, and efficient feature pooling into a unified, scale feature maps, allowing detection across varying fish
scalable, and real-time underwater detection framework. sizes. However, YOLOv3 struggled in underwater scenarios
for several reasons:
III. EXISTING SYSTEM
• Difficulty in detecting small, blurred, or partially
Historically, underwater object detection has relied on occluded fish.
various traditional methods and early deep learning
frameworks, each with notable strengths and significant • Inflexibility due to fixed anchor box dimensions.
limitations. The objective of existing systems was to provide
accurate identification of marine species such as fish; • No mechanism to dynamically adjust attention to
however, the dynamic and visually complex nature of noisy backgrounds.
underwater environments posed persistent challenges.
YOLOv5 introduced innovations such as auto-learning
A. Traditional Approaches anchors and more efficient CSPBackbone networks, further
improving accuracy. Nevertheless, it still lacked the
In the early stages, underwater object detection primarily flexibility required to model irregular object contours in
used traditional image processing techniques based on highly dynamic underwater environments. YOLOv8 made a
handcrafted features. Methods such as edge detection, leap forward by offering an anchor-free detection mechanism
histogram of gradients (HOG), background subtraction, and and decoupled heads for classification and localization tasks.
motion-based segmentation were widely applied. These This architecture improved detection generalization and
techniques were computationally efficient and relatively reduced manual hyperparameter tuning. Yet, despite these
simple to implement, making them suitable for limited- advancements, YOLOv8's reliance on fixed kernel
resource underwater monitoring devices. However, these convolutions and absence of advanced attention mechanisms
approaches exhibited critical shortcomings. Handcrafted limited its effectiveness in underwater imagery where object
feature methods were highly sensitive to variations in boundaries are often vague and motion is unpredictable.
lighting, turbidity, and camera motion. Slight changes in the
background or fish orientation often resulted in detection D. Limitations of Existing Systems
failures. Furthermore, these systems lacked semantic
understanding; they could detect moving objects but were The primary limitations of existing underwater fish detection
unable to distinguish between different fish species or even systems are summarized as follows:
differentiate fish from background noise like plants or debris.
• Poor Adaptability: Fixed convolution kernels cannot
B. Early Deep Learning Models
model deformable fish shapes.
With the advancement of deep learning, Convolutional
• Background Clutter: Conventional CNNs struggle
Neural Networks (CNNs) became the dominant paradigm for
to filter out noisy underwater backgrounds.
object recognition. Early CNN architectures, such as AlexNet
and VGGNet, were adapted for underwater fish classification
tasks. These models achieved better feature extraction • Small Object Detection: Small, fast-moving fish are
compared to handcrafted methods and demonstrated frequently missed or poorly localized.
improved robustness against noise. Nonetheless, CNN-based
classifiers treated the problem as an image classification • Computational Complexity: Many models are too
task—requiring cropped and pre-segmented fish images. This heavy for real-time deployment on embedded
requirement made real-time autonomous fish detection underwater vehicles.
impractical, as an external segmentation step was needed
before classification, adding significant latency.
These limitations motivate the development of YOLO8-
FASG, which incorporates adaptive convolutional kernels, a
dual-dimension attention mechanism, and efficient pooling D. Simplified Spatial Pyramid Pooling-Fast (SimSPPF)
strategies to address the shortcomings of prior systems while
ensuring real-time applicability. SimSPPF replaces the heavy SPPF module by:

IV. PROPOSED SOLUTION • Using ReLU instead of SiLU activation.

The proposed YOLO8-FASG system builds upon YOLOv8 • Preserving multi-scale feature extraction.
while incorporating three specialized modules to better
handle the complexities of underwater environments. • Reducing computational load, boosting real-time
performance.

This makes YOLO8-FASG ideal for deployment on


A. YOLOv8 Foundation embedded systems in underwater robots.

YOLOv8 serves as the backbone of YOLO8-FASG. It SYSTEM ARCHITECTURE DIAGRAM


features:

• Anchor-Free Design: Simplifies bounding box


predictions.

• Decoupled Head: Separates classification and


localization tasks.

• Efficient Backbone: Based on Cross-Stage Partial


(CSP) architectures for better gradient flow.

B. Alterable Kernel Convolution (AKConv)

AKConv enhances the model’s flexibility by enabling


convolutional kernels to change shape during training. This
helps the model adapt to fish outlines that may vary
dramatically between frames.

Equation(1): Fig. 1. Block Diagram of YOLO8-FASG System


New kernel sampling points = Default points + Learned Architecture
offsets
The system architecture of the YOLO8-FASG-based
AKConv thus allows the model to capture irregular and underwater fish detection and monitoring system is depicted
dynamic shapes more effectively. in Fig. 1. The process begins with capturing an input image,
which undergoes an enhancement stage to improve visibility
C. Global Attention Mechanism (GAM) and correct distortions typical of underwater conditions.
Following enhancement, a thresholding operation is applied
The GAM module applies sequential channel attention and to emphasize key features and suppress background noise.
spatial attention layers, empowering the model to: Enhanced images and pre-labeled datasets are then fed into
the YOLOv8 detection model, where fish species are
• Suppress irrelevant background regions. identified and localized within the scene. The detection
output is processed by an Arduino microcontroller, which
• Highlight significant fish features. acts as the central control unit for external device operations.
The microcontroller drives several output actions: it activates
• Improve feature reuse across layers. a buzzer to provide a sound alert upon successful fish
detection, displays real-time results on an LCD screen for
GAM is critical for handling cluttered underwater scenes human monitoring, and controls a gear motor through a motor
where distinguishing the object from the background is driver to simulate robotic actuation if needed. Power for the
difficult. entire system is managed through a dedicated power supply
unit. This modular setup ensures seamless integration A. Data Acquisition and Preprocessing
between the deep learning detection model and the physical
robotic components, enabling efficient, real-time underwater The Fish4Knowledge dataset was selected for its
fish identification and response. comprehensive coverage of marine species and varied
underwater imaging conditions. To standardize the input and
HARDWARE IMPLEMENTATION maintain a balance between computational efficiency and
feature clarity, all images were resized to 80×60 pixels. To
The hardware implementation of the YOLO8-FASG-based improve the robustness and generalization ability of the
underwater fish detection system integrates several electronic model, several data augmentation strategies—such as
components to bridge the software output with real-world horizontal flipping, random rotation, and brightness
actuation. The core of the system is an Arduino UNO modulation—were implemented. These augmentations
microcontroller, responsible for receiving detection signals simulate real-world underwater variability and help the
from the YOLOv8 output layer. Upon the successful model adapt to diverse visual scenarios.
detection of fish, the Arduino triggers a buzzer to produce an
immediate sound alert and updates an LCD display to B. Feature Extraction with YOLOv8 Backbone
visually indicate the system’s detection status. A motor driver
is interfaced with the Arduino to control the gear motor, Feature representation was achieved using the Cross Stage
simulating a robotic response mechanism based on detection Partial (CSP) backbone embedded in YOLOv8. This
outcomes. backbone architecture utilizes a hierarchy of convolutional
layers that effectively capture spatial details at multiple
levels. The CSP structure enhances the reuse of learned
features while optimizing computation, which is vital for
real-time object detection. The extracted features provide
semantically rich information essential for high-accuracy
localization and classification.

C. Temporal Adaptation via AKConv

To better handle dynamic underwater motion and non-rigid


fish shapes, the traditional convolution layers were replaced
with Alterable Kernel Convolution (AKConv) modules.
These modules introduce flexibility by learning to reshape
kernel patterns based on motion and deformation. During
training, AKConv learns adaptive offset parameters that
allow the convolutional grid to align dynamically with
shifting object contours, enhancing detection accuracy for
agile and irregularly shaped fish.
Fig. 2. Hardware Setup for YOLO8-FASG Implementation
D. Enhanced Feature Focus through Attention
A dedicated power supply unit ensures a stable and consistent
Mechanisms
voltage supply to all modules, including the microcontroller,
motor driver, and peripherals. The LCD screen provides clear
To refine the extracted features further, the Global Attention
textual feedback, showing messages such as “Wait for the
Mechanism (GAM) was employed within the network. GAM
Fish Detection” or “Found a Place More Fish” based on real-
sequentially applies both channel and spatial attention,
time system status. The buzzer serves as an audible
enabling the model to focus on salient fish features while
confirmation to nearby users or operators. Careful integration
diminishing background interference. This targeted
of the components ensures that the entire setup remains
enhancement improves the model’s discriminative capability,
modular, reliable, and capable of functioning in dynamic
particularly in environments with high visual complexity and
underwater-inspired environments. This hardware setup
background noise.
validates the practical applicability of the YOLO8-FASG
system beyond simulation, confirming its readiness for
deployment in autonomous robotic marine systems. E. Multi-Scale Representation via SimSPPF

To capture features at different object scales, the Simple


Spatial Pyramid Pooling-Fast (SimSPPF) module was
integrated. SimSPPF performs efficient pooling across
V. METHDOLOGY
various spatial scales using ReLU activation to maintain low
latency. This module ensures that fish of different sizes,
appearing at varying depths or distances, are effectively YOLO8-FASG achieved a precision of 92.7%, recall of
detected, contributing to comprehensive and scale-invariant 94.3%, mAP@50 of 91.5%, and mAP@50–95 of 88.1%,
recognition. outperforming baseline YOLOv8 results across all metrics.
Improvements were most significant in scenarios involving
small, fast-moving, and partially occluded fish,
demonstrating the model’s robustness in challenging
F. Real-Time Output Processing and Visualization underwater conditions.

Detection outputs are processed through a decoupled head D. Visual Results


that separates classification from bounding box regression,
enhancing modularity and performance. Post-processing Detection results showed that YOLO8-FASG could
involves Non-Maximum Suppression (NMS) to eliminate accurately identify fish species even under poor lighting,
redundant detections. The final output comprises bounding turbidity, and cluttered backgrounds. Bounding box
box coordinates, class probabilities, and species labels— predictions were tighter and more consistent compared to
structured for real-time visualization and further analysis or previous models, reducing the number of false positives and
recording. missed detections significantly.

G. Model Evaluation and Performance Metrics E. Comparison with Existing Systems

To validate the efficiency and reliability of the proposed fish When compared against YOLOv3, YOLOv5, and YOLOv8
detection framework, a comprehensive evaluation was models, YOLO8-FASG demonstrated superior performance
conducted using standard object detection metrics. Precision, in both detection accuracy and inference speed. The
recall, F1-score, and mean Average Precision (mAP) were incorporation of AKConv and GAM proved particularly
used to assess classification and localization accuracy. In effective in improving small object detection and scene
particular, [email protected] and [email protected]:0.95 were computed to understanding.
measure detection performance across varying Intersection
over Union (IoU) thresholds. The system achieved high F. Real-Time Deployment Feasibility
precision and recall values, demonstrating its robustness in
identifying fish under challenging underwater conditions. Inference times were measured to validate real-time
Real-time performance was also evaluated by calculating deployment capabilities. YOLO8-FASG maintained an
frames per second (FPS) on edge devices, confirming the average inference speed of 25 frames per second (FPS) on
model's suitability for deployment in underwater robotic high-resolution underwater footage, making it suitable for
systems. integration into AUVs and edge devices for live fish
monitoring missions.
VI.RESULTS
VII SOFTWARE OUTPUT SCREENS
A. Dataset and Experimental Setup

Experiments were conducted on the Fish4Knowledge dataset,


which contains over 27,000 annotated images of marine life
under various environmental conditions. The dataset was
split into 70% training, 20% validation, and 10% testing
subsets. All training and evaluation processes were
performed on an NVIDIA RTX 4090Ti GPU using PyTorch.

B. Evaluation Metrics

Performance was measured using standard object detection


metrics: Precision, Recall, F1-Score, mean Average Precision
at IoU 50% (mAP@50), and mAP averaged from IoU 50%
to 95% (mAP@50–95). These metrics provide a Fig. 3. Fish Detection Output Using YOLO8-FASG
comprehensive evaluation of detection accuracy, robustness, Interface
and consistency across varying overlap thresholds.

C. Quantitative Performance
reliable system that can both visually and physically respond
to the environment. The combined functionality of machine
learning inference and embedded control demonstrates the
feasibility of using YOLO8-FASG for real-time underwater
object detection and alert generation, effectively bridging the
gap between intelligent sensing and mechanical actuation.
The results highlight the robustness and practicality of the
system for deployment in autonomous underwater
monitoring platforms.

VIII DISCUSSION

The experimental evaluation of the YOLO8-FASG system


demonstrates substantial improvements in underwater fish
detection accuracy and operational reliability. By integrating
Fig. 4. LCD Output Display Showing Fish the Alterable Kernel Convolution, Global Attention
Detection Status Mechanism, and Simplified Spatial Pyramid Pooling-Fast
modules, the system effectively addresses challenges like
background clutter, poor lighting, and irregular fish
movements that typically degrade underwater detection
performance. The hardware-software integration also
confirms the practical viability of the proposed solution,
enabling real-time detection alerts and mechanical actuation.
However, certain limitations were observed, particularly
under extreme conditions such as very high turbidity or
sudden fish swarm movements, where detection accuracy
showed slight degradation. Additionally, while the Arduino
microcontroller proved sufficient for basic output control,
future versions of the system could benefit from more
powerful embedded processors to handle complex real-time
analytics. Despite these challenges, the YOLO8-FASG
model achieves a highly favorable balance between accuracy,
speed, and resource efficiency, making it a strong candidate
for deployment in autonomous underwater monitoring
missions. Further improvements, such as video-based
temporal smoothing and deployment on energy-efficient
Fig. 5. LCD Output Display Showing Fish edge devices, could further enhance system performance and
Detection Status extend its operational capabilities in dynamic aquatic
environments.
The output of the proposed YOLO8-FASG system is
demonstrated through both software-based detection results IX CONCLUSION
and hardware-level responses. On the software side, the
graphical user interface processes underwater images by This study presented YOLO8-FASG, a high-accuracy fish
applying a series of enhancements and thresholding detection model tailored for underwater robotic systems.
operations, feeding them into the YOLOv8 detection model. Building upon the YOLOv8 foundation, the integration of
The output interface displays the loaded image, the Alterable Kernel Convolution, Global Attention Mechanism,
preprocessed and segmented image, and finally provides real- and Simplified Spatial Pyramid Pooling-Fast significantly
time feedback through a detection confirmation message improved detection accuracy, robustness, and processing
indicating "Fish Detected." speed. Experimental results on the Fish4Knowledge dataset
demonstrated substantial gains in precision, recall, and mAP
Alongside software validation, the hardware setup reacts to metrics compared to conventional YOLO models. The
the YOLO8 detection output by triggering an audible alert proposed system proved highly effective in challenging
through a buzzer and updating the LCD display. When no fish underwater environments, handling issues such as turbidity,
are detected, the LCD continuously shows a waiting message. background noise, and object deformation with greater
Upon successful fish detection, the display updates reliability. Moreover, YOLO8-FASG maintained real-time
dynamically to indicate "Found a Place More Fish," signaling processing capability, making it suitable for deployment on
real-time confirmation. This dual-mode output ensures a autonomous underwater vehicles and embedded platforms.
By addressing the key limitations of existing detection Sep. 2010, Paper no. NMFS-F/SPO-121.
frameworks, YOLO8-FASG represents a step forward in
intelligent marine robotics and automated aquatic ecosystem [6] Y.-H. Hsiao, C.-C. Chen, S.-I. Lin, and F.-P. Lin, ‘‘Real-
monitoring. Future work will focus on expanding the model world underwater fish recognition and identification, using
to video-based temporal consistency and optimizing sparse representation,’’ Ecological Informat., vol. 23, pp.
deployment on low-power edge devices for extended field 13–21, Sep. 2014.
operations.
[7] S. Palazzo and F. Murabito, ‘‘Fish species identification
Competing Interests: in real-life underwater images,’’ in Proc. 3rd ACM Int.
The authors declare no competing interests. Workshop Multimedia Anal. Ecol. Data, France, Nov. 2014,
pp. 13–18.
Funding Information:
This research did not receive any specific grant from [8] A. Salman, A. Jalal, F. Shafait, A. Mian, M. Shortis, J.
funding agencies in the public, commercial, or not-for-profit Seager, and E. Harvey, ‘‘Fish species classification in
sectors. unconstrained underwater environments based on deep
learning,’’ Limnol. Oceanogr. Methods, vol. 14, no. 9, pp.
Author Contribution: 570–585, Sep. 2016.
Mathew Fedrick I conceptualized the study and led the
experimental design. Sikkandhar Batcha J supervised the [9] P. Zhuang, L. Xing, Y. Liu, S. Guo, and Y. Qiao,
project. Hari Darsan C and Haris Nihaal S A contributed to ‘‘Marine animal detection and recognition with advanced
software development and data analysis. Ciril Japeth V deep learning models,’’ in Proc. CLEF Working Notes,
supported the literature review and documentation. All 2017, pp. 166–177.
authors reviewed and approved the final manuscript.
[10] J. Zhao, W. Bao, F. Zhang, S. Zhu, Y. Liu, H. Lu, M.
Data Availability Statement: Shen, and Z. Ye, ‘‘Modified motion influence map and
The dataset used in this study (Fish4Knowledge) is publicly recurrent neural network-based monitoring of the local
available at http://groups.inf.ed.ac.uk/f4k/ unusual behaviors for fish school in intensive aquaculture,’’
Aquaculture, vol. 493, pp. 165–175, Aug. 2018.
Research Involving Human and/or Animals:
Not Applicable.

Informed Consent:
Not Applicable.

X REFERENCES

[1] E. Zereik, M. Bibuli, N. Mišković, P. Ridao, and A.


Pascoal, ‘‘Challenges and future trends in marine robotics,’’
Annu. Rev. Control, vol. 46, pp. 350–368, 2018.

[2] Z. Li, D. Cai, J. Wang, Y. Li, G. Gui, X. Sun, N. Wang,


J. Zhang, H. Liu, and G. Wang, ‘‘Machine learning based
dynamic correlation on marine environmental data using
cross-recurrence strategy,’’ IEEE Access, vol. 7, pp.
185121–185130, 2019.
[3] J. Glenn. (2023). Ultralytics YOLOv8. [Online].
Available: https:// github.com/ultralytics/ultralytics

[4] A. Rova, G. Mori, and L. M. Dill, ‘‘One fish, two fish,


butterfish, trumpeter: Recognizing fish in underwater
video,’’ in Proc. MVA, 2007, pp. 404–407.

[5] J. Matai, R. Kastner, G. R. Cutter, and D. A. Demer,


‘‘Automated techniques for detection and recognition of
fishes using computer vision algorithms,’’ in Proc. Nat.
Mar. Fisheries Service Automated Image Process.
Workshop, K. Williams, C. Rooper, and J. Harms, Eds.,

You might also like