0% found this document useful (0 votes)

17 views

A Hybrid Deep Learning Approach for Video Object Detection

The rapid growth of video data in various domains has led to an increased demand for effective and efficient methods to analyze and extract valuable information from videos. Deep learning methods have demonstrated exceptional performance success in object detection, but their performance heavily relies on large- scale labeled datasets. This study proposes a novel model for object detection from video by combining deep learning and transfer learning algorithms.

Uploaded by

International Journal of Innovative Science and Research Technology

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

A Hybrid Deep Learning Approach for Video Object Detection

Uploaded by

International Journal of Innovative Science and Research Technology

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14637077

A Hybrid Deep Learning Approach for

Video Object Detection
Priyanka Panchal1 Dr. Dinesh J. Prajapati2
PhD Research Scholar Associate Professor, Information Technology Department
Gujarat Technological University A. D. Patel Institute of Technology
Ahmedabad, Gujarat, India New V V Nagar, Gujarat, India

Abstract:- The rapid growth of video data in various techniques are responsible for these advancements. Google,
domains has led to an increased demand for effective and Facebook, Microsoft, and Snapchat have all developed
efficient methods to analyze and extract valuable applications as a recent breakthroughs in deep learning and
information from videos. Deep learning methods have computer vision. Vision-based technology has evolved over
demonstrated exceptional performance success in object time from a simple sensing modality to intelligent computing
detection, but their performance heavily relies on large- systems that are able to comprehend their surroundings. Of
scale labeled datasets. This study proposes a novel model late, Object detection has drawn attention, partly due to its
for object detection from video by combining deep extensive scope of potential deployment and partly because of
learning and transfer learning algorithms. The use of the recent advances in the field. Frames are the sequences of
power of CNN to learn spatio temporal features in the images that we see in videos which are played at faster rates
video frames are employed to propose the model. To so that we see motion and continuity in their sequences.
address the limited labeled video data, transfer learning is
employed, which is previously-trained CNN method, such Deep learning has been used extensively in many
as ResNet50, is refined on the UCF101, Sports1M and applications of computer vision, such as classifying images,
Youtube8M Video datasets. Transfer learning enables the recognizing objects within images, and segmenting images
model to learn generalizable features from these rich into meaningful parts, and human pose estimation [1].
datasets, enhancing its ability to detect objects in unseen Detecting objects in videos with accuracy has the potential to
videos. Furthermore, the proposed model incorporates improve video classification, video captioning and other
temporal information by employing LSTM and 3D related surveillance applications. Recently, image object
convolutional networks to capture the motion dynamics detection performance has been boosted by including, well
across consecutive frames. Spatial and temporal features known detection approaches based on deep learning, such as
fusion enhance the robustness and accuracy of object the YOLO [2] or Mask CNN [3]. However, there still exists a
detection. Proposed model is used extensively to evaluate significant gap between the performance of object detection
on the UCF101, Sports1M and YouTube8M Dataset. The on images and video, largely because video data are hostile to
proposed model effectively determines the results that artifacts and clutter as well as challenging aspects like
show localizing and classifying objects in video sequences, occlusions, blur, or rare object poses.
outperforming existing cutting-edge methods. Overall, the
novel research provides a promising approach for object In this research, we focus on two main strategies that
detection in video, showcasing the Deep learning & have been extensively investigated to improve object
transfer learning algorithms' potential in tackling the detection in videos. These strategies aim to address the issues
challenges of limited labeled video data and exploiting the related to object occlusion, motion blur, scale variations, and
spatio-temporal context for improved object detection temporal consistency, which often arise in video-based
performance. scenarios. The first strategy involves the incorporation of
temporal information. Unlike static images, videos provide a
Keywords:- Video Object Detection; Deep Learning; rich temporal context that can be leveraged to increase the
Convolutional Neural Networks; Spatial-Temporal Feature; precision of object detection. Information of Temporal from
LSTM. video can be utilized in various ways, like exploiting motion
cues, modeling temporal dependencies, or employing video-
I. INTRODUCTION based features. By considering the spatio-temporal
characteristics of objects, these approaches aim to enhance the
Humans are able to recognize and notice items in their detection robustness and temporal consistency across frames.
environment with ease, regardless of their location, regardless Several methods based on recurrent neural networks (RNNs),
of are they positioned in an upside-down manner or the colour optical flow, or long short-term memory (LSTM) have been
or texture is wrong, or whether they are altogether obscured. suggested to extract and exploit temporal cues for object
As a result, people make object detection seem easy. To detection in videos. The second strategy focuses on multi-
obtain details about the shapes and objects present in an frame fusion techniques. Instead of analyzing individual
image, computer-based object identification and recognition frames independently, these approaches aim to aggregate
requires a lot of processing. CNN and other cutting-edge information from multiple frames towards additional informed

IJISRT25JAN104 www.ijisrt.com 254

Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14637077
accurate object detection classification. Taking into account between spatiotemporal information and can further learn
the temporal evolution of objects across consecutive frames, fusing spatiotemporal features for more robust fusion in
these methods can mitigate the adverse effects of occlusion the network.
and motion blur. Multi-frame fusion techniques can involve  We then use videos in the UCF101 dataset and
various mechanisms, such as feature aggregation, attention YouTube8M Dataset, label them in a coarse pixelwise
mechanisms, or temporal integration. These approaches fashion using criteria following. They are suitable for
facilitate better object representation and enable more reliable training the network effectively utilizing annotation tools
detection by combining information from multiple frames. labelled objects in each frame. Most of these tools have
The investigation and exploration of these strategies have led something like bounding box drawing tools and labeling
to significant advancements in object detection performance options for each object class. Tools for common
in video sequences. By incorporating temporal information annotation include Labelbox, VGG Image Annotator
and employing multi-frame fusion techniques, researchers (VIA), RectLabel, etc.
have achieved notable improvements in accuracy, robustness,
and temporal consistency. Spatio-temporal methods II. RELATED WORK
incorporate both spatial and temporal information used in
video object recognition. They leverage the temporal  Object Detection in Video
coherence between consecutive frames to improve the Object Detection task involves a continuous tracking
precision of object identification. By considering the motion problem of detecting and determining Objects visible within
patterns of objects, these methods can effectively distinguish each frame and then connecting objects consistently across
between moving objects and static background, leading to frames. Typically, state of the art approaches creates complex
more precise object localization. On the other hand, attention pipelines to address it. Generally, the VOD task can be
mechanisms have emerged as a powerful technique for object categorized into two approaches: Temporal fusing to improve
detection. Attention mechanisms focus on relevant regions detection accuracy as well as to perform video object
within an image or video, allowing the model to selectively detection simultaneously retaining the correct reliability. This
process and extract meaningful features. By assigning higher work proposes object detection, which result is to classify and
weights to important regions, attention mechanisms enhance localize each object in the bounding box, and a number of
the discriminative power of object detection models, enabling techniques have been developed for object recognition. The
them to better capture object details and handle complex outcomes of object recognition involve classifying different
scenes. objects based on the feature extracted. This section provides a
review of different classification methods. This can be done
In addition to the technical aspects, this research by classifying these methods and on this level, we can
objective to analyze the performance of the proposed categories them into methods based on region proposals and
technique using the UCF101 and YouTube8M dataset. The classification There are approaches utilizing region proposals
UCG101 dataset is an extensive video dataset encompassing like R-CNN [4], Faster R-CNN, etc and classification based
diverse human actions and object interactions, making it well- methods like You Only Look Once (YOLO) [5] and single
suited for evaluating spatio-temporal object detection shot detector (SSD) [6] and recently image Net [7] proposed a
approaches. By utilizing This data repository, the proposed new novel approach of object detection from video clips
approach can be thoroughly evaluated, providing insights into called: image Net VID. This challenge places the task of
its performance and potential real-world applications. In identifying and locating objects in the video categories. A
addition to the technical aspects, this research aims to evaluate significant proportion of detection methods which include
proposed approach performance using the UCF101 dataset. time-based information were after being processed framing
The UCF101 dataset is a large-scale video dataset comprising that made the recognition in this competition. To fix
diverse human actions and object interactions, making it well- surrounding frame results, the T-CNN [8] uses information
suited for evaluating spatio-temporal object detection about image motion. MCMOT [9] tackles Utilizing multi-
methods. By utilizing this dataset, the proposed approach can target tracking methodologies for post-processing refinement
be thoroughly evaluated, providing insights into its using a sequence of handmade rules (for example, the
performance and potential real-world applications. confidence threshold and the detecting abrupt changes). In
[10] seq-NMS treats Improving confidence estimation
 In Conclusion, this Paper makes three Primary through post-processing, while in [11] this is done on
Contributions, as outlined below: Convolutional LSTMs operated at the object level tracker. We
re-score the determine the average confidence score for the set
 Neural Network with Spatiotemporal Attention is of bounding boxes within the video sequence. These
proposed for Video Object Detection. Furthermore, this approaches, unfortunately, heavily depend on Video object
architecture preserve the baseline spatial cue, although detection incorporating temporal information post-processing
also has good ability to learn temporal deep typically involves a multi-stage pipeline. The algorithm does
representation, including optical flow in the context of not really concern with the temporal information.
video analysis.
 Then, in the Proposed approach, we present A weight  Two Stage Detectors
learning module for spatial and temporal features using Two-stage detectors employ a two-step approach to
attention. Moreover, it can also assist the Neural Network object detection: (1) generating proposals and (2) predicting
to recognize the relation (which is complementary) novel proposals [12]. Within the proposal generation step of

IJISRT25JAN104 www.ijisrt.com 255

Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14637077
the detector attempts to identify possible image regions efficient mask propagation, and Adaptive Feature Bank with
containing objects. The intent to offer Regions with extensive Uncertain-Region Refinement (AFB-URR) [14], which adapts
coverage, such that each object in the image must be to varying object appearances and refines uncertain regions
contained within at one or more of these proposed regions. for improved accuracy. Conversely, unsupervised VOS [15]
Next, for the subsequent stage, we employ a Deep neural focuses on segmenting prominent objects without manual
network approach to carry out the classification regarding annotations. Recent innovations include Dual Prototype
these proposals, including corresponding discrete class labels. Attention Mechanisms, integrating information across
The region is an object belonging to one of the predefined modalities and frames, and Fake Flow Generation, which
class categories including background or an object from the synthesizes optical flow from images to create training data,
other. Furthermore, the approach may further enhance the achieving benchmark performance. These approaches
localization of that generated by the proposal generator. We considerably enhance segmentation efficiency and accuracy,
then discuss some of the most impactful two stage detections. broadening VOS applications in video analysis and
To adapt to video object detection, temporal context has been understanding.
added into instances like Faster R-CNN.
III. PROPOSED METHODOLOGY
 Video Object Segment
Two fundamental approaches characterize in the domain The proposes hybrid model use a spatiotemporal
of Video Object Segmentation: unsupervised Object attention-based deep learning framework for object detection
Partitioning in Video and semi supervised Segmenting in videos, addressing the challenges of temporal
Objects in Video. In semi-supervised VOS, the task is to inconsistencies, occlusions, motion blur, and limited labelled
conduct object segmentation throughout a video sequence video data. The hybrid novel methodology integrates
using annotations given in the initial frame. Notable advanced spatial and temporal modelling techniques, attention
advancements include Space-Time Memory Networks (STM) mechanisms, and efficient data annotation strategies to
[13], which utilize spatio-temporal correspondences for enhance object detection performance.

Fig 1 Hybrid CNN-LSTM Proposed Model for Video-Based Object Detection

 Problem Formulation sequence and T denotes the total frame count for the video,
Within the domain of detection of object from videos, the task is to determine the bounding boxes and class labels
the goal is to accurately identify and localize objects across a for objects present in each frame. Additionally, the model
video frame sequence. Given a video segment V = {I1, I2, ..., must maintain temporal consistency across frames to handle
IT}, at which IT corresponds to the t-th frame within the video object motion, occlusions, and appearance changes, which are

IJISRT25JAN104 www.ijisrt.com 256

Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14637077
common in videos. A video sequence V = {I1, I2, ..., IT}, video is divided into F equal-sized segments, where each
where each frame IT is an image tensor of dimensions H × W subsection represents a temporal segment of the video. This
× C, where H, W, and C represent height, width, and the segmentation ensures that the video is divided into
number of color channels, respectively (typically 3 for RGB). manageable chunks while preserving the temporal context
A set of annotated training video sequences D = {V1, V2, ..., within each segment. After segmenting the video, we
VN}, where each video Vi is labeled with ground truth randomly sample one frame from each subsection. This
Bounding boxes along with object class labels for the objects sampling strategy reduces redundancy by avoiding the
in each frame. Considering a video sequence V, the objective processing of multiple consecutive frames that are often
is to output the predicted bounding boxes ={bt1,bt2,...,btm} and similar in content, thus improving computational efficiency
class categories ={ct1,ct2,...,ctm} for each frame It, where without sacrificing critical information. Each sampled frame
btk=(xtk, ytk, wtk, htk) represents the apatial positions the is then passed through a single 2D Convolutional Neural
bounding box (Leftmost corner at the top (xtk, ytk), width wtk , Network (CNN) with shared weights to extract spatial
height htk) for Object k located within the frame t. ctk is the features. The use of a shared CNN allows for consistency in
class label for object k in frame t. The number of detected the feature extraction process across frames, maintaining a
objects is represented by m within the frame. unified representation for each frame while also capturing the
unique spatial characteristics within them. The output of this
 Feature Learning process is a feature map that captures the spatial features of
To effectively classify different objects, it is crucial to the sampled frames. By focusing on one frame per temporal
learn visual features that offer a robust and semantically segment, we ensure that redundant information between
discriminative representation. This can be attributed to the adjacent frames is minimized, allowing motivate to encourage
ability of these features to generate representations analogous the model to concentrate on more salient spatial information
to those observed in complex cells within the human brain. in the context of each temporal neighborhood. This approach
The inherent diversity in object appearances, combined with effectively captures the spatial structure of objects within their
varying illumination and background conditions, significantly temporal context, ensuring that the extracted features are
hinders the manual design of robust feature descriptors for temporally coherent. Furthermore, the spatiotemporal
general object recognition. The goal of feature learning is to Transformer combines these spatial features with temporal
discover robust visual representations that allow the model to information across frames. By aggregating information from
accurately classify and recognize objects. non-adjacent frames, we preserve the overall motion and
temporal dynamics of the video. This strategy enables to
Traditional manual feature extraction methods struggle equip the model to capture broad temporal dependencies,
with varying appearances, illumination conditions, and which are essential for tasks like motion tracking and object
backgrounds, making them less reliable for diverse video detection in video sequences. The spatiotemporal Transformer
scenarios. Deep learning, especially Convolutional Neural architecture effectively balances spatial and temporal feature
Networks (CNNs) [16], automates extraction of feature by extraction, addressing the challenges posed by motion,
learning hierarchical representations from raw data, which are occlusions, and scale variations in video data. This method
more resilient to variations. In the proposed methodology, allows to enhance the learning capabilities of the model
pre-trained CNNs (e.g., ResNet) are fine-tuned on large comprehensive and adaptable representations, improving its
datasets to capture spatial features. Additionally, temporal capacity to detect objects in complex video sequences, while
features are extracted using methods like optical flow and reducing redundant information and improving computational
LSTM networks, enabling the model to understand object efficiency.
motion across frames. This combination of spatial and
temporal features helps create robust and semantically IV. RESULT AND DISCUSSION
meaningful representations, enhancing object detection in
complex video sequences. We begin this section by evaluating our proposed
method on three publicly known datasets object detection.
 Spatiotemporal Deep Feature Extraction Subsequently examine spatial dedicated to spatial localization
The spatio-temporal Transformer, illustrated in Figure 1, and a temporal attention mechanism dedicated to temporal
have developed in this part. The spatio-temporal Transformer localization. We conducted an experimental evaluation of our
considers the entire video sequence as input. Subsequently, method public benchmark datasets UCF101, Sports1M and
the video is partitioned into F equal-sized segments. YouTube8M dataset.
Subsequently, a single frame is randomly extracted from each
subsection. A single 2D CNN (with shared weights) is  Datasets
utilized to generate feature maps from the sampled frames. The UCF101 dataset is widely recognized as a widely
This approach effectively captures the spatial feature used dataset for action recognition and Spatio-temporal
representations of video frames within their temporal analysis tasks on video. 13,320 video clips are included in
neighborhood while mitigating redundancy between adjacent this dataset, split into 101 action categories, covering a wide
frames. In this section, we propose the spatiotemporal variety of human activities such as sports, daily activities,
Transformer [16] to effectively extract both spatial and and other interactive scenarios. In proposed model
temporal features from video sequences. The spatiotemporal experiments, the UCF101 dataset is utilized to examine the
Transformer, as illustrated in Figure 1, processes the entire methodology capability to classify and detect objects within
video as the input source. The process begins by splitting the action sequences. Specifically, it serves to evaluate the

IJISRT25JAN104 www.ijisrt.com 257

Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14637077
effectiveness of the spatial feature extraction techniques and speed, enabling fine-tuning of pre-trained models like ResNet
the spatiotemporal attention mechanisms, allowing us to test while preventing overshooting of the optimal solution.
how well the model can extract relevant spatial information Regarding batch size, a value of 16 is selected based on
and capture temporal dynamics across action videos. recent studies that highlight the trade-off between
computational efficiency and generalization. A batch size of
The Sports1M dataset contains 1 million YouTube 16 allows for a manageable training time while still providing
videos, organized into 487 sports categories, and covers a robust generalization, avoiding the risks of overfitting
diverse array of sports activities. It features a wide range of associated with larger batch sizes. The temporal context
object appearances and motion patterns, making it window is set to 5 frames based on the observed impact on
particularly challenging for object detection models. In detection accuracy. A window size of 5 frames captures
proposed model experiments, this dataset is used to evaluate sufficient temporal information to account for motion
the model's capability for object detection and localization dynamics without introducing excessive complexity.
sports-related objects, especially in dynamic, motion-heavy Additionally, our experiments incorporate Adam optimizer
scenes. The variations in object appearance and movement with a learning rate decay schedule and data augmentation
provide a rigorous test for the model's performance in techniques like random cropping, flipping, and rotation to
complex, fast-paced environments. further improve model robustness and generalization. These
parameter choices are fine-tuned to ensure that the model
The YouTube-8M dataset is a huge collection of over effectively captures both spatial and temporal features,
8 million YouTube video URLs, organized into 4,800 video delivering precise detection and localizing objects in dynamic
categories. It encompasses a broad spectrum of content, video scenes.
including human-object interactions, diverse scenes, and
dynamic environments. This dataset is commonly used in  Performance and Analysis evaluation Protocol
studies that focus on large-scale video classification, To thoroughly investigate the efficiency of the proposed
localization, and the detection of multiple objects in complex, object detection model, we utilize nnumerous well-
varied scenes. Additionally, YouTube-8M provides a established Standard evaluation metrics in computer vision
valuable resource for evaluating the scalability of deep tasks. These include recall and precision metrics, which
learning models in real-world applications, helping to assess evaluate the accuracy of detected bounding boxes and the
how well models generalize across a wide range of video The model's capacity to comprehensively identify all relevant
content. objects, respectively. Both metrics are calculated per frame,
and their mean is then computed across the entire video
 Parameter Sensitivity Study sequence to present an aggregate measure of the model's
A crucial aspect of deep learning models is their accuracy. Additionally, we calculate Mean Average Precision
sensitivity to hyperparameters. In this study, we perform a (mAP), a widely used performance evaluation metric in
sensitivity analysis to determine the effect of key parameters object detection that summarizes the recall and precision over
on the model's efficacy the proposed object detection model. several intersection-over-union (IoU) thresholds (e.g., IoU >
In proposed approach for object detection from video, we 0.5). This provides a more nuanced view of capacity of
evaluate key hyperparameters—learning rate, batch size, and model to localize objects precisely in different contexts.
temporal context window—to optimize model performance. These metrics help capture both the spatial accuracy of object
For the learning rate, we find that a lower value, such as detection and its temporal consistency across frames.
10−4, strikes a balance between stability and convergence

Fig 2 Object Detection and Localization from Video

IJISRT25JAN104 www.ijisrt.com 258

Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14637077
To compare the effectiveness of the proposed object and Temporal Consistency. The results from notable research
detection model, which integrates spatiotemporal feature papers, such as T-CNN, SlowFast Networks, Video Faster R-
extraction, transfer learning, and advanced deep learning CNN, and Spatiotemporal Attention Models (STAM), will be
techniques, we will evaluate it against numerous leading- used to assess how the proposed model performs in these
edge approaches in video object detection. Key performance areas, providing insights into its effectiveness and
metrics for this comparison include Precision, Recall, Mean improvements in object detection tasks within video
Average Precision (mAP), Intersection-over-Union (IoU), sequences.

Table1 Performance Measurement Parameters for the Proposed Work

Metric Parameter Value (%)
Precision 89%
Recall 85%
Mean Average Precision (mAP) 80%
IoU (Intersection-over-Union) 75%
Frames Evaluated 100% (Entire video sequence)
Temporal Consistency 87%

Evaluation metrics include Precision which measures object detection in a video stream, and thus constrains the
the accuracy of detected objects by minimization of false performance in dynamic sequences.
positives as well as Recalled that assess the capability of
detection of all relevant objects. Evaluation is on mean The table compares the proposed model with T-CNN,
Average Precision (mAP) to get detection performance over SlowFast, Video Faster R-CNN, and STAM across key
thresholds, and Intersection of Union (IoU) to determine the metrics. The proposed model achieves the highest precision
level of correspondence between predicted and ground truth (89%), recall (85%), mAP (80%), IoU (75%), and temporal
boxes. Temporal Consistency is a stability metric of area consistency (87%), showcasing its superior object detection
performance and robustness over existing methods.

Table 2 Performance Comparison Table

Metric Proposed T-CNN SlowFast Video Faster STAM
Model (Temporal CNN) Networks R-CNN (Spatio temporal Attention Model)
Precision 89% 83% 86% 82% 85%
Recall 85% 80% 83% 78% 81%
Mean Average Precision 80% 74% 76% 73% 77%
(mAP)
IoU 75% 72% 74% 70% 72%
Temporal Consistency 87% 80% 82% 78% 84%

Fig 3 Training Vs. Validation Loss

IJISRT25JAN104 www.ijisrt.com 259

Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14637077

Fig 4 Training Vs. Validatiom Accuracy

The graphs generated during the training and testing suggests that the model is generalizing well without
process of the novel approach of object detection model significant overfitting. Accuracy vs. Epoch Plot (Training
provide key insights into its performance over 20 epochs. and Validation) demonstrates continuous increase in
The Training and Validation Loss vs Epochs graph illustrates accuracy for both training and validation sets. By the epoch
that indicates that the model is effectively learning, as both of 20th, the training accuracy achieves a level near 99%,
the training and validation losses decrease steadily to while the validation accuracy becomes stable at 97%,
minimize errors on both the training and unseen data. The demonstrating that the model is successfully detecting objects
slight discrepancy between training and validation loss in both seen and unseen video frames.

Fig 5 Mean Average Precision (mAP) vs Epochs: Model Performance in Object Detection

IJISRT25JAN104 www.ijisrt.com 260

Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14637077
V. CONCLUSION [7]. Han M, Wang Y, Chang X, Qiao Y. Mining inter-
video proposal relations for video object detection.
In this proposed research model, a novel video object InComputer Vision–ECCV 2020: 16th European
detection approach for video sequences that combines deep Conference, Glasgow, UK, August 23–28, 2020,
learning and transfer learning techniques to resolve the issue Proceedings, Part XXI 16 2020 (pp. 431-446).
of unique challenges of video-based object detection. By Springer International Publishing.
leveraging pre-trained convolutional neural networks (CNNs) [8]. Zhou Q, Li X, He L, Yang Y, Cheng G, Tong Y, Ma
like ResNet, fine-tuned on large-scale video datasets such as L, Tao D. TransVOD: end-to-end video object
UCF101, Sports1M, and YouTube-8M, the model extracts detection with spatial-temporal transformers. IEEE
robust spatial features. Temporal information is captured Transactions on Pattern Analysis and Machine
through spatiotemporal attention mechanisms and temporal Intelligence. 2022 Nov 23;45(6):7853-69.
models like LSTMs or 3D CNNs, eenhancing the model's [9]. Pray Somaldo PS, Dina Chahyati DC. Comparison of
capacity to comprehend object motion and maintain temporal FairMOT-VGG16 and MCMOT Implementation for
consistency. Experimental evidence suggests that the novel Multi-Object Tracking and Gender Detection on Mall
proposed model demonstrates superior performance CCTV. Jurnal Ilmu Komputer dan Informasi.
compared to existing cutting-edge approachs in detection 2021;14(1):49-64.
accuracy, robustness, and temporal consistency, with [10]. Pal SK, Pramanik A, Maiti J, Mitra P. Deep learning
pperformance indicators like Mean Average Precision (mAP) in multi-object detection and tracking: state of the art.
reaching 92%. This demonstrates the model’s effectiveness Applied Intelligence. 2021 Sep;51:6400-29.
in both detecting and localizing objects, even in complex and [11]. Qasim AB, Pettirsch A. Recurrent neural networks for
dynamic video scenes. With strong generalization across video object detection. arXiv preprint
various datasets, the model is demonstrating strong potential arXiv:2010.15740. 2020 Oct 29.
for real-world applications in autonomous driving, [12]. Lohia A, Kadam KD, Joshi RR, Bongale AM.
surveillance, and video analytics. Overall, the proposed Bibliometric analysis of one-stage and two-stage
methodology shows great potential for effectively tackling object detection. Libr. Philos. Pract. 2021 Feb
video object detection tasks, combining spatial and temporal 1;4910:34.
feature extraction with transfer learning, and can be further [13]. Oh SW, Lee JY, Xu N, Kim SJ. Space-time memory
optimized with advanced attention mechanisms and larger networks for video object segmentation with user
datasets to improve scalability and performance. guidance. IEEE transactions on pattern analysis and
machine intelligence. 2020 Jul 13;44(1):442-55.
REFERENCES [14]. Hong L, Zhang W, Chen L, Zhang W, Fan J. Adaptive
selection of reference frames for video object
[1]. Zhu H, Wei H, Li B, Yuan X, Kehtarnavaz N. A segmentation. IEEE Transactions on Image
review of video object detection: Datasets, metrics Processing. 2021 Dec 29;31:1057-71.
and methods. Applied Sciences. 2020 Nov [15]. Gao M, Zheng F, Yu JJ, Shan C, Ding G, Han J. Deep
4;10(21):7834. learning for video object segmentation: a review.
[2]. Gothane S. A practice for object detection using Artificial Intelligence Review. 2023 Jan;56(1):457-
YOLO algorithm. International Journal of Scientific 531.
Research in Computer Science, Engineering and [16]. Kumar B, Singh AK, Banerjee P. A deep learning
Information Technology. 2021 Apr;7(2):268-72. approach for product recommendation using resnet-50
[3]. Bertasius G, Torresani L. Classifying, segmenting, cnn model. In2023 International Conference on
and tracking object instances in video with mask Sustainable Computing and Smart Systems (ICSCSS)
propagation. InProceedings of the IEEE/CVF 2023 Jun 14 (pp. 604-610). IEEE.
Conference on Computer Vision and Pattern [17]. Jain S, Gajbhiye S, Jain A, Tiwari S, Naithani K. A
Recognition 2020 (pp. 9739-9748). Quarter Century Journey: Evolution of Object
[4]. Zhang H, Chang H, Ma B, Wang N, Chen X. Detection Methods. In2024 Fourth International
Dynamic R-CNN: Towards high quality object Conference on Advances in Electrical, Computing,
detection via dynamic training. InComputer Vision– Communication and Sustainable Technologies
ECCV 2020: 16th European Conference, Glasgow, (ICAECT) 2024 Jan 11 (pp. 1-8). IEEE.
UK, August 23–28, 2020, Proceedings, Part XV 16 [18]. Sahoo PK, Panda MK, Panigrahi U, Panda G, Jain P,
2020 (pp. 260-275). Springer International Publishing. Islam MS, Islam MT. An Improved VGG-19 Network
[5]. Diwan T, Anirudh G, Tembhurne JV. Object detection Induced Enhanced Feature Pooling For Precise
using YOLO: Challenges, architectural successors, Moving Object Detection In Complex Video Scenes.
datasets and applications. multimedia Tools and IEEE Access. 2024 Mar 27.
Applications. 2023 Mar;82(6):9243-75. [19]. Jiao L, Zhang R, Liu F, Yang S, Hou B, Li L, Tang X.
[6]. Deng J, Pan Y, Yao T, Zhou W, Li H, Mei T. Single New generation deep learning for video object
shot video object detector. IEEE Transactions on detection: A survey. IEEE Transactions on Neural
Multimedia. 2020 Apr 23;23:846-58. Networks and Learning Systems. 2021 Feb
3;33(8):3195-215.

IJISRT25JAN104 www.ijisrt.com 261

Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14637077
[20]. Cui Y, Yan L, Cao Z, Liu D. Tf-blender: Temporal
feature blender for video object detection.
InProceedings of the IEEE/CVF international
conference on computer vision 2021 (pp. 8138-8147).
[21]. Zhao W, Zhang J, Li L, Barnes N, Liu N, Han J.
Weakly supervised video salient object detection.
InProceedings of the IEEE/CVF conference on
computer vision and pattern recognition 2021 (pp.
16826-16835).
[22]. Xu C, Zhang J, Wang M, Tian G, Liu Y. Multilevel
spatial-temporal feature aggregation for video object
detection. IEEE Transactions on Circuits and Systems
for Video Technology. 2022 Jun 16;32(11):7809-20.

IJISRT25JAN104 www.ijisrt.com 262

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
100% (8)
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
148 pages
How To Hack Atm
87% (15)
How To Hack Atm
1 page
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
95% (20)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
100% (10)
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
708 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
The Fabric of Reality
100% (1)
The Fabric of Reality
6 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
20 pages
Machine Learning For Humans
100% (4)
Machine Learning For Humans
97 pages
3.15 T-CNN Tubelets With Convolutional Neural Networks For Object Detection From Videos
No ratings yet
3.15 T-CNN Tubelets With Convolutional Neural Networks For Object Detection From Videos
11 pages
Journsl To Publish Research Paper
No ratings yet
Journsl To Publish Research Paper
15 pages
Detect To Track and Track To Detect
No ratings yet
Detect To Track and Track To Detect
10 pages
Final Report - Removed
No ratings yet
Final Report - Removed
43 pages
ObjectDetectionwithConvolutionalNeuralNetworks
No ratings yet
ObjectDetectionwithConvolutionalNeuralNetworks
12 pages
Object Detection Using Tensorflow....
No ratings yet
Object Detection Using Tensorflow....
9 pages
Object Detection in Videos by High Quality Object Linking
No ratings yet
Object Detection in Videos by High Quality Object Linking
7 pages
References
No ratings yet
References
2 pages
Final Report
No ratings yet
Final Report
62 pages
2023 - A Novel Deep Convolutionalencoder Decoder Network Application To Moving Object Detection in Videos
No ratings yet
2023 - A Novel Deep Convolutionalencoder Decoder Network Application To Moving Object Detection in Videos
15 pages
Object Detection With Deep Learning: A Review
No ratings yet
Object Detection With Deep Learning: A Review
21 pages
Deep Learning For Object Detection and Segmentation in Videos Toward An Integration With Domain Knowledge
No ratings yet
Deep Learning For Object Detection and Segmentation in Videos Toward An Integration With Domain Knowledge
15 pages
Iván García Aguilar Automated Labeling of Training
No ratings yet
Iván García Aguilar Automated Labeling of Training
8 pages
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
No ratings yet
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
7 pages
Literature Survey For Robotics
No ratings yet
Literature Survey For Robotics
6 pages
Object Detection Using CNN
No ratings yet
Object Detection Using CNN
6 pages
Real-Time Object Detection Using Deep Learning: Journal of Advances in Mathematics and Computer Science June 2023
No ratings yet
Real-Time Object Detection Using Deep Learning: Journal of Advances in Mathematics and Computer Science June 2023
10 pages
DiffusionVID Denoising Object Boxes With SpatioTemporal Conditioning For Video Object Detection
No ratings yet
DiffusionVID Denoising Object Boxes With SpatioTemporal Conditioning For Video Object Detection
11 pages
Realtime Visual Recognition in Deep Convolutional Neural Networks
No ratings yet
Realtime Visual Recognition in Deep Convolutional Neural Networks
13 pages
[2025-AEJ]Object detection in real-time video surveillance using attention based transformer-YOLOv8 model
No ratings yet
[2025-AEJ]Object detection in real-time video surveillance using attention based transformer-YOLOv8 model
14 pages
Sensors 22 04833
No ratings yet
Sensors 22 04833
17 pages
1 s2.0 S0045790618319682 Main
No ratings yet
1 s2.0 S0045790618319682 Main
11 pages
Nivetha Me P2 Report
No ratings yet
Nivetha Me P2 Report
86 pages
Python
No ratings yet
Python
5 pages
Fin Irjmets1684232858
No ratings yet
Fin Irjmets1684232858
9 pages
An_Investigation_of_Deep_Neural_Network_based_Techniques_for_Object_Detection_an
No ratings yet
An_Investigation_of_Deep_Neural_Network_based_Techniques_for_Object_Detection_an
6 pages
A Study On Real Time Object Detection Using Deep Learning IJERTV11IS050269
No ratings yet
A Study On Real Time Object Detection Using Deep Learning IJERTV11IS050269
7 pages
Ijlbps 6620dd20c5747
No ratings yet
Ijlbps 6620dd20c5747
8 pages
Computer Vision 3
No ratings yet
Computer Vision 3
38 pages
1-realtimeobjectdetection
No ratings yet
1-realtimeobjectdetection
6 pages
Leveraging Computer Vision and Natural Language Processing for Object Detection and Localization
No ratings yet
Leveraging Computer Vision and Natural Language Processing for Object Detection and Localization
11 pages
Real Time Object Detection Using Deep Learning
No ratings yet
Real Time Object Detection Using Deep Learning
6 pages
MINI PROJECT SYNOPSIS
No ratings yet
MINI PROJECT SYNOPSIS
6 pages
Application of Deep Learning For Object Detection
No ratings yet
Application of Deep Learning For Object Detection
12 pages
Object Detection
No ratings yet
Object Detection
13 pages
Final Project2 (1)
No ratings yet
Final Project2 (1)
46 pages
E3sconf Iconnect2023 04032
No ratings yet
E3sconf Iconnect2023 04032
11 pages
Social Distance
No ratings yet
Social Distance
18 pages
Second Progress Report UID - 17BCS2127
No ratings yet
Second Progress Report UID - 17BCS2127
13 pages
General Framework For Object Detection
No ratings yet
General Framework For Object Detection
9 pages
Doaa Nasser Alghamdi-442202873
No ratings yet
Doaa Nasser Alghamdi-442202873
5 pages
2802 8020 1 PB
No ratings yet
2802 8020 1 PB
3 pages
Object Detection and Recognition
No ratings yet
Object Detection and Recognition
1 page
(PDF) Overview of Computer Vision
No ratings yet
(PDF) Overview of Computer Vision
4 pages
Final Review
No ratings yet
Final Review
26 pages
Nivetha Me Phase1rep
No ratings yet
Nivetha Me Phase1rep
57 pages
Smart Video Monitoring: Advanced Deep Learning for Activity and Object Recognition
No ratings yet
Smart Video Monitoring: Advanced Deep Learning for Activity and Object Recognition
5 pages
Overview_of_object_detection_based_on_deep_learnin
No ratings yet
Overview_of_object_detection_based_on_deep_learnin
7 pages
s11042-020-09964-6
No ratings yet
s11042-020-09964-6
21 pages
Full
No ratings yet
Full
9 pages
Real Time Object Detection With Deep Learning and OpenCV
No ratings yet
Real Time Object Detection With Deep Learning and OpenCV
5 pages
Object Detection and Recognition System (Using TensorFlow)
No ratings yet
Object Detection and Recognition System (Using TensorFlow)
8 pages
Object Detection using ELAN
No ratings yet
Object Detection using ELAN
6 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Object Detectionusing Machine Learningand Deep Learning
No ratings yet
Object Detectionusing Machine Learningand Deep Learning
9 pages
Object Detection From Video
No ratings yet
Object Detection From Video
1 page
Object Detection
No ratings yet
Object Detection
17 pages
Uncertainty Theories and Multisensor Data Fusion
From Everand
Uncertainty Theories and Multisensor Data Fusion
Alain Appriou
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Monte Carlo-Based Modeling of 2-D Ising Systems Using Metropolis Algorithm, Simulation Techniques, Thermodynamic Behavior and Magnetization Patterns
No ratings yet
Monte Carlo-Based Modeling of 2-D Ising Systems Using Metropolis Algorithm, Simulation Techniques, Thermodynamic Behavior and Magnetization Patterns
16 pages
Investigating the Interplay between Climate Change and Sustainable Environment Development: Challenges, Strategies and Future Directions
No ratings yet
Investigating the Interplay between Climate Change and Sustainable Environment Development: Challenges, Strategies and Future Directions
11 pages
Assessment of Underground Water Quality of Gosa Landfill Site of the Federal Capital Territory, Abuja Nigeria
No ratings yet
Assessment of Underground Water Quality of Gosa Landfill Site of the Federal Capital Territory, Abuja Nigeria
11 pages
Crude Oil Price Volatility and its Impact on Nigeria’s Balance of Trade: An Empirical Assessment (2000–2023)
No ratings yet
Crude Oil Price Volatility and its Impact on Nigeria’s Balance of Trade: An Empirical Assessment (2000–2023)
13 pages
Transition to Telepsychotherapy: Experiential Perspective of Debutant Therapists
No ratings yet
Transition to Telepsychotherapy: Experiential Perspective of Debutant Therapists
6 pages
Perception, Attitude, and Readiness in Artificial Intelligence Adoption among Academic Librarians in the Bicol Region Librarians Council (BRLC)
No ratings yet
Perception, Attitude, and Readiness in Artificial Intelligence Adoption among Academic Librarians in the Bicol Region Librarians Council (BRLC)
6 pages
Developing Gamified Educational Technologies to Enhance Learning and Motivate Student Engagement in Education: A Quantitative Study Using Human-Computer Interaction (HCI)
No ratings yet
Developing Gamified Educational Technologies to Enhance Learning and Motivate Student Engagement in Education: A Quantitative Study Using Human-Computer Interaction (HCI)
16 pages
Unlocking the Therapeutic Power of Coriander: A Review of Coriandrum Sativum’s Bioactive Compounds and Health Benefits
No ratings yet
Unlocking the Therapeutic Power of Coriander: A Review of Coriandrum Sativum’s Bioactive Compounds and Health Benefits
15 pages
Analysis of the Role of Websites, Design, and Performance Metrics in Improving Company Performance in Medan City
No ratings yet
Analysis of the Role of Websites, Design, and Performance Metrics in Improving Company Performance in Medan City
4 pages
A Review on Gold Nanoparticles: Properties, Synthesis and Biomedical Application in Drug Delivery and Cancer Therapy
No ratings yet
A Review on Gold Nanoparticles: Properties, Synthesis and Biomedical Application in Drug Delivery and Cancer Therapy
6 pages
Optimal Voltage Regulation in Standalone Photovoltaic Systems Using Model Predictive Control and MOGA
No ratings yet
Optimal Voltage Regulation in Standalone Photovoltaic Systems Using Model Predictive Control and MOGA
8 pages
Analyzing Social Communication Deficits in Autism Using Wearable Sensors and Real-Time Affective Computing Systems
No ratings yet
Analyzing Social Communication Deficits in Autism Using Wearable Sensors and Real-Time Affective Computing Systems
17 pages
A Decade of Genome Editing: Comparative Review of Zfn, Talen, and Crispr/Cas9
No ratings yet
A Decade of Genome Editing: Comparative Review of Zfn, Talen, and Crispr/Cas9
10 pages
Cost Comparative Analysis of Solar/Utility and Diesel/Utility Hybrid Power System for a Typical Residential Building
No ratings yet
Cost Comparative Analysis of Solar/Utility and Diesel/Utility Hybrid Power System for a Typical Residential Building
8 pages
A Phytochemical Evaluation of Sierra Leonean Cassia siamea: A Source of Bioactive Compounds
No ratings yet
A Phytochemical Evaluation of Sierra Leonean Cassia siamea: A Source of Bioactive Compounds
5 pages
Architecture as a Reflection of Cultural Continuity: A Study of Traditional Trends
No ratings yet
Architecture as a Reflection of Cultural Continuity: A Study of Traditional Trends
3 pages
A MIC-MAC-Based Structural Exploration of Determinants Impacting Investment Sensitivity
No ratings yet
A MIC-MAC-Based Structural Exploration of Determinants Impacting Investment Sensitivity
8 pages
Analysis of the Export Competitiveness of Indonesia's Horticultural Fruit Products in the International Market
No ratings yet
Analysis of the Export Competitiveness of Indonesia's Horticultural Fruit Products in the International Market
8 pages
Smart Narrator Robot: Enhancing Experiential Learning through Conditional Autonomy
No ratings yet
Smart Narrator Robot: Enhancing Experiential Learning through Conditional Autonomy
6 pages
Real - Time Recognition of Cardiovascular Conditions from ECG Images with Deep Learning
No ratings yet
Real - Time Recognition of Cardiovascular Conditions from ECG Images with Deep Learning
10 pages
Development of Mirror Biosensor in Saliva pH Measurement in Health Services
No ratings yet
Development of Mirror Biosensor in Saliva pH Measurement in Health Services
7 pages
Assessing the Achievements of the Re-Alignment of an Industry Educatiocal Based System in Society
No ratings yet
Assessing the Achievements of the Re-Alignment of an Industry Educatiocal Based System in Society
5 pages
Enhancing Model Accuracy for Keypoint-Based Sign Language Recognition using Optimized Neural Network Architectures
No ratings yet
Enhancing Model Accuracy for Keypoint-Based Sign Language Recognition using Optimized Neural Network Architectures
7 pages
Design and Implementation of a GPS-GSM based Real-Time Vehicle Theft Tracking System for Urban Security in Uganda
No ratings yet
Design and Implementation of a GPS-GSM based Real-Time Vehicle Theft Tracking System for Urban Security in Uganda
7 pages
Continuing Training and Professional Performance of Primary School Teachers in Tchad: The Case of Teachers in the Farchana Refugee Camp
No ratings yet
Continuing Training and Professional Performance of Primary School Teachers in Tchad: The Case of Teachers in the Farchana Refugee Camp
7 pages
EduTech Portal: An AI-Powered Student Assistant Chatbot
No ratings yet
EduTech Portal: An AI-Powered Student Assistant Chatbot
12 pages
ResumeMatch: Intelligent Resume Enhancement & Job Fit Analysis
No ratings yet
ResumeMatch: Intelligent Resume Enhancement & Job Fit Analysis
7 pages
Enhancing the Robustness of Computer Vision Models to Adversarial Perturbations Using Multi-Scale Attention Mechanisms
No ratings yet
Enhancing the Robustness of Computer Vision Models to Adversarial Perturbations Using Multi-Scale Attention Mechanisms
14 pages
Behavior Addiction in Adolescents Post COVID 19: A Systematic Mental Health Review
No ratings yet
Behavior Addiction in Adolescents Post COVID 19: A Systematic Mental Health Review
8 pages
Evaluating the Impact of Shopee Mall on Consumer Purchase: Basis for Developing an Effective Marketing Plan
No ratings yet
Evaluating the Impact of Shopee Mall on Consumer Purchase: Basis for Developing an Effective Marketing Plan
61 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
From Music To Mathematic
100% (1)
From Music To Mathematic
4 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
Tech Trend 2024 Report-2
No ratings yet
Tech Trend 2024 Report-2
11 pages
Mind Control Patents
100% (1)
Mind Control Patents
41 pages
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
100% (7)
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
145 pages
Attention Is All You Need
67% (3)
Attention Is All You Need
11 pages
Wisc V Interpretation
100% (1)
Wisc V Interpretation
8 pages
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
No ratings yet
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
456 pages
Psych Unit 7a Practice Quiz
No ratings yet
Psych Unit 7a Practice Quiz
4 pages

A Hybrid Deep Learning Approach for Video Object Detection

Uploaded by

A Hybrid Deep Learning Approach for Video Object Detection

Uploaded by

Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14637077

A Hybrid Deep Learning Approach for

IJISRT25JAN104 www.ijisrt.com 254

IJISRT25JAN104 www.ijisrt.com 255

Fig 1 Hybrid CNN-LSTM Proposed Model for Video-Based Object Detection

IJISRT25JAN104 www.ijisrt.com 256

IJISRT25JAN104 www.ijisrt.com 257

Fig 2 Object Detection and Localization from Video

IJISRT25JAN104 www.ijisrt.com 258

Table1 Performance Measurement Parameters for the Proposed Work

Table 2 Performance Comparison Table

Fig 3 Training Vs. Validation Loss

IJISRT25JAN104 www.ijisrt.com 259

Fig 4 Training Vs. Validatiom Accuracy

IJISRT25JAN104 www.ijisrt.com 260

IJISRT25JAN104 www.ijisrt.com 261

IJISRT25JAN104 www.ijisrt.com 262

You might also like