Performance comparison of optical flow and background subtraction and discrete wavelet transform methods for moving objects
Performance comparison of optical flow and background subtraction and discrete wavelet transform methods for moving objects
Corresponding Author:
Monika Sharma
Department of Computer Science and Engineering, Galgotia’s University
Greater Noida, NCR, India
Email: [email protected]
1. INTRODUCTION
Object detection is a technique used in computer vision to identify and locate items in both video
and still images. Object detection algorithms typically rely on machine learning or deep learning techniques
to obtain meaningful findings. Humans can quickly identify and find objects of interest when they view
visual material [1]. Object detection seeks to replicate this level of cognitive ability in a computational
framework. Various disciplines are currently allocating resources to the investigation of automated video
surveillance. Advancements in modern technology have reached a stage where it is economically
advantageous to install cameras in a certain location and capture video footage, rather than employing
individuals to constantly examine the recorded footage [2]. Numerous enterprises have already installed
security cameras, capable of capturing footage that can be stored on tape, subject to being overwritten or
stored in a video archive. Subsequently, detectives can scrutinize the recorded video material to ascertain the
sequence of events in the occurrence of a criminal act [3], such as a robbery in a store or the pilfering of a
valuable automobile. However, then, it is evidently beyond the point of prevention or intervention. To
mitigate the occurrence of these situations, we can implement continuous monitoring and analysis of video
surveillance systems. In this manner, if security agents identify an ongoing robbery or someone exhibiting
suspicious behavior in the parking lot, they can promptly intervene to avert criminal activity.
Video-based surveillance systems [4] allow for the monitoring of many scenes. Video streams can
be utilized to extract information that captures our attention in various applications, such as security,
entertainment, safety, and efficiency enhancement. Task Video surveillance is utilized in the field of
recognition. Recognizing events from an area of interest has numerous possible applications, including but
not limited to traffic analysis [5], tracking limited vehicle movements, and analyzing multi-object interaction.
Compared to the need for continuous human supervision, it helps solve several problems. The first crucial
step in this approach is to determine whether video samples include motion. The approach must not only be
free from noise, but it must also segment the video stream to eliminate the presence of moving objects. The
presence of rapid variations in light intensity, such as those caused by a light switch, poses a substantial
challenge for detecting moving objects. If the algorithm fails to cope with variations in lighting and camera
movement, it will result in the inclusion of background noise in the final output [6]. The problem would be
worsened by dynamic backgrounds, which would enable objects to move around. Weather variations and
swaying trees may create inaccurate results during the detecting step. Alterations in scenery introduce an
additional level of difficulty. Regardless of whether one is asleep or awake, a moving item has the potential
to momentarily halt and gradually blend into the surrounding environment. A motion detection system should
possess the capability to effectively navigate through these various hurdles [7].
The video surveillance system commences [8] with the detection of motion and objects. Motion
detection involves the process of separating the areas of an image that contain moving objects from the rest
of the image. Background modeling and motion segmentation are commonly employed in the task of
detecting motion and identifying objects. In an image sequence, the objective of motion segmentation is to
identify the sections or areas that correspond to moving objects, such as automobiles, birds, humans, animals,
and so on [9]. When motion is identified in a specific area or region, it is necessary to study these detected
regions for further procedures such as object tracking and behavior analysis. Following the process of motion
and object identification, the video surveillance system typically traces the movement of objects from one
frame to the next in a sequence of images. Behavior analysis entails the examination and identification of
motion patterns, the description of actions, and the relationships between things.
2. RELATED WORK
Automated cars must be able to access accurate, real-time data on the state of objects in their
immediate surroundings if we are to guarantee safe driving. Object occlusion, clutter interference, and a
limited sensor-detecting capability produce false alarms and missed object detection [10]. Thus, it is difficult
to guarantee tracking stability and state prediction in complex traffic conditions. Background subtraction [11]
requires a training sequence devoid of objects to construct a background model, in contrast to object
detectors, which require instances that have been explicitly tagged to train a binary classifier. An important
step toward analytical automation is object recognition without a distinct training phase. Attempts to solve
this problem by analyzing motion data have been made. A popular method for detecting moving objects is
discriminative modeling (DM), which seeks to improve performance in foreground-background separation
using discriminative features and well-designed classifiers [12]. Because class separability is typically poor
in camouflaged locations, DM may fail when confronted with the camouflage problem. To detect foreground
pixels that have been camouflaged, we present a novel approach in this work: camouflage modeling (CM).
Because of the two-part nature of camouflage, we must represent both the foreground and the backdrop.
An innovative framework that incorporates information about color and texture has been developed
for backdrop modeling [13]. The foreground choice equation in this framework is composed of three
components: the left section is for the integration of the two parts, the right portion is for the information
about the texture, and the third part is for the information about the color. The use of this structure allows you
to take advantage of the power of color and texture while avoiding the downsides associated with them. To
accelerate the modeling of the background even more, we recommend using a block-based technique. To be
more specific, texture information modeling is distinct from the traditional multi-histogram model for block-
based background modeling in that it creates a single histogram model for each block. This model contains
bins that indicate the occurrence probabilities of various patterns. Based on this process, the dominant
background patterns are selected to determine the background likelihood of upcoming blocks. A novel
method based on fuzzy color difference histogram (FCDH) has been suggested to incorporate fuzzy c-means
(FCM) clustering [14]. The utilization of the FCM clustering technique in CDH mitigates the impact of
intensity variation resulting from fake motion or changes in background illumination, while also reducing the
substantial complexity of the computation's histogram bins. The suggested approach was tested using various
publicly accessible video sequences featuring complex scenarios. The method is suggested based on
IAES Int J Rob & Autom, Vol. 14, No. 1, March 2025: 93-102
IAES Int J Rob & Autom ISSN: 2722-2586 95
extracting moving objects from a frame sequence, hence neither human interaction in the form of empirical
threshold tuning, nor background modeling with which other systems are built are necessary [15]. The
suggested approach rents out moving objects to be extracted without using any of them. The saliency map of
the current frame with complete resolution is created by use of the constant symmetric difference between the
frames adjacent to the present frame. Saliency variables on this map help to highlight moving items while
also hiding the backdrop.
An image descriptor and nonlinear classification technique for optical flow orientation and a
histogram-based method have been used to characterize motion information in each video frame [16]. The
nonlinear one-class support vector machine classification approach initially learns from training frame
behavior to identify unusual events in the current frame. The optical flow approach begins with a Gaussian
filter to remove noise from each frame [17]. Next, it calculates the optical flow for the present frame the
previous frame the current frame, and the forthcoming frame. Merging the two optical flow constituents
yields the gross optical flow. An adaptive thresholding post-processing phase removes distracting foreground
components. Morphological techniques are then used to the equalized output to locate moving items. The
methodology was implemented, deployed, and evaluated on numerous authentic video datasets [18]. The 2D
discrete wavelet transform (DWT) and variance approach were used for object detection and tracking [19].
An examination of the proposed variance-based method for object detection and localization in comparison
to the widely utilized mean-shift method reveals that the latter is slower, leading to slower item detection
overall. To wrap things up, this analysis helps detect and track moving objects by using only the bandpass
components of the 2D-DWT outputs. The Daubechies complex wavelet transform is well-suited for tracking
because of its approximate shift-invariance property. The recommended method can perform object
segmentation from scenes [20]. Following the initial segmentation of the first frame, achieved through the
computation of multiscale correlation of the imaginary component of complex wavelet coefficients, the
subsequent frames track the object by calculating the energy of the complex wavelet coefficients assigned to
the object's region and comparing it to the energy of the surrounding region. The research gap is in the
identification of suitable methods for specific object detection problems. Optical flow provides the most
accurate and detailed motion data, but it is also the most computationally expensive. Background subtraction
usually works well when used in real-time scenarios with well-maintained backdrop models. To ensure its
efficacy in motion detection, additional processing may be necessary after using the DWT, which provides a
unique type of information.
3. METHODS
Identification and tracking objects that are in motion in photos or videos is a fundamental task in the
field of computer vision. This task has a wide range of applications, including surveillance, autonomous
driving, and human-computer interaction. Various methodologies and strategies are employed for the
detection of moving objects. Below are many frequently employed techniques. The common steps for object
detection are given in Figure 1.
Computer vision can detect objects in video or still images. Preprocessing the image before feeding
it to an object detection model is possible. Scaling or normalizing pixel values may be needed to meet model
input requirements. Mathematical models help object detection models extract features. These networks learn
hierarchical characteristics from photos to distinguish things. Localizing objects in an image is as crucial as
categorizing them for object detection. Predicting bounding boxes that securely contain items of interest is
common. The model classifies all observable elements after object localization. Post-processing is done after
categorization refines results.
𝐼𝑏 (𝑥, 𝑦, 𝑡) denotes the background model, 𝐼𝑐 (𝑥, 𝑦, 𝑡) denotes the color intensity of the image (x, y) in the ‘t’
frame or current frame, and k is the learning rate (0 < k < 1).
The current frame's absolute difference (or other metrics like squared difference) from the previous
frame can be used to identify items in the foreground (𝐼𝑓 (𝑥, 𝑦, 𝑡).
The results image after threshold comparison is given as. It is used to classify that image belongs to the
background region or foreground region.
1 if 𝐼𝑓 (𝑥, 𝑦, 𝑡) > 𝑇
𝐼𝑅 (𝑥, 𝑦, 𝑡) = { (3)
0 if 𝐼𝑓 (𝑥, 𝑦, 𝑡) < 𝑇
To eliminate noise morphological operations such as erosion and dilation can be applied to get the masked
image. The learning rate α is modified via adaptive approaches according to the size of the pixel differences
to accommodate different levels of scene dynamics
IAES Int J Rob & Autom, Vol. 14, No. 1, March 2025: 93-102
IAES Int J Rob & Autom ISSN: 2722-2586 97
𝛿𝐼 𝛿𝐼 𝛿𝐼
𝐼(𝑥 + ∆𝑥, 𝑦 + ∆𝑦, 𝑡 + ∆𝑡) = 𝐼(𝑥, 𝑦, 𝑡) + ∆𝑥 + ∆𝑦 + ∆𝑡 + ⋯ 𝐻𝑖𝑔ℎ𝑒𝑟 𝑇𝑒𝑟𝑚 (5)
𝛿𝑥 𝛿𝑦 𝛿𝑡
𝛿𝐼 𝛿𝐼 𝛿𝐼
∆𝑥 + ∆𝑦 + ∆𝑡 = 0 (6)
𝛿𝑥 𝛿𝑦 𝛿𝑡
𝛿𝐼 ∆𝑥 𝛿𝐼 ∆𝑦 𝛿𝐼 ∆𝑡
( )+ ( )+ ( )= 0 (7)
𝛿𝑥 ∆𝑡 𝛿𝑦 ∆𝑡 𝛿𝑡 ∆𝑡
Vpx, Vpy, and Vpt denote the velocity or optical flow vectors, Ipx, Ipy, and Ipy show the variants of the image
intensities at a coordinate in the form of derivatives for the image I m(x, y, t). By employing the approach of
thresholding to derive the motion vector for object detection. The magnitude of the motion vector is
presented as (10).
Optical flow vectors, in their most fundamental form, provide input to a large variety of higher-level
operations that need scene awareness of video sequences. These activities are necessary for proper operation.
The optical flow method ensures object velocity across consecutive frames using the apparent motion of
brightness patterns in a picture.
𝑁/2−1 𝑁/2−1
𝑁/2−1 𝑁/2−1
𝑁/2−1 𝑁/2−1
𝑁/2−1 𝑁/2−1
Figure 2 presents the DWT image decomposition and level processing. Applying filters in both the
horizontal and vertical axes separates [28] the image into different frequency components in a 2-level DWT
decomposition. The decomposition process produces detail coefficients that capture high-frequency
information in the horizontal, vertical, and diagonal dimensions, as well as approximation coefficients at
various resolutions (levels) [29]. Object detection and tracking, compression, and denoising are just a few of
the many image-processing applications that benefit from this multi-resolution representation [30].
IAES Int J Rob & Autom, Vol. 14, No. 1, March 2025: 93-102
IAES Int J Rob & Autom ISSN: 2722-2586 99
or the backdrop is reflected in this metric. The sensitivity, recall, or true positive rate is a measure of how
well the system detects real positives or moving objects. A system with a high sensitivity will be able to pick
up on most moving items in the scene, reducing the likelihood that anything crucial would go unnoticed.
Important fields that rely on it include automated driving and surveillance, where the ability to recognize any
moving object is paramount. The sensitivity of a system is defined as the percentage of real negatives (i.e.,
non-moving background) that are properly detected as negatives.
Performance Estimation
97
96
95
94
% Utilization
93
92 Accuracy(%)
91 Sensitivity (%)
Specificity (%)
90
89
88
87
DWT Optical Background Subtraction
Method
Image object/Video-1
Image object/Video-2
Image object/Video-3
Image object/Video-4
5. CONCLUSION
It is possible to compare several methods, tune parameters, and verify that the system satisfies
operational requirements with the help of MATLAB's performance evaluation and validation tools. The
evaluation is done for the optical flow methods, background subtraction methods, and DWT processing the
IAES Int J Rob & Autom, Vol. 14, No. 1, March 2025: 93-102
IAES Int J Rob & Autom ISSN: 2722-2586 101
moving objects detection. The simulation is carried out in several environments including rainy and hazy
environments. The primary advantages of the DWT for object detection are its capacity to capture essential
information, its robustness against noise, its compact representation, and its multi-resolution analysis.
Consequently, it serves as a very effective instrument, particularly when conventional procedures may pose
challenges or when obtaining specific attributes is essential. The simulation work of the DWT method has
shown a minimum latency of 0.23 seconds then optical flow of 0.29 seconds and 0.35 seconds for
background subtraction methods. The same type of behavior is analyzed for other cases also. The accuracy of
the DWT, optical, and background subtraction methods is 95.34%, 94.15%, and 91.40%. The sensitivity of
the DWT, optical, and background subtraction methods is 95.96%, 93.88%, and 90.00%. The specificity of
the DWT, optical, and background subtraction methods is 94.68%, 94.44%, and 93.03%. When it comes to
detecting moving objects in images and videos, the DWT method has continuously proven to be the optimal
choice in terms of both hardware and software.
REFERENCES
[1] A. Cavallaro, O. Steiger, and T. Ebrahimi, “Tracking video objects in cluttered background,” IEEE Transactions on Circuits and
Systems for Video Technology, vol. 15, no. 4, pp. 575–584, 2005, doi: 10.1109/TCSVT.2005.844447.
[2] A. Mukhtar, L. Xia, and T. B. Tang, “Vehicle detection techniques for collision avoidance systems: a review,” IEEE Transactions
on Intelligent Transportation Systems, vol. 15, no. 5, pp. 2318–2338, 2015, doi: 10.1109/TITS.2015.2409109.
[3] S. Hassan, G. Mujtaba, A. Rajput, and N. Fatima, “Multi-object tracking: a systematic literature review,” Multimedia Tools and
Applications, vol. 83, no. 14, pp. 43439–43492, Oct. 2023, doi: 10.1007/s11042-023-17297-3.
[4] Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: a survey,” in Proceedings of the IEEE, 2023, vol. 111,
no. 3, pp. 257–276, doi: 10.1109/JPROC.2023.3238524.
[5] D. K. Prasad, D. Rajan, L. Rachmawati, E. Rajabally, and C. Quek, “Video processing from electro-optical sensors for object
detection and tracking in a maritime environment: a survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 18,
no. 8, pp. 1993–2016, Aug. 2017, doi: 10.1109/TITS.2016.2634580.
[6] H. Zhu, H. Wei, B. Li, X. Yuan, and N. Kehtarnavaz, “A comprehensive survey of video datasets for background subtraction,”
Applied Sciences, vol. 10, no. 21, Nov. 2020, doi: 10.3390/app10217834.
[7] A. Kumar, “Text extraction and recognition from an image using image processing in MATLAB,” in Conference on Advances in
Communication and Control Systems 2013 (CAC2S 2013), 2013, vol. 2013, pp. 429–435.
[8] M. Sharma, K. S. Kaswan, and D. K. Yadav, “Moving objects detection based on histogram of oriented gradient algorithm chip
for hazy environment,” International Journal of Reconfigurable and Embedded Systems, vol. 13, no. 3, pp. 604–615, 2024, doi:
10.11591/ijres.v13.i3.pp604-615.
[9] B. Mirzaei, H. Nezamabadi-pour, A. Raoof, and R. Derakhshani, “Small object detection and tracking: a comprehensive review,”
Sensors, vol. 23, no. 15, 2023, doi: 10.3390/s23156887.
[10] J. Bai, S. Li, L. Huang, and H. Chen, “Robust detection and tracking method for moving object based on radar and camera data
fusion,” IEEE Sensors Journal, vol. 21, no. 9, pp. 10761–10774, 2021, doi: 10.1109/JSEN.2021.3049449.
[11] X. Zhou, C. Yang, and W. Yu, “Moving object detection by detecting contiguous outliers in the low-rank representation,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 3, pp. 597–610, 2013.
[12] X. Zhang, C. Zhu, S. Wang, Y. Liu, and M. Ye, “A Bayesian approach to camouflaged moving object detection,” IEEE
Transactions on Circuits and Systems for Video Technology, vol. 27, no. 9, pp. 2001–2013, 2017, doi:
10.1109/TCSVT.2016.2555719.
[13] H. Han, J. Zhu, S. Liao, Z. Lei, and S. Z. Li, “Moving object detection revisited: Speed and robustness,” IEEE Transactions on
Circuits and Systems for Video Technology, vol. 25, no. 6, pp. 910–921, 2015, doi: 10.1109/TCSVT.2014.2367371.
[14] D. K. Panda and S. Meher, “Detection of moving objects using fuzzy color difference histogram based background subtraction,”
IEEE Signal Processing Letters, vol. 23, no. 1, pp. 45–49, 2016, doi: 10.1109/LSP.2015.2498839.
[15] Z. Wang, K. Liao, J. Xiong, and Q. Zhang, “Moving object detection based on temporal information,” IEEE Signal Processing
Letters, vol. 21, no. 11, pp. 1403–1407, 2014, doi: 10.1109/LSP.2014.2338056.
[16] T. Wang and H. Snoussi, “Detection of abnormal visual events via global optical flow orientation histogram,” IEEE Transactions
on Information Forensics and Security, vol. 9, no. 6, pp. 988–998, 2014, doi: 10.1109/TIFS.2014.2315971.
[17] S. S. Sengar and S. Mukhopadhyay, “Detection of moving objects based on enhancement of optical flow,” Optik, vol. 145, pp.
130–141, 2017, doi: 10.1016/j.ijleo.2017.07.040.
[18] J. Hariyono, V. D. Hoang, and K. H. Jo, “Moving object localization using optical flow for pedestrian detection from a moving
vehicle,” Scientific World Journal, vol. 2014, 2014, doi: 10.1155/2014/196415.
[19] P. P. Gangal, V. R. Satpute, K. D. Kulat, and A. G. Keskar, “Object detection and tracking using 2D—DWT and variance
method,” in 2014 Students Conference on Engineering and Systems, May 2014, pp. 1–6, doi: 10.1109/SCES.2014.6880123.
[20] Y. Wu, X. He, and T. Q. Nguyen, “Moving object detection with a freely moving camera via background motion subtraction,”
IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 2, pp. 236–248, Feb. 2017, doi:
10.1109/TCSVT.2015.2493499.
[21] R. Kalsotra and S. Arora, “A comprehensive survey of video datasets for background subtraction,” IEEE Access, vol. 7, pp.
59143–59171, 2019, doi: 10.1109/ACCESS.2019.2914961.
[22] A. Talukder and L. Matthies, “Real-time detection of moving objects from moving vehicles using dense stereo and optical flow,”
in 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2004, vol. 4, pp. 3718–3725, doi:
10.1109/iros.2004.1389993.
[23] K. Kale, S. Pawar, and P. Dhulekar, “Moving object tracking using optical flow and motion vector estimation,” in 2015 4th
International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), Sep.
2015, pp. 1–6, doi: 10.1109/ICRITO.2015.7359323.
[24] A. Kumar, “Study and analysis of different segmentation methods for brain tumor MRI application,” Multimedia Tools and
Applications, vol. 82, no. 5, pp. 7117–7139, Feb. 2023, doi: 10.1007/s11042-022-13636-y.
[25] A. Goel, A. K. Goel, and A. Kumar, “Performance analysis of multiple input single layer neural network hardware chip,”
Multimedia Tools and Applications, vol. 82, no. 18, pp. 28213–28234, Jul. 2023, doi: 10.1007/s11042-023-14627-3.
[26] A. Kumar, P. Rastogi, and P. Srivastava, “Design and FPGA implementation of DWT, image text extraction technique,” Procedia
Computer Science, vol. 57, pp. 1015–1025, 2015, doi: 10.1016/j.procs.2015.07.512.
[27] A. S. Rawat, A. Rana, A. Kumar, and A. Bagwari, “Application of multi layer artificial neural network in the diagnosis system: A
systematic review,” IAES International Journal of Artificial Intelligence (IJ-AI), vol. 7, no. 3, p. 138, Aug. 2018, doi:
10.11591/ijai.v7.i3.pp138-142.
[28] A. Goel, A. K. Goel, and A. Kumar, “The role of artificial neural network and machine learning in utilizing spatial information,”
Spatial Information Research, vol. 31, no. 3, pp. 275–285, Jun. 2023, doi: 10.1007/s41324-022-00494-x.
[29] A. Devrari and A. Kumar, “Turbo encoder and decoder chip design and FPGA device analysis for communication system,”
International Journal of Reconfigurable and Embedded Systems, vol. 12, no. 2, pp. 174–185, 2023, doi:
10.11591/ijres.v12.i2.pp174-185.
[30] S. Dhyani, A. Kumar, and S. Choudhury, “Analysis of ECG-based arrhythmia detection system using machine learning,”
MethodsX, vol. 10, 2023, doi: 10.1016/j.mex.2023.102195.
BIOGRAPHIES OF AUTHORS
IAES Int J Rob & Autom, Vol. 14, No. 1, March 2025: 93-102