integrated_image_and_speech_analysis_for_content_based_video_indexing

Uploaded by

Maria Marchiano

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

integrated_image_and_speech_analysis_for_content_based_video_indexing

Uploaded by

Maria Marchiano

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Integrated Image and Speech Analysis for Content-Based

Video Indexing
Yuh-Lin Chang Wenjun Zeng Ibrahim Kamel Rafael Alonsoy
Matsushita Information Technology Laboratory
Panasonic Technologies, Inc.
2 Research Way
Princeton, NJ 08540-6628, USA
e-mail: fyuhlin,kevin,ibrahim,[email protected].

Abstract a novel approach to the video information ex-

In this paper we study an important problem in traction problem that is based on the integration
multimedia database, namely, the automatic ex- of speech understanding and image analysis al-
traction of indexing information from raw data gorithms. The goal of our research project is to
based on video contents. The goal of our research develop a prototype system for automatic index-
project is to develop a prototype system for auto- ing of sports (in particular, football) videos. The
matic indexing of sports videos. The novelty of sports video analysis problem has been studied by
our work is that we propose to integrate speech other researchers before [4, 6]. However, to our
understanding and image analysis algorithms for knowledge, our work is the rst that uses knowl-
extracting information. The main thrust of this edge from both audio and video domains.
work comes from the observation that in news or The main thrust of this work comes from the
sports video indexing, usually speech analysis is observations that in news or sports video indexing,
more ecient in detecting events than image anal- a very important aspect is the detection of the oc-
ysis. Therefore, in our system, the audio process- currence of important events, and that for such
ing modules are rst applied to locate candidates videos, speech analysis is usually more ecient in
in the whole data. This information is passed to detecting events than image analysis. Therefore,
the video processing modules, which further ana- we propose to use speech analysis to detect impor-
lyze the video. The nal products of video analy- tant events rst, and then apply image analysis
sis are in the form of pointers to the locations of algorithms for further processing.
interesting events in a video. Our algorithms have Figure 1 gives a global view of our work. There
been tested extensively with real TV programs, and are three major components in our system: au-
results are presented and discussed in the paper. dio processing, video processing, and demo video
database. We rst digitize the video and au-
dio data from regular video tapes. We then ap-
1. Introduction ply audio processing modules to locate candidates
in the data. This information is passed to the
The content-based video indexing problem has video processing modules, which further analyze
attracted much attention recently [3, 9, 12]. The the video. The results of video analysis are in
applications of such research work may include the form of pointers to the locations of interest-
digital library, non-linear video editing, video-on- ing events. We put the indexed video on a LAN-
demand services, etc.. In this paper we propose based video-on-demand (VOD) server, the Star-
A Ph.D. candidate at the Department of Electrical En- Works VOD server, and we also developed a demo
gineering, Princeton University. video database client that can retrieve the indexed
y Currently with David Sarno Research Center, SRI. video from a PC running MS-Windows.
2.1. Word Spotting
One important observation we get from watch-
ing TV sports programs for so many years is that
in such programs, the information content in au-
dio is highly correlated with the information con-
tent in video. After all, a sports reporter's job is
to inform viewers what is happening on the eld.
Therefore, if we can detect important keywords
such as \touchdown" or \fumble" in the audio
stream, then we can use it as a coarse lter to
locate candidates for important events.
Keyword spotting is an important application
of speech recognition, and it has attracted a grow-
ing research interest lately [7, 8, 11]. Currently we
use a simple, template-matching based approach
to spotting keywords. We are aware of the more
sophisticated and robust algorithms to the prob-
lem [7, 8, 11], but for the preliminary implemen-
Figure 1. Overview of the video and audio tation, we chose a simpler approach, mainly be-
processing modules. cause our current application is di erent from tra-
ditional keyword spotting in the following aspects.
In our system, audio processing is used as a
While our current work focuses on the analy- \pre-processing" for video analysis and, con-
sis of football video, extension to other domains sequently, false alarm is not a major concern.
should not be a problem since we adopt a \tool- Speaker independence is also not a major con-
box" approach. Our audio and video analysis al- cern, since we can assume we know who the
gorithms are implemented using the tools we de- reporters are a priori.
veloped for the Khoros system [1]. Khoros pro-
vides a nice environment for both application inte- The algorithm and part of the implementation
gration and fast prototyping. Therefore, it should is based on a public domain package, Lotec [10].
be easy to incorporate both new domain knowl- Figure 2 illustrates the algorithm. For a prelimi-
edge and data analysis algorithms into our frame- nary implementation, the template-based spotting
work. algorithm works surprisingly well.
The rest of the paper is organized as follows. In
the next section, we explain the audio processing
modules, and in Section 3, we present the video
processing algorithms. Section 4 then describes
the implementation of a demo VDB, and Section 5
discusses the results of applying our algorithms on
real data. Finally, Section 6 concludes this paper.

2. Audio Signal Analysis

We rst extract information from data using
audio processing, since its computation is less ex- Figure 2. The wordspotting algorithm.
pensive than image processing. Two types of au-
dio signal processing are experimented in the pa- To detect keywords in an audio stream, we rst
per, word spotting and cheering detection. They extract features and then match the feature vec-
will be explained in the following subsections. tors against a set of pre-computed templates.
Feature extraction. Filter banks are used to there is little or no silence spots in the cheering
extract features from audio data. The follow- segments while there are quite few silence spots in
ing procedures are involved. the reporter chat.
The outline of the cheering detection schema is
1. Noise reduction. To reduce the e ect as follow. The audio signal is divided into small
of background noise, we rst collect the units, 300 msec each, and processed sequentially
noise statistics from the training data. from the beginning to the end. Each unit is pro-
Such information is then used in lter- cessed by a classi er, shown in Figure 3, which
ing out noise in the test data. marks each audio unit as either a cheering unit
2. Segmentation. An audio stream is split or chatting unit. A cheering segment is identi-
into segments of xed size 10 ms each. ed only when m cheering units are detected in
3. Filter banks. We rst transform the sequence, where m is an integer constant that is
data to the frequency domain by FFT. A de ned experimentally.
set of eight overlapping lters are then In Figure 3, the Envelop Extraction module cal-
applied to the Fourier magnitude, and culates the covering envelop for the audio unit by
the log of total energy in each bank is calculating the local maximum of the absolute val-
computed and used as \features" to rep- ues of the signal. The results is then smoothed
resent this audio sample. The lters we using a low-pass lter. The last step in the cheer-
use cover frequency from 150 to 4000 Hz. ing classi er calculates the peak-to-peak values of
the smoothed envelop. If the peak-to-peak value
Feature matching. Feature vectors are is greater than a threshold the audio unit is
matched against templates, which are ob- marked as regular chat otherwise it is marked as
tained from the training data. Currently the cheering unit. Of course the threshold is a func-
normalized distance is used to measure simi- tion of the microphone volume. In our experi-
larity. The distance between a template and ments we determine the value of experimentally.
the test data is de ned as the Euclidean dis- Currently, we are considering a formula that cal-
tance between the two 8-dimensional feature culates the as a function of the average signal
vectors. The distance is then normalized by value.
the sum of energy in each template. After
matching, the best matches from all tem-
plates are then sorted according to the dis-
tance. We use the inverse of distance to rep- Envelop
Extraction
Lowpass
Filter
Peak−to−peak
estiamtion

resent con dence of a match. If the con dence

Audio Unit Signal Envelop Smoothed Envelop Peak−to−peak value

is greater than a pre-set threshold, we declare Figure 3. Block diagram for the cheering
the detection of a candidate. classi er.
2.2. Cheering Detection
Crowd cheering can be a powerful tool in in- 3. Video Information Analysis
dexing sport videos in general because it indicates
\interesting" events during the game, for exam- The candidates detected by the audio analysis
ple, touchdown (or scoring), fumble, clever pass- modules are further examined by the video anal-
ing, exciting run, etc. Unlike the word-spotting, ysis modules. Assuming that a touchdown can-
cheering detection is general and game/speaker in- didate is located at time t, we will apply video
dependent. In this subsection we describe our al- analysis only to the region [t ? 1 min; t + 2 min].
gorithm to detect the crowd cheering in a football The assumption we employ here is that a touch-
game video using audio stream only. Our main down event should begin and end within that time
goal is to build a simple and fast cheering detec- range. In video processing, the original video se-
tion module that recognizes the crowd cheering quence is broken down into discrete shots. Key
from the reporter chat. We process the audio sig- frames from each shot are extracted and shot iden-
nals in the time domain to quantify the frequency ti cation is then applied on them to verify the ex-
of the silence spots. The basic assumption is that istence of a touchdown.
3.1. Shot Segmentation
We use a well-known video shot segmentation
algorithm based on histogram di erence [5]. Fig-
ure 4 illustrates the owchart of operations, and
Figure 5 shows the implementation in Khoros. Ba-
sically, a cut is detected if a frame's histogram is
considered \substantially di erent" from that of
its previous frame, as de ned by the 2 compari-
son:
X G (H (i) ? H (i))2
t t?1 ; (1)
i=1 H t (i) Figure 5. Khoros workspace for video shot
segmentation.
where Ht is the histogram for time t, and G is the
total number of colors in an image.
To implement this algorithm, we have utilized 3.2. Shot Identi cation
both native Khoros and modules developed by us.
We propose a model-based approach to identify
Import AVI converts an AVI encoded data the contents of key frames. In particular, for a
stream into VIFF, and Video Histogram com- touchdown sequence, we de ne an ideal model for
putes the histogram of a VIFF video. shot transition, as shown in Figure 6. Basically,
a touchdown sequence should start with the two
Translate is a Khoros function for shifting a teams lining up on the eld. The word touchdown
VIFF object in time, and Subtract subtracts is usually announced in the middle or at the end
two VIFF objects. of the action shot, which is followed by some kind
Square is a Khoros function for applying the
of commentary and replay. To conclude a touch-
down sequence, the scoring team usually kicks an
squaring operation on a VIFF object, and Di- extra point. We notice that our model may cover
vide is divides two VIFF objects. most but not all the possible touchdown sequences.
Statistics is a Khoros function for computing However, for a preliminary implementation, our
the statistics of a VIFF object. simple model provides very satisfactory results.
Shot Segment detects the shot transition
boundary by locating the peaks in the his-
togram di erence sequence.
Store Kframe extracts the representative
frames from each shot and store them as a
new VIFF video. Currently, we use the rst
and/or the last frames to represent the whole Figure 6. The ideal shot transition model for
shot. a touchdown sequence.

Starting with the candidate location supplied

by audio analysis, our system looks backward and
forward a few shots to t the model with the video
data. If there is a high con dence in the matching,
then a touchdown event is declared detected.
Figure 4. Flowchart of the video shot seg- To identify shots, some features of interests on
mentation algorithm. which the model is based need to be extracted. In
football videos, possible features of interests are
line marks, players numbers, end zone, goal posts,
etc. In particular, for the detection of a touch- Client. We developed a video player for
down sequence as modeled in Figure 6, our prelim- MS/VfW that utilizes the indexing informa-
inary work focuses on the detection of line marks tion when retrieving AVI video data . Using
and goal posts in the lining-up and the kicking our video player, a user can move directly to
shots, respectively. For the lining-up shot, the line the next or previous shot/play/event. Such
marks usually appear to be parallel lines oriented search capabilities can be complimentary to
at around diagonal directions. On the other hand, the traditional linear fast-forward/backward
goal posts almost always show up as strong verti- movements.
cal lines. Both line marks and goal post should be
relatively long with respect to the image size.
Our line extraction work is based on the Object 5. Examples
Recognition Toolkit [2]. We modi ed and incorpo-
rated it into the Khoros system. For each shot,
we have one or two representative frames. The Our algorithms have been tested extensively
gradient operation is rst applied to these repre- with real TV programs. Table 1 summarizes the
sentative frames to detect edges. The edge pixels data used in the experiments. We captured in to-
are then converted into lists of connected pixels by tal about 45 minutes of video and audio data from
Pixel Chaining. The chain lists are segmented into two football games. Most of the data are from
straight-line segments which are further grouped Super Bowl IXXX played in Jan. 1995, while one
into parallel lines. The parallel line pairs are then sequence is from the Chicago vs. Minnesota game
ltered by length and orientation. For example, to played in Oct. 1995. Both games were broadcasted
qualify for a goal post, the detected parallel lines by the ABC. These data are separated into two
should be long and vertically oriented. Similarly, groups, the 1st group (from 1st half) is used for
the detected parallel lines should be long and diag- training, and the 2nd group (from 2nd half) is
onally oriented to be potential candidates for line used for testing. We should only use data from
marks. the 1st group to train system parameters, and we
Currently, we use only the intensity values of will report the results of applying our algorithms
an image for line extraction. In the future, other on the test group. The resolution for video is 256
attributes, such as color, texture, etc., should be by 192 at 15 frames per second. The data rate for
incorporated to improve the accuracy and robust- audio is 22 KHz with 8 bits per sample.
ness.

4. Demo Video Database Group Name # of

frames
Game TD
To demonstrate the results of automatic video td1 1,297 SB 1-H Yes
indexing, we also built a simple demo video 1st td2 2,262 SB 1-H Yes
database system running under the MS/VfW (Mi- td3 1,694 SB 1-H Yes
crosoft Video for Windows). The demo video 2ndhalf1 7,307 SB 2-H No
database system has two sub-parts, the server and 2ndhalf2 6,919 SB 2-H No
the client. 2ndhalf3 6,800 SB 2-H Yes
2nd 2ndhalf4 5,592 SB 2-H No
Server. We use the StarWorks VOD system 2ndhalf5 2,661 SB 2-H Yes
(from Starlight Networks Inc.) as the server. 2ndhalf6 2,774 SB 2-H Yes
The server is running on an EISA-bus PC- 2ndhalf7 2,984 SB 2-H Yes
486/66 with Lynx realtime OS and 4 GB stor- newgame1 2,396 Chicago vs. Yes
age space. A PC/Windows client can connect Minnesota
to the server through regular 10-BaseT Eth- Table 1. Summary information for the au-
ernet. The server can guarantee the realtime dio/video data.
delivery of data (video and audio) streams of
up to 12 Mbps (mega bits per second) via two
Ethernet segments.
5.1. Audio Processing Results and the segmentation results. The key frames ex-
We rst discuss the results of audio processing tracted by the segmentation process are shown in
on the eight test sets. Figures 7 shows the re- Figure 10. Theses 12 frames are arranged from
sults of wordspotting. The graphs are arranged left to right, and from top to bottom, according
from left to right, and from top to bottom. In to their temporal order.
each graph, the X-axis is time, and the Y-axis in-
dicates con dence. The higher the con dence, the
more likely the existence of a touchdown. From
the training data, the wordspotting threshold is
set to 25. Figure 8 shows the results of cheering
detection. The regions of 1's indicate the presence
of cheering, while those of 0's indicate the absence.
From the training data, the double thresholds used
in cheering detection are set to 20 and 60. Table 2
summaries the audio processing results. In gen-
eral, our simplistic wordspotting algorithm gives
quite reliable results. Of the ve touchdown exist-
ing in the test data, only the one in 2ndhalf7 is not
detected. The miss-detection is mainly due to the
fact that in 2ndhalf7, touchdown is announced in
a way di erent from the three templates we use.
One possible remedy is to reduce the threshold to
10, but this will generate a lot of false alarms (45,
to be exact). A better way is to collect more sam-
ples for templates. An even better approach is
to use more robust matching algorithms, such as
dynamic time warping or HMM (hidden Markov
model). To combine the results from wordspot-
ting and cheering detection, we apply simple logic
AND. However, we notice that it is also possible
to use other methods such as weighted sum. We
shall further investigate the information fusion as-
pect in the future.

Algorithm Correct Miss False

detect detect alarms
Wordspot 4 (out of 5) 1 (out of 5) 3
Wordspot + 4 (out of 5) 1 (out of 5) 1 Figure 7. Wordspotting results for the rst
Cheer detect and the second four sets.
Table 2. Audio detection results.
Finally, the results of shot identi cation are
5.2. Video Processing Results demonstrated. Basically, if a touchdown event ts
our model and if the kicking shot is correctly de-
We now present the results of shot segmenta- tected by the segmentation algorithm, then the
tion. The test data 2ndhalf2 is used as an ex- line extraction algorithm should have no problem
ample. Only 1; 471 frames are processed because detecting the goal post. Line mark detection is
we are only interested in the region around the more dicult, but our line extractor works quite
candidate detected by the audio processing mod- well nonetheless. We expect to have better re-
ules. Figure 9 shows the beginning of this video sults for extracting line mark when we incorpo-
rate color information into the edge detector. On
the other hand, currently the purpose of identify- Algorithm Correct Miss False
ing lining-up is mainly for determining the start of detect detect alarms
the touchdown act. As a result, we may use only Shot 4 (out of 5) 1 (out of 5) 0
the kicking shot detection to extract a touchdown Identify
sequence.
Table 3. Video analysis results.

6. Conclusion
In this paper we present a novel approach to
automatically extract important information from
football videos. Our system integrates speech
understanding and image analysis algorithms, so
that we can maximize detection accuracy and min-
imize computation cost at the same time. Our
algorithms have been tested extensively with real
data captured from TV programs. The prelimi-
nary results demonstrate the feasibility of our ap-
proach. In the future, we may work on the follow-
ing topics to improve the system.
More test data and more robust wordspotting
algorithms.
More complicated shot segmentation algo-
rithms with good shot transition models.
Other shot representation scheme such as the
mosaic used in the QBIC system [3].
Detecting other events such as fumbles.
References
[1] Y.-L. Chang and R. Alonso. Developing a mul-
timedia toolbox for the Khoros system. In SPIE
Proceedings, Multimedia: full-service impact on
business, education, and home, October 1995.
[2] A. Etemadi. Robust segmentation of edge data.
Technical report, University of Surrey, U.K.,
1992.
[3] M. Flickner et al. Query by image and video
Figure 8. Cheering detection results for the content: the QBIC system. IEEE Computer,
28(9):23{32, 1995.
rst and the second four sets. [4] Y. Gong et al. Automatic parsing of TV soccer
programs. In The 2nd IEEE International Con-
ference on Multimedia Computing, pages 167{
Table 3 presents the video analysis results. Of 174, May 1995.
the ve test sets with touchdowns, 2ndhalf6 does [5] A. Hampapur and T. Weymouth. Digital video
segmentation. In The 2nd ACM Int'l Conf. on
not t our model because its touchdown starts Multimedia, pages 357{364, Oct. 1994.
with kick-o (instead of lining-up) and ends with [6] S. S. Intille and A. F. Bobick. Tracking using a lo-
2-point conversion (instead of kicking an extra cal closed-world assumption: tracking in the foot-
point). Finally, Figures 11 illustrates the lining-up ball domain. Technical Report TR-296, M.I.T.,
and kicking shots identi ed by our algorithms. Aug. 1994.
[7] K. M. Knill and S. J. Young. Speaker dependent
keyword spotting for accessing stored speech.
Technical Report TR-193, Cambridge University
Engineering Department, Oct. 1994.
[8] R. C. Rose and E. M. Hofstetter. Techniques
for robust word spotting in continuous speech
messages. In Proc. Eurospeech, pages 1183{1186,
Sep. 1991.
[9] S. W. Smoliar and H. Zhang. Content-based
video indexing and retrieval. IEEE Multimedia,
1(2):62{75, 1994.
[10] N. Ward. The Lotec Speech Recognition Pack-
age. ftp.sanpo.t.u-tokyo.ac.jp: /pub/nigel/lotec,
1994.
[11] L. D. Wilcox and M. A. Bush. Training and
search algorithms for an interactive wordspotting
system. In Proc. ICASSP, 1992.
[12] A. Yoshitaka et al. Knowledge-assisted content-
based retrieval for multimedia databases. IEEE
Multimedia, 1(4):12{20, 1994.

Figure 9. The rst frame of 2ndhalf2 and its

cut detection results.

Figure 10. Collection of the rst frame in Figure 11. lining-up and kicking shots lo-
each shot for 2ndhalf2. cated for 2ndhalf3, 2ndhalf5, and 2ndhalf7.

Software Development: BCS Level 4 Certificate in IT study guide
From Everand
Software Development: BCS Level 4 Certificate in IT study guide
Tig Williams
3.5/5 (2)
Python Programming, Deep Learning: 3 Books in 1: A Complete Guide for Beginners, Python Coding for Ai, Neural Networks, & Machine Learning, Data Science/Analysis with Practical Exercises for Learners
From Everand
Python Programming, Deep Learning: 3 Books in 1: A Complete Guide for Beginners, Python Coding for Ai, Neural Networks, & Machine Learning, Data Science/Analysis with Practical Exercises for Learners
Anthony Adams
4/5 (4)
Multimedia Programming Using Max/MSP and TouchDesigner
From Everand
Multimedia Programming Using Max/MSP and TouchDesigner
Patrik Lechner
5/5 (3)
Learning Software Engineering
From Everand
Learning Software Engineering
IT Campus Academy
No ratings yet
Learn IoT Programming Using Node-RED: Begin to Code Full Stack IoT Apps and Edge Devices with Raspberry Pi, NodeJS, and Grafana
From Everand
Learn IoT Programming Using Node-RED: Begin to Code Full Stack IoT Apps and Edge Devices with Raspberry Pi, NodeJS, and Grafana
Bernardo Ronquillo Japón
No ratings yet
Mastering the Microsoft Deployment Toolkit
From Everand
Mastering the Microsoft Deployment Toolkit
Jeff Stokes
No ratings yet
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Human Visual System Model: Understanding Perception and Processing
From Everand
Human Visual System Model: Understanding Perception and Processing
Fouad Sabry
No ratings yet
Learn OpenCV with Python by Examples
From Everand
Learn OpenCV with Python by Examples
James Chen
No ratings yet
Using Vocals Determine Human Emotion
From Everand
Using Vocals Determine Human Emotion
Faiz ul haque Zeya
No ratings yet
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
From Everand
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
Mark Magic
No ratings yet
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
From Everand
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
Chitra Lele
No ratings yet
Data Compression: Unlocking Efficiency in Computer Vision with Data Compression
From Everand
Data Compression: Unlocking Efficiency in Computer Vision with Data Compression
Fouad Sabry
No ratings yet
Aphelion Software: Unlocking Vision: Exploring the Depths of Aphelion Software
From Everand
Aphelion Software: Unlocking Vision: Exploring the Depths of Aphelion Software
Fouad Sabry
No ratings yet
Digital Signal Processing for Audio Applications: Volume 2 - Code
From Everand
Digital Signal Processing for Audio Applications: Volume 2 - Code
Anton R Kamenov
5/5 (1)
Kinect in Motion – Audio and Visual Tracking by Example
From Everand
Kinect in Motion – Audio and Visual Tracking by Example
Clemente Giorio
No ratings yet
Programming And Coding begginers level
From Everand
Programming And Coding begginers level
Memo
No ratings yet
AN IMPROVED TECHNIQUE FOR MIX NOISE AND BLURRING REMOVAL IN DIGITAL IMAGES
From Everand
AN IMPROVED TECHNIQUE FOR MIX NOISE AND BLURRING REMOVAL IN DIGITAL IMAGES
UTKARSH SHUKLA
No ratings yet
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
Mastering Python Forensics
From Everand
Mastering Python Forensics
Spreitzenbarth Dr. Michael
4/5 (1)
Computerised Systems Architecture: An embedded systems approach
From Everand
Computerised Systems Architecture: An embedded systems approach
S Mathioudakis
No ratings yet
Rapid BeagleBoard Prototyping with MATLAB and Simulink
From Everand
Rapid BeagleBoard Prototyping with MATLAB and Simulink
Dr. Xuewu Dai
No ratings yet
Configuring IPCop Firewalls: Closing Borders with Open Source
From Everand
Configuring IPCop Firewalls: Closing Borders with Open Source
Barrie Dempster
No ratings yet
Text Analysis with Python: A Research-Oriented Guide
From Everand
Text Analysis with Python: A Research-Oriented Guide
Mamta Mittal
No ratings yet
Programming Concepts in Python
From Everand
Programming Concepts in Python
Robert Burns
No ratings yet
Real-Time Critical Systems
From Everand
Real-Time Critical Systems
Jordan Lee Mauro-Buhagiar
3/5 (1)
Learning .NET High-performance Programming
From Everand
Learning .NET High-performance Programming
Antonio Esposito
No ratings yet
Smarter Decisions – The Intersection of Internet of Things and Decision Science
From Everand
Smarter Decisions – The Intersection of Internet of Things and Decision Science
Jojo Moolayil
No ratings yet
Building Telephony Systems with OpenSER
From Everand
Building Telephony Systems with OpenSER
Goncalves Flavio E.
No ratings yet
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Learn Computer Science
From Everand
Learn Computer Science
Knowledge Flow
No ratings yet
Machine Learning Infrastructure and Best Practices for Software Engineers: Take your machine learning software from a prototype to a fully fledged software system
From Everand
Machine Learning Infrastructure and Best Practices for Software Engineers: Take your machine learning software from a prototype to a fully fledged software system
Miroslaw Staron
No ratings yet
Human-Machine Interface Design for Process Control Applications
From Everand
Human-Machine Interface Design for Process Control Applications
Jean-Yves Fiset
4/5 (2)
Learn Python Programming the Easy and Fun Way
From Everand
Learn Python Programming the Easy and Fun Way
Elaiya Iswera Lallan
No ratings yet
InduSoft Application Design and SCADA Deployment Recommendations for Industrial Control System Security
From Everand
InduSoft Application Design and SCADA Deployment Recommendations for Industrial Control System Security
Richard Clark
No ratings yet
Semantic Computing
From Everand
Semantic Computing
Phillip C.-Y. Sheu
No ratings yet
Learning OpenCV 3 Application Development
From Everand
Learning OpenCV 3 Application Development
Samyak Datta
No ratings yet
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
Learning VirtualDub: The complete guide to capturing, processing and encoding digital video
From Everand
Learning VirtualDub: The complete guide to capturing, processing and encoding digital video
Sohail Salehi
No ratings yet
OSWorkflow: A guide for Java developers and architects to integrating open-source Business Process Management
From Everand
OSWorkflow: A guide for Java developers and architects to integrating open-source Business Process Management
Diego Adrian Naya Lazo
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Live Trace Visualization for System and Program Comprehension in Large Software Landscapes
From Everand
Live Trace Visualization for System and Program Comprehension in Large Software Landscapes
Florian Fittkau
No ratings yet
Software Design And Development in your pocket
From Everand
Software Design And Development in your pocket
David Chen
5/5 (1)
Programming Concepts in Java
From Everand
Programming Concepts in Java
Robert Burns
No ratings yet
Raspberry Pi Blueprints
From Everand
Raspberry Pi Blueprints
Dan Nixon
No ratings yet
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet
Automatic Target Recognition: Fundamentals and Applications
From Everand
Automatic Target Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet
Building a BeagleBone Black Super Cluster
From Everand
Building a BeagleBone Black Super Cluster
Andreas Josef Reichel
No ratings yet
Hands-On Python for DevOps: Leverage Python's native libraries to streamline your workflow and save time with automation
From Everand
Hands-On Python for DevOps: Leverage Python's native libraries to streamline your workflow and save time with automation
Ankur Roy
No ratings yet
Computer Jargon - The Illustrated Glossary of Basic Computer Terminology: Decode and simplify complex computer terms with easy-to-follow visual guides
From Everand
Computer Jargon - The Illustrated Glossary of Basic Computer Terminology: Decode and simplify complex computer terms with easy-to-follow visual guides
Kevin Wilson
No ratings yet
Computer Jargon: The Illustrated Glossary of Basic Computer Terminology
From Everand
Computer Jargon: The Illustrated Glossary of Basic Computer Terminology
Kevin Wilson
No ratings yet
Python for Secret Agents
From Everand
Python for Secret Agents
Steven F. Lott
No ratings yet
Machine Vision: Insights into the World of Computer Vision
From Everand
Machine Vision: Insights into the World of Computer Vision
Fouad Sabry
No ratings yet
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
Exploring Temporal Consistency For Video Analysis and Retrieval
No ratings yet
Exploring Temporal Consistency For Video Analysis and Retrieval
10 pages
A10 3rd Review
No ratings yet
A10 3rd Review
22 pages
L5 ContentBasedVideoRetrieval
No ratings yet
L5 ContentBasedVideoRetrieval
26 pages
Immersion Into New Appkat Ons Image Underrfanding: Takeo Kanade, Robotics Institute, Carnegie Mellon University
No ratings yet
Immersion Into New Appkat Ons Image Underrfanding: Takeo Kanade, Robotics Institute, Carnegie Mellon University
8 pages
C2017 Information Fusion in Content Based Image Retrieval - A Comprehensive Overview
No ratings yet
C2017 Information Fusion in Content Based Image Retrieval - A Comprehensive Overview
11 pages
2059799117720617
No ratings yet
2059799117720617
10 pages
2059799118791397
No ratings yet
2059799118791397
14 pages
2059799119825576
No ratings yet
2059799119825576
9 pages
Achievement Motivation and the Adolescent Musician
No ratings yet
Achievement Motivation and the Adolescent Musician
10 pages
2059799117703119
No ratings yet
2059799117703119
14 pages
2059799117720611
No ratings yet
2059799117720611
11 pages
Tan_AIMusicGeneratorSuno_2024
No ratings yet
Tan_AIMusicGeneratorSuno_2024
34 pages
Bamford_Davidson_2017
No ratings yet
Bamford_Davidson_2017
55 pages
Marchiano_y_Martinez__2019___Motion_patterns__SMPC
No ratings yet
Marchiano_y_Martinez__2019___Motion_patterns__SMPC
2 pages
Kippetal2014 - anvil
No ratings yet
Kippetal2014 - anvil
5 pages
1029864918785941
No ratings yet
1029864918785941
18 pages
126007498
No ratings yet
126007498
13 pages
Scopus Artes Visuales y Performáticas
No ratings yet
Scopus Artes Visuales y Performáticas
32 pages
Scopus Psychology 2
No ratings yet
Scopus Psychology 2
25 pages
Scopus Lenguaje y Lingüsitica 1
No ratings yet
Scopus Lenguaje y Lingüsitica 1
46 pages
Grounding Creativity in Music Perception? A Multidisciplinary Conceptual Analysis
No ratings yet
Grounding Creativity in Music Perception? A Multidisciplinary Conceptual Analysis
15 pages
Scopus Filosofía
No ratings yet
Scopus Filosofía
36 pages
Zbikowski (1997) - Conceptual Models and Cross-Domain Mapping New Per
No ratings yet
Zbikowski (1997) - Conceptual Models and Cross-Domain Mapping New Per
34 pages
Lecture Notes: Competing in Foreign Markets
No ratings yet
Lecture Notes: Competing in Foreign Markets
33 pages
Electric Car Questions - Practice Questions With Answers & Explanations
No ratings yet
Electric Car Questions - Practice Questions With Answers & Explanations
13 pages
1st Sem resULT
No ratings yet
1st Sem resULT
1 page
Appendix A. English Proficiency Test
No ratings yet
Appendix A. English Proficiency Test
6 pages
List of Activities: 1. Think, Pair and Share
No ratings yet
List of Activities: 1. Think, Pair and Share
2 pages
MME (1)
No ratings yet
MME (1)
4 pages
Artificial Intelligence in The Context of Noosphere Studies
No ratings yet
Artificial Intelligence in The Context of Noosphere Studies
7 pages
Extreme Insight Into Customer Satisfaction
No ratings yet
Extreme Insight Into Customer Satisfaction
4 pages
Antibully Wordsearch
No ratings yet
Antibully Wordsearch
2 pages
Recruitment in Smes: Presented By: Manish Gupta Surendra Singh Yadav
No ratings yet
Recruitment in Smes: Presented By: Manish Gupta Surendra Singh Yadav
20 pages
De 2122 - Sua Bai
No ratings yet
De 2122 - Sua Bai
9 pages
Cr. Note No. RJSSC 14322
No ratings yet
Cr. Note No. RJSSC 14322
1 page
Microsoft Azure Training and Certifications: Aka - Ms/Azuretraincertdeck
No ratings yet
Microsoft Azure Training and Certifications: Aka - Ms/Azuretraincertdeck
55 pages
Updated Final Mechanical Engineering Handbook January 2021
No ratings yet
Updated Final Mechanical Engineering Handbook January 2021
37 pages
Metric Standards For Worldwide Manufacturing 2007 PDF
100% (5)
Metric Standards For Worldwide Manufacturing 2007 PDF
807 pages
Unit-1: Previous Year JNTUH Question
100% (1)
Unit-1: Previous Year JNTUH Question
2 pages
How Many Times Can We Order at A Time in Swiggy - Google Search
No ratings yet
How Many Times Can We Order at A Time in Swiggy - Google Search
1 page
I.D. Question CMA Exam Support
No ratings yet
I.D. Question CMA Exam Support
33 pages
Amy Carmichael, God's Missionary: May/June 2002
No ratings yet
Amy Carmichael, God's Missionary: May/June 2002
4 pages
Tugas Individu Sustainable & Lean Manufacturing Usamah
No ratings yet
Tugas Individu Sustainable & Lean Manufacturing Usamah
16 pages
ME Motor: Owmax
No ratings yet
ME Motor: Owmax
25 pages
Nanaji Deshmukh
100% (1)
Nanaji Deshmukh
3 pages
LLB 3 Y 1st Sem Family Law (English)
No ratings yet
LLB 3 Y 1st Sem Family Law (English)
32 pages
Coatings Audit Handbook Nov 2011
No ratings yet
Coatings Audit Handbook Nov 2011
19 pages
Modul 4 Human Error
No ratings yet
Modul 4 Human Error
16 pages
NapFly VTOL Hi-Scout Drone (v2)
No ratings yet
NapFly VTOL Hi-Scout Drone (v2)
1 page
Primates
No ratings yet
Primates
11 pages
08 MSDS Valine
No ratings yet
08 MSDS Valine
5 pages
Chapter 1 Worksheet Openstax
No ratings yet
Chapter 1 Worksheet Openstax
6 pages
Building Your Practice Routine
83% (6)
Building Your Practice Routine
6 pages

integrated_image_and_speech_analysis_for_content_based_video_indexing

Uploaded by

integrated_image_and_speech_analysis_for_content_based_video_indexing

Uploaded by

Integrated Image and Speech Analysis for Content-Based

Abstract a novel approach to the video information ex-

2. Audio Signal Analysis

resent con dence of a match. If the con dence

Starting with the candidate location supplied

4. Demo Video Database Group Name # of

Algorithm Correct Miss False

Figure 9. The rst frame of 2ndhalf2 and its

You might also like