SlideShare a Scribd company logo
Second Progress Presentation on
“Video Summarization”
Presented By:
Neeraj Baghel
M. Tech.(P) (CSE) II Yr
178150005
Supervised By:
Prof. Charul Bhatnagar
Professor, Deptt. of CEA
GLA University, Mathura
Dept. of Computer Engineering & Applications,
GLA University, Mathura.
October 24, 2018
1
Outline
• Introduction
• Literature Survey
• Research Gap
• Challenges
• Problem Statement
• Dataset
• Tools
• Conclusion
• References
2
Introduction To Video Summarization
Video
• Video data is a great asset for
information extraction and
knowledge discovery.
• Due to its size an variability, it is
extremely hard for users to
monitor.[5]
Video Summarization
• Intelligent video summarization
algorithms allow us to quickly
browse a lengthy video by
capturing the essence and
removing redundant
information.[5]
Fig 1: Video Summarization Work Flow [1]
3
Types of Video summarization
Video can be summarized by two different ways which are as follows.
Fig 2: Video Summarization Technique Classification [7]
4
Literature Survey
On:
“Video Summarization”
5
Paper 1: Tvsum: Summarizing web videos using
titles [2].
Video summarization is a
challenging problem in part
because knowing which
part of a video is important
requires prior knowledge
about its main topic. We
present TVSum, an
unsupervised video
summarization framework
that uses title-based image
search results to find
visually important shots.[2]
Authors- Yale Song, Jordi Vallmitjana, Amanda Stent, Alejandro Jaimes
Yahoo Labs, New York
IEEE Conference on Computer Vision and Pattern Recognition. 2015
Fig. 1. -Figure 1. An illustration of
title-based video
summarization.[2]
6
Objectiv
e
Proposed
Method
Dataset Strengt
h
Limitatio
n
1. To find
which part of a
video is
important. And
thus “summary
worthy,”
requires prior
knowledge
about its main
topic.
2. Proposed
TVSum ,an
unsupervised video
summarization
framework that
uses the video title
to find visually
important shots.
2. Author devel-
oped co-archetypal
analysis technique
that learns
canonical visual
concepts shared
between video and
images
1. TVSum50
dataset
2. SumMe
dataset
TVSum
unsupervised
video summa-
rization
framework that
uses the video
title to find
visually
important shots
1) Titles are
free-formed,
unconstrain
ed, and
often
written
ambiguousl
y,
2) How to
learn all
titles text.
7
Paper 2: Query-Focused Video Summarization: Dataset,
Evaluation, and A Memory Network Based Approach [5].
One of the main obstacles to the research
on video summarization is the user
subjectivity — users have various
preferences over the summaries. The
subjectiveness causes at least two
problems. First, no single video
summarizer fits all users unless it interacts
with and adapts to the individual users.
Second, it is very challenging to evaluate
theperformanceofavideosummarizer..[5]
Aidean SharghiA, Jacob S. LaurelB, and Boqing GongA
A University of Central Florida, Orlando
B University of Alabama at Birmingham
IEEE Conference on Computer Vision and Pattern Recognition. 2017
Fig. 2- Comparing the semantic
information captured by 48 captions and
by the concept tags we collected.[8]8
Objective Propose
d
Method
Dataset Strengt
h
Limitatio
n
Main obstacles to
the research on
video
summarization is
the user
subjectivity--users
have various
preferences over
the summaries.
Auhtor propose
a memory net-
work
parameterized
sequential
determinantal
point process
in order to attend
the user query
onto different
video frames
and shots.
1. UTEgocentri
c (UTE)
dataset
1)Introduces
user
preferences
in the form of
text queries
2) Author
collect dense
per-video-
shot concept
annotations
1)Collecting
dense per-
video-shot
concept annota-
Tions
9
Paper 3: Query-Conditioned Three-Player Adversarial
Network for Video Summarization [9].
Video summarization plays an important role in video understanding by selecting key
frames/shots.Traditionally,itaimstofindthemostrepresentativeanddiversecontentsinavideoas
short summaries. In this paper, Author propose a query-conditioned three-player generative
adversarialnetworktotacklethischallenge.Thegeneratorlearnsthejointrepresentationoftheuser
query and the video content, and the discriminator takes three pairs of query-conditioned
summariesastheinputtodiscriminatetherealsummaryfromageneratedandarandomone.[9]
Yujia Zhang12, Michael Kampffmeyer3, Xiaodan Liang4, Min Tan12, Eric P. Xing4
1 Institute of Automation,Chinese Academy of Sciences
2 University of Chinese Academy of Sciences
3 UiT The Arctic University of Norway
4 Carnegie Mellon University
IEEE Conference on Computer Vision and Pattern Recognition. 2018
Fig 3. Different video summarization10
Objectiv
e
Proposed
Method
Dataset Strengt
h
Limitatio
n
Main aims to
find the most
representative
and diverse
contents
in a video as
short
summaries.
Author propose a
query-conditioned
three-player
generative
adversarial network
to tackle this
challenge. The
generator learns
the joint
representation of
the user query and
the video content,
1. UTEgocentr
ic (UTE)
dataset
Results are
more accurate
based on user
query
1)Do not
randomly
generated
summary
11
Paper 4: Hierarchical Structure-Adaptive RNN for
Video Summarization [10].
The video data follow a hierarchical
structure, a video is composed of
shots, and a shot is composed of
several frames. While few existing
summarization approaches pay
attention to the shot segmentation
procedure. They generate shots by
some trivial strategies, such as fixed
length segmentation, which may
destroy the underlying hierarchical
structure of video data and further
reduce the quality of generated
summaries.[10]
Authors- Bin Zhao1, Xuelong Li2, Xiaoqiang Lu2
1 Northwestern Polytechnical University, Shaanxi, P. R. China
2 Chinese Academy of Sciences, Shaanxi, P. R. China
IEEE Conference on Computer Vision and Pattern Recognition. 2018
Fig. 4- The diagram of the proposed
HSA-RNN, where Layer 1 and Layer 2
are designed to exploit the video
structure and generate the video
summary[10]
12
Objective Propose
d
Method
Dataset Strengt
h
Limitatio
n
To make the
underlying
hierarchical
structure of
video data and
further improve
the quality of
generated
summaries.
Author propose
a structure-
adaptive video
summarization
approach that
integrates shot
segmentation
and video
summarization
into a
Hierarchical
Structure-
Adaptive RNN
1. SumMe
dataset
2. TVSum
dataset
1) Use
hierarchical
structure of
video data
improve the
quality of
generated
summaries.
1) Results are
not based on
user
subjectivity
13
Paper 5: Unsupervised object-level video summarization
with online motion auto-encoder [11].
Unsupervised video summarization
plays an important role on
digesting, browsing, and searching
the ever-growing videos every day,
and the underlying fine-grained
semantic and motion information
(i.e., objects of interest and their
key motions) in online videos has
been barely touched.[11]
Authors-Yujia ZhangA,Xiaodan LiangB,Dingwen ZhangC,Min TanA,Eric P.XingB
A University of Chinese Academy of Sciences, Beijing, China
B Carnegie Mellon University, Pittsburgh, PA, USA.
C Xidian university, Xi’an, China
IEEE Conference on Computer Vision and Pattern Recognition. 2018
Fig. 4- Different types of video
summarization techniques.[11]
14
Objective Propose
d
Method
Dataset Strengt
h
Limitatio
n
To extract key
motions
of participated
objects and
learning to
summarize in an
unsupervised and
online manner
Author propose
a novel online
motion Auto-
Encoder (online
motion-AE)
framework that
functions on the
super-segmented
object motion
clips.
1) OrangeVille
2) Base jumping
dataset from
public CoSum
dataset
1)Video
Summarized
based on
moving
object
instances.
2) Tracking
of each
moving
object.
1)Tracking of
too many
moving object
in a high speed.
It willvery
complex.
3) Results are
not based on
user
subjectivity
15
Research Gap
 Finding a title based video summarization where Titles are free-formed,
often written ambiguously having unsupervised learning of titles text.
 Collecting dense annotations of per-video-shot using learning
algorithms.
 Finding HSA RNN for Video Summarization based on user subjectivity
 Finding Unsupervised object-level video summarization with online
motion auto-encoder with user subjectivity
 Finding key frame based on extracted text and assign a weight to frame.
16
Challenges
Some Challenges related to video summarization:
 learning of all titles text.
 Accuracy of object learning algorithms.
 Assigning weight for extracted text.
 Recovering Loss of information
 Computationally expensive
 Evaluate the performance of a video summarizer
 No single video summarizer fits all users
17
Problem Statement
“Finding key frame based on
extracted text and assign a
weight to frame”
18
Datasets
 UT Egocnetric (UTE) [5]
The dataset contains 4 videos from head-mounted cameras, each about
3-5 hours long. (Size: 1.4Gb)
 SumMe [12]
The dataset consists of 25 videos which are single-shot and range in
length from 1-6 minutes. The dataset contains summaries created by 15
to 18 users with the constraint in length being that the summaries
should be 5% to 15% of the original video. (Size: 2.2 GB)
19
Datasets Cont…
 YouTube-8M [2]
YouTube-8M is a large-scale labeled video dataset that consists of
millions of YouTube video IDs and associated labels from a diverse
vocabulary of 4700+ visual entities
 Each video must be public and have at least 1000 views
 Each video must be between 120 and 500 seconds long
 Each video must be associated with at least one entity from our target
vocabulary
 Adult & sensitive content is removed (as determined by automated
classifiers)
May 2018 version (current): 6.1M videos, 3862 classes, 3.0 labels/video,
2.6B audio-visual features
20
Tools
 Matlab
Matlab is a commercial product that is pretty widely-used in the image
video processing community. It also has an adequate image processing
`toolbox,' and toolboxes for things like Kalman filters, neural networks,
genetic algorithms, and so on. It runs on most Unices, including Linux,
and on Windows 95/NT. For people who are researching into vision
algorithms, the lack of source code is a killer.
 OpenCV
OpenCV is a library of programming functions mainly aimed at real-
time computer vision. Originally developed by Intel. The library is
cross-platform and free for use under the open-source BSD license.
21
Tools Cont…
 Python
Python is an interpreted high-level programming language for general-
purpose programming. Created by Guido van Rossum and first released
in 1991, Python has a design philosophy that emphasizes code
readability, notably using significant whitespace
22
Conclusion:
23
 The Text retrieval can be used to assign the weight for a frame and
that can be used as one more feature for generating video
summary.
References:
1. https://www.slideshare.net/MikolajLeszczuk/results-on-video-summarization
(D.L.V 01/09/18)
2. Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes, “Tvsum: Summarizing web
videos using titles,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 5179–5187, 2015
3. Y. Zhuang, R. Xiao, and F. Wu, “Key issues in video summarization and its
application,” in Information, Communications and Signal Processing, 2003
and Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003
Joint Conference of the Fourth International Conference on, vol. 1, pp. 448–
452, IEEE, 2003
4. R. Kansagara, D. Thakore, and M. Joshi, “A study on video summarization
tech-niques,” International Journal of Innovative Research in Computer and
Communication engineering, vol. 2, 2014.
5. A. Sharghi, J. S. Laurel, and B. Gong, “Query-focused video summarization:
Dataset, evaluation, and a memory network based approach,” in The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR),pp. 2127–
2136, 201724
References Cont…
6. P. Mundur, Y. Rao, and Y. Yesha, “Keyframe-based video summarization using delaunay
clustering,” International Journal on Digital Libraries, vol. 6, no. 2, pp. 219–232, 2006
7. M,Padmavathi, Y. Rao, and Y. Yesha. "Keyframe-based video summarization using
Delaunay clustering." International Journal on Digital Libraries 6.2 (2006): 219-232.
8. S. Yeung, A. Fathi, and L. Fei-Fei. Videoset: Video summary evaluation through text.
arXiv preprint arXiv:1406.5824, 2014. 1, 2, 3, 4, 5, 8
9. Y. Zhang, M. Kampffmeyer, X. Liang, M. Tan, and E. P. Xing, “Query-conditioned
three-player adversarial network for video summarization,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2018.
10. B.Zhao, X.Li, & X.Lu,. HSA-RNN: Hierarchical Structure-Adaptive RNN for Video
Summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (pp. 7405-7414) (2018).
11. Y.Zhang, X.Liang, D.Zhang, M.Tan, & E.P Xing Unsupervised Object-Level Video
Summarization with Online Motion Auto-Encoder. arXiv preprint
arXiv:1801.00543.(2018)
12. M.Gygli, H.Grabner, H.Riemenschneider, & L. Van Gool. Creating summaries from user
videos. In European conference on computer vision (pp. 505-520). Springer,
Cham.(2014)
25
Thank You
26
Ad

Recommended

Unsupervised Video Summarization via Attention-Driven Adversarial Learning
Unsupervised Video Summarization via Attention-Driven Adversarial Learning
VasileiosMezaris
 
Mtech First progress PRESENTATION ON VIDEO SUMMARIZATION
Mtech First progress PRESENTATION ON VIDEO SUMMARIZATION
NEERAJ BAGHEL
 
PGL SUM Video Summarization
PGL SUM Video Summarization
VasileiosMezaris
 
Keyframe-based Video Summarization Designer
Keyframe-based Video Summarization Designer
Universitat Politècnica de Catalunya
 
Results on video summarization
Results on video summarization
Mikolaj Leszczuk
 
Explaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attention
VasileiosMezaris
 
07 regularization
07 regularization
Ronald Teo
 
DTrace Topics: Introduction
DTrace Topics: Introduction
Brendan Gregg
 
GAN-based video summarization
GAN-based video summarization
VasileiosMezaris
 
Human Activity Recognition
Human Activity Recognition
AshwinGill1
 
Video Transformers.pptx
Video Transformers.pptx
Sangmin Woo
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecs
Qualcomm Research
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
Thomas Graf
 
Precise LSTM Algorithm
Precise LSTM Algorithm
YasutoTamura1
 
VVC tutorial at ICME 2020 together with Benjamin Bross
VVC tutorial at ICME 2020 together with Benjamin Bross
Mathias Wien
 
Hardware Probing in the Linux Kernel
Hardware Probing in the Linux Kernel
Kernel TLV
 
Multimodal deep learning
Multimodal deep learning
hoai_ln
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative Models
Ryohei Suzuki
 
intel Sync. & Edge Solution udpate xEng-v1.0.pptx
intel Sync. & Edge Solution udpate xEng-v1.0.pptx
Alex Wooram Kim
 
IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...
IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...
Vignesh V Menon
 
SDI to IP 2110 Transition Part 2
SDI to IP 2110 Transition Part 2
Dr. Mohieddin Moradi
 
Learning occam razor
Learning occam razor
Minakshi Atre
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
pratik pratyay
 
Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)
준식 최
 
Sequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
Jaya Kawale
 
It's Time to ROCm!
It's Time to ROCm!
inside-BigData.com
 
HEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam Mariappan
Vinayagam Mariappan
 
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
Melanie Swan
 
M.tech Third progress Presentation
M.tech Third progress Presentation
NEERAJ BAGHEL
 
Mtech Fourth progress presentation
Mtech Fourth progress presentation
NEERAJ BAGHEL
 

More Related Content

What's hot (20)

GAN-based video summarization
GAN-based video summarization
VasileiosMezaris
 
Human Activity Recognition
Human Activity Recognition
AshwinGill1
 
Video Transformers.pptx
Video Transformers.pptx
Sangmin Woo
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecs
Qualcomm Research
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
Thomas Graf
 
Precise LSTM Algorithm
Precise LSTM Algorithm
YasutoTamura1
 
VVC tutorial at ICME 2020 together with Benjamin Bross
VVC tutorial at ICME 2020 together with Benjamin Bross
Mathias Wien
 
Hardware Probing in the Linux Kernel
Hardware Probing in the Linux Kernel
Kernel TLV
 
Multimodal deep learning
Multimodal deep learning
hoai_ln
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative Models
Ryohei Suzuki
 
intel Sync. & Edge Solution udpate xEng-v1.0.pptx
intel Sync. & Edge Solution udpate xEng-v1.0.pptx
Alex Wooram Kim
 
IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...
IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...
Vignesh V Menon
 
SDI to IP 2110 Transition Part 2
SDI to IP 2110 Transition Part 2
Dr. Mohieddin Moradi
 
Learning occam razor
Learning occam razor
Minakshi Atre
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
pratik pratyay
 
Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)
준식 최
 
Sequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
Jaya Kawale
 
It's Time to ROCm!
It's Time to ROCm!
inside-BigData.com
 
HEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam Mariappan
Vinayagam Mariappan
 
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
Melanie Swan
 
GAN-based video summarization
GAN-based video summarization
VasileiosMezaris
 
Human Activity Recognition
Human Activity Recognition
AshwinGill1
 
Video Transformers.pptx
Video Transformers.pptx
Sangmin Woo
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecs
Qualcomm Research
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
Thomas Graf
 
Precise LSTM Algorithm
Precise LSTM Algorithm
YasutoTamura1
 
VVC tutorial at ICME 2020 together with Benjamin Bross
VVC tutorial at ICME 2020 together with Benjamin Bross
Mathias Wien
 
Hardware Probing in the Linux Kernel
Hardware Probing in the Linux Kernel
Kernel TLV
 
Multimodal deep learning
Multimodal deep learning
hoai_ln
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative Models
Ryohei Suzuki
 
intel Sync. & Edge Solution udpate xEng-v1.0.pptx
intel Sync. & Edge Solution udpate xEng-v1.0.pptx
Alex Wooram Kim
 
IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...
IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...
Vignesh V Menon
 
Learning occam razor
Learning occam razor
Minakshi Atre
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
pratik pratyay
 
Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)
준식 최
 
Sequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
Jaya Kawale
 
HEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam Mariappan
Vinayagam Mariappan
 
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
Melanie Swan
 

Similar to Mtech Second progresspresentation ON VIDEO SUMMARIZATION (20)

M.tech Third progress Presentation
M.tech Third progress Presentation
NEERAJ BAGHEL
 
Mtech Fourth progress presentation
Mtech Fourth progress presentation
NEERAJ BAGHEL
 
Hierarchical structure adaptive
Hierarchical structure adaptive
NEERAJ BAGHEL
 
Unsupervised object-level video summarization with online motion auto-encoder
Unsupervised object-level video summarization with online motion auto-encoder
NEERAJ BAGHEL
 
TVSum: Summarizing Web Videos Using Titles
TVSum: Summarizing Web Videos Using Titles
NEERAJ BAGHEL
 
Parking Surveillance Footage Summarization
Parking Surveillance Footage Summarization
IRJET Journal
 
Query focused video summarization
Query focused video summarization
NEERAJ BAGHEL
 
Unsupervised video summarization framework using keyframe extraction and vide...
Unsupervised video summarization framework using keyframe extraction and vide...
Shruti Jadon
 
Icme2020 tutorial video_summarization_part1
Icme2020 tutorial video_summarization_part1
VasileiosMezaris
 
CA-SUM Video Summarization
CA-SUM Video Summarization
VasileiosMezaris
 
Summarizing videos with Attention
Summarizing videos with Attention
Arithmer Inc.
 
Enhancing Video Summarization via Vision-Language Embedding
Enhancing Video Summarization via Vision-Language Embedding
ivaderivader
 
Bhuvan T...................... D[1].pptx
Bhuvan T...................... D[1].pptx
chethanrohit0045
 
Multimodal video abstraction into a static document using deep learning
Multimodal video abstraction into a static document using deep learning
IJECEIAES
 
SUMMARY GENERATION FOR LECTURING VIDEOS
SUMMARY GENERATION FOR LECTURING VIDEOS
IRJET Journal
 
Semantic Summarization of videos, Semantic Summarization of videos
Semantic Summarization of videos, Semantic Summarization of videos
darsh228313
 
Video Summarization
Video Summarization
IRJET Journal
 
Defense_20140625
Defense_20140625
Shun-Hsing Ou
 
Dataset and methods for 360-degree video summarization
Dataset and methods for 360-degree video summarization
VasileiosMezaris
 
Key frame extraction for video summarization using motion activity descriptors
Key frame extraction for video summarization using motion activity descriptors
eSAT Journals
 
M.tech Third progress Presentation
M.tech Third progress Presentation
NEERAJ BAGHEL
 
Mtech Fourth progress presentation
Mtech Fourth progress presentation
NEERAJ BAGHEL
 
Hierarchical structure adaptive
Hierarchical structure adaptive
NEERAJ BAGHEL
 
Unsupervised object-level video summarization with online motion auto-encoder
Unsupervised object-level video summarization with online motion auto-encoder
NEERAJ BAGHEL
 
TVSum: Summarizing Web Videos Using Titles
TVSum: Summarizing Web Videos Using Titles
NEERAJ BAGHEL
 
Parking Surveillance Footage Summarization
Parking Surveillance Footage Summarization
IRJET Journal
 
Query focused video summarization
Query focused video summarization
NEERAJ BAGHEL
 
Unsupervised video summarization framework using keyframe extraction and vide...
Unsupervised video summarization framework using keyframe extraction and vide...
Shruti Jadon
 
Icme2020 tutorial video_summarization_part1
Icme2020 tutorial video_summarization_part1
VasileiosMezaris
 
CA-SUM Video Summarization
CA-SUM Video Summarization
VasileiosMezaris
 
Summarizing videos with Attention
Summarizing videos with Attention
Arithmer Inc.
 
Enhancing Video Summarization via Vision-Language Embedding
Enhancing Video Summarization via Vision-Language Embedding
ivaderivader
 
Bhuvan T...................... D[1].pptx
Bhuvan T...................... D[1].pptx
chethanrohit0045
 
Multimodal video abstraction into a static document using deep learning
Multimodal video abstraction into a static document using deep learning
IJECEIAES
 
SUMMARY GENERATION FOR LECTURING VIDEOS
SUMMARY GENERATION FOR LECTURING VIDEOS
IRJET Journal
 
Semantic Summarization of videos, Semantic Summarization of videos
Semantic Summarization of videos, Semantic Summarization of videos
darsh228313
 
Dataset and methods for 360-degree video summarization
Dataset and methods for 360-degree video summarization
VasileiosMezaris
 
Key frame extraction for video summarization using motion activity descriptors
Key frame extraction for video summarization using motion activity descriptors
eSAT Journals
 
Ad

More from NEERAJ BAGHEL (9)

Generating super resolution images using transformers
Generating super resolution images using transformers
NEERAJ BAGHEL
 
Latex intro
Latex intro
NEERAJ BAGHEL
 
Host rank:Exploiting the Hierarchical Structure for Link Analysis
Host rank:Exploiting the Hierarchical Structure for Link Analysis
NEERAJ BAGHEL
 
Traffic behavior of local area network based on
Traffic behavior of local area network based on
NEERAJ BAGHEL
 
A Framework For Dynamic Hand Gesture Recognition Using Key Frames Extraction
A Framework For Dynamic Hand Gesture Recognition Using Key Frames Extraction
NEERAJ BAGHEL
 
Fingerprint recognition
Fingerprint recognition
NEERAJ BAGHEL
 
Disk scheduling
Disk scheduling
NEERAJ BAGHEL
 
SMOWSER (A VOICE BASED BROWSER)
SMOWSER (A VOICE BASED BROWSER)
NEERAJ BAGHEL
 
Itvv project ppt
Itvv project ppt
NEERAJ BAGHEL
 
Generating super resolution images using transformers
Generating super resolution images using transformers
NEERAJ BAGHEL
 
Host rank:Exploiting the Hierarchical Structure for Link Analysis
Host rank:Exploiting the Hierarchical Structure for Link Analysis
NEERAJ BAGHEL
 
Traffic behavior of local area network based on
Traffic behavior of local area network based on
NEERAJ BAGHEL
 
A Framework For Dynamic Hand Gesture Recognition Using Key Frames Extraction
A Framework For Dynamic Hand Gesture Recognition Using Key Frames Extraction
NEERAJ BAGHEL
 
Fingerprint recognition
Fingerprint recognition
NEERAJ BAGHEL
 
SMOWSER (A VOICE BASED BROWSER)
SMOWSER (A VOICE BASED BROWSER)
NEERAJ BAGHEL
 
Ad

Recently uploaded (20)

Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
resming1
 
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
João Esperancinha
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-ABB Furse.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-ABB Furse.pdf
djiceramil
 
02 - Ethics & Professionalism - BEM, IEM, MySET.PPT
02 - Ethics & Professionalism - BEM, IEM, MySET.PPT
SharinAbGhani1
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
djiceramil
 
Pavement and its types, Application of rigid and Flexible Pavements
Pavement and its types, Application of rigid and Flexible Pavements
Sakthivel M
 
60 Years and Beyond eBook 1234567891.pdf
60 Years and Beyond eBook 1234567891.pdf
waseemalazzeh
 
Week 6- PC HARDWARE AND MAINTENANCE-THEORY.pptx
Week 6- PC HARDWARE AND MAINTENANCE-THEORY.pptx
dayananda54
 
grade 9 science q1 quiz.pptx science quiz
grade 9 science q1 quiz.pptx science quiz
norfapangolima
 
Low Power SI Class E Power Amplifier and Rf Switch for Health Care
Low Power SI Class E Power Amplifier and Rf Switch for Health Care
ieijjournal
 
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
resming1
 
Understanding Amplitude Modulation : A Guide
Understanding Amplitude Modulation : A Guide
CircuitDigest
 
Modern multi-proposer consensus implementations
Modern multi-proposer consensus implementations
François Garillot
 
Engineering Mechanics Introduction and its Application
Engineering Mechanics Introduction and its Application
Sakthivel M
 
IntroSlides-June-GDG-Cloud-Munich community [email protected]
IntroSlides-June-GDG-Cloud-Munich community [email protected]
Luiz Carneiro
 
Industry 4.o the fourth revolutionWeek-2.pptx
Industry 4.o the fourth revolutionWeek-2.pptx
KNaveenKumarECE
 
OCS Group SG - HPHT Well Design and Operation - SN.pdf
OCS Group SG - HPHT Well Design and Operation - SN.pdf
Muanisa Waras
 
machine learning is a advance technology
machine learning is a advance technology
ynancy893
 
Center Enamel can Provide Aluminum Dome Roofs for diesel tank.docx
Center Enamel can Provide Aluminum Dome Roofs for diesel tank.docx
CenterEnamel
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-Glands & Lugs, Simplex...
362 Alec Data Center Solutions-Slysium Data Center-AUH-Glands & Lugs, Simplex...
djiceramil
 
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
resming1
 
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
João Esperancinha
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-ABB Furse.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-ABB Furse.pdf
djiceramil
 
02 - Ethics & Professionalism - BEM, IEM, MySET.PPT
02 - Ethics & Professionalism - BEM, IEM, MySET.PPT
SharinAbGhani1
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
djiceramil
 
Pavement and its types, Application of rigid and Flexible Pavements
Pavement and its types, Application of rigid and Flexible Pavements
Sakthivel M
 
60 Years and Beyond eBook 1234567891.pdf
60 Years and Beyond eBook 1234567891.pdf
waseemalazzeh
 
Week 6- PC HARDWARE AND MAINTENANCE-THEORY.pptx
Week 6- PC HARDWARE AND MAINTENANCE-THEORY.pptx
dayananda54
 
grade 9 science q1 quiz.pptx science quiz
grade 9 science q1 quiz.pptx science quiz
norfapangolima
 
Low Power SI Class E Power Amplifier and Rf Switch for Health Care
Low Power SI Class E Power Amplifier and Rf Switch for Health Care
ieijjournal
 
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
resming1
 
Understanding Amplitude Modulation : A Guide
Understanding Amplitude Modulation : A Guide
CircuitDigest
 
Modern multi-proposer consensus implementations
Modern multi-proposer consensus implementations
François Garillot
 
Engineering Mechanics Introduction and its Application
Engineering Mechanics Introduction and its Application
Sakthivel M
 
Industry 4.o the fourth revolutionWeek-2.pptx
Industry 4.o the fourth revolutionWeek-2.pptx
KNaveenKumarECE
 
OCS Group SG - HPHT Well Design and Operation - SN.pdf
OCS Group SG - HPHT Well Design and Operation - SN.pdf
Muanisa Waras
 
machine learning is a advance technology
machine learning is a advance technology
ynancy893
 
Center Enamel can Provide Aluminum Dome Roofs for diesel tank.docx
Center Enamel can Provide Aluminum Dome Roofs for diesel tank.docx
CenterEnamel
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-Glands & Lugs, Simplex...
362 Alec Data Center Solutions-Slysium Data Center-AUH-Glands & Lugs, Simplex...
djiceramil
 

Mtech Second progresspresentation ON VIDEO SUMMARIZATION

  • 1. Second Progress Presentation on “Video Summarization” Presented By: Neeraj Baghel M. Tech.(P) (CSE) II Yr 178150005 Supervised By: Prof. Charul Bhatnagar Professor, Deptt. of CEA GLA University, Mathura Dept. of Computer Engineering & Applications, GLA University, Mathura. October 24, 2018 1
  • 2. Outline • Introduction • Literature Survey • Research Gap • Challenges • Problem Statement • Dataset • Tools • Conclusion • References 2
  • 3. Introduction To Video Summarization Video • Video data is a great asset for information extraction and knowledge discovery. • Due to its size an variability, it is extremely hard for users to monitor.[5] Video Summarization • Intelligent video summarization algorithms allow us to quickly browse a lengthy video by capturing the essence and removing redundant information.[5] Fig 1: Video Summarization Work Flow [1] 3
  • 4. Types of Video summarization Video can be summarized by two different ways which are as follows. Fig 2: Video Summarization Technique Classification [7] 4
  • 6. Paper 1: Tvsum: Summarizing web videos using titles [2]. Video summarization is a challenging problem in part because knowing which part of a video is important requires prior knowledge about its main topic. We present TVSum, an unsupervised video summarization framework that uses title-based image search results to find visually important shots.[2] Authors- Yale Song, Jordi Vallmitjana, Amanda Stent, Alejandro Jaimes Yahoo Labs, New York IEEE Conference on Computer Vision and Pattern Recognition. 2015 Fig. 1. -Figure 1. An illustration of title-based video summarization.[2] 6
  • 7. Objectiv e Proposed Method Dataset Strengt h Limitatio n 1. To find which part of a video is important. And thus “summary worthy,” requires prior knowledge about its main topic. 2. Proposed TVSum ,an unsupervised video summarization framework that uses the video title to find visually important shots. 2. Author devel- oped co-archetypal analysis technique that learns canonical visual concepts shared between video and images 1. TVSum50 dataset 2. SumMe dataset TVSum unsupervised video summa- rization framework that uses the video title to find visually important shots 1) Titles are free-formed, unconstrain ed, and often written ambiguousl y, 2) How to learn all titles text. 7
  • 8. Paper 2: Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach [5]. One of the main obstacles to the research on video summarization is the user subjectivity — users have various preferences over the summaries. The subjectiveness causes at least two problems. First, no single video summarizer fits all users unless it interacts with and adapts to the individual users. Second, it is very challenging to evaluate theperformanceofavideosummarizer..[5] Aidean SharghiA, Jacob S. LaurelB, and Boqing GongA A University of Central Florida, Orlando B University of Alabama at Birmingham IEEE Conference on Computer Vision and Pattern Recognition. 2017 Fig. 2- Comparing the semantic information captured by 48 captions and by the concept tags we collected.[8]8
  • 9. Objective Propose d Method Dataset Strengt h Limitatio n Main obstacles to the research on video summarization is the user subjectivity--users have various preferences over the summaries. Auhtor propose a memory net- work parameterized sequential determinantal point process in order to attend the user query onto different video frames and shots. 1. UTEgocentri c (UTE) dataset 1)Introduces user preferences in the form of text queries 2) Author collect dense per-video- shot concept annotations 1)Collecting dense per- video-shot concept annota- Tions 9
  • 10. Paper 3: Query-Conditioned Three-Player Adversarial Network for Video Summarization [9]. Video summarization plays an important role in video understanding by selecting key frames/shots.Traditionally,itaimstofindthemostrepresentativeanddiversecontentsinavideoas short summaries. In this paper, Author propose a query-conditioned three-player generative adversarialnetworktotacklethischallenge.Thegeneratorlearnsthejointrepresentationoftheuser query and the video content, and the discriminator takes three pairs of query-conditioned summariesastheinputtodiscriminatetherealsummaryfromageneratedandarandomone.[9] Yujia Zhang12, Michael Kampffmeyer3, Xiaodan Liang4, Min Tan12, Eric P. Xing4 1 Institute of Automation,Chinese Academy of Sciences 2 University of Chinese Academy of Sciences 3 UiT The Arctic University of Norway 4 Carnegie Mellon University IEEE Conference on Computer Vision and Pattern Recognition. 2018 Fig 3. Different video summarization10
  • 11. Objectiv e Proposed Method Dataset Strengt h Limitatio n Main aims to find the most representative and diverse contents in a video as short summaries. Author propose a query-conditioned three-player generative adversarial network to tackle this challenge. The generator learns the joint representation of the user query and the video content, 1. UTEgocentr ic (UTE) dataset Results are more accurate based on user query 1)Do not randomly generated summary 11
  • 12. Paper 4: Hierarchical Structure-Adaptive RNN for Video Summarization [10]. The video data follow a hierarchical structure, a video is composed of shots, and a shot is composed of several frames. While few existing summarization approaches pay attention to the shot segmentation procedure. They generate shots by some trivial strategies, such as fixed length segmentation, which may destroy the underlying hierarchical structure of video data and further reduce the quality of generated summaries.[10] Authors- Bin Zhao1, Xuelong Li2, Xiaoqiang Lu2 1 Northwestern Polytechnical University, Shaanxi, P. R. China 2 Chinese Academy of Sciences, Shaanxi, P. R. China IEEE Conference on Computer Vision and Pattern Recognition. 2018 Fig. 4- The diagram of the proposed HSA-RNN, where Layer 1 and Layer 2 are designed to exploit the video structure and generate the video summary[10] 12
  • 13. Objective Propose d Method Dataset Strengt h Limitatio n To make the underlying hierarchical structure of video data and further improve the quality of generated summaries. Author propose a structure- adaptive video summarization approach that integrates shot segmentation and video summarization into a Hierarchical Structure- Adaptive RNN 1. SumMe dataset 2. TVSum dataset 1) Use hierarchical structure of video data improve the quality of generated summaries. 1) Results are not based on user subjectivity 13
  • 14. Paper 5: Unsupervised object-level video summarization with online motion auto-encoder [11]. Unsupervised video summarization plays an important role on digesting, browsing, and searching the ever-growing videos every day, and the underlying fine-grained semantic and motion information (i.e., objects of interest and their key motions) in online videos has been barely touched.[11] Authors-Yujia ZhangA,Xiaodan LiangB,Dingwen ZhangC,Min TanA,Eric P.XingB A University of Chinese Academy of Sciences, Beijing, China B Carnegie Mellon University, Pittsburgh, PA, USA. C Xidian university, Xi’an, China IEEE Conference on Computer Vision and Pattern Recognition. 2018 Fig. 4- Different types of video summarization techniques.[11] 14
  • 15. Objective Propose d Method Dataset Strengt h Limitatio n To extract key motions of participated objects and learning to summarize in an unsupervised and online manner Author propose a novel online motion Auto- Encoder (online motion-AE) framework that functions on the super-segmented object motion clips. 1) OrangeVille 2) Base jumping dataset from public CoSum dataset 1)Video Summarized based on moving object instances. 2) Tracking of each moving object. 1)Tracking of too many moving object in a high speed. It willvery complex. 3) Results are not based on user subjectivity 15
  • 16. Research Gap  Finding a title based video summarization where Titles are free-formed, often written ambiguously having unsupervised learning of titles text.  Collecting dense annotations of per-video-shot using learning algorithms.  Finding HSA RNN for Video Summarization based on user subjectivity  Finding Unsupervised object-level video summarization with online motion auto-encoder with user subjectivity  Finding key frame based on extracted text and assign a weight to frame. 16
  • 17. Challenges Some Challenges related to video summarization:  learning of all titles text.  Accuracy of object learning algorithms.  Assigning weight for extracted text.  Recovering Loss of information  Computationally expensive  Evaluate the performance of a video summarizer  No single video summarizer fits all users 17
  • 18. Problem Statement “Finding key frame based on extracted text and assign a weight to frame” 18
  • 19. Datasets  UT Egocnetric (UTE) [5] The dataset contains 4 videos from head-mounted cameras, each about 3-5 hours long. (Size: 1.4Gb)  SumMe [12] The dataset consists of 25 videos which are single-shot and range in length from 1-6 minutes. The dataset contains summaries created by 15 to 18 users with the constraint in length being that the summaries should be 5% to 15% of the original video. (Size: 2.2 GB) 19
  • 20. Datasets Cont…  YouTube-8M [2] YouTube-8M is a large-scale labeled video dataset that consists of millions of YouTube video IDs and associated labels from a diverse vocabulary of 4700+ visual entities  Each video must be public and have at least 1000 views  Each video must be between 120 and 500 seconds long  Each video must be associated with at least one entity from our target vocabulary  Adult & sensitive content is removed (as determined by automated classifiers) May 2018 version (current): 6.1M videos, 3862 classes, 3.0 labels/video, 2.6B audio-visual features 20
  • 21. Tools  Matlab Matlab is a commercial product that is pretty widely-used in the image video processing community. It also has an adequate image processing `toolbox,' and toolboxes for things like Kalman filters, neural networks, genetic algorithms, and so on. It runs on most Unices, including Linux, and on Windows 95/NT. For people who are researching into vision algorithms, the lack of source code is a killer.  OpenCV OpenCV is a library of programming functions mainly aimed at real- time computer vision. Originally developed by Intel. The library is cross-platform and free for use under the open-source BSD license. 21
  • 22. Tools Cont…  Python Python is an interpreted high-level programming language for general- purpose programming. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace 22
  • 23. Conclusion: 23  The Text retrieval can be used to assign the weight for a frame and that can be used as one more feature for generating video summary.
  • 24. References: 1. https://www.slideshare.net/MikolajLeszczuk/results-on-video-summarization (D.L.V 01/09/18) 2. Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes, “Tvsum: Summarizing web videos using titles,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5179–5187, 2015 3. Y. Zhuang, R. Xiao, and F. Wu, “Key issues in video summarization and its application,” in Information, Communications and Signal Processing, 2003 and Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint Conference of the Fourth International Conference on, vol. 1, pp. 448– 452, IEEE, 2003 4. R. Kansagara, D. Thakore, and M. Joshi, “A study on video summarization tech-niques,” International Journal of Innovative Research in Computer and Communication engineering, vol. 2, 2014. 5. A. Sharghi, J. S. Laurel, and B. Gong, “Query-focused video summarization: Dataset, evaluation, and a memory network based approach,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR),pp. 2127– 2136, 201724
  • 25. References Cont… 6. P. Mundur, Y. Rao, and Y. Yesha, “Keyframe-based video summarization using delaunay clustering,” International Journal on Digital Libraries, vol. 6, no. 2, pp. 219–232, 2006 7. M,Padmavathi, Y. Rao, and Y. Yesha. "Keyframe-based video summarization using Delaunay clustering." International Journal on Digital Libraries 6.2 (2006): 219-232. 8. S. Yeung, A. Fathi, and L. Fei-Fei. Videoset: Video summary evaluation through text. arXiv preprint arXiv:1406.5824, 2014. 1, 2, 3, 4, 5, 8 9. Y. Zhang, M. Kampffmeyer, X. Liang, M. Tan, and E. P. Xing, “Query-conditioned three-player adversarial network for video summarization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 10. B.Zhao, X.Li, & X.Lu,. HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7405-7414) (2018). 11. Y.Zhang, X.Liang, D.Zhang, M.Tan, & E.P Xing Unsupervised Object-Level Video Summarization with Online Motion Auto-Encoder. arXiv preprint arXiv:1801.00543.(2018) 12. M.Gygli, H.Grabner, H.Riemenschneider, & L. Van Gool. Creating summaries from user videos. In European conference on computer vision (pp. 505-520). Springer, Cham.(2014) 25

Editor's Notes

  • #6: Recently due to progress in Complementary Metal Oxide Semiconductor (CMOS) technology, Wireless Multimedia Sensor Networks (WMSNs) become focus of research in a broader range of applications.
  • #7: Data aggregation eliminates redundancy and improves bandwidth utilization and energy-efficiency of sensor nodes.
  • #9: Data aggregation eliminates redundancy and improves bandwidth utilization and energy-efficiency of sensor nodes.
  • #11: Data aggregation eliminates redundancy and improves bandwidth utilization and energy-efficiency of sensor nodes.
  • #13: Data aggregation eliminates redundancy and improves bandwidth utilization and energy-efficiency of sensor nodes.
  • #15: Data aggregation eliminates redundancy and improves bandwidth utilization and energy-efficiency of sensor nodes.