SemanticVideo_compress_2019

This paper presents a novel approach for semantic segmentation in compressed videos, utilizing I-frames and P-frames without extracting all frames as RGB images, which significantly reduces computation time. The proposed ConvLSTM model captures temporal information to enhance segmentation accuracy while maintaining speed, outperforming traditional frame-by-frame methods. Experimental results demonstrate that this method achieves similar accuracy to existing models but operates much faster, making it suitable for real-time applications like autonomous driving.

Uploaded by

hungmanucian29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

SemanticVideo_compress_2019

Uploaded by

hungmanucian29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Semantic Segmentation in Compressed Videos

Ang Li* Yiwei Lu* Yang Wang

University of Manitoba University of Manitoba University of Manitoba
Winnipeg, Canada Winnipeg, Canada Winnipeg, Canada
[email protected] [email protected] [email protected]

Abstract—Existing approaches for semantic segmentation in

videos usually extract each frame as an RGB image, then apply
standard image-based semantic segmentation models on each
frame. This is time-consuming. In this paper, we tackle this
problem by exploring the nature of video compression techniques.
A compressed video contains three types of frames, I-frames,
P-frames, and B-frames. I-frames are represented as regular
images, P-frames are represented as motion vectors and residual
errors, and B-frames are bidirectionally frames that can be Fig. 1. Current solutions of semantic segmentation in videos require extracting
regarded as a special case of a P frame. We propose a method all frames as regular RGB images, then process each image separately to
that directly operates on I-frames (as RGB images) and P- produce its semantic segmentation. This can lead to heavy computation and
low speed. In this paper, we propose a semantic segmentation method that
frames (motion vectors and residual errors) in a video. Our
directly operates on compressed videos without extracting all frames.
proposed model uses a ConvLSTM model to capture the temporal
information in the video required for producing the semantic
segmentation on P-frames. Our experimental results show that
our method performs much faster than other alternatives while Existing works already explore the usability of compressed
achieveing similar performance in terms of accuracies. videos in computer vision tasks such as action recognition
[18] and object detection [17]. However, to the best of our
I. I NTRODUCTION knowledge, this is the first work on using compressed videos in
semantic segmentation. We propose a ConvLSTM model that
Semantic segmentation in videos is of crucial importance propagates the temporal information from I-frame to succeed-
for real-time application such as autonomous driving. Existing ing P/B-frames for semantic segmentation. Our experimental
approaches usually operate on a frame-by-frame basis. They results show that the proposed method performs either better or
first extract each frame as a regular RGB image, then apply on-par with standard frame-based methods. But the proposed
standard semantic segmentation models on this frame. These method can run at a much faster speed.
methods suffer from very high computational cost or low
speed. Typically, our video is compressed to 15 to 30 frames II. R ELATED WORK
per second (fps). However, according to the frame-by-frame A. Semantic Segmentation
model, it takes 0.17s to provide semantic segmentation on one
frame. For example, if the video is played at 30 fps and the The goal of semantic segmentation is to assign a label to
video length is 2 minutes, the frame-by-frame model will take each pixel in an image (see Fig 1). For semantic segmentation
10 minutes to provide semantic segmentation for the video. in images, there have been a lot of models that apply deep
As a result, they are not applicable to real-time semantic convolutional neural networks [6], [7], [16] for semantic seg-
segmentation scenarios such as self-driving. mentations. For example, methods such as FCN [10], dilated
Existing frame-by-frame approaches ignore the fact that convolutions [19] and SegNet [1] are widely used. To apply
videos usually come in compressed format for transmission semantic segmentation on videos, the most popular approach
and storage. In this paper, we propose a semantic segmentation is to extract each frame in the video as an image, then apply
method that directly operates on compressed videos. Working a standard image-based semantic segmentation algorithm to
directly with compressed videos provides several advantages. process each frame.
First of all, since we do not need to extract frames from a For semantic segmentation in videos, there is always a
video, our method can be much faster. Secondly, the com- trade-off between accuracy and efficiency. In order to obtain
pressed video directly provides the motion information that higher accuracy, A new method for dealing with the spatial
RGB images do not have. As a result, our method can directly and temporal features of video semantic segmentation was
take advantage of this information and consider temporal proposed in [5]. A pyramid scene parsing network was applied
information of a video clip. in [20] to acquire more accurate semantic segmentation. But
these methods requires a lot of computation time. A model
* Equal Contribution that only focuses on a single annotation object was proposed
978-1-7281-1817-8/19/$31.00 ©2019 IEEE in [3]. In order to reduce the computation time, a method based
Fig. 2. We divide the video by groups. Each group contains 1 RGB image of the I-frame and 11 P-frames represented by the motion vector and the residual
error. The processing of I-frames and P-frames are different: we first obtained a semantic segmentation of the I-frame based on ResNet. Then the information
of the I-frame is taken as the initial state of a ConvLSTM module, which also takes the information of each P-frame to update its hidden state. At each
time-step, the module produce a semantic segmentation prediction.

on clockwork driven by a fixed or adaptive clock signals was Following prior work [18], we divide frames in an entire
proposed in [9], [15]. video into several groups, while each group contains one I-
frame and several P-frames, represented by the collection {I,
B. Computer Vision with Compressed video
P1 , P2 , ... , PT }. The I-frame I is represented as a regular
Videos usually come in compressed format for transmis- RGB image, while each P-frame Pt only stores the difference
sion and storage. Several popular compression techniques with respect to the previous frame. Our model takes {I, P1 ,
are widely applied, including AVI [12], MPEG4 [8], FLV P2 , ... , PT } as the input. The desired output is the semantic
[13], and so on. Recently, there has been some work on segmentation of each image, regardless of the frame type. The
solving computer vision problems using compressed videos. semantic segmentation network is represented by fs (x), where
For example, [18] uses MPEG4 on action recognition and their x can be either a I-frame or a P-frame. Given the ground-truth
approach shows that operating on motion vectors and residual semantic segmentation masks, our learning objective function
errors in compressed videos is more efficient than traditional can be described below:
methods that operate on RGB frames. [17] combines com-
T
pressed video technology with LSTM to obtain spatial and X
L = Lce (GTI − fs (I)) + Lce (GTPt − fs (Pt )) (1)
temporal information on object detection problem. However,
t=1
as far as we know, there is no existing work on semantic
segmentation in compressed videos. where Lce is cross-entropy loss function, GTI is the ground-
truth semantic segmentation mask of the I-frame, and GTPt is
III. A PPROACH the ground-truth semantic segmentation mask of the P-frame
A. Overview Pt . Our goal is to learn a network that minimizes the loss
Videos are usually stored and transmitted in some com- function defined in Eq. 1.
pressed format, such as MPEG-4, H.264, etc. Most of the
video compression techniques use the fact that adjacent frames B. Semantic Segmentation for I-frame
in a video are often similar. As a result, we only need to In order to obtain the semantic segmentation of an I-
store a small number of frames (called I-frame) as regular frame, we use a standard encoder-decoder architecture for
images, while other frames (called P-frame) can be efficiently semantic segmentation (see Fig 2). An I-frame is represented
represented by only storing the difference between frames. as a regular RGB image tensor with three channels. Let
I ∈ RH×W ×3 be the image of the I-frame, where H × W is
the spatial size of the image. We use ResNet as the backbone
network to extract a feature map of the image denoted as
H W
z(I) ∈ R 32 × 32 ×c , where c is the number of channels of the
last convolutional layer of the feature extractor. We set c as
the number of classes in semantic segmentation. The spatial
size of z(I) is smaller than the original image I due to max-
pooling. In order to obtain the pixel-wise prediction at the
original image size, we apply an upsampling layer to enlarge
z(I) to have the same spatial size of the input image. We use
fs (I) ∈ RH×W ×c to denote the output of this upsampling
layer. We can interpret the c-dimensional vector at each pixel
location of fs (I) as the score of classifying the pixel to each
of the c classes. Fig. 3. The process of our network on P-frame when the timestep = t

C. Semantic Segmentation for P-frame

Since a P-frame is represened as the difference from the the 32 semantic classes. Most videos are collected by using a
previous frame, an P-frame by itself does not contain enough fixed-position CCTV-style camera taken from the observation
information for semantic segmentation. In order to segment of a driver in the car. The driving scenes increase the number
a P-frame, intuitively we should capture the temporal infor- and heterogeneity of the observed object classes. We use three
mation between this P-frame and the preceding I-frame. In videos from CamVid: seq06R0, seq01TP, and seq 05VD. The
this work, we apply a ConvLSTM module to accumulate the total number of frames in the 1436 group is 17,239, and each
information of previous frames (see Fig 3) that are needed for group contains 12 frames (1 I-frame and 11 P-frames), with 19
segmenting a particular P-frame at time t. semantic classes in the selected image. We divide the frame
Let Pt denote the P-frame at time t. A P-frame is repre- into 70% as training data (1005) and 30% (431) test data.
sented as the motion vector and the residual error (see Fig. 2). Cityscapes provides an image segmentation dataset in an self-
We can interpret the motion vector and the residual error as driving environment. It is used to evaluate the performance
two images. We apply two different CNNs to extract features of visual algorithms in the semantic understanding of urban
from these two images denoted as z1 (t) and z2 (t) (where scenes. Cityscapes contains 50 different scenes, different back-
H W
z1 , z2 ∈ R 32 × 32 ×c ), respectively. We then concatenate z1 and grounds, different seasons of streetscapes. The total number
z2 as the input to ConvLSTM at time t (see Fig. 3). of frames in the 960 group is 11,520 with 15fps and 19 class
The ConvLSTM module will process information starting numbers. We divided the frame into 70% as training data (672)
from the I-frame in the group. We set the initial hidden and 30% (288) test data.
state h(0) of ConvLSTM as the feature of the corresponding b) Ground-truth labels: The videos in our evaluation
I-frame, i.e. h(0) = z(I). For the P-frame P (t) at time datasets do not contain ground-truth labels for all frames. In
t, we simply take the aforementioned concatenated features order to get the ground-truth, we first decompress the video
cat(z1 (t), z2 (t)) (where cat represents the concatenation op- and extract all frames as regular RGB images. We then run
eration) as the input at t in ConvLSTM. ResNet [14] to obtain semantic segmentation maps for all
We consider the hidden state h(t) as the feature representa- frames based on their RGB images and use the predicted
tion of the information of the P-frame P (t). Since h(t) has ac- segmentation maps as the ground-truth.
cumulated the information of frames starting from an I-frame c) Evaluation metrics: We use the mean Intersection
that leads to P (t), h(t) has enough information for semantic Over Union (MeanIoU) and the pixel accuracy to measure the
segmentation of P (t). We use h(t) as the input and apply an performance of the semantic segmentation. We also measure
upsampling layer to obtain the semantic segmentation. the speed of the proposed approach during inference time.
d) Baselines: We consider the following baseline meth-
IV. E XPERIMENTS ods for comparison. First, we consider standard semantic
In this section, we first describe our experimental setup and segmentation models that operate on regular images, including
datasets in Section IV-A. We then present the experimental FCN-32s, FCN-8s, ResNet [5], [11]. Note that these baselines
results in Section IV-B. cannot directly handle compressed video format. They have
to extract each frame as a regular image in order to predict
A. Experimental Setup the semantic segmentation of this frame. Since there is no
a) Datasets: We evalute the performance of our approach existing work that directly produces semantic segmentation
on the Cambridge-driving labelled video database (CamVid) for compressed videos, we also define our own baseline as
dataset [2] and the Semantic Understanding of Urban Street follows. This baseline first produces the semantic segmentation
Scenes dataset (Cityscapes) dataset [4]. CamVid provides map on an I-frame. For remaining P-frames in the group, this
object-class semantic labels that assign each pixel to one of baseline simply uses the semantic segmentation map from this
Fig. 4. Speed and accuracy on CamVid, compared to FCN-32s, FCN-8s, and ResNet.

Network Pixel Accuracy MeanIoU CamVid

FCN-32s [5] 91% 46.1% Network Pixel Accuracy MeanIoU
FCN-8s [5] 92.6% 49.7% Baseline 89% 25%
ResNet [5] 95% 53% Ours 94% 51%
Ours 94% 51%
Cityscapes
TABLE I
E VALUATING P ERFORMANCE BETWEEN FCN, R ES N ET, AND O URS Network Pixel Accuracy MeanIoU
APPROACH FOR VIDEO SEMANTIC SEGMENTATION ON C AM V ID Baseline 80% 22%
Ours 87% 34%

Network Inference time (ms per frame) TABLE III

E VALUATING P ERFORMANCE BETWEEN BASELINE AND O URS APPROACH
FCN-32s 42.5 FOR VIDEO SEMANTIC SEGMENTATION ON C AM V ID AND C ITYSCAPES
FCN-8s 56
ResNet 168
Ours 17
TABLE II
E VALUATING INFERENCE TIME BETWEEN FCN, R ES N ET, AND O URS
temporal inforamtion required segmenting the P-frames. Our
APPROACH FOR VIDEO SEMANTIC SEGMENTATION ON C AM V ID experimental results show that the proposed method performs
on-par with frame-based methods in terms of accuracy. But our
method can perform at a much higher speed during inference
time. We believe our method can potentially be used in real-
I-frame as the prediction for each P-frame. time applications where the efficiency is crucial.
B. Results
VI. ACKNOWLEDGEMENT
We first compare different methods in terms of both their
This work was supported by NSERC. We thank NVIDIA
accuracy and inference speed using the CamVid dataset. The
for donating some of the GPUs used in this work.
comparisons are shown in Table I, Table II and Fig 4. Our
method achieves better performance than FCN-32s and FCN- R EFERENCES
8s [5] in terms of both accuracy and speed. Our method
performs comparably to ResNet in terms of MeanIoU and [1] V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep con-
volutional encoder-decoder architecture for image segmentation. IEEE
pixel accuracy, but our method is much faster. Transactions on Pattern Analysis and Machine Intelligence, 2017.
We also compare the performance between our method and [2] G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla. Segmentation
the baseline we have defined earlier in Table III. Our method and recognition using structure from motion point clouds. In European
Conference on Computer Vision, 2008.
achieves higher pixel accuracy and MeanIoU. [3] S. Caelles, K. Maninis, J. Pont-Tuset, L. Leal-Taixé, D. Cremers, and
L. V. Gool. One-shot video object segmentation. CoRR, 2016.
V. C ONCLUSION [4] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benen-
son, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for
We have proposed a new method for semantic segmentation semantic urban scene understanding. In IEEE Conference on Computer
in compressed videos. Our method does not require extracting Vision and Pattern Recognition, 2016.
each frame as an RGB image. Instead, it directly operates [5] M. Fayyaz, M. H. Saffar, M. Sabokrou, M. Fathy, F. Huang, and
R. Klette. Stfcn: spatio-temporal fully convolutional neural network
on the compressed video format consisting of I-frames and P- for semantic segmentation of street scenes. In Asian Conference on
frames. Our model uses a ConvLSTM model for capturing the Computer Vision, 2016.
[6] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for
image recognition. In IEEE Conference on Computer Vision and Pattern
Recognition, 2016.
[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification
with deep convolutional neural networks. In Advances in Neural
Information Processing Systems, 2012.
[8] D. J. LeGall. Mpeg (moving pictures expert group) video compression
algorithm: a review. In Image Processing Algorithms and Techniques II,
1991.
[9] Y. Li, J. Shi, and D. Lin. Low-latency video semantic segmentation.
CoRR, 2018.
[10] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for
semantic segmentation. In IEEE Conference on Computer Vision and
Pattern Recognition, 2015.
[11] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for
semantic segmentation. In IEEE Conference on Computer Vision and
Pattern Recognition, 2015.
[12] G. Maertens and K. Soroushian. Accurate and error resilient time
stamping method and/or apparatus for the audio-video interleaved (avi)
format, 2007.
[13] A. Mozo, M. Obien, C. Rigor, D. Rayel, K. Chua, and G. Tangonan.
Video steganography using flash video (flv). In IEEE Instrumentation
and Measurement Technology Conference, 2009.
[14] D. Pakhomov, V. Premachandran, M. Allan, M. Azizian, and N. Navab.
Deep residual learning for instrument segmentation in robotic surgery.
arXiv preprint arXiv:1703.08580, 2017.
[15] E. Shelhamer, K. Rakelly, J. Hoffman, and T. Darrell. Clockwork
convnets for video semantic segmentation. CoRR, 2016.
[16] K. Simonyan and A. Zisserman. Very deep convolutional networks for
large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[17] S. Wang, H. Lu, P. Dmitriev, and Z. Deng. Fast object detection in
compressed video. CoRR, 2018.
[18] C.-Y. Wu, M. Zaheer, H. Hu, R. Manmatha, A. J. Smola, and
P. Krähenbühl. Compressed video action recognition. In IEEE Con-
ference on Computer Vision and Pattern Recognition, 2018.
[19] F. Yu and V. Koltun. Multi-scale context aggregation by dilated
convolutions. arXiv preprint arXiv:1511.07122, 2015.
[20] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing net-
work. In IEEE conference on Computer Vision and Pattern Recognition,
2017.

Wiring Ecu Calya
100% (7)
Wiring Ecu Calya
22 pages
VA Lecture 28
No ratings yet
VA Lecture 28
20 pages
Video Summarization Using Fully Convolutional Sequence Networks
No ratings yet
Video Summarization Using Fully Convolutional Sequence Networks
17 pages
Blazingly Fast Seg
No ratings yet
Blazingly Fast Seg
10 pages
Smsnet: Semantic Motion Segmentation Using Deep Convolutional Neural Networks
No ratings yet
Smsnet: Semantic Motion Segmentation Using Deep Convolutional Neural Networks
8 pages
Learning Video Object Segmentation From Static Images
No ratings yet
Learning Video Object Segmentation From Static Images
10 pages
DL UNIT 5
No ratings yet
DL UNIT 5
63 pages
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
No ratings yet
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
11 pages
MIT DriveSeg Manual
No ratings yet
MIT DriveSeg Manual
5 pages
【SegFormer】NeurIPS 2021 Segformer Simple and Efficient Design for Semantic Segmentation With Transformers Paper
No ratings yet
【SegFormer】NeurIPS 2021 Segformer Simple and Efficient Design for Semantic Segmentation With Transformers Paper
14 pages
2301.03832v1
No ratings yet
2301.03832v1
29 pages
Video Summarization Using Deep Semantic Features
No ratings yet
Video Summarization Using Deep Semantic Features
16 pages
Swiftnet: Real-Time Video Object Segmentation
No ratings yet
Swiftnet: Real-Time Video Object Segmentation
10 pages
Every Frame Counts
No ratings yet
Every Frame Counts
8 pages
Image Segmentation Keras: Implementation of Segnet, FCN, Unet, Pspnet and Other Models in Keras
No ratings yet
Image Segmentation Keras: Implementation of Segnet, FCN, Unet, Pspnet and Other Models in Keras
5 pages
2021 ICPR FASSDNet
No ratings yet
2021 ICPR FASSDNet
8 pages
Flow-Edge Guided Unsupervised Video Object Segmentation
No ratings yet
Flow-Edge Guided Unsupervised Video Object Segmentation
12 pages
Semantic Segmentation by Using Down-Sampling and S
No ratings yet
Semantic Segmentation by Using Down-Sampling and S
14 pages
An Efficient Transformer-Based System For Text-Based Video Segment Retrieval Using FAISS
No ratings yet
An Efficient Transformer-Based System For Text-Based Video Segment Retrieval Using FAISS
4 pages
DL UNIt-III
No ratings yet
DL UNIt-III
21 pages
Zhang_Semantic_Segmentation_by_Early_Region_Proxy_CVPR_2022_paper
No ratings yet
Zhang_Semantic_Segmentation_by_Early_Region_Proxy_CVPR_2022_paper
11 pages
Li et al. - 2024 - OMG-Seg Is One Model Good Enough For All Segmentation
No ratings yet
Li et al. - 2024 - OMG-Seg Is One Model Good Enough For All Segmentation
14 pages
W-Net A Deep Model For Fully Unsupervised Image Segmentation
No ratings yet
W-Net A Deep Model For Fully Unsupervised Image Segmentation
13 pages
Asynchronous Semantic Background Subtraction
No ratings yet
Asynchronous Semantic Background Subtraction
20 pages
Collaborative Video Object Segmentation by Foreground-Background Integration
No ratings yet
Collaborative Video Object Segmentation by Foreground-Background Integration
17 pages
SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation - A Synthetic Dataset and Baselines
No ratings yet
SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation - A Synthetic Dataset and Baselines
11 pages
Optimisation of Semantic Segmentation Algorithm For Autonomous Driving Using U-NET Architecture
No ratings yet
Optimisation of Semantic Segmentation Algorithm For Autonomous Driving Using U-NET Architecture
16 pages
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
No ratings yet
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
20 pages
Strudel Transformer Segmentation
No ratings yet
Strudel Transformer Segmentation
17 pages
Seggpt Paper
No ratings yet
Seggpt Paper
12 pages
S IC: Unleashing The Emergent Correspondence For In-Context Segmentation
No ratings yet
S IC: Unleashing The Emergent Correspondence For In-Context Segmentation
11 pages
csvt11_preprint
No ratings yet
csvt11_preprint
14 pages
Wang End-to-End Video Instance Segmentation With Transformers CVPR 2021 Paper PDF
No ratings yet
Wang End-to-End Video Instance Segmentation With Transformers CVPR 2021 Paper PDF
10 pages
Image/Video Segmentation Using Bodypix Model: Department of Computer St. John College of Engineering and Management
No ratings yet
Image/Video Segmentation Using Bodypix Model: Department of Computer St. John College of Engineering and Management
5 pages
RA ASPP Deeplab
No ratings yet
RA ASPP Deeplab
12 pages
Zhuang Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction CVPR 2022 Paper
No ratings yet
Zhuang Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction CVPR 2022 Paper
9 pages
20 Tpami Space Time
No ratings yet
20 Tpami Space Time
14 pages
20PWMCT0732 Ass#3
No ratings yet
20PWMCT0732 Ass#3
8 pages
1-2024-arxiv- LLM-Seg：连接图像分割和大型语言模型推理
No ratings yet
1-2024-arxiv- LLM-Seg：连接图像分割和大型语言模型推理
10 pages
Localizing Text On Videos
No ratings yet
Localizing Text On Videos
13 pages
Lec+2(+Image+Segemnation)
No ratings yet
Lec+2(+Image+Segemnation)
52 pages
Image_Segmentation_DeepLearning
No ratings yet
Image_Segmentation_DeepLearning
18 pages
A Survey On Deep Learning Techniques For Image and Video Semantic Segmentation
No ratings yet
A Survey On Deep Learning Techniques For Image and Video Semantic Segmentation
68 pages
7. Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes
No ratings yet
7. Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes
12 pages
Real-Time Foreground Segmentation and Boundary Matting For Live Videos Using SVM Technique
No ratings yet
Real-Time Foreground Segmentation and Boundary Matting For Live Videos Using SVM Technique
5 pages
Semantic Segmentation Architecture: A Key Part of Scene Understanding Applications
No ratings yet
Semantic Segmentation Architecture: A Key Part of Scene Understanding Applications
9 pages
6 Segnet
No ratings yet
6 Segnet
14 pages
Efficient Lightweight Residual Network For Real-Time Road Semantic Segmentation
No ratings yet
Efficient Lightweight Residual Network For Real-Time Road Semantic Segmentation
8 pages
Semantic Segmentation: Tingwu Wang Machine Learning Group, University of Toronto
No ratings yet
Semantic Segmentation: Tingwu Wang Machine Learning Group, University of Toronto
28 pages
2302.12242v2
No ratings yet
2302.12242v2
12 pages
1-s2.0-S026288562300197X-main
No ratings yet
1-s2.0-S026288562300197X-main
11 pages
Overview of semantic segmentation
No ratings yet
Overview of semantic segmentation
20 pages
chandra-cvpr-2018
No ratings yet
chandra-cvpr-2018
10 pages
Semantic Segmentation
No ratings yet
Semantic Segmentation
22 pages
5.5.2 Video To Text With LSTM Models
No ratings yet
5.5.2 Video To Text With LSTM Models
10 pages
CLIP As RNN: Segment Countless Visual Concepts Without Training Endeavor
No ratings yet
CLIP As RNN: Segment Countless Visual Concepts Without Training Endeavor
18 pages
Final
No ratings yet
Final
31 pages
Wu_Image-Text_Co-Decomposition_for_Text-Supervised_Semantic_Segmentation_CVPR_2024_paper
No ratings yet
Wu_Image-Text_Co-Decomposition_for_Text-Supervised_Semantic_Segmentation_CVPR_2024_paper
10 pages
Lecture Sematic-Segmentation
No ratings yet
Lecture Sematic-Segmentation
23 pages
Motion Estimation: Advancements and Applications in Computer Vision
From Everand
Motion Estimation: Advancements and Applications in Computer Vision
Fouad Sabry
No ratings yet
Joint Photographic Experts Group: Unlocking the Power of Visual Data with the JPEG Standard
From Everand
Joint Photographic Experts Group: Unlocking the Power of Visual Data with the JPEG Standard
Fouad Sabry
No ratings yet
Iso 19250 2010 en PDF
No ratings yet
Iso 19250 2010 en PDF
11 pages
The Law On Obligations and Contracts
100% (1)
The Law On Obligations and Contracts
85 pages
By Xan
No ratings yet
By Xan
9 pages
MGNM838 Ca2
No ratings yet
MGNM838 Ca2
33 pages
Vertimax Phase 2 Workout
No ratings yet
Vertimax Phase 2 Workout
1 page
Importance of Distance Education in India
No ratings yet
Importance of Distance Education in India
7 pages
Test 5 - Edited
100% (1)
Test 5 - Edited
11 pages
Capstone Framework
No ratings yet
Capstone Framework
25 pages
MODULE 4 - BA-AC101 Accounting Principles
No ratings yet
MODULE 4 - BA-AC101 Accounting Principles
15 pages
Project On Indian Fedralism
100% (1)
Project On Indian Fedralism
14 pages
Rural Retailing and E Tailing-Rural Retailing Emerging Trends and Careers in Retail Industry
100% (1)
Rural Retailing and E Tailing-Rural Retailing Emerging Trends and Careers in Retail Industry
49 pages
Individual Assignment
No ratings yet
Individual Assignment
2 pages
CE 441 - Lec04 - Footings On Sand
No ratings yet
CE 441 - Lec04 - Footings On Sand
42 pages
Jotun - Penguard FC (Second Coat)
No ratings yet
Jotun - Penguard FC (Second Coat)
5 pages
Hyrdo-Point Rainfall Measurement
No ratings yet
Hyrdo-Point Rainfall Measurement
40 pages
Ronald L Snell - Stanley Kurtz - Jonathan Marr - Fundamentals of Radio Astronomy - Astrophysics-CRC Press (2019)
No ratings yet
Ronald L Snell - Stanley Kurtz - Jonathan Marr - Fundamentals of Radio Astronomy - Astrophysics-CRC Press (2019)
361 pages
Squat Standards For Men and Women (KG) - Strength Level
No ratings yet
Squat Standards For Men and Women (KG) - Strength Level
2 pages
Paradise Datacom 136 LNB Redundant LNB Systems
No ratings yet
Paradise Datacom 136 LNB Redundant LNB Systems
10 pages
Tutorial 3 Purchases - Sales Cycle (Q)
No ratings yet
Tutorial 3 Purchases - Sales Cycle (Q)
14 pages
Analysis of The Gait Characteristics and Usability After Wearable Exoskeleton Robot Gait Training
No ratings yet
Analysis of The Gait Characteristics and Usability After Wearable Exoskeleton Robot Gait Training
10 pages
13Th Leiden - Sarin International Air Law Moot Court Competition, 2022
No ratings yet
13Th Leiden - Sarin International Air Law Moot Court Competition, 2022
35 pages
View Invoice Receipt
No ratings yet
View Invoice Receipt
1 page
Cambridge International Examinations: Physics 5054/11 May/June 2017
No ratings yet
Cambridge International Examinations: Physics 5054/11 May/June 2017
3 pages
8 - People vs. Sandiganbayan
No ratings yet
8 - People vs. Sandiganbayan
2 pages
Fluid Mechanics 2
No ratings yet
Fluid Mechanics 2
33 pages
Week 15 All Combined
No ratings yet
Week 15 All Combined
85 pages
YUM FSA FSSC 22000 Comparison Document
No ratings yet
YUM FSA FSSC 22000 Comparison Document
6 pages
Part-3 Lalit Narayan Mithila University, Darbhanga
100% (1)
Part-3 Lalit Narayan Mithila University, Darbhanga
1 page
FGO Calculator v1.37
No ratings yet
FGO Calculator v1.37
11 pages

SemanticVideo_compress_2019

Uploaded by

SemanticVideo_compress_2019

Uploaded by

Semantic Segmentation in Compressed Videos

Ang Li* Yiwei Lu* Yang Wang

Abstract—Existing approaches for semantic segmentation in

C. Semantic Segmentation for P-frame

Network Pixel Accuracy MeanIoU CamVid

Network Inference time (ms per frame) TABLE III

You might also like