Deep learning-based techniques for video enhancement, compression and restoration
Deep learning-based techniques for video enhancement, compression and restoration
Corresponding Author:
Redouane Lhiadi
National School of business and Management, University of Mohammed 1st
Oujda, Morocco
Email: [email protected]
1. INTRODUCTION
The advent of deep learning has revolutionized video restoration by enabling the development of
sophisticated models capable of understanding complex data relationships and achieving superior results.
Convolutional neural networks (CNNs) and attention mechanisms are at the forefront of these advancements,
addressing various aspects of video quality, including resolution enhancement, sharpness improvement, and
noise reduction [1], [2]. In contrast, traditional video restoration techniques, which rely on heuristic-based
methods and manually crafted features, often struggle to effectively manage intricate degradation patterns
and compression artifacts [3]. Deep learning models, leveraging CNNs, excel at capturing hierarchical
representations and enhancing video quality by providing translation invariance and robust pattern
recognition [4], [5]. Figure 1 illustrates the traditional video compression process, outlining its key
components and workflow. This visual representation highlights the limitations and challenges of
conventional techniques, particularly in managing compression artifacts and degradation patterns.
Despite significant advancements, notable gaps remain in previous research. For example, while
some studies have explored the impact of compression artifacts on video quality [4], there has been limited
focus on how advanced restoration techniques influence the effectiveness of compression models. Previous
work has often concentrated on either restoration or compression, with a comprehensive framework
integrating both aspects being notably absent. Furthermore, the growing demand for high-quality digital
video content has heightened the need for real-time application of advanced restoration models in fields such
as video streaming, surveillance, and digital cinema [5], [6].
This paper aims to fill these voids by introducing an innovative framework that combines
cutting-edge restoration and compression techniques. This research enhances video quality and reduces
compression artifacts by using advanced models like super-resolution, deblurring, denoising, and frame
interpolation in combination with the libx265 compression codec. Our method enhances video quality and
accuracy while also providing real-time processing features, making it ideal for various uses.
2. MOTIVATION
Conventional video restoration techniques face significant challenges in managing compression
artifacts and enhancing visual quality. Traditional methods, which often rely on heuristic approaches and
manually crafted features, struggle to address the complex degradation patterns introduced during video
compression. Recognizing these limitations, this research introduces an innovative video restoration pipeline
that leverages the strengths of deep learning models and cutting-edge compression algorithms.
Our proposed pipeline integrates advanced deep learning techniques, including super-resolution,
deblurring, and denoising, with a high-performance compression algorithm, specifically the libx265 codec
[5]. This integration begins with compressing the input video frames using libx265, which effectively reduces
bitrate and storage requirements. Subsequently, the compressed frames are processed through our video
restoration module, where pretrained deep learning models address artifacts and enhance video quality.
Figure 2 provides a visual representation of the traditional video restoration workflow, outlining its processes
and inherent limitations. This illustration serves as a foundation for understanding how our approach
improves upon conventional methods. By combining advanced restoration models with cutting-edge
compression techniques, our pipeline aims to significantly enhance visual fidelity and perceptual quality.
Moreover, our framework is designed to be adaptable and scalable, making it suitable for diverse video
processing applications, including video streaming, surveillance, and digital entertainment [4].
The collaboration between deep learning-based restoration models and efficient compression algorithms
offers promising advancements in video quality enhancement, addressing both current limitations and future
needs in the field.
Deep learning-based techniques for video enhancement, compression and restoration (Redouane Lhiadi)
1520 ISSN: 2252-8938
3. RELATED WORK
Recently, there have been notable developments in methods for compressing images.
Convolutional autoencoders [5] show potential for effective compression with preserved image quality.
Furthermore, compression techniques that are optimized from one end to another and use transforms based
on frequency have shown better results in reducing bitrate without compromising perceptual quality.
Assessing compression algorithms frequently includes subjective quality evaluations [6], which reveal
important information about the perceived quality of compressed videos. Deep learning techniques [7] are
now being used effectively for image compression by utilizing end-to-end learning to enhance compression
performance. Super-resolution techniques in video processing have become popular for improving the
resolution of video sequences in real-time applications [8]. Transformer-based techniques such as SwinIR
have displayed impressive outcomes in image enhancement duties like super-resolution and denoising.
Recent progress in video super-resolution has been concentrated on enhancing feature propagation and
alignment techniques, leading to improved performance in video super-resolution assignments [8].
Basic research on necessary elements for improving video quality [8] has offered important understanding of
the crucial aspects that impact model effectiveness. Substantial advancements have been achieved in the area
of video deblurring techniques, specifically by utilizing cascaded deep learning methods that exploit temporal
data to improve deblurring efficiency [8]. Deep learning techniques have been applied to video deblurring
with a focus on reducing motion blur artifacts, which leads to enhanced visual quality in handheld video
recordings.
Methods such as enhanced deformable video restoration (EDVR) have effectively utilized enhanced
deformable convolutional networks to produce remarkable outcomes in different video restoration tasks like
super-resolution or deblurring. Moreover, existing video deblurring techniques [8] have incorporated blur-
invariant motion estimation methods to improve deblurring algorithm effectiveness. To understand the
approach described in this section, and to illustrate the processes involved in deblurring, Figure 3 presents a
visual depiction of the flow and key stages necessary for understanding the deblurring technique.
Deblurring algorithm:
𝑓 = 𝑔 ∗ 𝑝 + 𝑛
4. METHOD
4.1. Data aqcuasition and preprocessing
In order to collect the necessary video data for our experiments, we employed a Python script that
makes use of the FFmpeg library. The script is designed to work with dynamic video datasets, including the
"your own video", and it extracts single frames at a steady frame rate of 15 frames per second.
This frame rate guarantees extensive coverage of content and resolutions, which in turn enables thorough
testing of our hybrid compression and restoration approach [9].
4.3.3. Reconstruction
The HQ frames are reconstructed through the combination of shallow and deep features.
Global residual learning streamlines the process of feature learning by predicting solely the difference
between the bilinearly upsampled LQ sequence and the actual HQ sequence. The reconstruction modules
differ based on the specific restoration tasks; for instance, sub-pixel convolution layers are employed for
video super-resolution, whereas a single convolution layer is adequate for video deblurring.
2
𝐿 = √(𝐼𝑅𝐻𝑄 − 𝐼𝐻𝑄 ) + 𝑒 2
IRHQ stands for the reconstructed HQ sequence, while IHQ is the ground-truth HQ sequence, with being a
small constant typically set to 10−3, to prevent division by zero.
Deep learning-based techniques for video enhancement, compression and restoration (Redouane Lhiadi)
1522 ISSN: 2252-8938
QR = XR · PQ , KS = XS · PK , VS = XS · PV
Where PQ, PK, and PV represent projection matrices. The computation of the attention map A is as follows:
𝑄𝑅 𝐾𝑆𝑇
𝐴 = 𝑆𝑜𝑓𝑡𝑀𝑎𝑥 ( )
√𝐷
𝑄𝑅 𝐾𝑆𝑇
𝑀𝐴(𝑄𝑅 , 𝐾𝑆 , 𝑉𝑆 ) = 𝑆𝑜𝑓𝑡𝑀𝑎𝑥 ( ) 𝑉𝑆
√𝐷
The PSNR and SSIM metrics provide insights into the visual quality of the compressed frame
compared to the original. The calculations for these metrics reveal that the compression process maintains a
high level of visual fidelity despite the reduction in file size. Table 1 illustrates that our approach
demonstrates substantial improvements across key metrics, with a notable increase in PSNR (+1.4 dB) and
enhancements in SSIM and MS-SSIM by +0.12 on average. Although our bitrate reduction is slightly less
than that of previous methods, the overall gains in visual quality are significant.
This graph as shown in Figure 6 provides a clear and comprehensive visual comparison of the
performance of various video compression methods:
− The libx265 model achieves the best results in terms of PSNR, SSIM, and MS-SSIM, while maintaining
a relatively low BIT RATE.
− The increase of +1.4 dB in PSNR compared to the previous method is clearly visible, as are the
improvements in SSIM and MS-SSIM.
− This highlights the effectiveness of our approach in enhancing visual quality, despite a slight increase in
BIT RATE compared to other methods.
Deep learning-based techniques for video enhancement, compression and restoration (Redouane Lhiadi)
1524 ISSN: 2252-8938
Deep learning-based techniques for video enhancement, compression and restoration (Redouane Lhiadi)
1526 ISSN: 2252-8938
Deep learning-based techniques for video enhancement, compression and restoration (Redouane Lhiadi)
1528 ISSN: 2252-8938
6. CONCLUSION
In summary, our research presents a comprehensive framework for enhancing video quality by
integrating advanced deep learning techniques to address compression artifacts. The proposed system
incorporates models for super-resolution, deblurring, denoising, and frame interpolation, demonstrating
significant improvements in visual appearance and perceived quality. Our approach successfully combines
the libx265 compression codec with the VRT, effectively enhancing video quality across various metrics,
including PSNR and SSIM. By utilizing HEVC-based compression with a CRF value and downscaling video
resolution, we manage to reduce the bitrate while preserving perceptually relevant information. This
framework not only advances existing video restoration methods but also shows considerable promise for
real-world applications in fields such as entertainment, surveillance, and digital cinema. Future work will
focus on integrating more sophisticated compression models to further enhance video quality and exploring
novel compression techniques that reduce file size without compromising visual integrity. Incorporating
hardware acceleration techniques such as graphics processing units (GPUs) or field programmable gate
arrays (FPGA) could significantly speed up the restoration process, enabling real-time applications and
broadening the framework's relevance across various domains.
REFERENCES
[1] A. Kappeler, S. Yoo, Q. Dai, and A. K. Katsaggelos, “Video super-resolution with convolutional neural networks,” IEEE
Transactions on Computational Imaging, vol. 2, no. 2, pp. 109–122, Jun. 2016, doi: 10.1109/TCI.2016.2532323.
[2] Y. Jo, S. W. Oh, J. Kang, and S. J. Kim, “Deep video super-resolution network using dynamic upsampling filters without explicit
motion compensation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 3224–3232,
doi: 10.1109/CVPR.2018.00340.
[3] S. Islam, A. Dash, A. Seum, A. H. Raj, T. Hossain, and F. M. Shah, “Exploring video captioning techniques: A comprehensive
survey on deep learning methods,” SN Computer Science, vol. 2, no. 2, Apr. 2021, doi: 10.1007/s42979-021-00487-x.
[4] O. Wiles, J. Carreira, I. Barr, A. Zisserman, and M. Malinowski, “Compressed vision for efficient video understanding,” in
Computer Vision – ACCV 2022, 2023, pp. 679–695, doi: 10.1007/978-3-031-26293-7_40.
[5] D. Alexandre and H.-M. Hang, “Learned video codec with enriched reconstruction for CLIC P-frame coding,” Computer Vision
and Pattern Recognition, Dec. 2020.
[6] Y. Tian et al., “Self-conditioned probabilistic learning of video rescaling,” in 2021 IEEE/CVF International Conference on
Computer Vision (ICCV), Oct. 2021, pp. 4470–4479, doi: 10.1109/ICCV48922.2021.00445.
[7] M. Gorji, E. Hafezieh, and A. Tavakoli, “Advancing image deblurring performance with combined autoencoder and customized
hidden layers,” Tuijin Jishu/Journal of Propulsion Technology, vol. 44, no. 4, pp. 6462–6467, Oct. 2023, doi:
10.52783/tjjpt.v44.i4.2283.
[8] S. Yadav, C. Jain, and A. Chugh, “Evaluation of image deblurring techniques,” International Journal of Computer Applications,
vol. 139, no. 12, pp. 32–36, Apr. 2016, doi: 10.5120/ijca2016909492.
[9] K. Purohit, A. Shah, and A. N. Rajagopalan, “Bringing alive blurred moments,” Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, vol. 2019, pp. 6823–6832, Apr. 2019, doi: 10.1109/CVPR.2019.00699.
[10] G. A. Farulla, M. Indaco, D. Rolfo, L. O. Russo, and P. Trotta, “Evaluation of image deblurring algorithms for real-time
applications,” in 2014 9th IEEE International Conference on Design & Technology of Integrated Systems in Nanoscale Era
(DTIS), May 2014, pp. 1–6, doi: 10.1109/DTIS.2014.6850668.
[11] O. N. Gerek and Y. Altunbasak, “Key frame selection from MPEG video data,” in Visual Communications and Image Processing
’97, Jan. 1997, vol. 3024, pp. 920–925, doi: 10.1117/12.263304.
[12] M. Uhrina, J. Bienik, and M. Vaculik, “Subjective video quality assessment of H.265 compression standard for full HD
resolution,” Advances in Electrical and Electronic Engineering, vol. 13, no. 5, pp. 545–551, Dec. 2015, doi:
10.15598/aeee.v13i5.1503.
[13] M. M. Awad and N. N. Khamiss, “Low latency UHD adaptive video bitrate streaming based on HEVC encoder configurations
and Http2 protocol,” Iraqi Journal of Science, pp. 1836–1847, Apr. 2022, doi: 10.24996/ijs.2022.63.4.40.
[14] D. Watni and S. Chawla, “Enhancing embedding capacity of JPEG images in smartphones by selection of suitable cover image,”
in ICDSMLA 2019, vol. 601, 2020, pp. 211–220, doi: 10.1007/978-981-15-1420-3_22.
[15] Q. Meng, S. Zhao, Z. Huang, and F. Zhou, “MagFace: A universal representation for face recognition and quality assessment,” i n
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp. 14220–
14229, doi: 10.1109/CVPR46437.2021.01400.
[16] F. Kong and R. Henao, “Efficient classification of very large images with tiny objects,” in 2022 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), Jun. 2022, pp. 2374–2384, doi: 10.1109/CVPR52688.2022.00242.
[17] D. Smirnov and J. Solomon, “HodgeNet: learning spectral geometry on triangle meshes,” ACM Transactions on Graphics,
vol. 40, no. 4, pp. 1–11, Aug. 2021, doi: 10.1145/3450626.3459797.
[18] J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “SwinIR: image restoration using swin transformer,” in 2021
IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Oct. 2021, pp. 1833–1844, doi:
10.1109/ICCVW54120.2021.00210.
[19] L. Tran, F. Liu, and X. Liu, “Towards high-fidelity nonlinear 3D face morphable model,” in 2019 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), Jun. 2019, pp. 1126–1135, doi: 10.1109/CVPR.2019.00122.
[20] S. Niklaus, L. Mai, and F. Liu, “Video frame interpolation via adaptive separable convolution,” in 2017 IEEE International
Conference on Computer Vision (ICCV), Oct. 2017, pp. 261–270, doi: 10.1109/ICCV.2017.37.
[21] K. C. K. Chan, X. Wang, K. Yu, C. Dong, and C. C. Loy, “BasicVSR: The search for essential components in video super-
resolution and beyond,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2021,
pp. 4945–4954, doi: 10.1109/CVPR46437.2021.00491.
[22] S. Nah, S. Son, and K. M. Lee, “Recurrent neural networks with intra-frame iterations for video deblurring,” in 2019 IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019, pp. 8094–8103, doi: 10.1109/CVPR.2019.00829.
[23] V. Sharma, M. Gupta, A. Kumar, and D. Mishra, “Video processing using deep learning techniques: a systematic literature
review,” IEEE Access, vol. 9, pp. 139489–139507, 2021, doi: 10.1109/ACCESS.2021.3118541.
[24] H. Jiang, D. Sun, V. Jampani, M.-H. Yang, E. Learned-Miller, and J. Kautz, “Super SloMo: high quality estimation of multiple
intermediate frames for video interpolation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun.
2018, pp. 9000–9008, doi: 10.1109/CVPR.2018.00938.
[25] J. Dong, K. Ota, and M. Dong, “Video frame interpolation: a comprehensive survey,” ACM Transactions on Multimedia
Computing, Communications, and Applications, vol. 19, no. 2s, pp. 1–31, Apr. 2023, doi: 10.1145/3556544.
[26] F. Reda et al., “Unsupervised video interpolation using cycle consistency,” in 2019 IEEE/CVF International Conference on
Computer Vision (ICCV), Oct. 2019, pp. 892–900, doi: 10.1109/ICCV.2019.00098.
[27] H. Chen, M. Teng, B. Shi, Y. Wang, and T. Huang, “A residual learning approach to deblur and generate high frame rate video
with an event camera,” IEEE Transactions on Multimedia, vol. 25, pp. 5826–5839, 2023, doi: 10.1109/TMM.2022.3199556.
[28] Z. Huang, T. Zhang, W. Heng, B. Shi, and S. Zhou, “Real-time intermediate flow estimation for video frame interpolation,” in
Computer Vision – ECCV 2022, pp. 624–642, doi: 10.1007/978-3-031-19781-9_36.
Deep learning-based techniques for video enhancement, compression and restoration (Redouane Lhiadi)
1530 ISSN: 2252-8938
BIOGRAPHIES OF AUTHORS