TY - GEN
T1 - Video Colorization Based on a Diffusion Model Implementation
AU - Stival, Leandro
AU - da Silva Torres, Ricardo
AU - Pedrini, Helio
PY - 2024
Y1 - 2024
N2 - Cutting-edge techniques are being employed by researchers to develop algorithms that have the capability to automatically add color to black-and-white videos. This advancement has the potential to revolutionize our experience of historical films and provide filmmakers and video producers with a powerful new tool. These algorithms employ sophisticated deep neural networks to analyze images, identifying patterns and offering a promising avenue for extracting meaning and insights from visual data in the field of computer vision. Although current studies primarily focus on image colorization, there is a noticeable gap when it comes to videos and movies in the realm of deep machine learning techniques. Our investigation aims to bridge this gap and demonstrate that the image colorization techniques used today can also be effectively applied to videos and match the current state of the art presented at NTIRE 2023 video colorization challenge. We explored the application of diffusion models, which have gained popularity due to their ability to generate images and text. Our implementation involves utilizing a diffusion model to introduce noise in the frames, while a U-Net with self-attention layers predicts the denoised frames, thereby predicting the color of the video frames. For training purposes, we utilized the DAVIS and LDV datasets. When comparing the colorized frames with the ground truth in the test set, we observed promising results under several quality metrics, such as PSNR, SSIM, FID, and CDC.
AB - Cutting-edge techniques are being employed by researchers to develop algorithms that have the capability to automatically add color to black-and-white videos. This advancement has the potential to revolutionize our experience of historical films and provide filmmakers and video producers with a powerful new tool. These algorithms employ sophisticated deep neural networks to analyze images, identifying patterns and offering a promising avenue for extracting meaning and insights from visual data in the field of computer vision. Although current studies primarily focus on image colorization, there is a noticeable gap when it comes to videos and movies in the realm of deep machine learning techniques. Our investigation aims to bridge this gap and demonstrate that the image colorization techniques used today can also be effectively applied to videos and match the current state of the art presented at NTIRE 2023 video colorization challenge. We explored the application of diffusion models, which have gained popularity due to their ability to generate images and text. Our implementation involves utilizing a diffusion model to introduce noise in the frames, while a U-Net with self-attention layers predicts the denoised frames, thereby predicting the color of the video frames. For training purposes, we utilized the DAVIS and LDV datasets. When comparing the colorized frames with the ground truth in the test set, we observed promising results under several quality metrics, such as PSNR, SSIM, FID, and CDC.
KW - Deep learning diffusion models
KW - Evaluation metrics
KW - Video colorization
U2 - 10.1007/978-3-031-66329-1_10
DO - 10.1007/978-3-031-66329-1_10
M3 - Conference paper
AN - SCOPUS:85200950700
SN - 9783031663284
VL - 1
T3 - Lecture Notes in Networks and Systems
SP - 117
EP - 131
BT - Intelligent Systems and Applications
A2 - Arai, Kohei
PB - Springer
T2 - Intelligent Systems Conference, IntelliSys 2024
Y2 - 5 September 2024 through 6 September 2024
ER -