review date: 2019/07/26 (by Meyong-Gyu.LEE @Soongsil Univ.)
Eng+Kor review of 'Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder' (Siggraph 2017)
(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...MYEONGGYU LEE
Korean Paper Review of "Abnormal Event Detection in Videos using Generative Adversarial Nets"
(Review date: 2021.05.17 @ Soongsil Univ. Cognitive Science Class)
Survey of Super Resolution Task (SISR Only)MYEONGGYU LEE
The document summarizes the landscape of super resolution tasks. It begins with an introduction to super resolution, including defining the problem and common algorithm approaches. It then provides a taxonomy of super resolution models including how they approach upsampling and network architecture types. Finally, it reviews some seminal papers in single image super resolution including SRCNN, FSRCNN, VDSR, SRResNet, and SRGAN.
(Paper Review)U-GAT-IT: unsupervised generative attentional networks with ada...MYEONGGYU LEE
The document introduces a new paper titled "U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation". It proposes a model that uses attention mechanisms to focus on discriminative regions between domains. A new normalization method called AdaLIN is also introduced to flexibly control the degree of shape and texture transformation without changing the model structure or hyperparameters. The model aims to learn mapping functions between unpaired source and target domains for tasks like selfie to anime image translation.
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)Shunta Saito
Pyramid Scene Parsing Network introduces the Pyramid Pooling Module to improve semantic segmentation. The module captures context at different regions and scales by performing average pooling at different pyramid levels on the final convolutional feature map. Experiments on ADE20K and PASCAL VOC datasets show the Pyramid Pooling Module improves mean Intersection-over-Union by over 4% compared to global average pooling, achieving state-of-the-art performance.
Learning visual representation without human labelKai-Wen Zhao
Self supervised learning (SSL) is one of the most fast-growing research topic in recent years. SSL provides algorithm that directly learn visual representation from data itself rather than human manual labels. From theoretical point of view, SSL explores information theory & the nature of large scale dataset.
Visual odometry & slam utilizing indoor structured environmentsNAVER Engineering
Visual odometry (VO) and simultaneous localization and mapping (SLAM) are fundamental building blocks for various applications from autonomous vehicles to virtual and augmented reality (VR/AR).
To improve the accuracy and robustness of the VO & SLAM approaches, we exploit multiple lines and orthogonal planar features, such as walls, floors, and ceilings, common in man-made indoor environments.
We demonstrate the effectiveness of the proposed VO & SLAM algorithms through an extensive evaluation on a variety of RGB-D datasets and compare with other state-of-the-art methods.
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...Joonhyung Lee
A presentation introducting DeepLab V3+, the state-of-the-art architecture for semantic segmentation. It also includes detailed descriptions of how 2D multi-channel convolutions function, as well as giving a detailed explanation of depth-wise separable convolutions.
Fast R-CNN is a method that improves object detection speed and accuracy over previous methods like R-CNN and SPPnet. It uses a region of interest pooling layer and multi-task loss to jointly train a convolutional neural network for classification and bounding box regression in a single stage of training. This allows the entire network to be fine-tuned end-to-end for object detection, resulting in faster training and testing compared to previous methods while achieving state-of-the-art accuracy on standard datasets. Specifically, Fast R-CNN trains 9x faster than R-CNN and runs 200x faster at test time.
(Research Note) Delving deeper into convolutional neural networks for camera ...Jacky Liu
This document summarizes a research paper on improving camera relocalization using convolutional neural networks. The key contributions are: 1) Developing a new orientation representation called Euler6 to solve issues with quaternion representations, 2) Performing pose synthesis to augment training data and address overfitting on sparse poses, and 3) Proposing a branching multi-task CNN called BranchNet to separately regress orientation and translation while sharing lower level features. Experiments on a benchmark dataset show the techniques reduce relocalization error compared to prior methods.
The document summarizes the Batch Normalization technique presented in the paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". Batch Normalization aims to address the issue of internal covariate shift in deep neural networks by normalizing layer inputs to have zero mean and unit variance. It works by computing normalization statistics for each mini-batch and applying them to the inputs. This helps in faster and more stable training of deep networks by reducing the distribution shift across layers. The paper presented ablation studies on MNIST and ImageNet datasets showing Batch Normalization improves training speed and accuracy compared to prior techniques.
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis taeseon ryu
해당 논문은 3D Aware 모델입니다 StyleGAN 같은 경우에는 어떤 하나의 피처에 대해서 Editing 하고 싶을 때 입력에 해당하는 레이턴트 백터를 찾아서 레이턴트 백터를 수정함으로써 입에 해당하는 피쳐를 바꿀 수 있었는데 이런 컨셉을 그대로 착안해서
GAN 스페이스 논문에서는 인풋이 들어왔을 때 어떤 공간적인 정보까지도 에디팅하려고 시도했습니다 결과를 봤을 때 로테이션 정보가 어느 정도 잘 학습된 것 같지만 같은 사람이 아닌 것 같이 인식되기도 합니다 이러한 문제를 이제 disentangle 되지 않았다라고 하는 게 원하는 피처만 변화시켜야 되는 것과 달리 다른 피처까지도 모두 학습 모두 변했다는 것인데 이를 좀 더 효율적으로 3D를 더 잘 이해시키기 위해서 탄생한 논문입니다.
Kernel Estimation of Videodeblurringalgorithm and Motion Compensation of Resi...IJERA Editor
This paper presents a videodeblurring algorithm utilizing the high resolution information of adjacent unblurredframes.First, two motion-compensated predictors of a blurred frame are derived from its neighboring unblurred frames via bidirectional motion compensation. Then, an accurate blur kernel, which is difficult to directly obtain from the blurred frame itself, is computed between the predictors and the blurred frame. Next, a residual deconvolution is employed to reduce the ringing artifacts inherently caused by conventional deconvolution. The blur kernel estimation and deconvolution processes are iteratively performed for the deblurred frame. Experimental results show that the proposed algorithm provides sharper details and smaller artifacts than the state-of-the-art algorithms.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/auvizsystems/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nagesh Gupta, Founder and CEO of Auviz Systems, presents the "Semantic Segmentation for Scene Understanding: Algorithms and Implementations" tutorial at the May 2016 Embedded Vision Summit.
Recent research in deep learning provides powerful tools that begin to address the daunting problem of automated scene understanding. Modifying deep learning methods, such as CNNs, to classify pixels in a scene with the help of the neighboring pixels has provided very good results in semantic segmentation. This technique provides a good starting point towards understanding a scene. A second challenge is how such algorithms can be deployed on embedded hardware at the performance required for real-world applications. A variety of approaches are being pursued for this, including GPUs, FPGAs, and dedicated hardware.
This talk provides insights into deep learning solutions for semantic segmentation, focusing on current state of the art algorithms and implementation choices. Gupta discusses the effect of porting these algorithms to fixed-point representation and the pros and cons of implementing them on FPGAs.
#6 PyData Warsaw: Deep learning for image segmentationMatthew Opala
Deep learning techniques ignited a great progress in many computer vision tasks like image classification, object detection, and segmentation. Almost every month a new method is published that achieves state-of-the-art result on some common benchmark dataset. In addition to that, DL is being applied to new problems in CV.
In the talk we’re going to focus on DL application to image segmentation task. We want to show the practical importance of this task for the fashion industry by presenting our case study with results achieved with various attempts and methods.
2020 11 2_automated sleep stage scoring of the sleep heartJAEMINJEONG5
The study aimed to develop an automated sleep stage scoring system using deep neural networks. The system was trained on over 52,000 hours of sleep data from 5,213 patients and achieved a weighted F1-score of 0.87 and Cohen's kappa of 0.82 when tested on 580 additional patients, exceeding the inter-human agreement reported in other studies. The optimal model used spectrograms of different sleep signals as input to convolutional and recurrent layers. Testing on additional datasets showed the model could generalize to different patient populations and equipment.
Currently, in both market and the academic communities have required applications based on image and video processing with several real-time constraints. On the other hand, detection of moving objects is a very important task in mobile robotics and surveillance applications. In order to achieve this, we are using a alternative means for real time motion detection systems. This paper proposes hardware architecture for motion detection based on the background subtraction algorithm, which is implemented on FPGAs (Field Programmable Gate Arrays). For achieving this, the following steps are executed: (a) a background image (in gray-level format) is stored in an external SRAM memory, (b) a low-pass filter is applied to both the stored and current images, (c) a subtraction operation between both images is obtained, and (d) a morphological filter is applied over the resulting image. Afterward, the gravity center of the object is calculated and sent to a PC (via RS-232 interface).
The document discusses Mask R-CNN, an extension of Faster R-CNN object detection that also performs semantic segmentation. Mask R-CNN adds a branch for predicting segmentation masks on each Region of Interest independently of class. During training, the mask branch learns to segment objects regardless of class, and at test time predicts masks for all classes using a "winner takes all" approach. The document also compares Mask R-CNN to Faster R-CNN and FCN approaches.
This document summarizes the paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". It introduces batch normalization, which normalizes layer inputs to speed up training of neural networks. Batch normalization reduces internal covariate shift by normalizing layer inputs. It computes normalization statistics over each mini-batch and applies them to the inputs. This allows higher learning rates and acts as a regularizer. Experiments show batch normalization stabilizes and accelerates the training of neural networks on ImageNet classification.
Mask R-CNN is an algorithm for instance segmentation that builds upon Faster R-CNN by adding a branch for predicting masks in parallel with bounding boxes. It uses a Feature Pyramid Network to extract features at multiple scales, and RoIAlign instead of RoIPool for better alignment between masks and their corresponding regions. The architecture consists of a Region Proposal Network for generating candidate object boxes, followed by two branches - one for classification and box regression, and another for predicting masks with a fully convolutional network using per-pixel sigmoid activations and binary cross-entropy loss. Mask R-CNN achieves state-of-the-art performance on standard instance segmentation benchmarks.
U2-Net is a novel deep network for salient object detection that uses a two-level nested U-structure with newly designed residual U-blocks (RSU) to capture multi-scale contextual information with increased depth but limited computational cost. The proposed U2-Net achieves competitive results against state-of-the-art methods on various datasets while providing a full-size model (176.3 MB, 30 FPS) and a smaller model (4.7 MB, 40 FPS) for constrained devices.
DeconvNet, DecoupledNet, TransferNet in Image SegmentationNamHyuk Ahn
The document discusses three neural network models for semantic segmentation: DeconvNet, DecoupledNet, and TransferNet. DeconvNet uses deconvolution layers to generate dense pixel-wise segmentation maps from convolutional features. DecoupledNet is designed for semi-supervised learning, using separate networks for classification and binary segmentation with bridging layers. TransferNet introduces an attention model to enable transferring a segmentation model trained on one dataset to a different dataset with new classes.
Learning deep features for discriminative localization太一郎 遠藤
This document discusses using global average pooling and class activation mapping to perform discriminative localization using convolutional neural networks. It describes how global average pooling can generate class activation maps to highlight discriminative regions in images for object localization. Fine-tuning networks like AlexNet, VGGNet, and GoogLeNet using global average pooling and additional localization layers can generate bounding boxes for objects.
[Paper] Multiscale Vision Transformers(MVit)Susang Kim
This document summarizes research on multiscale vision transformers (MViT). MViT builds on the transformer architecture by incorporating a multiscale pyramid of features, with early layers operating at high resolution to model low-level visual information and deeper layers focusing on coarse, complex features. MViT introduces multi-head pooling attention to operate at changing resolutions, and uses separate spatial and temporal embeddings. Experiments on Kinetics-400 and ImageNet show MViT achieves better accuracy than ViT baselines with fewer parameters and lower computational cost. Ablation studies validate design choices in MViT like input sampling and stage distribution.
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...MYEONGGYU LEE
review date: 2019/03/20 (by Meyong-Gyu.LEE @Soongsil Univ.)
Korean review of '3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks'(CVPR 2017)
The slides for the techniques used in the Temporal Segment Network (TSN), including the basic ideas, recall of BN-Inception, optical flow and tricks in application. Used in group paper reading in University of Sydney.
Fast R-CNN is a method that improves object detection speed and accuracy over previous methods like R-CNN and SPPnet. It uses a region of interest pooling layer and multi-task loss to jointly train a convolutional neural network for classification and bounding box regression in a single stage of training. This allows the entire network to be fine-tuned end-to-end for object detection, resulting in faster training and testing compared to previous methods while achieving state-of-the-art accuracy on standard datasets. Specifically, Fast R-CNN trains 9x faster than R-CNN and runs 200x faster at test time.
(Research Note) Delving deeper into convolutional neural networks for camera ...Jacky Liu
This document summarizes a research paper on improving camera relocalization using convolutional neural networks. The key contributions are: 1) Developing a new orientation representation called Euler6 to solve issues with quaternion representations, 2) Performing pose synthesis to augment training data and address overfitting on sparse poses, and 3) Proposing a branching multi-task CNN called BranchNet to separately regress orientation and translation while sharing lower level features. Experiments on a benchmark dataset show the techniques reduce relocalization error compared to prior methods.
The document summarizes the Batch Normalization technique presented in the paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". Batch Normalization aims to address the issue of internal covariate shift in deep neural networks by normalizing layer inputs to have zero mean and unit variance. It works by computing normalization statistics for each mini-batch and applying them to the inputs. This helps in faster and more stable training of deep networks by reducing the distribution shift across layers. The paper presented ablation studies on MNIST and ImageNet datasets showing Batch Normalization improves training speed and accuracy compared to prior techniques.
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis taeseon ryu
해당 논문은 3D Aware 모델입니다 StyleGAN 같은 경우에는 어떤 하나의 피처에 대해서 Editing 하고 싶을 때 입력에 해당하는 레이턴트 백터를 찾아서 레이턴트 백터를 수정함으로써 입에 해당하는 피쳐를 바꿀 수 있었는데 이런 컨셉을 그대로 착안해서
GAN 스페이스 논문에서는 인풋이 들어왔을 때 어떤 공간적인 정보까지도 에디팅하려고 시도했습니다 결과를 봤을 때 로테이션 정보가 어느 정도 잘 학습된 것 같지만 같은 사람이 아닌 것 같이 인식되기도 합니다 이러한 문제를 이제 disentangle 되지 않았다라고 하는 게 원하는 피처만 변화시켜야 되는 것과 달리 다른 피처까지도 모두 학습 모두 변했다는 것인데 이를 좀 더 효율적으로 3D를 더 잘 이해시키기 위해서 탄생한 논문입니다.
Kernel Estimation of Videodeblurringalgorithm and Motion Compensation of Resi...IJERA Editor
This paper presents a videodeblurring algorithm utilizing the high resolution information of adjacent unblurredframes.First, two motion-compensated predictors of a blurred frame are derived from its neighboring unblurred frames via bidirectional motion compensation. Then, an accurate blur kernel, which is difficult to directly obtain from the blurred frame itself, is computed between the predictors and the blurred frame. Next, a residual deconvolution is employed to reduce the ringing artifacts inherently caused by conventional deconvolution. The blur kernel estimation and deconvolution processes are iteratively performed for the deblurred frame. Experimental results show that the proposed algorithm provides sharper details and smaller artifacts than the state-of-the-art algorithms.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/auvizsystems/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nagesh Gupta, Founder and CEO of Auviz Systems, presents the "Semantic Segmentation for Scene Understanding: Algorithms and Implementations" tutorial at the May 2016 Embedded Vision Summit.
Recent research in deep learning provides powerful tools that begin to address the daunting problem of automated scene understanding. Modifying deep learning methods, such as CNNs, to classify pixels in a scene with the help of the neighboring pixels has provided very good results in semantic segmentation. This technique provides a good starting point towards understanding a scene. A second challenge is how such algorithms can be deployed on embedded hardware at the performance required for real-world applications. A variety of approaches are being pursued for this, including GPUs, FPGAs, and dedicated hardware.
This talk provides insights into deep learning solutions for semantic segmentation, focusing on current state of the art algorithms and implementation choices. Gupta discusses the effect of porting these algorithms to fixed-point representation and the pros and cons of implementing them on FPGAs.
#6 PyData Warsaw: Deep learning for image segmentationMatthew Opala
Deep learning techniques ignited a great progress in many computer vision tasks like image classification, object detection, and segmentation. Almost every month a new method is published that achieves state-of-the-art result on some common benchmark dataset. In addition to that, DL is being applied to new problems in CV.
In the talk we’re going to focus on DL application to image segmentation task. We want to show the practical importance of this task for the fashion industry by presenting our case study with results achieved with various attempts and methods.
2020 11 2_automated sleep stage scoring of the sleep heartJAEMINJEONG5
The study aimed to develop an automated sleep stage scoring system using deep neural networks. The system was trained on over 52,000 hours of sleep data from 5,213 patients and achieved a weighted F1-score of 0.87 and Cohen's kappa of 0.82 when tested on 580 additional patients, exceeding the inter-human agreement reported in other studies. The optimal model used spectrograms of different sleep signals as input to convolutional and recurrent layers. Testing on additional datasets showed the model could generalize to different patient populations and equipment.
Currently, in both market and the academic communities have required applications based on image and video processing with several real-time constraints. On the other hand, detection of moving objects is a very important task in mobile robotics and surveillance applications. In order to achieve this, we are using a alternative means for real time motion detection systems. This paper proposes hardware architecture for motion detection based on the background subtraction algorithm, which is implemented on FPGAs (Field Programmable Gate Arrays). For achieving this, the following steps are executed: (a) a background image (in gray-level format) is stored in an external SRAM memory, (b) a low-pass filter is applied to both the stored and current images, (c) a subtraction operation between both images is obtained, and (d) a morphological filter is applied over the resulting image. Afterward, the gravity center of the object is calculated and sent to a PC (via RS-232 interface).
The document discusses Mask R-CNN, an extension of Faster R-CNN object detection that also performs semantic segmentation. Mask R-CNN adds a branch for predicting segmentation masks on each Region of Interest independently of class. During training, the mask branch learns to segment objects regardless of class, and at test time predicts masks for all classes using a "winner takes all" approach. The document also compares Mask R-CNN to Faster R-CNN and FCN approaches.
This document summarizes the paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". It introduces batch normalization, which normalizes layer inputs to speed up training of neural networks. Batch normalization reduces internal covariate shift by normalizing layer inputs. It computes normalization statistics over each mini-batch and applies them to the inputs. This allows higher learning rates and acts as a regularizer. Experiments show batch normalization stabilizes and accelerates the training of neural networks on ImageNet classification.
Mask R-CNN is an algorithm for instance segmentation that builds upon Faster R-CNN by adding a branch for predicting masks in parallel with bounding boxes. It uses a Feature Pyramid Network to extract features at multiple scales, and RoIAlign instead of RoIPool for better alignment between masks and their corresponding regions. The architecture consists of a Region Proposal Network for generating candidate object boxes, followed by two branches - one for classification and box regression, and another for predicting masks with a fully convolutional network using per-pixel sigmoid activations and binary cross-entropy loss. Mask R-CNN achieves state-of-the-art performance on standard instance segmentation benchmarks.
U2-Net is a novel deep network for salient object detection that uses a two-level nested U-structure with newly designed residual U-blocks (RSU) to capture multi-scale contextual information with increased depth but limited computational cost. The proposed U2-Net achieves competitive results against state-of-the-art methods on various datasets while providing a full-size model (176.3 MB, 30 FPS) and a smaller model (4.7 MB, 40 FPS) for constrained devices.
DeconvNet, DecoupledNet, TransferNet in Image SegmentationNamHyuk Ahn
The document discusses three neural network models for semantic segmentation: DeconvNet, DecoupledNet, and TransferNet. DeconvNet uses deconvolution layers to generate dense pixel-wise segmentation maps from convolutional features. DecoupledNet is designed for semi-supervised learning, using separate networks for classification and binary segmentation with bridging layers. TransferNet introduces an attention model to enable transferring a segmentation model trained on one dataset to a different dataset with new classes.
Learning deep features for discriminative localization太一郎 遠藤
This document discusses using global average pooling and class activation mapping to perform discriminative localization using convolutional neural networks. It describes how global average pooling can generate class activation maps to highlight discriminative regions in images for object localization. Fine-tuning networks like AlexNet, VGGNet, and GoogLeNet using global average pooling and additional localization layers can generate bounding boxes for objects.
[Paper] Multiscale Vision Transformers(MVit)Susang Kim
This document summarizes research on multiscale vision transformers (MViT). MViT builds on the transformer architecture by incorporating a multiscale pyramid of features, with early layers operating at high resolution to model low-level visual information and deeper layers focusing on coarse, complex features. MViT introduces multi-head pooling attention to operate at changing resolutions, and uses separate spatial and temporal embeddings. Experiments on Kinetics-400 and ImageNet show MViT achieves better accuracy than ViT baselines with fewer parameters and lower computational cost. Ablation studies validate design choices in MViT like input sampling and stage distribution.
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...MYEONGGYU LEE
review date: 2019/03/20 (by Meyong-Gyu.LEE @Soongsil Univ.)
Korean review of '3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks'(CVPR 2017)
The slides for the techniques used in the Temporal Segment Network (TSN), including the basic ideas, recall of BN-Inception, optical flow and tricks in application. Used in group paper reading in University of Sydney.
1) The document proposes TransNeRF, a transfer learning framework for neural radiance fields (NeRF) that improves scene reconstruction efficiency.
2) TransNeRF uses two MLPs - one for 3D scene generation and another for color emission. It also uses generative latent optimization to account for photometric variations.
3) TransNeRF is trained from lower to higher resolution images. The first MLP predicts geometry, while the second MLP's weights are transferred between resolutions, allowing geometry to remain stable while radiance varies per image.
1. The document discusses using deep learning techniques for surface defect detection, focusing on strategies for dealing with imbalanced training data.
2. It proposes using generative adversarial networks (GANs) to generate synthetic defect samples in order to address the class imbalance problem. Convolutional neural networks (CNNs) are then used for classification.
3. Autoencoding models like convolutional autoencoders (CAE) and variational autoencoders (VAE) can also be used for unsupervised defect detection based on image reconstruction.
This document describes an FPGA implementation of moving object detection using background modeling and connected component analysis. It discusses background differencing algorithms, FPGA-based background modeling, and connected component analysis algorithms like two-pass and multi-pass. The document also provides details of the FPGA hardware implementation including memory architecture, processing speeds achieved, and resource utilization. Real-time processing of 640x480 video at 209 frames per second is demonstrated using only a small percentage of FPGA resources.
Here, we have implemented CNN network in FPGA by incorporating a novel technique of convolution which includes pipelining technique as well as parallelism (by optimizing) between the two.
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video Processing (NeRF...Deep Learning JP
Neural Radiance Flow (NeRFlow) is a method that extends Neural Radiance Fields (NeRF) to model dynamic scenes from video data. NeRFlow simultaneously learns two fields - a radiance field to reconstruct images like NeRF, and a flow field to model how points in space move over time using optical flow. This allows it to generate novel views from a new time point. The model is trained end-to-end by minimizing losses for color reconstruction from volume rendering and optical flow reconstruction. However, the method requires training separate models for each scene and does not generalize to unknown scenes.
Faster R-CNN is an object detection neural network that improves on R-CNN models by making them faster and training the whole network end-to-end. It introduces a Region Proposal Network that proposes regions of interest within the image in one forward pass of the network using anchors of different scales and aspect ratios. These proposals are then fed into the Fast R-CNN network for classification and bounding box regression in one stage of training instead of multiple stages like R-CNN. Mask R-CNN extends Faster R-CNN by adding a branch to predict segmentation masks for each region of interest in parallel with classification and bounding box regression.
Online video object segmentation via convolutional trident networkNAVER Engineering
발표자: 장원동 (고려대 박사과정)
발표일: 2017.8.
개요:
A semi-supervised online video object segmentation algorithm, which accepts user annotations about a target object at the first frame, will be presented. It propagates the segmentation labels at the previous frame to the current frame using optical flow vectors.
However, the propagation is error-prone. Therefore, I’ve developed the convolutional trident network, which has three decoding branches: separative, definite foreground, and definite background decoders.
Then, the algorithm performs Markov random field optimization based on outputs of the three decoders.
These process is sequentially carried out from the second to the last frames to extract a segment track of the target object.
Experimental results will demonstrate that this algorithm significantly outperforms the state-of-the-art conventional algorithms on the DAVIS benchmark dataset.
Faster R-CNN improves object detection by introducing a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. The RPN slides over feature maps and predicts object bounds and objectness at each position. During training, anchors are assigned positive or negative labels based on Intersection over Union with ground truth boxes. Faster R-CNN runs the RPN in parallel with Fast R-CNN for detection, end-to-end in a single network and stage. This achieves state-of-the-art object detection speed and accuracy while eliminating computationally expensive selective search for proposals.
Neural Radiance Fields (NeRF) represents scenes as neural radiance fields that can be used for novel view synthesis. NeRF learns a continuous radiance field from a sparse set of input views using a multi-layer perceptron that maps 5D coordinates to RGB color and density values. It uses volumetric rendering to integrate these values along camera rays and optimizes the network via differentiable rendering and a reconstruction loss. NeRF produces high-fidelity novel views and has inspired extensions like handling dynamic scenes and reconstructing scenes from unstructured internet photos.
1. The document describes using a deep neural network to detect changes between two SAR images by preclassifying the images, training the neural network on selected samples, and analyzing the results.
2. A similarity matrix and variance matrix are calculated during preclassification to identify and jointly label similar pixels, while different pixels are labeled separately. Good samples are selected to train the neural network.
3. The neural network is tested on images with different types and levels of noise and performs well at change detection, with performance increasing as noise decreases. Future work could focus on accelerating the training process.
2019-06-14:6 - Reti neurali e compressione immagineuninfoit
This document discusses using neural networks for image compression. It describes traditional image compression techniques and architectures, as well as different types of neural networks like convolutional neural networks (CNNs) and recurrent neural networks (RNNs). CNNs have been used in autoencoders to obtain a latent space representation of an image for compression. RNNs have also been proposed for neural network image coding frameworks to leverage their memory capabilities. Examples show neural networks can achieve comparable or better compression than traditional techniques like JPEG at lower bitrates.
Neural Discrete Representation Learning - A paper reviewAbhishek Koirala
1) The paper introduces Vector Quantization Variational Autoencoders (VQ-VAEs), which use discrete rather than continuous latent codes. This allows the prior to be learned from the data distribution rather than assuming a fixed prior.
2) VQ-VAEs train with a loss that enforces the encoder outputs to be close to embeddings in a learned codebook. This allows generating new samples by sampling the prior rather than relying only on reconstruction.
3) Experiments show VQ-VAEs can generate images, video, and speech that retains semantic content, while achieving likelihoods comparable to continuous latent variable models. The discrete latent space captures long-term dependencies without supervision.
Interactive Control over Temporal Consistency while Stylizing Video StreamsMatthias Trapp
Presentation of the research paper "Interactive Control over Temporal Consistency while Stylizing Video Streams" at the 34th Eurographics Symposium on Rendering (EGSR 2023) in Delft, Netherlands.
This document provides an overview of JPEG image compression and forensic analysis of JPEG images. It discusses:
1. The JPEG standard for lossy image compression, how it works by applying discrete cosine transform, quantization, and entropy coding to remove spatial redundancy in images.
2. Key aspects of the JPEG compression process including color space conversion to YCbCr, subsampling of chroma channels, use of quantization tables to discard more high-frequency DCT coefficients at higher compression levels.
3. How traces of JPEG compression like double compression artifacts can be analyzed forensically to estimate a photo's compression history or detect tampering.
The document discusses HDR (high dynamic range) and tone mapping. It begins with an introduction to HDR images and how HDR is used in various fields like video on demand services, gaming and photography. It then covers what HDR and tone mapping are, including how HDR images have a wider brightness and color range than traditional images. It also discusses HDR standards like HDR10 and Dolby Vision.
ICCV is an influential biennial computer vision conference. ICCV 2019 was held in Seoul from October 27 to November 2. Over 4,300 papers were submitted and 1,077 were accepted, for an acceptance rate of 27.9%. Popular topics included graph neural networks, adversarial examples, unsupervised learning, attention mechanisms, and person re-identification. Emerging areas of interest were semantic adversarial attacks, few-shot learning, lightweight models via quantization, knowledge distillation, and scene graph generation.
(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu...MYEONGGYU LEE
review date: 2018/04/09 (by Meyong-Gyu.LEE @Soongsil Univ.)
Eng review of 'A versatile learning based 3D temporal tracker - scalable, robust, online'(ICCV 2015)
(Paper Review)Towards foveated rendering for gaze tracked virtual realityMYEONGGYU LEE
review date: 2017/10/30 (by Meyong-Gyu.LEE @Soongsil Univ.)
Korean review of 'Towards Foveated Rendering for Gaze-Tracked Virtual Reality'(A Patney et al.)
(Papers Review)CNN for sentence classificationMYEONGGYU LEE
review date: 2017/10/10 (by Meyong-Gyu.LEE @Soongsil Univ.)
Korean review of 'Convolutional Neural Networks for Sentence Classification'(EMNLP2014) and 'A Syllable-based Technique for Word Embeddings of Korean Words'(HCLT 2017)
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...MYEONGGYU LEE
review date: 2017/12/5 (by Meyong-Gyu.LEE @Soongsil Univ.)
Korean Paper review of 'Kernel Predicting Convolutional Networks for Denoising Monte Carlo Renderings'(Siggraph2017)
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul
Artificial intelligence is changing how businesses operate. Companies are using AI agents to automate tasks, reduce time spent on repetitive work, and focus more on high-value activities. Noah Loul, an AI strategist and entrepreneur, has helped dozens of companies streamline their operations using smart automation. He believes AI agents aren't just tools—they're workers that take on repeatable tasks so your human team can focus on what matters. If you want to reduce time waste and increase output, AI agents are the next move.
This is the keynote of the Into the Box conference, highlighting the release of the BoxLang JVM language, its key enhancements, and its vision for the future.
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55
We’re bringing the TDX energy to our community with 2 power-packed sessions:
🛠️ Workshop: MuleSoft for Agentforce
Explore the new version of our hands-on workshop featuring the latest Topic Center and API Catalog updates.
📄 Talk: Power Up Document Processing
Dive into smart automation with MuleSoft IDP, NLP, and Einstein AI for intelligent document workflows.
HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browser’s cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
Vaibhav Gupta BAML: AI work flows without Hallucinationsjohn409870
Shipping Agents
Vaibhav Gupta
Cofounder @ Boundary
in/vaigup
boundaryml/baml
Imagine if every API call you made
failed only 5% of the time
boundaryml/baml
Imagine if every LLM call you made
failed only 5% of the time
boundaryml/baml
Imagine if every LLM call you made
failed only 5% of the time
boundaryml/baml
Fault tolerant systems are hard
but now everything must be
fault tolerant
boundaryml/baml
We need to change how we
think about these systems
Aaron Villalpando
Cofounder @ Boundary
Boundary
Combinator
boundaryml/baml
We used to write websites like this:
boundaryml/baml
But now we do this:
boundaryml/baml
Problems web dev had:
boundaryml/baml
Problems web dev had:
Strings. Strings everywhere.
boundaryml/baml
Problems web dev had:
Strings. Strings everywhere.
State management was impossible.
boundaryml/baml
Problems web dev had:
Strings. Strings everywhere.
State management was impossible.
Dynamic components? forget about it.
boundaryml/baml
Problems web dev had:
Strings. Strings everywhere.
State management was impossible.
Dynamic components? forget about it.
Reuse components? Good luck.
boundaryml/baml
Problems web dev had:
Strings. Strings everywhere.
State management was impossible.
Dynamic components? forget about it.
Reuse components? Good luck.
Iteration loops took minutes.
boundaryml/baml
Problems web dev had:
Strings. Strings everywhere.
State management was impossible.
Dynamic components? forget about it.
Reuse components? Good luck.
Iteration loops took minutes.
Low engineering rigor
boundaryml/baml
React added engineering rigor
boundaryml/baml
The syntax we use changes how we
think about problems
boundaryml/baml
We used to write agents like this:
boundaryml/baml
Problems agents have:
boundaryml/baml
Problems agents have:
Strings. Strings everywhere.
Context management is impossible.
Changing one thing breaks another.
New models come out all the time.
Iteration loops take minutes.
boundaryml/baml
Problems agents have:
Strings. Strings everywhere.
Context management is impossible.
Changing one thing breaks another.
New models come out all the time.
Iteration loops take minutes.
Low engineering rigor
boundaryml/baml
Agents need
the expressiveness of English,
but the structure of code
F*** You, Show Me The Prompt.
boundaryml/baml
<show don’t tell>
Less prompting +
More engineering
=
Reliability +
Maintainability
BAML
Sam
Greg Antonio
Chris
turned down
openai to join
ex-founder, one
of the earliest
BAML users
MIT PhD
20+ years in
compilers
made his own
database, 400k+
youtube views
Vaibhav Gupta
in/vaigup
[email protected]
boundaryml/baml
Thank you!
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, transcript, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
Semantic Cultivators : The Critical Future Role to Enable AIartmondano
By 2026, AI agents will consume 10x more enterprise data than humans, but with none of the contextual understanding that prevents catastrophic misinterpretations.
Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark?
At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍
Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Aqusag Technologies
In late April 2025, a significant portion of Europe, particularly Spain, Portugal, and parts of southern France, experienced widespread, rolling power outages that continue to affect millions of residents, businesses, and infrastructure systems.
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPathCommunity
Join this UiPath Community Berlin meetup to explore the Orchestrator API, Swagger interface, and the Test Manager API. Learn how to leverage these tools to streamline automation, enhance testing, and integrate more efficiently with UiPath. Perfect for developers, testers, and automation enthusiasts!
📕 Agenda
Welcome & Introductions
Orchestrator API Overview
Exploring the Swagger Interface
Test Manager API Highlights
Streamlining Automation & Testing with APIs (Demo)
Q&A and Open Discussion
Perfect for developers, testers, and automation enthusiasts!
👉 Join our UiPath Community Berlin chapter: https://community.uipath.com/berlin/
This session streamed live on April 29, 2025, 18:00 CET.
Check out all our upcoming UiPath Community sessions at https://community.uipath.com/events/.
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell
With expertise in data architecture, performance tracking, and revenue forecasting, Andrew Marnell plays a vital role in aligning business strategies with data insights. Andrew Marnell’s ability to lead cross-functional teams ensures businesses achieve sustainable growth and operational excellence.
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john
Analyze the growth of meme coins from mere online jokes to potential assets in the digital economy. Explore the community, culture, and utility as they elevate themselves to a new era in cryptocurrency.
Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity
This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots.
📕 Here's what you can expect:
- Modeling: Build end-to-end processes using BPMN.
- Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes.
- Operating: Control process instances with rewind, replay, pause, and stop functions.
- Monitoring: Use dashboards and embedded analytics for real-time insights into process instances.
This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes.
👨🏫 Speaker:
Andrei Vintila, Principal Product Manager @UiPath
This session streamed live on April 29, 2025, 16:00 CET.
Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-und-verwaltung-von-multiuser-umgebungen/
HCL Nomad Web wird als die nächste Generation des HCL Notes-Clients gefeiert und bietet zahlreiche Vorteile, wie die Beseitigung des Bedarfs an Paketierung, Verteilung und Installation. Nomad Web-Client-Updates werden “automatisch” im Hintergrund installiert, was den administrativen Aufwand im Vergleich zu traditionellen HCL Notes-Clients erheblich reduziert. Allerdings stellt die Fehlerbehebung in Nomad Web im Vergleich zum Notes-Client einzigartige Herausforderungen dar.
Begleiten Sie Christoph und Marc, während sie demonstrieren, wie der Fehlerbehebungsprozess in HCL Nomad Web vereinfacht werden kann, um eine reibungslose und effiziente Benutzererfahrung zu gewährleisten.
In diesem Webinar werden wir effektive Strategien zur Diagnose und Lösung häufiger Probleme in HCL Nomad Web untersuchen, einschließlich
- Zugriff auf die Konsole
- Auffinden und Interpretieren von Protokolldateien
- Zugriff auf den Datenordner im Cache des Browsers (unter Verwendung von OPFS)
- Verständnis der Unterschiede zwischen Einzel- und Mehrbenutzerszenarien
- Nutzung der Client Clocking-Funktion
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder
1. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (1/46) CGLAB 이명규
2019/07/26
재귀적 Denoising AE를 통한
MC렌더링 이미지 시퀀스의
실시간 복원 기법
Interactive Reconstruction of
Monte Carlo Image Sequences using a
Recurrent Denoising Autoencoder
2. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (2/46)
I N D E X
01
02
03
04
05
Introduction
Recurrent AE
Proposed Method
Experiments
Conclusion
3. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (3/46)
Introduction
Part 01
1. 논문소개
2. 관련 연구 요약
3. Monte Carlo Rendering
4. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (4/46)
↳
논문소개1-1
• 발표 : SIGGRAPH 2017
• 저자 : Chakravarty R. Alla Chaitanya et al.
(NVIDIA, University of Montreal and McGill University)
• 인용횟수 : 63회
• Monte Carlo 렌더링에서 낮은 spp로 인해 발생되는 노이즈를
Recurrent AutoEncoder로 Denoising하는 연구
저널정보 및 논문소개
5. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (5/46)
↳
https://www.youtube.com/watch?v=YjjTPV2pXY0
논문소개1-1
7. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (7/46)
↳
관련 연구 요약1-2
• Offline Denoising for MC Rendering
• Non-linear image space filters to indirect diffuse illumination (Jenson et al.)
• Looking at the frequency analysis of light transport (Egan et al.)
• Train parameters of a non-local means filter using ML
Good quality, but slow.
• Interactive Denoising for MC Rendering
• Separate direct/indirect illumination and filter the latter using edge-avoiding
filters
• edge-avoiding À-trous wavelets, adaptive manifolds, guided image filters
Local detail may be lost
Related Works – Image Denoising
8. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (8/46)
↳
관련 연구 요약1-2
• Image Restoration using Deep Learning
• Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections
[Mao et al.]
• The denoising of images corrupted with Gaussian noise is an active research topic.
• But in this paper, some samples have a very high energy while most areas appear black.
• Video Super Resolution
• Using RNNs(Huang et al. 2015) or LSTM block in bottleneck of the AE(Pătrăucean et al.)
Related Works – Reconstruction of Images
9. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (9/46)
↳
Monte Carlo Rendering1-3
Monte Carlo Integral
10. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (10/46)
↳
Monte Carlo Rendering1-3
Monte Carlo Integral
https://www.cs.rpi.edu/~cutler/classes/advancedgraphics/S08/lectures/17_monte_carlo.pdf
11. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (11/46)
↳
Monte Carlo Rendering1-3
Monte Carlo Rendering
https://www.cs.rpi.edu/~cutler/classes/advancedgraphics/S08/lectures/17_monte_carlo.pdf
Bidirectional Reflectance Distribution Function
12. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (12/46)
↳
Monte Carlo Rendering1-3
Monte Carlo Rendering
https://www.cs.rpi.edu/~cutler/classes/advancedgraphics/S08/lectures/17_monte_carlo.pdf
13. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (13/46)
↳
Monte Carlo Rendering1-3
Monte Carlo Rendering Our Problem
https://www.cs.rpi.edu/~cutler/classes/advancedgraphics/S08/lectures/17_monte_carlo.pdf
14. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (14/46)
↳
Monte Carlo Rendering1-3
Monte Carlo Rendering
https://www.cs.rpi.edu/~cutler/classes/advancedgraphics/S08/lectures/17_monte_carlo.pdf
15. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (15/46)
Recurrent AE
Part 02
1. AutoEncoder
2. RCNN(Recurrent CNN)
3. Recurrent AE
4. Additional Slides
16. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (16/46)
↳
AutoEncoder2-1
Linear vs. Non-Linear Dimension Reduction
https://www.jeremyjordan.me/autoencoders/
“Why using AE in this paper?”
17. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (17/46)
↳
AutoEncoder2-1
Concept of AutoEncoder
• AE는 입력 데이터를 하위 차원 매니폴드*로 매핑하기 위한 vector field를 학습
• 즉 고차원 공간 속에 분포하는 저차원의 Manifold hypothesis를 알아서 찾는것이 목표
*Manifold : 데이터가 분포하고 있는 공간의 표면을 의미(locally homeomorphic to euclidean space)
Self-supervised Learning
(Input을 target으로 사용)
Bottleneck
(Important features)
Target→
Predicted→
18. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (18/46)
↳
RCNN(Recurrent CNN)2-2
Concept of RNN
• 시퀀스 데이터 모델링을 위해 등장
• Hidden state(≈기억)를 갖고 있는 것이 기존 네트워크와의 차이
• 네트워크의 hidden state는 현재 state까지 요약된 입력 데이터와 같음
• 새로운 입력이 들어올 때마다 hidden state가 수정됨
New
Hidden state
Input data
19. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (19/46)
↳
RCNN(Recurrent CNN)2-2
Concept of RNN
https://pythonkim.tistory.com/57
ℎ𝑡𝑡 = 𝑓𝑓𝑤𝑤(ℎ𝑡𝑡−1, 𝑥𝑥𝑡𝑡)
New State
Some function with
parameters 𝑾𝑾
Old State
Input vector at
some time step
21. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (21/46)
https://www.youtube.com/watch?v=UNmqTiOnRfg
22. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (22/46)
https://www.youtube.com/watch?v=UNmqTiOnRfg
1
0
0
0
1
1
0
0
0
1
0
0
0
0
1
1
1
0
1
0
1
2
1
0
0
0
0
1
0
최댓값=1, 나머지=0
0
0
1
23. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (23/46)
↳
RCNN(Recurrent CNN)2-2
Concept of RNN
• 입력과 출력 단계의 거리가 멀수록 Vanishing Gradient 발생
• RNN은 가장 최근의 입력을 제일 강하게 기억하기 때문
• RCNN = RNN + CNN
• RNN의 단점을 보완한 LSTM 등이 제안되었으나 본 논문에서는
vanilla RNN 사용
• 이미지 내의 크고 다양한 사이즈의 영역에 부정적인 영향을 미칠 수
있다고 판단
24. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (24/46)
↳
Recurrent AE2-3
Why AE + RCNN?
• AE구조에 RCNN을 접목하면 Temporal stability가 증가됨
• 또한 AE를 사용하면 이를 end-to-end learning으로 감독 없이
자동으로 auxiliary pixel features를 잘 활용하도록 학습이 가능
• auxiliary pixel features : depth, normal 등등
25. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (25/46)
↳
Additional slides2-4
What is Auxiliary Pixel Features?
• G-buffer에 저장되는 정보로 Scene의 Geometry에 대한 정보들을 포함
• Rasterization path→reconstruction algorithm으로 보내는 정보
• HDR RGB image, Depth, Roughness, View-space shading normal
• 3(FP16)+4(1 FP16+3 FP8)=7 scalar values per pixel
26. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (26/46)
↳
2-4
What is Geometry Buffer?
• Per-pixel lighting에 필요한 모든 정보를 저장하는 buffer
• Normal, Position, Diffuse/Specular Albedo, …
Additional slides
27. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (27/46)
Proposed Method
Part 03
1. Interactive Path Tracer
2. Network Architecture
3. Training Data
4. Loss Functions
5. Analysis
28. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (28/46)
↳
3-1
Overview
I n t e r a c t i v e Pa t h
T r a c e r
Visible Surface
Rasterization
Tracing path using
NVIDIA OptiX GPU
Ray Tracer
• 1-sample unidirectionally path tracer 사용 (Indirect bounce = 1)
• 1-Direct lighting path(Cam→surface→light),
1-Indirect path(Cam→surface→ surface→ light)로 구성
• DoF, Motion blur는 G-buffer에서 노이즈를 유발하므로 Post process에서 처리
• Sampling light source
• Sampling scattering directions
(low-discrepancy Halton sequences)
• Apply path space regularization to
glossy & specular materials
29. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (29/46)
↳
3-2
기존 Image Restoration 방식의 문제
Network Architecture
• CNN with hierarchical skip connection[Mao et al.]의 문제점
• Full resolution(1080p)에서 매우 느림
• MC Rendering에서는 흔한 Spatially very sparse samples에는 취약함
• Frame들이 독립되어 있어 temporally unstable한 결과물이 나옴
Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections - https://arxiv.org/pdf/1606.08921.pdf
30. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (30/46)
↳
3-2
Network Overview – AE+RCNN
Network Architecture
• Denoising AE와 함께 시간 개념을 더하기 위해 RNN구조를 적용
• Recurrent connections를 통해 시간 경과에 따른 조명 정보를 누적
• 입력이 encoder에서 더 sparse하므로 Encoder에만 recurrent 구조를 적용
Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections - https://arxiv.org/pdf/1606.08921.pdf
31. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (31/46)
↳
3-2
Network Overview – AE+RCNN
Network Architecture
Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections - https://arxiv.org/pdf/1606.08921.pdf
7View space shading
Normals
(FP8, 2ch)
Depth map
(FP16, 1ch)
Material’s Roughness
map
(FP8, 1ch)
Noisy HDR RGB
(FP16, 3ch)
1024*1024
32. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (32/46)
↳
3-3
Trian Data Overview
Training Data
• 7개의 fly-through 동영상 시퀀스 사용
• 시퀀스 내에서 무작위 시간 범위를 골라 sub-sequence 학습에 사용
• 무작위로 앞/뒤 재생 및 다양한 카메라 움직임
• Data Augmentation
• 무작위로 선택된 시퀀스에 대해 90/180/270도 무작위 회전 적용
• 각 색상 채널별로 0~2 범위에서 무작위 색상 변조를 한 후 모든 시퀀스에 적용
(채널 독립성과 input-target 간의 linear한 관계를 더 잘 학습하게 함)
33. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (33/46)
↳
3-4
Overview of Loss Functions
Loss Functions
ℒ = 𝑤𝑤𝑠𝑠ℒ𝑠𝑠 + 𝑤𝑤𝑔𝑔ℒ𝑔𝑔 + 𝑤𝑤𝑡𝑡ℒ𝑡𝑡
𝒘𝒘𝒔𝒔, 𝒘𝒘𝒈𝒈, 𝒘𝒘𝒕𝒕: Weights
Spatial 𝑳𝑳𝟏𝟏 Loss
Gradient domain
𝑳𝑳𝟏𝟏 Loss
Temporal
𝑳𝑳𝟏𝟏 Loss
34. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (34/46)
↳
3-4
Loss with Isolated Images
Loss Functions
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ℒ𝑠𝑠 =
𝟏𝟏
𝑵𝑵
�
𝒊𝒊
𝑵𝑵
𝑷𝑷𝒊𝒊 − 𝑻𝑻𝒊𝒊
• L2대신 L1 loss를 사용하면 reconstructed image에서 splotchy artifacts를
감소시킬 수 있음.
ℒ = 𝑤𝑤𝑠𝑠ℒ𝑠𝑠 + 𝑤𝑤𝑔𝑔ℒ𝑔𝑔 + 𝑤𝑤𝑡𝑡ℒ𝑡𝑡
𝑷𝑷𝒊𝒊: Predicted, 𝑻𝑻𝒊𝒊: Target
35. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (35/46)
↳
3-4
Loss with Isolated Images
Loss Functions
𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ℒ𝑔𝑔 =
𝟏𝟏
𝑵𝑵
�
𝒊𝒊
𝑵𝑵
𝜵𝜵𝑷𝑷𝒊𝒊 − 𝜵𝜵𝑻𝑻𝒊𝒊
• ∇는 HFEN(High Frequency Error Norm)으로 계산
• Edge detection에 Laplacian 방식을 사용
• 노이즈에 취약하기 때문에 Gaussian filter(𝝈𝝈 = 𝟏𝟏. 𝟓𝟓)로 pre-smoothing
ℒ = 𝑤𝑤𝑠𝑠ℒ𝑠𝑠 + 𝑤𝑤𝑔𝑔ℒ𝑔𝑔 + 𝑤𝑤𝑡𝑡ℒ𝑡𝑡
𝑷𝑷𝒊𝒊: Predicted, 𝑻𝑻𝒊𝒊: Target
36. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (36/46)
↳
ℒ = 𝑤𝑤𝑠𝑠ℒ𝑠𝑠 + 𝑤𝑤𝑔𝑔ℒ𝑔𝑔 + 𝑤𝑤𝑡𝑡ℒ𝑡𝑡
𝑷𝑷𝒊𝒊: Predicted, 𝑻𝑻𝒊𝒊: Target
3-4
Loss that Penalize Temporal Incoherence
Loss Functions
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ℒ𝑔𝑔 =
𝟏𝟏
𝑵𝑵
�
𝒊𝒊
𝑵𝑵
∂𝑷𝑷𝒊𝒊
∂𝒕𝒕
−
∂𝑻𝑻𝒊𝒊
∂𝒕𝒕
37. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (37/46)
↳
3-4
𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝓛𝓛𝒔𝒔 vs. 𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳
Loss Functions
• 𝒘𝒘𝒔𝒔, 𝒘𝒘𝒈𝒈, 𝒘𝒘𝒕𝒕의 scale은 0.8, 0.1, 0.1로 대강 맞춤
• 영상 시퀀스가 끝나갈 때 loss weight를 높게 줄수록
temporal gradient를 증폭 가능
• Gaussian Curve를 이용해 수치 변경 (0.011, 0.044, 0.135, 0.325, 0.607, 0.882, 1)
• 𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝓛𝓛𝒔𝒔: 0.9335, 𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳: 0.9417
38. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (38/46)
↳
3-5
Analysis of Auxiliary Features
Analysis
• Using untextured lighting improves
the convergence speed
• Normals help network to detect
silhouettes of objects
• Add depth, roughness to get
more improvements.
39. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (39/46)
↳
3-5
Network Properties Analysis
best best
Analysis
40. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (40/46)
Experiments
Part 04
1. Reconstruction Quality
with low samples
2. Performance
41. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (41/46)
↳
4-1
Overview of Network
Reconstruction Quality with low samples
42. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (42/46)
↳
4-1 Reconstruction Quality with low samples
43. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (43/46)
↳
4-2
Environment
Performance
• Train
• Training network using NVIDIA DGX-1
16H for training 500epoch(1H for preprocessing data)
• Optimizer: ADAM(lr 0.001, decay rates 𝜷𝜷𝟏𝟏 = 𝟎𝟎. 𝟗𝟗, 𝜷𝜷𝟐𝟐 = 𝟎𝟎. 𝟗𝟗𝟗𝟗)
• Initialize parameters using He et al.’s method
• Apply LeakyReLU( 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝜶𝜶 = 𝟎𝟎. 𝟏𝟏, except last layer)+MaxPooling
• Reconstruction Performance
• CUDA kernel + cuDNN 5.1
• 720p reconstruction에 54.9ms 소요(TitanX)
https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/dgx-1/dgx-1-print-infographic-738238-nvidia-web.pdf
“149,000$”
44. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (44/46)
Conclusion
Part 05
1. Conclusion
2. Limitations
45. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (45/46)
↳
Conclusion5-1
• Conclusion
• First application of recurrent denoising AE
• Producing noise-free and temporally coherent
animation sequence with GI(Global illumination)
• Future work
• 네트워크에 렌즈와 시간 좌표를 제공해 motion blur, DoF와 같은 효과도 처리
요약
46. CGLAB 이명규Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder (46/46)
↳
Limitations5-2
• 머리카락과 같이 정교한 geometry에서는 spp가 낮을 때
파괴된 이미지 구조를 복구하지 못함
• 학습 데이터가 적을 경우 Flickering 현상 발생
본 논문의 한계
• Right: Noisy 1 spp input RGB sequence
• Middle: Reconstructed sequence
• Left: Reference 4096 spp sequence
https://github.com/yuyingyeh/rdae