ConvMixer is a simple CNN-based model that achieves state-of-the-art results on ImageNet classification. It divides the input image into patches and embeds them into high-dimensional vectors, similar to ViT. However, unlike ViT, it does not use attention but instead applies simple convolutional layers between the patch embedding and classification layers. Experiments show that despite its simplicity, ConvMixer outperforms more complex models like ResNet, ViT, and MLP-Mixer on ImageNet, demonstrating that patch embeddings may be as important as attention mechanisms for vision tasks.
cvpaper.challenge の Meta Study Group 発表スライド
cvpaper.challenge はコンピュータビジョン分野の今を映し、トレンドを創り出す挑戦です。論文サマリ・アイディア考案・議論・実装・論文投稿に取り組み、凡ゆる知識を共有します。2019の目標「トップ会議30+本投稿」「2回以上のトップ会議網羅的サーベイ」
http://xpaperchallenge.org/cv/
This document summarizes recent advances in single image super-resolution (SISR) using deep learning methods. It discusses early SISR networks like SRCNN, VDSR and ESPCN. SRResNet is presented as a baseline method, incorporating residual blocks and pixel shuffle upsampling. SRGAN and EDSR are also introduced, with EDSR achieving state-of-the-art PSNR results. The relationship between reconstruction loss, perceptual quality and distortion is examined. While PSNR improves yearly, a perception-distortion tradeoff remains. Developments are ongoing to produce outputs that are both accurately restored and naturally perceived.
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video Processing (NeRF...Deep Learning JP
Neural Radiance Flow (NeRFlow) is a method that extends Neural Radiance Fields (NeRF) to model dynamic scenes from video data. NeRFlow simultaneously learns two fields - a radiance field to reconstruct images like NeRF, and a flow field to model how points in space move over time using optical flow. This allows it to generate novel views from a new time point. The model is trained end-to-end by minimizing losses for color reconstruction from volume rendering and optical flow reconstruction. However, the method requires training separate models for each scene and does not generalize to unknown scenes.
This document contains contact information for several researchers from the Machine Perception and Robotics Group at Chubu University in Japan, including professors, lecturers, and research assistants. It lists their names, titles, contact details such as phone numbers and email addresses, and web links for the group's website. The group is part of the Department of Robotics Science and Technology or Department of Computer Science within the College of Engineering at Chubu University.
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...Kazuki Adachi
Selvaraju, Ramprasaath R., et al. "Grad-cam: Visual explanations from deep networks via gradient-based localization." The IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618-626
The document summarizes a presentation on machine learning methods for graph data and recent trends. It introduces graph data and common graph neural network (GNN) approaches, including Recurrent GNNs, Convolutional GNNs, Graph Autoencoders, Graph Adversarial Methods, and Spatial-Temporal GNNs. It then discusses the GNNExplainer method for explaining GNN predictions and concludes with an overview and outlook for future developments in the field.
cvpaper.challenge の Meta Study Group 発表スライド
cvpaper.challenge はコンピュータビジョン分野の今を映し、トレンドを創り出す挑戦です。論文サマリ・アイディア考案・議論・実装・論文投稿に取り組み、凡ゆる知識を共有します。2019の目標「トップ会議30+本投稿」「2回以上のトップ会議網羅的サーベイ」
http://xpaperchallenge.org/cv/
This document summarizes recent advances in single image super-resolution (SISR) using deep learning methods. It discusses early SISR networks like SRCNN, VDSR and ESPCN. SRResNet is presented as a baseline method, incorporating residual blocks and pixel shuffle upsampling. SRGAN and EDSR are also introduced, with EDSR achieving state-of-the-art PSNR results. The relationship between reconstruction loss, perceptual quality and distortion is examined. While PSNR improves yearly, a perception-distortion tradeoff remains. Developments are ongoing to produce outputs that are both accurately restored and naturally perceived.
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video Processing (NeRF...Deep Learning JP
Neural Radiance Flow (NeRFlow) is a method that extends Neural Radiance Fields (NeRF) to model dynamic scenes from video data. NeRFlow simultaneously learns two fields - a radiance field to reconstruct images like NeRF, and a flow field to model how points in space move over time using optical flow. This allows it to generate novel views from a new time point. The model is trained end-to-end by minimizing losses for color reconstruction from volume rendering and optical flow reconstruction. However, the method requires training separate models for each scene and does not generalize to unknown scenes.
This document contains contact information for several researchers from the Machine Perception and Robotics Group at Chubu University in Japan, including professors, lecturers, and research assistants. It lists their names, titles, contact details such as phone numbers and email addresses, and web links for the group's website. The group is part of the Department of Robotics Science and Technology or Department of Computer Science within the College of Engineering at Chubu University.
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...Kazuki Adachi
Selvaraju, Ramprasaath R., et al. "Grad-cam: Visual explanations from deep networks via gradient-based localization." The IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618-626
The document summarizes a presentation on machine learning methods for graph data and recent trends. It introduces graph data and common graph neural network (GNN) approaches, including Recurrent GNNs, Convolutional GNNs, Graph Autoencoders, Graph Adversarial Methods, and Spatial-Temporal GNNs. It then discusses the GNNExplainer method for explaining GNN predictions and concludes with an overview and outlook for future developments in the field.
You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話Yusuke Uchida
第7回全日本コンピュータビジョン勉強会「CVPR2021読み会」(前編)の発表資料です
https://kantocv.connpass.com/event/216701/
You Only Look One-level Featureの解説と、YOLO系の雑談や、物体検出における関連する手法等を広く説明しています
Invited keynote on Software Symposium 2010 Japan. Talk about history of software engineering and the role of agile. Corrected recent words from Tom DeMarco, Ed Yourdon, Mary Poppendiec, Tom Gilb, Ivar Jacobson, ... and my thoughts.
オープンコミュニティ「要求開発アライアンス」(http://www.openthology.org)の2013年4月定例会発表資料です。 Open Community "Requirement Development Alliance" 2013/4 regular meeting of the presentation materials.
This document summarizes face image quality assessment (FIQA) and introduces several FIQA algorithms. It defines FIQA and outlines common FIQA processes of inputting a face image, detecting the face region, and applying a FIQA algorithm to output a quality score. It discusses levels of FIQA algorithms from unlearned to integrated with face recognition. Example algorithms described include FaceQnet, SER-FIQ, and MagFace. FaceQnet generates quality score ground truths from face recognition and trains a model to predict scores. SER-FIQ and MagFace leverage face embeddings from recognition models to assess quality without separate training.
論文紹介: Long-Tailed Classification by Keeping the Good and Removing the Bad Mom...Plot Hong
1) The paper proposes a new method called De-confound-TDE to address long-tailed classification problems by removing the bad causal effect of head classes' momentum on tail classes during training.
2) It decouples representation and classifier learning via multi-head normalization and removes the effect of feature drift toward head classes via counterfactual TDE inference.
3) Experimental results show it achieves state-of-the-art performance on long-tailed classification benchmarks like CIFAR-10-LT, CIFAR-100-LT, and ImageNet-LT, as well as object detection and segmentation benchmarks like LVIS.
This document discusses deepfakes, including their creation and detection. It begins with an introduction to face swapping, face reenactment, and face synthesis techniques used to generate deepfakes. It then describes several methods for creating deepfakes, such as faceswap algorithms, 3D modeling approaches, and GAN-based methods. The document also reviews several datasets used to detect deepfakes. Finally, it analyzes current research on detecting deepfakes using techniques like two-stream neural networks, analyzing inconsistencies in audio-video, and detecting warping artifacts.
19. 19
Dataset: WAYMO Open Dataset[6]
Model: PointPillars[7], Range Sparse Net(RSN)[8]
3D OBJECT DETECTION
20. 20
[1] Zhaoqi Leng, Mingxing Tan ~Mingxing_Tan3 , Chenxi Liu, Ekin Dogus Cubuk, Jay
Shi, Shuyang Cheng, Dragomir Anguelov. PolyLoss: A Polynomial Expansion
Perspective of Classification Loss Functions. In ICLR 2022.
[2] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A
large-scale hierarchical image database. In 2009 IEEE conference on computer vision
and pattern recognition, pp. 248–255. Ieee, 2009.
[3] Mingxing Tan and Quoc V Le. Efficientnetv2: Smaller models and faster training.
In International Conference on Machine Learning, 2021.
[4] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva
Ramanan, Piotr Dollar, and C Lawrence Zitnick. Microsoft coco: Common objects in
context. In ´ European conference on computer vision, pp. 740–755. Springer, 2014.
[5] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask r-cnn. In ´
Proceedings of the IEEE international conference on computer vision, pp. 2961–2969,
2017.
Reference
21. 21
[6] Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai
Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al.
Scalability in perception for autonomous driving: Waymo open dataset. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 2446–2454, 2020.
[7] Alex H Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar
Beijbom. Pointpillars: Fast encoders for object detection from point clouds. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 12697–12705, 2019.
[8] Pei Sun, Weiyue Wang, Yuning Chai, Gamaleldin Elsayed, Alex Bewley, Xiao Zhang,
Christian Sminchisescu, and Dragomir Anguelov. Rsn: Range sparse net for efficient,
accurate lidar 3d object detection. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, 2021.
Reference