0% found this document useful (0 votes)

8 views40 pages

Lecture 13

The document discusses computer vision techniques for embedded systems including convolutional neural networks and transformers. It describes large image datasets like JFT and effects of training data and model capacity. Limitations of CNNs and how transformers can be used for image recognition by treating images as sequences of patches are covered.

Uploaded by

huo si

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views40 pages

Lecture 13

Uploaded by

huo si

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Computer Vision for

Embedded Systems

Yung-Hsiang Lu
Purdue University
[email protected]

Yung-Hsiang Lu, Purdue University 1

Revisiting Unreasonable Effectiveness of Data in
Deep Learning Era
Chen Sun, Abhinav Shrivastava, Saurabh Singh, Abhinav Gupta, ICCV 2017

JFT dataset: 300M images, 18,291 categories

Yung-Hsiang Lu, Purdue University 2

Yung-Hsiang Lu, Purdue University 3
JFT Dataset
● 300M images
● 375M labels
● 18,291 categories
○ 1,165 types of animals
○ 5,720 types of vehicles
○ maximum depth of hierarchy is 12
○ maximum number of children is 2,876
● heavy tail distribution: 3K categories with fewer than 100 images each
● image sizes: 340 x 340 cropped to 299 x 299, normalized to [-1, 1]

Yung-Hsiang Lu, Purdue University 4

Effects of Training Examples
COCO PASCAL VOC

Yung-Hsiang Lu, Purdue University 5

Effect of Model Capacity
COCO
on ResNet

Yung-Hsiang Lu, Purdue University 6

Limitations of Convolutional Neural Networks
● Convolution considers neighbor pixels but at fixed distances
● Same parameters are applied to all pixels even though objects may
have different sizes
● Hyperparameters (stride, filter size, number of layers …) determined in
advance (may be determined by neural architecture search)

Yung-Hsiang Lu, Purdue University 7

An Image is Worth 16x16 Words:
Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn,
Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg
Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby 2020

Yung-Hsiang Lu, Purdue University 8

Yung-Hsiang Lu, Purdue University 9
Create Image Patches

image height image width

number of channels size of patch

number of patches

Yung-Hsiang Lu, Purdue University 10

Options of Position Embedding
● no position information (bags of patches)
● 1D position embedding (sequences of patches)
● 2D position embedding
● relative position embedding

Yung-Hsiang Lu, Purdue University 11

Datasets
● ImageNet 1K: 1K classes, 1.3M images
● ImageNet 21K: 21K classes, 14M images
● JFT: 18K classes, 14M images
● CIFAR-10 and 100
● Oxford-IIIT Pets
● Oxford Flowers-102

Yung-Hsiang Lu, Purdue University 12

Model Variants

Yung-Hsiang Lu, Purdue University 13

Comparison

Yung-Hsiang Lu, Purdue University 14

Yung-Hsiang Lu, Purdue University 15
Yung-Hsiang Lu, Purdue University 16
Yung-Hsiang Lu, Purdue University 17
How to train your ViT? Data, Augmentation,
and Regularization in Vision Transformers
Andreas Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob
Uszkoreit, Lucas Beyer 2021

Yung-Hsiang Lu, Purdue University 18

Yung-Hsiang Lu, Purdue University 19
ViViT: A Video Vision Transformer
Anurag Arnab, Mostafa Dehghani, Georg Heigold Chen Sun, CVPR 2021

Yung-Hsiang Lu, Purdue University 20

The Kinetics Human Action Video
Dataset 2017

Yung-Hsiang Lu, Purdue University 21

Kinetics Dataset
● 400 human action classes
● each action at least 400 clips
● each clip 10 seconds from Youtube
● Single person activities: drawing, laughing, drinking
● Person-person activities: shaking hands, hugging
● Person-object activities: washing dishes, mowing lawn

Yung-Hsiang Lu, Purdue University 22

Yung-Hsiang Lu, Purdue University 23
Crowdsourcing to label data

Yung-Hsiang Lu, Purdue University 24

Yung-Hsiang Lu, Purdue University 25
https://nanonets.com/blog/optical-flow/ Yung-Hsiang Lu, Purdue University 26
https://nanonets.com/blog/optical-flow/

Yung-Hsiang Lu, Purdue University 27

https://nanonets.com/blog/optical-flow/
Yung-Hsiang Lu, Purdue University 28
Lucas–Kanade method
The method assume the optical flow in each small neighborhood is
unchanged.

https://en.wikipedia.org/wiki/Lucas%E2%80%93Kanade_method
Yung-Hsiang Lu, Purdue University 29
Yung-Hsiang Lu, Purdue University 30
Yung-Hsiang Lu, Purdue University 31
UCF-101 Yung-Hsiang Lu, Purdue University HMDB-51 32
ViViT: A Video Vision Transformer
Anurag Arnab, Mostafa Dehghani, Georg Heigold Chen Sun, CVPR 2021

Yung-Hsiang Lu, Purdue University 33

Yung-Hsiang Lu, Purdue University 34
Yung-Hsiang Lu, Purdue University 35
Yung-Hsiang Lu, Purdue University 36
Yung-Hsiang Lu, Purdue University 37
Yung-Hsiang Lu, Purdue University 38
Yung-Hsiang Lu, Purdue University 39
Yung-Hsiang Lu, Purdue University 40

LBS150 PDF Eng
No ratings yet
LBS150 PDF Eng
12 pages
Mastering Omaha8 Poker by Krieger LouTenner Mark Z
100% (2)
Mastering Omaha8 Poker by Krieger LouTenner Mark Z
319 pages
A Survey On Vision Transformer
No ratings yet
A Survey On Vision Transformer
23 pages
Video Quality Assessment (VQA) Using Vision Transformers
No ratings yet
Video Quality Assessment (VQA) Using Vision Transformers
5 pages
Video Quality Assessment (VQA) Using Vision Transformers
No ratings yet
Video Quality Assessment (VQA) Using Vision Transformers
6 pages
34-1902.06162v1
No ratings yet
34-1902.06162v1
24 pages
Research Notes
No ratings yet
Research Notes
9 pages
ViTA_A_Vision_Transformer_Inference_Accelerator_for_Edge_Applications
No ratings yet
ViTA_A_Vision_Transformer_Inference_Accelerator_for_Edge_Applications
5 pages
10 Transformers
No ratings yet
10 Transformers
22 pages
Transformers
No ratings yet
Transformers
30 pages
ppt2_new
No ratings yet
ppt2_new
30 pages
Vision Transformer (ViT)
No ratings yet
Vision Transformer (ViT)
26 pages
good note - ViT
No ratings yet
good note - ViT
13 pages
Vi Transformer
No ratings yet
Vi Transformer
21 pages
Week5_Computer_Vision
No ratings yet
Week5_Computer_Vision
58 pages
ViViT: A Video Vision Transformer
No ratings yet
ViViT: A Video Vision Transformer
14 pages
Efﬁcient Training of Visual Transformers with Small Datasets_Liu et al_
No ratings yet
Efﬁcient Training of Visual Transformers with Small Datasets_Liu et al_
13 pages
Chen Et Al. - 2020 - Pre-Trained Image Processing Transformer
No ratings yet
Chen Et Al. - 2020 - Pre-Trained Image Processing Transformer
13 pages
CVT: Introducing Convolutions To Vision Transformers
No ratings yet
CVT: Introducing Convolutions To Vision Transformers
10 pages
CVT: Introducing Convolutions To Vision Transformers
No ratings yet
CVT: Introducing Convolutions To Vision Transformers
10 pages
03 - ViViT - A Video Vision Transformer
No ratings yet
03 - ViViT - A Video Vision Transformer
13 pages
2012.12556
No ratings yet
2012.12556
23 pages
Escaping The Big Data Paradigm With Compact Transformers
No ratings yet
Escaping The Big Data Paradigm With Compact Transformers
18 pages
CampusX (D.L) Course Syllabus
No ratings yet
CampusX (D.L) Course Syllabus
5 pages
ViT Transformers SEMINAR
No ratings yet
ViT Transformers SEMINAR
16 pages
Aman Arora Blog On Vision Transformer
No ratings yet
Aman Arora Blog On Vision Transformer
11 pages
AN IMAGE IS WORTH 16X16 WORDS TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE Hirtika Mirghani
No ratings yet
AN IMAGE IS WORTH 16X16 WORDS TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE Hirtika Mirghani
2 pages
Transformers For Vision
No ratings yet
Transformers For Vision
28 pages
XXXBetter Plain ViT Baselines for ImageNet-1k
No ratings yet
XXXBetter Plain ViT Baselines for ImageNet-1k
3 pages
2501.05453v1
No ratings yet
2501.05453v1
19 pages
Gaurav_Vision_Transformer
No ratings yet
Gaurav_Vision_Transformer
10 pages
Video Transformer Network
No ratings yet
Video Transformer Network
11 pages
An Image Is Worth More Than 16x16 Patches
No ratings yet
An Image Is Worth More Than 16x16 Patches
23 pages
3.Action Recognition
No ratings yet
3.Action Recognition
10 pages
Video Classification Project
No ratings yet
Video Classification Project
52 pages
Yin a-ViT Adaptive Tokens For Efficient Vision Transformer CVPR 2022 Paper
No ratings yet
Yin a-ViT Adaptive Tokens For Efficient Vision Transformer CVPR 2022 Paper
10 pages
CNN Model For Image Classification Using Resnet: Dr. Senbagavalli M & Swetha Shekarappa G
No ratings yet
CNN Model For Image Classification Using Resnet: Dr. Senbagavalli M & Swetha Shekarappa G
10 pages
Transformers_in_computational_visual_media_A_surve
No ratings yet
Transformers_in_computational_visual_media_A_surve
30 pages
AdaptFormer
No ratings yet
AdaptFormer
21 pages
VQA-ViT
No ratings yet
VQA-ViT
24 pages
A ConvNet For The 2020s - FaceBook - Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie - 2201.03545
No ratings yet
A ConvNet For The 2020s - FaceBook - Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie - 2201.03545
14 pages
AUTOREGRESSIVE VIDEO GENERATION
No ratings yet
AUTOREGRESSIVE VIDEO GENERATION
22 pages
Project Presentation
No ratings yet
Project Presentation
20 pages
Origami
No ratings yet
Origami
14 pages
Vision Transformer Overview
No ratings yet
Vision Transformer Overview
21 pages
Li Et Al. - 2022 - EfficientFormer Vision Transformers at MobileNet Speed
No ratings yet
Li Et Al. - 2022 - EfficientFormer Vision Transformers at MobileNet Speed
19 pages
An Overview of Vision Transformers For Image Processing A Survey
No ratings yet
An Overview of Vision Transformers For Image Processing A Survey
17 pages
NeurIPS 2021 Transformer in Transformer Paper
No ratings yet
NeurIPS 2021 Transformer in Transformer Paper
12 pages
C8-Modern CNNs
No ratings yet
C8-Modern CNNs
57 pages
Learning Spatiotemporal Features With 3D Convolutional Networks
No ratings yet
Learning Spatiotemporal Features With 3D Convolutional Networks
16 pages
A Simple Single-Scale Vision Transformer For Object Localization
No ratings yet
A Simple Single-Scale Vision Transformer For Object Localization
12 pages
Photorealistic Video Generation With Diffusion Models
No ratings yet
Photorealistic Video Generation With Diffusion Models
13 pages
2024_GvT_Shan_chen_arXiv
No ratings yet
2024_GvT_Shan_chen_arXiv
9 pages
2103.10619v2
No ratings yet
2103.10619v2
11 pages
Comprehensive Survey of Model Compression and Speed up for Vision Transformers_Chen et al_
No ratings yet
Comprehensive Survey of Model Compression and Speed up for Vision Transformers_Chen et al_
12 pages
Transformer in Transformer: Kai Han An Xiao Enhua Wu Jianyuan Guo Chunjing Xu Yunhe Wang
No ratings yet
Transformer in Transformer: Kai Han An Xiao Enhua Wu Jianyuan Guo Chunjing Xu Yunhe Wang
14 pages
Chen ViTamin Designing Scalable Vision Models in The Vision-Language Era CVPR 2024 Paper
No ratings yet
Chen ViTamin Designing Scalable Vision Models in The Vision-Language Era CVPR 2024 Paper
13 pages
RG AI Summer School CNN Transfer Learning
No ratings yet
RG AI Summer School CNN Transfer Learning
26 pages
Research on Learning Representations in Computer Vision
No ratings yet
Research on Learning Representations in Computer Vision
52 pages
A Simple Single-Scale Vision Transformer For Object Detection and Instance Segmentation
No ratings yet
A Simple Single-Scale Vision Transformer For Object Detection and Instance Segmentation
23 pages
1.Thesis Book Omar
No ratings yet
1.Thesis Book Omar
55 pages
Supervised Machine Learning for Science: How to stop worrying and love your black box
From Everand
Supervised Machine Learning for Science: How to stop worrying and love your black box
Christoph Molnar
No ratings yet
UST Admission Guide PDF
No ratings yet
UST Admission Guide PDF
21 pages
Modern Control Lecture Plan
No ratings yet
Modern Control Lecture Plan
2 pages
PPT Lecture 3.3.1(International Trade)
No ratings yet
PPT Lecture 3.3.1(International Trade)
9 pages
The Menace of Corruption
No ratings yet
The Menace of Corruption
3 pages
PO Service Line Item Quantity Exceed Validation Against PR Quantity - SAP Blogs
No ratings yet
PO Service Line Item Quantity Exceed Validation Against PR Quantity - SAP Blogs
14 pages
Engine Vibration and Health Monitoring Systems
No ratings yet
Engine Vibration and Health Monitoring Systems
8 pages
Utility Bill
No ratings yet
Utility Bill
1 page
Chumak 2017 J. Phys. D Appl. Phys. 50 300201
No ratings yet
Chumak 2017 J. Phys. D Appl. Phys. 50 300201
5 pages
I Can Talk About Celebrity: A. Celebrity and The Media
No ratings yet
I Can Talk About Celebrity: A. Celebrity and The Media
9 pages
A319 - A321 - LAA - and - LUS - A321NX - (A321 Structural Repair Manual) - (51-77-11-911-023)
No ratings yet
A319 - A321 - LAA - and - LUS - A321NX - (A321 Structural Repair Manual) - (51-77-11-911-023)
6 pages
Energy Science Engineering - 2024 - Parhamfar - Towards The Net Zero Carbon Future A Review of Blockchain Enabled
No ratings yet
Energy Science Engineering - 2024 - Parhamfar - Towards The Net Zero Carbon Future A Review of Blockchain Enabled
23 pages
2.-24 Volts Gen 2
No ratings yet
2.-24 Volts Gen 2
43 pages
Microsoft Office For Mac Shortcuts
No ratings yet
Microsoft Office For Mac Shortcuts
3 pages
Ashcroft Pressure Guage 1259 - Accesories
No ratings yet
Ashcroft Pressure Guage 1259 - Accesories
1 page
02 Big Data Analytics MDEC PDF
No ratings yet
02 Big Data Analytics MDEC PDF
34 pages
High Efficiency Filter Elements For Hydraulic and Lubrication Oils
No ratings yet
High Efficiency Filter Elements For Hydraulic and Lubrication Oils
8 pages
Prologue: 0.1 Books and Algorithms
No ratings yet
Prologue: 0.1 Books and Algorithms
9 pages
Pumpkin Day LP
No ratings yet
Pumpkin Day LP
2 pages
Learning Styles Results: NC State University
No ratings yet
Learning Styles Results: NC State University
2 pages
Orders, Decorations, and Medals of Indonesia
No ratings yet
Orders, Decorations, and Medals of Indonesia
18 pages
Love Term Paper
100% (1)
Love Term Paper
5 pages
Adobe Scan Dec 16, 2023
No ratings yet
Adobe Scan Dec 16, 2023
10 pages
Traditional Kerala
No ratings yet
Traditional Kerala
32 pages
Building and Detailing Scale Model Aircraft
100% (1)
Building and Detailing Scale Model Aircraft
147 pages
Conjunctions Transitions TEST4
0% (1)
Conjunctions Transitions TEST4
6 pages
Contents and Layout of Research Report
No ratings yet
Contents and Layout of Research Report
4 pages
Becoming A Professional - CMonereo & MCaride
No ratings yet
Becoming A Professional - CMonereo & MCaride
25 pages
Tutorial 5.2
No ratings yet
Tutorial 5.2
6 pages

Lecture 13

Uploaded by

Lecture 13

Uploaded by

Computer Vision for

Yung-Hsiang Lu, Purdue University 1

JFT dataset: 300M images, 18,291 categories

Yung-Hsiang Lu, Purdue University 2

Yung-Hsiang Lu, Purdue University 4

Yung-Hsiang Lu, Purdue University 5

Yung-Hsiang Lu, Purdue University 6

Yung-Hsiang Lu, Purdue University 7

Yung-Hsiang Lu, Purdue University 8

image height image width

number of channels size of patch

Yung-Hsiang Lu, Purdue University 10

Yung-Hsiang Lu, Purdue University 11

Yung-Hsiang Lu, Purdue University 12

Yung-Hsiang Lu, Purdue University 13

Yung-Hsiang Lu, Purdue University 14

Yung-Hsiang Lu, Purdue University 18

Yung-Hsiang Lu, Purdue University 20

Yung-Hsiang Lu, Purdue University 21

Yung-Hsiang Lu, Purdue University 22

Yung-Hsiang Lu, Purdue University 24

Yung-Hsiang Lu, Purdue University 27

Yung-Hsiang Lu, Purdue University 33

You might also like