0% found this document useful (0 votes)
18 views

Lec01 Intro

lec1 vision

Uploaded by

shihyunnam7
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Lec01 Intro

lec1 vision

Uploaded by

shihyunnam7
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

CS 444: Deep Learning for Computer Vision

D. Hockney, Pool with two figures, 1972

https://slazebni.cs.illinois.edu/spring23/
Lecture overview
• About the class
• Milestones of deep learning
• Recent successes and origins
• Visual recognition
• Natural language understanding
• Generative modeling
• Games
• Robotics
• Topics to be covered in class
A few historical milestones
• 1958: Rosenblatt’s perceptron

Frank Rosenblatt (1928-1971)


A few historical milestones
• 1958: Rosenblatt’s perceptron
• 1969: Minsky and Papert Perceptrons book
• Fascinating reading: M. Olazaran, A Sociological Study
of the Official History of the Perceptrons Controversy,
Social Studies of Science, 1996
A few historical milestones
• 1958: Rosenblatt’s perceptron
• 1969: Minsky and Papert Perceptrons book
• 1980: Fukushima’s Neocognitron
• Video (short version)
• Inspired by the findings of Hubel & Wiesel
about the hierarchical organization
of the visual cortex in cats and monkeys (1959-1977)
Kunihiko Fukushima

Image source
A few historical milestones
• 1958: Rosenblatt’s perceptron
• 1969: Minsky and Papert Perceptrons book
• 1980: Fukushima’s Neocognitron
• 1986: Back-propagation
• Origins in control theory and optimization: Kelley (1960), Dreyfus (1962),
Bryson & Ho (1969), Linnainmaa (1970)
• Application to neural networks: Werbos (1974)
• Popularized by Rumelhart, Hinton & Williams (1986)
A few historical milestones
• 1958: Rosenblatt’s perceptron
• 1969: Minsky and Papert Perceptrons book
• 1980: Fukushima’s Neocognitron
• 1986: Back-propagation
• 1989 – 1998: Convolutional neural networks
• LeNet to LeNet-5

Yann LeCun
2018 ACM Turing Award winner
(with Hinton and Bengio)
A few historical milestones
• 1958: Rosenblatt’s perceptron
• 1969: Minsky and Papert Perceptrons book
• 1980: Fukushima’s Neocognitron
• 1986: Back-propagation
• 1989 – 1998: Convolutional neural networks
• 2012: AlexNet

Photo source
A few historical milestones
• 1958: Rosenblatt’s perceptron
• 1969: Minsky and Papert Perceptrons book
• 1980: Fukushima’s Neocognitron
• 1986: Back-propagation
• 1989 – 1998: Convolutional neural networks
• 2012: AlexNet
• Fascinating reading: The secret auction that set off the race for AI supremacy,
Wired, 3/16/2021
A few historical milestones
• 1958: Rosenblatt’s perceptron
• 1969: Minsky and Papert Perceptrons book
• 1980: Fukushima’s Neocognitron
• 1986: Back-propagation
• 1989 – 1998: Convolutional neural networks
• 2012: AlexNet
• 2012 – present: deep learning explosion

Source, via J. Johnson


Lecture overview
• About the class
• Milestones of deep learning
• Progress in the last decade
• Visual recognition
• Natural language understanding
• Generative modeling
• Games
• Robotics
Recognition: ImageNet Challenge

Convolutional Human
ILSVRC Before deep
learning architectures baseline

Figure source
ImageNet is obsolete?

“Programmer”

K. Yang, K. Qinami, L. Fei-Fei, J. Deng, O. Russakovsky,


Towards Fairer Datasets: Filtering and Balancing the Distribution of the People Subtree in th
L. Beyer et al. Are we done with ImageNet? arXiv:2006.07159, 2020 e ImageNet Hierarchy
, Conference on Fairness, Accountability, and Transparency (FAccT), 2020
Object instance segmentation

K. He, G. Gkioxari, P. Dollar, and R. Girshick, Mask R-CNN,


ICCV 2017 (Best Paper Award)
Recognition on my iPhone
Recognition on my iPhone
Recognition: Concerns

How China Uses High-Tech Surveillance to Subdue Minorities – New York Times, 5/22/2019
The Secretive Company That Might End Privacy As We Know It – New York Times, 1/18/2020
Wrongfully Accused by an Algorithm – New York Times, 6/24/2020
Lecture overview
• About the class
• Milestones of deep learning
• Progress in the last decade
• Visual recognition
• Natural language understanding
Neural machine translation

Google Neural Machine Transformers


Translation (GNMT)
(BLEU score)

Y. Wu et al.
Google's Neural Machine Translation System: Bri Figure source
dging the Gap between Human and Machine Tra
nslation
Previous system (before deep learning):
. arXiv 2016
PBMT (2014): 37 BLEU A. Vaswani et al. Attention is all you need.
https://mobile.nytimes.com NeurIPS 2017
/2016/12/14/magazine/the-great-ai-
awakening.html
Large language models: Google BERT
• Self-supervised pre-training task: masked token prediction
Bidirectional Encoder Representations from Transformers (BERT)

Figure source

J. Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. EMNLP 2018
Large language models: OpenAI GPT
• Self-supervised pre-training task: next token prediction

Figure source

GPT: A. Radford et al. Improving language understanding with unsupervised learning. 2018
GPT-2 (1.5B parameters): A. Radford et al. Language models are unsupervised multitask learners. 2019
GPT-3 (175B parameters): T. Brown et al. Language models are few-shot learners. NeurIPS 2020 (Best Paper Award)
Stochastic parrots or sentient entities?*
*Asking either question will get you fired from Google

https://www.technologyreview.com/2020/12/04/1013294/google-ai https://www.cnn.com/2022/07/23/business/google-ai
-ethics-research-paper-forced-out-timnit-gebru/ -engineer-fired-sentient/index.html

E. Bender et al., On the dangers of stochastic partots


: Can language models be too big? FAccT 2021
InstructGPT and ChatGPT
Reinforcement Learning with Human Feedback (RLHF)

L. Ouyang et al. Training language models to follow instructions with human feedback. NeurIPS 2022
https://openai.com/blog/chatgpt/
ChatGPT

Generated on 1/10/2023
ChatGPT

Generated on 1/10/2023
ChatGPT

Generated on 1/10/2023
ChatGPT: Concerns

https://www.nytimes.com
/2023/01/16/technology/chatgpt
-artificial-intelligence-universities.html
ChatGPT: Concerns – and opportunities

Some Google search results as of 1/10/2023


Lecture overview
• About the class
• Milestones of deep learning
• Progress in the last decade
• Vision
• Language
• Generative modeling
Progress in face generation
Progress in general category generation

GAN-generated dogs in 2017 GAN-generated dogs in 2018

Source: EBGAN Source: BigGAN


Text-to-image generation: OpenAI DALL-E

A. Ramesh et al., Zero-Shot Text-to-Image Generation, ICML 2021


https://openai.com/blog/dall-e/
Text-to-image generation: OpenAI DALL-E
• Underlying technology: autoregressive generation using a
transformer decoder

Decode to 256x256
Text prompt encoding (256 tokens) Image encoding (1024 = 32x32 tokens) image

A. Ramesh et al., Zero-Shot Text-to-Image Generation, ICML 2021


https://openai.com/blog/dall-e/
Text-to-image generation: OpenAI DALL-E 2

A. Ramesh et al. Hierarchical text-conditional image generation with CLIP latents. 2022
Diffusion models
• Idea: convert noise to an image in multiple passes

J. Ho et al. Denoising diffusion probabilistic models. NeurIPS 2020


Blog introduction: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
Diffusion models
• Idea: convert noise to an image in multiple passes
• Proliferation of models: Imagen, Stable Diffusion, Midjourney, …
• Text-to-video, text-to-3D, …
Diffusion models: The next gold rush?

https://www.foley.com/en/insights/publications/2022/12/venture-capital-investors-betting-generative-ai
Generative modeling: Concerns
• Deepfakes DALL-E 2 images of lawyers, flight attendants (source)

• Biases, toxic content


• AI replacing artists?

https://www.wired.com/story/zelensky-deepfake-facebook-twitter-playbook/
AI-generated work wins first prize at art fair
Lecture overview
• About the class
• Milestones of deep learning
• Progress in the last decade
• Vision
• Language
• Generative modeling
• Games
Games

• 2013:
DeepMind uses deep reinforcement learning t
o beat humans at some Atari games

• 2016:
DeepMind’s AlphaGo system beats Go grand
master Lee Sedol 4-1
• 2017:
AlphaZero learns to play Go and chess from s
cratch
• 2019:
DeepMind’s StarCraft 2 AI is better than 99.8 p
Lecture overview
• About the class
• Milestones of deep learning
• Progress in the last decade
• Vision
• Language
• Generative modeling
• Games
• Robotics
Sensorimotor learning

Overview video,
training video

S. Levine, C. Finn, T. Darrell, P. Abbeel, End-to-end training of deep visuomotor policies, JMLR 2016
Sensorimotor learning

A. Agarwal, A. Kumar, J. Malik, and D. Pathak. Legged Locomotion in Challenging Terrains using Egocentric Vision. CoRL 2022
Lecture overview
• About the class
• Milestones of deep learning
• Progress in the last decade
• Vision
• Language
• Generative modeling
• Games
• Robotics
• Topics to be covered in class
Topics to be covered in class
ML basics, linear classifiers Multilayer neural networks, backpropagation Convolutional networks for classification

Networks for detection, dense prediction Self-supervised learning Generative models (GANs, image-to-image
translation, diffusion models)

Transformers, large language models, Deep reinforcement learning


Recurrent models
transformers for vision
Fascinating historical reading
• 1943: McCulloch and Pitts neurons
• The Man Who Tried to Redeem the World with Logic, Nautilus, 2/5/2015

Walter Pitts (1923-1969)


Fascinating historical reading
• 1959: First pattern recognition benchmark, training-test split

1500 characters (26 letters, 10 digits from 50 writers), 12x12 resolution, stored on IBM 704 punch cards
Bill Highleyman and Louis Kamentsky, Bell Labs

You might also like