Assign4

The assignment for CS7.505: Computer Vision at IIIT Hyderabad focuses on segmentation using the Unet architecture and exploring CLIP for contrastive language-image pretraining. Students are required to submit a Jupyter notebook containing their code, outputs, and observations by April 22, 2024. The assignment includes tasks related to model preparation, metric evaluation, and comparisons between ImageNet and CLIP pretraining, emphasizing the importance of original work and proper documentation.

Uploaded by

tdstaniksh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views3 pages

Assign4

Uploaded by

tdstaniksh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

International Institute of Information Technology, Hyderabad

Spring 2024 CS7.505: Computer Vision

Assignment 4: Segmentation and CLIP
8 April 2024

Instructions:

• The goal of this assignment is to get familiar with dense prediction models and CLIP.

• You should upload the assignment as a jupyter notebook with appropriate cells (markdown and code) containing:
(1) code that you wrote, (2) keep relevant outputs, and (3) your report and observations (markdown cells). The
file should be uploaded in the courses portal.
• We recommend Python.
• You may want to use Google Colab for both questions in the assignment.

• Include the assignment number, your name, and roll number in the first cell of the notebook submission.
• Make sure that the assignment that you submit is your own work. Any breach of this rule could
result in serious actions including an F grade in the course.
• The experiments and writing it all together can take time. Start your work early and do not wait till the deadline.

Submission: Any time before 22nd Apr 2024, 23:59 IST

1 Assignment
This assignment provides hands-on experience on the Unet architecture for segmentation. In the second question,
we compare ImageNet vs. CLIP pretraining. Please use relevant libraries, do not implement from scratch. You are
expected to solve both questions.

Q1: Segmentation [4 points]

1. [1 point] Preparation. We will use an existing illustrative code for segmentation using a Unet in Pytorch.
Please make a copy of the following notebook as a google colab project and ensure you are able to run it:
https://www.kaggle.com/code/dhvananrangrej/image-segmentation-with-unet-pytorch/notebook.
The first cell should contain the necessary code to download the segmentation dataset (a subset of Cityscapes).
Carefully study the code, especially focusing on the data and model.
Do not change the model definition, dataset class, or evaluation metric (currently being computed over 10 images
or 1 batch of the validation set). You may need to fix one bug iter(data_loader).next(), but this does not
influence the other parts.
Hint: You may train the model for 1 epoch as it takes quite some time to complete all 10 epochs.
2. [2 points] Importance of skip connections. The above model is a standard Unet architecture. The key
novelty in the Unet are the skip connections between the encoder and decoder. But how important are they?
In this experiment, modify the model to disconnect the skip connections. As compared to the original model,
does the new model perform well? Report the results, both qualitatively and quantitatively (IoU over the same
10 images as before).
Hint 1: When removing the skip connections, you will need to halve the number of input channels too.
Hint 2: For a fair comparison to the above model, please train the new model without skip connections for the
same number of epochs (1 if you followed the above point!).
3. [1 point] Metric. The metric implemented in the notebook calculates IoU directly (logical_and and logical_or)
and does not seem correct.
First, explain what is the issue with the metric in the current form.
Next, re-write the metric function to instead compute an IoU for each class. Report the mean over all classes as
the score for each image, and the mean over all images as the score for the validation set (or 10 images therein).
Is the mIoU score different between the model with and without skip connections? Explain why.

Q2: Contrastive Language-Image Pretraining [6 points]

1. [1 point] Setup models. Load the ResNet-50 (RN50) model, initialized in two different ways:
(a) ImageNet pretraining (torchvision.models can be used, specifically look at IMAGENET1K_V1); and
(b) OpenAI’s CLIP (see https://github.com/openai/CLIP).
Do the visual encoders have the same architecture? If not, please describe and explain the differences.
Hint: When you load the CLIP model, you will get both the vision and text encoders - be sure to differentiate
between them as necessary.
2. [1 point] Setup data. Understand the ImageNet challenge dataset (1000 labels of ILSVRC).
(i) What label hierarchy is used in ImageNet? (ii) What does a synset mean? (iii) Could grouping objects based
on synsets lead to problems for visual recognition? (iv) State 3 types of visual differences we can expect to see
in images with objects corresponding to the same synset.
3. [1 point] Setup zero-shot CLIP. Similar to the ImageNet pretrained RN50, set up CLIP to generate probability
scores for the 1000 ImageNet categories.
Test it with a few example images to check that it identifies the correct object category.
Hint: You may treat the cosine similarities as “logits”.
4. [1.5 points] CLIP vs ImageNet pretraining. Pick 10 classes from ImageNet (not all from the same branch,
e.g., not all dogs). For each class:
(i) Find 2 images that work well with CLIP, but not with ImageNet pretrained RN50. Reason about why this
may be the case. From where did you get these images?
(ii) Find 1 image that works well with ImageNet pretraining but not CLIP. Reason about why this may be the
case. From where did you get these images?
Note: For the purpose of this question, we will say that the model “works well” if it generates the correct category
label within the top-5 highest scoring labels.
Hint: Reading the CLIP paper (https://arxiv.org/abs/2103.00020) can help you solve this question quickly
as it shows examples where ImageNet does not work, but CLIP works.
5. [1.5 points] FP16. While deep learning has primarily relied on “single” floating precision (32 bits or 4 bytes
per parameter), 16-bit floating point numbers are useful for saving on memory. In this question, we will only
consider CLIP’s image encoder.
(i) Convert the RN50 CLIP image encoder model to fp16. Calculate the (wall-clock) time required to encode an
image. Estimate the difference between the fp32 (original) model and the fp16 model. Note, it is common to
report the mean over 100 runs and indicate both the mean and standard deviation.
(ii) For 5 images (1 per class), recalculate the probabilities using the fp16 model. Are there significant differences
between the fp32 and fp16 outputs? Why?
(iii) Using nvidia-smi or a profiler, note and explain the differences in memory usage for a forward pass between
fp32 and fp16 models. If you do not use the profiler, please include some screenshots for the assignment.
Hint: https://pytorch.org/blog/understanding-gpu-memory-1/ Categorized Memory Usage may help.
2 Submission
We recommend that you submit a report as a single jupyter notebook with relevant cell outputs as mentioned at the
top. In case you are not using Python, please share code (with instructions on how to execute) and a separate pdf
report.
Submit the file in the courses / moodle portal before the deadline: 22nd Apr 2024 23:59 IST. The moodle portal
may show a different date due to the grace period, do not get confused.
The report/notebook should contain:

• A description of the problem, algorithms, results and comparison of methods based on the experiments you
performed.
• Challenges you faced and learnings from the experiments.

Remember, you are expected to write the complete code for the assignment yourselves. DO NOT COPY ANY
PART FROM ANY SOURCE not limited to your friends, seniors or the internet.

DTS304TC_CW2_Paper
No ratings yet
DTS304TC_CW2_Paper
21 pages
C & C++ Interview Questions You'll Most Likely Be Asked
From Everand
C & C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Mathematics in Action ch1 Answer
50% (2)
Mathematics in Action ch1 Answer
40 pages
Practical Monte Carlo Simulation with Excel - Part 2 of 2: Applications and Distributions
From Everand
Practical Monte Carlo Simulation with Excel - Part 2 of 2: Applications and Distributions
Akram Najjar
2/5 (1)
Practical Monte Carlo Simulation with Excel - Part 1 of 2: Basics and Standard Procedures
From Everand
Practical Monte Carlo Simulation with Excel - Part 1 of 2: Basics and Standard Procedures
Akram Najjar
No ratings yet
CNN RNN Assignment Set 4
0% (1)
CNN RNN Assignment Set 4
2 pages
Getting Started with Simulink
From Everand
Getting Started with Simulink
Luca Zamboni
4.5/5 (4)
Math Syllabus F3-F5
100% (1)
Math Syllabus F3-F5
78 pages
Assigh
No ratings yet
Assigh
2 pages
Report
No ratings yet
Report
15 pages
IGNOU BCA Computer Basics and PC Software Previous Year Unsolved Papers BCS 011
From Everand
IGNOU BCA Computer Basics and PC Software Previous Year Unsolved Papers BCS 011
Manish Soni
No ratings yet
AdvanceQuestionsAnswers
No ratings yet
AdvanceQuestionsAnswers
4 pages
Assignment I-4
No ratings yet
Assignment I-4
3 pages
Project 2
No ratings yet
Project 2
2 pages
Microsoft Visual C++ Windows Applications by Example
From Everand
Microsoft Visual C++ Windows Applications by Example
Stefan BjÃ¶rnander
3.5/5 (3)
ITNPAI1 Assignment S22
No ratings yet
ITNPAI1 Assignment S22
3 pages
Assignment 5 - NN
No ratings yet
Assignment 5 - NN
4 pages
CV VIII Sem 2025 (1)
No ratings yet
CV VIII Sem 2025 (1)
2 pages
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
From Everand
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
Mark Magic
No ratings yet
Final ProblemStatements Website
No ratings yet
Final ProblemStatements Website
15 pages
hw1 2487155975100812
No ratings yet
hw1 2487155975100812
6 pages
ECE_685D_HW3_2024
No ratings yet
ECE_685D_HW3_2024
3 pages
MIC Assignment4
No ratings yet
MIC Assignment4
9 pages
Atelier2
No ratings yet
Atelier2
2 pages
CVDL TAE 63 (1)
No ratings yet
CVDL TAE 63 (1)
9 pages
Microsoft Azure DevOps Engineer AZ 400
From Everand
Microsoft Azure DevOps Engineer AZ 400
Manish Soni
No ratings yet
11dl
No ratings yet
11dl
2 pages
SS_2021
No ratings yet
SS_2021
16 pages
CV Lab Final AwaisKhan EE A
No ratings yet
CV Lab Final AwaisKhan EE A
7 pages
Machine Learning in Production: Master the art of delivering robust Machine Learning solutions with MLOps (English Edition)
From Everand
Machine Learning in Production: Master the art of delivering robust Machine Learning solutions with MLOps (English Edition)
Suhas Pote
No ratings yet
Programming Concepts in C++
From Everand
Programming Concepts in C++
Robert Burns
No ratings yet
Chatgpt | Generative AI - The Step-By-Step Guide For OpenAI & Azure OpenAI In 36 Hrs.
From Everand
Chatgpt | Generative AI - The Step-By-Step Guide For OpenAI & Azure OpenAI In 36 Hrs.
AJIT DASH
No ratings yet
Clip Model
No ratings yet
Clip Model
7 pages
Accelerate Model Training with PyTorch 2.X: Build more accurate models by boosting the model training process
From Everand
Accelerate Model Training with PyTorch 2.X: Build more accurate models by boosting the model training process
Maicon Melo Alves
No ratings yet
CN7023 Coursework T2 2425
No ratings yet
CN7023 Coursework T2 2425
5 pages
Harvard CS197 Lecture 5 Notes
No ratings yet
Harvard CS197 Lecture 5 Notes
14 pages
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
Advance Computer Vision 4
No ratings yet
Advance Computer Vision 4
4 pages
CS7643: Deep Learning Assignment 3: Instructor: Zsolt Kira Deadline: 11:59pm Mar 14, 2021, EST
No ratings yet
CS7643: Deep Learning Assignment 3: Instructor: Zsolt Kira Deadline: 11:59pm Mar 14, 2021, EST
12 pages
keras
No ratings yet
keras
4 pages
Hardware Implementation: Data-Flow and Design Space: 1.1 Memory-Access Simulator
No ratings yet
Hardware Implementation: Data-Flow and Design Space: 1.1 Memory-Access Simulator
6 pages
Android Jetpack Compose Handbook
From Everand
Android Jetpack Compose Handbook
Onyx Rose
No ratings yet
Assignment 3 DS5620
No ratings yet
Assignment 3 DS5620
11 pages
CINEMA 4D R15 Fundamentals: For Teachers and Students
From Everand
CINEMA 4D R15 Fundamentals: For Teachers and Students
Anson Call
5/5 (1)
Visual Studio Code: End-to-End Editing and Debugging Tools for Web Developers
From Everand
Visual Studio Code: End-to-End Editing and Debugging Tools for Web Developers
Bruce Johnson
No ratings yet
MCS-024: Object Oriented Technologies and Java Programming
From Everand
MCS-024: Object Oriented Technologies and Java Programming
Dr. DK Sukhani
No ratings yet
MNIST Based Handwritten Digits Recognition
No ratings yet
MNIST Based Handwritten Digits Recognition
5 pages
SketchUp Pro 2014 New features
From Everand
SketchUp Pro 2014 New features
João Gaspar
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Fake Image Detection Report
No ratings yet
Fake Image Detection Report
21 pages
Software Development Techniques
From Everand
Software Development Techniques
Chandini Devar
No ratings yet
WS_2021
No ratings yet
WS_2021
16 pages
Generative AI From Beginner to Paid Professional, Part 2: Master Prompt Design, Gemini Multimodal in Vertex AI Studio, LangChain, Launching & Deploying Generative AI Projects
From Everand
Generative AI From Beginner to Paid Professional, Part 2: Master Prompt Design, Gemini Multimodal in Vertex AI Studio, LangChain, Launching & Deploying Generative AI Projects
Bolakale Aremu
No ratings yet
Core Objective-C in 24 Hours
From Everand
Core Objective-C in 24 Hours
Keith Lee
5/5 (1)
The Art of Clean Code: Best Practices to Eliminate Complexity and Simplify Your Life
From Everand
The Art of Clean Code: Best Practices to Eliminate Complexity and Simplify Your Life
Christian Mayer
No ratings yet
AI-900: Microsoft Azure AI Fundamentals Preparation
From Everand
AI-900: Microsoft Azure AI Fundamentals Preparation
Georgio Daccache
No ratings yet
Conference Template
No ratings yet
Conference Template
5 pages
ID6001_Homework_2b57bb1d39ec7c53700fa31dc04520dc
No ratings yet
ID6001_Homework_2b57bb1d39ec7c53700fa31dc04520dc
2 pages
IS675-Assignment2-1
No ratings yet
IS675-Assignment2-1
1 page
GameCube Architecture: Architecture of Consoles: A Practical Analysis, #10
From Everand
GameCube Architecture: Architecture of Consoles: A Practical Analysis, #10
Rodrigo Copetti
No ratings yet
Deep Learning Lab Manual - IGDTUW - Vinisky Kumar
100% (1)
Deep Learning Lab Manual - IGDTUW - Vinisky Kumar
33 pages
C Programming for the Pc the Mac and the Arduino Microcontroller System
From Everand
C Programming for the Pc the Mac and the Arduino Microcontroller System
Peter D Minns
No ratings yet
Fportfolio PDF
No ratings yet
Fportfolio PDF
61 pages
25 - Underground Conduits
No ratings yet
25 - Underground Conduits
40 pages
RDBMS Unit 5
No ratings yet
RDBMS Unit 5
39 pages
GE June 1993 - Flownet Diagrams - The Use of Finite Differences and A Spreadsheet To Determine Potential Heads PDF
No ratings yet
GE June 1993 - Flownet Diagrams - The Use of Finite Differences and A Spreadsheet To Determine Potential Heads PDF
7 pages
Theory of Machines - Module 1 Notes
No ratings yet
Theory of Machines - Module 1 Notes
25 pages
Unit II: Computer Arithmetic
No ratings yet
Unit II: Computer Arithmetic
23 pages
Unit 5 - Sketching Graphs - 8 Apr
No ratings yet
Unit 5 - Sketching Graphs - 8 Apr
5 pages
Modeling Viscoelastic Damping For Dampening Adhesives
100% (1)
Modeling Viscoelastic Damping For Dampening Adhesives
28 pages
Polynomial 10
No ratings yet
Polynomial 10
1 page
Tcs Aptitude Papers
No ratings yet
Tcs Aptitude Papers
18 pages
Ba Paper 3
No ratings yet
Ba Paper 3
1 page
DLL For Dressmaking 8
100% (3)
DLL For Dressmaking 8
9 pages
Mathematics Magazine Vol. 75, No. 3, June 2002
No ratings yet
Mathematics Magazine Vol. 75, No. 3, June 2002
84 pages
2017 AMC8 Problems CH Modified
No ratings yet
2017 AMC8 Problems CH Modified
10 pages
Conic The Dots: Hyperbolas: Precalculus Quarter 1 - Module 4 Melcs 8 & 9
No ratings yet
Conic The Dots: Hyperbolas: Precalculus Quarter 1 - Module 4 Melcs 8 & 9
14 pages
SOM Solid Mechanics
No ratings yet
SOM Solid Mechanics
76 pages
Lecture4 SSP 2007 PDF
No ratings yet
Lecture4 SSP 2007 PDF
29 pages
1.05 - Segment Cut and Paste Activity
No ratings yet
1.05 - Segment Cut and Paste Activity
4 pages
23T1 Integration Bee Questions and Answers-2
No ratings yet
23T1 Integration Bee Questions and Answers-2
6 pages
Class 8 Maths Term 1
No ratings yet
Class 8 Maths Term 1
5 pages
Maths Literacy Grade 12 Trial 2021 P1 and Memo
No ratings yet
Maths Literacy Grade 12 Trial 2021 P1 and Memo
27 pages
MAT099 Past Years Question
No ratings yet
MAT099 Past Years Question
4 pages
Bihar Board (BSEB) Question Paper for Class 10 Maths 2015
No ratings yet
Bihar Board (BSEB) Question Paper for Class 10 Maths 2015
25 pages
Excel Cheat Sheet: Travis Cuzick
100% (1)
Excel Cheat Sheet: Travis Cuzick
15 pages
Maths Two and Three Dimensional Trigonometry 2 Grade 12
No ratings yet
Maths Two and Three Dimensional Trigonometry 2 Grade 12
5 pages
Test
No ratings yet
Test
7 pages
Ec1202 Signals Systems PDF
No ratings yet
Ec1202 Signals Systems PDF
18 pages