Presentation

Uploaded by

Kanute

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Presentation

Uploaded by

Kanute

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Bike Price Prediction

Matthias Fast, Amos Dinh

Semester Project
Agenda
1. Problem formulation
2. Crawling for data
3. Filtering and cleaning the data
4. Loss function
5. Model Training
6. Visualization of activations (Conclusion)
1. Problem formulation and motivation

• Suppose you can identify bicycles which are underpriced based on an

image of the bicycle
• You can resell the bicycle to make profit

-> Problem formulation:

„Predict offer-price in euros, given an image of a bike”
2. Crawling
„Predict offer-price in euros, given an image of a bike”
• (Image, price) mapping can be consistently obtained with
webscraping
• scraping of bicycle-images:
• High Quality:
• www.fahrrad.de (1500)
• www.fahrrad-xxl.de (5700)
• Low Quality:
• www.google.com/... &tbm=shop (52000)
~60000 Images
3. Filtering and Cleaning
• Constraints foran im age
• The picture has to contain one bicycle

Reduce the am ountofdata needed (N arrow ed-dow n problem

form ulation):
• The bicycle has to face sidew ays
• The background ofthe im age needs to be m onotonous.

-> Aim to remove deviating images

3. Filtering and Cleaning
3.1. Filter out non-bicycle images
3.2. Remove non-monotonous and other images
3.3. Remove duplicates
3.4. Is there bias in the images?
(3.5.) Preprocessing and mean image
3.1. Filter out non-bicycle images
clf. head embeddings
Chosen method: base model embeddings

1. Compute image embeddings using

ResNet50 CNN (ImageNet)
2. filter images: label “bad samples” by
hand and use their embeddings to
localize similar “bad” images
3. Filter images with visual verification
3.1. Filter out non-bicycle images
3.1. Filter out non-bicycle images
TSNE of ResNet base:
• Embeddings of
classes to be removed
are marked red.
• xnobike (green) is not
well separated
3.1. Filter out non-bicycle images
TSNE of ResNet with
classification head:
• xnobike (green) is
well separated
• other classes (e.g.
blue, cyan) are not
well separated
3.1. Filter out non-bicycle images
• Examination of top-1 predictions of the
classification head images shows many “top-1
classes” to be removable
3.1. Filter out non-bicycle images
• Embeddings after removal, 5200 images are removed,
54000 are kept

-> Still need to remove non-monotonous and xnobike

3.2. Remove non-monotonous and xnobike
1. Perform LDA on embeddings of hand-labeled images vs.
all other images
3.2. Remove non-monotonous and xnobike
2. Manually verify images kept for different thresholds

-> xnobike: Choose threshold 1, remove 1987 images, keep 52021

-> non-monotonous: threshold 0.97, remove 3194 images, keep 48825
3.2. Remove non-monotonous and xnobike
• Removal appears to be mostly successful
3.3. Remove duplicate images
• Many duplicate images (Google)
• Reuse embeddings, since they also represent “similarity”
to a certain degree
• Group images by Euclidean distance, groups of “distance x”

-> Choose threshold 1 , remove 3989 images, keep 44836

3.4. Is there bias in the images?
• Many duplicate images (Google)
3.4. Is there bias in the images?
• Price distribution per source
• The bicycle-sellers tend to have
higher priced bicycles
• Generally, the number of images
from Google outweighs this fact
• Bicycles priced < 100 and > 10000
are removed
3.4. Is there bias in the images?
• Darkness score is the
“average greyscale pixel value”
• Fahrrad.de shows a deviating
distribution but this is not further
pursued
3.4. Is there bias in the images?
• Linear correlation is calculated
• As the price distribution looks
similar to log-normal, this is
explored
• Price is most linearly correlated
with the square root of resolution
and darkness (0.022 and 0.069)
• No large correlation can be
determined
3.5. Preprocessing
• Images are rescaled and
preprocessing applied (for ResNet)
• Cropping of images (Not part of the
NB)
• Visualization immediately before
training
3.5. Mean Image
• Compute the mean image for
different price ranges
• Bicycles are generally right facing
• Most frequent shape seems to be
“Mountainbike” (Non-horizontal
mid-section)
3.5. Datasplit
• Random Shuffle of all 3 sources,
• Google
• Fahrrad.de
• Fahrrad-xxl.de
• Train = 45000
• Dev = 1000
• Test = 1000 (Originally hp. optimization was intended)
• Split is permanently saved and loaded via .json
4. Loss function
• The algorithm will minimize the loss function
• The choice of loss function has to match the problem to be solved
• Otherwise unsatisfactory results will be obtained
4. Loss function
• RMSE:
• MAE:

• MAPE: Mean Absolute Percentage Error

4. Loss function
• RMSE: The tail, high priced image, would
be over-emphasized
• MAPE: „Scaled MAE“ is more sensitive to
small prices
y = 200, ŷ = 300, MAPE = 50%
y = 2000, ŷ = 2100, MAPE = 5%
The original problem: Reselling Profit:
- Allows reselling cheap bicycles as well
- Allows for limited capital inve
5. Model training: Baseline
• The model used is a 50-block ResNet

Regression Head:
• Hidden: 1024 Fully Connected Neurons
• Output: 1 ReLu/Linear Neuron
5. Model training: Baseline
bs: 32
lr: [
0.0001,
0.0003,
0.001
]
epochs: 15

The baseline model can not achieve good performance,

MAPE = 65%
on average: 400€ or 1600€ for a 1000€ bicycle
5. Model training: Conv5-Model
• The model used is a 50-block ResNet

Regression Head:
• Hidden Layer: 1024 Neurons
• Output: 1 ReLu/Linear Neuron
5. Model training: Conv5-Model
bs: 32
lr: 0.0001
epochs: 15

The performance is better, MAPE = 21%

Prediction on average: 800€ or 1200€ for a 1000€ bicycle
Improvements: Hyperparameter Tuning, simply using ResNet V2
5. Model training: Conv5-Model
• The model used is a 50-block ResNet

Retraining:
• The last 6 of the total 53 convolutional layers
Regression Head:
• Output: 1 ReLu/Linear Neuron
5. Model training: Robust Conv5-Model
• The data is augmented by:
• Flip
• Rotation: 2.5 degrees
• MAPE = 22%

• Comparing MAPE

Baseline Conv5 Conv5 aug.

Dev 65% 21% 22%
Aug. Test - 37.9% 22.5%
5. Recap: “Is there bias in the data?”

• Error approximately normal distributed

• No significant error difference between the sources
6. Visualization of activations
• Gradient Cam:
• Gradient of
1. the predicted output with respect to
2. the output-featuremap of
3. each convolutional layer
6. Visualization of activations
• The algorithm generalizes:
• unclean images
• “non-sideview”
images as well
• Grad Cam is not 100%
exact in locality (resolution
is the feature map size of
the respective Conv. Layer)
6. Visualization of activations
• The algorithm picks up on
some random white
background pixelation

-> train-dev-test data-leakage

• Only some images are

affected
• Only some layers are
affected
• The overall impact is not
investigated
Thank you for your attention
Are there any questions, feedback or suggestions?

Wp2300 Series
100% (5)
Wp2300 Series
146 pages
Amazon Inventory Reconciliation Using AI: ST ND RD
No ratings yet
Amazon Inventory Reconciliation Using AI: ST ND RD
6 pages
Classify Webcam Images Using Deep Learning
No ratings yet
Classify Webcam Images Using Deep Learning
17 pages
Traffic Sign Classification Slides
No ratings yet
Traffic Sign Classification Slides
29 pages
Identify Web Cam Images Using Neural Networks
No ratings yet
Identify Web Cam Images Using Neural Networks
17 pages
Your First Neural Network
No ratings yet
Your First Neural Network
15 pages
FULLTEXT01
No ratings yet
FULLTEXT01
74 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
Image Recognition Using ML (CNN) For Beginners - by Akhil Haridasan - The Startup - Medium
No ratings yet
Image Recognition Using ML (CNN) For Beginners - by Akhil Haridasan - The Startup - Medium
21 pages
Deep Learning notes
No ratings yet
Deep Learning notes
155 pages
Image Classification: Keras
No ratings yet
Image Classification: Keras
21 pages
Demosaising Convolutional Neural Networks Memoire
No ratings yet
Demosaising Convolutional Neural Networks Memoire
63 pages
TP3 Mi204 Santos Scardellato
No ratings yet
TP3 Mi204 Santos Scardellato
20 pages
Alexnet Tugce Kyunghee
No ratings yet
Alexnet Tugce Kyunghee
35 pages
Image Data lit
No ratings yet
Image Data lit
4 pages
PPT
No ratings yet
PPT
20 pages
Convolutional Networks
No ratings yet
Convolutional Networks
211 pages
Military AI-Week 05-AI in Computer Vision
No ratings yet
Military AI-Week 05-AI in Computer Vision
65 pages
Dissertation
No ratings yet
Dissertation
86 pages
Application of CNN For Image Classification On Pascal VOC Challenge 2012 Dataset
No ratings yet
Application of CNN For Image Classification On Pascal VOC Challenge 2012 Dataset
6 pages
Vic Bfra (Bike Form Recognition Analysis) Documentation[1]
No ratings yet
Vic Bfra (Bike Form Recognition Analysis) Documentation[1]
4 pages
MP Final Report
No ratings yet
MP Final Report
19 pages
Traffic Sign Classification: Mezzi Houssem
No ratings yet
Traffic Sign Classification: Mezzi Houssem
36 pages
0 Computer Vision Panikzettel
No ratings yet
0 Computer Vision Panikzettel
28 pages
L8 - Image Classification
No ratings yet
L8 - Image Classification
20 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
No ratings yet
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
55 pages
Explore the Implementation of CNNs in Python
No ratings yet
Explore the Implementation of CNNs in Python
10 pages
Classification Classify Images of Clothing - ALI LAZIM
No ratings yet
Classification Classify Images of Clothing - ALI LAZIM
21 pages
Thesis - Anomaly Detection in Manufacturing
No ratings yet
Thesis - Anomaly Detection in Manufacturing
56 pages
Learning Multiple Layers of Features from Tiny Images. Alex Krizhevsky
No ratings yet
Learning Multiple Layers of Features from Tiny Images. Alex Krizhevsky
60 pages
An Analysis of Convolutional Neural Network Architectures
No ratings yet
An Analysis of Convolutional Neural Network Architectures
54 pages
Master's Thesis Deep Learning For Visual Recognition: Remi Cadene Supervised by Nicolas Thome and Matthieu Cord
No ratings yet
Master's Thesis Deep Learning For Visual Recognition: Remi Cadene Supervised by Nicolas Thome and Matthieu Cord
58 pages
Text+image Embedding
No ratings yet
Text+image Embedding
7 pages
Data Science Interview Preparation (#DAY 14)
No ratings yet
Data Science Interview Preparation (#DAY 14)
11 pages
DL 3
No ratings yet
DL 3
10 pages
AML assignment explanation
No ratings yet
AML assignment explanation
1 page
DNN Architectures
No ratings yet
DNN Architectures
12 pages
BreastCancer EXP
No ratings yet
BreastCancer EXP
8 pages
23-CNN Operations - Architecture - Simple Convolution Network-09!09!2024
No ratings yet
23-CNN Operations - Architecture - Simple Convolution Network-09!09!2024
8 pages
Matconvnet Manual
No ratings yet
Matconvnet Manual
59 pages
Ml@ok Questions
No ratings yet
Ml@ok Questions
16 pages
VGG Image Classification Practical
No ratings yet
VGG Image Classification Practical
11 pages
OpenCV - Cheatsheet
100% (1)
OpenCV - Cheatsheet
12 pages
1 Online
No ratings yet
1 Online
5 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
38 pages
MobileNetV2 Inverted Residuals and Linear Bottlenecks
No ratings yet
MobileNetV2 Inverted Residuals and Linear Bottlenecks
11 pages
CV 2 MARKS
No ratings yet
CV 2 MARKS
11 pages
A41174 - Vision AI When Data Is Expensive and Constantly Changing
No ratings yet
A41174 - Vision AI When Data Is Expensive and Constantly Changing
30 pages
GoogleNET and ResNet v4 With Nin and Bias
No ratings yet
GoogleNET and ResNet v4 With Nin and Bias
82 pages
Lecture 6 CNN - Detection
No ratings yet
Lecture 6 CNN - Detection
48 pages
HCIP-AI-EI Developer V2.0 Training Material
No ratings yet
HCIP-AI-EI Developer V2.0 Training Material
508 pages
10.CNN-2
No ratings yet
10.CNN-2
97 pages
HODL Lec 3 DNNs For Vision 1
No ratings yet
HODL Lec 3 DNNs For Vision 1
36 pages
219 Report
No ratings yet
219 Report
7 pages
03 Classification
No ratings yet
03 Classification
93 pages
IP Report Final
No ratings yet
IP Report Final
20 pages
Stage 424 June 2023
No ratings yet
Stage 424 June 2023
89 pages
Regression Linaire Python Tome I
No ratings yet
Regression Linaire Python Tome I
9 pages
OpenCV Computer Vision Application Programming Cookbook Second Edition
From Everand
OpenCV Computer Vision Application Programming Cookbook Second Edition
Robert Laganière
No ratings yet
Bachelor_thesis
No ratings yet
Bachelor_thesis
61 pages
University Presentation
No ratings yet
University Presentation
23 pages
Description_of_approach
No ratings yet
Description_of_approach
5 pages
University_presentation
No ratings yet
University_presentation
23 pages
Association of Autonomous Astronauts Zine
100% (1)
Association of Autonomous Astronauts Zine
44 pages
Fishbone Analysis, Reflection, and Criteria - Jullie Lugong
No ratings yet
Fishbone Analysis, Reflection, and Criteria - Jullie Lugong
3 pages
What Is Astrology-Intro To Jyotish-Free-eBook
100% (1)
What Is Astrology-Intro To Jyotish-Free-eBook
44 pages
REVELO, JANICE C. - Enhancing Grade Two Pupils' Cursive Writing Skill
100% (1)
REVELO, JANICE C. - Enhancing Grade Two Pupils' Cursive Writing Skill
29 pages
BC547
No ratings yet
BC547
3 pages
Human Dignity in The Story of CReation
No ratings yet
Human Dignity in The Story of CReation
2 pages
E-receipt3
No ratings yet
E-receipt3
2 pages
6-3 Similar Figures and Scale Drawing
No ratings yet
6-3 Similar Figures and Scale Drawing
12 pages
Ambulatory Blood Pressure Monitors: Operations Manual
No ratings yet
Ambulatory Blood Pressure Monitors: Operations Manual
34 pages
Kasus Ideo
No ratings yet
Kasus Ideo
2 pages
Figure 1.1 Failure Curve For Hardware Figure 1.2 Failure Curves For Software
No ratings yet
Figure 1.1 Failure Curve For Hardware Figure 1.2 Failure Curves For Software
5 pages
ST3_ 6_MATHEMATICS Q4 (1)
No ratings yet
ST3_ 6_MATHEMATICS Q4 (1)
4 pages
Cost Accounting
No ratings yet
Cost Accounting
8 pages
Middle East Companies
100% (3)
Middle East Companies
112 pages
2001 Nissan X Trail 58
No ratings yet
2001 Nissan X Trail 58
76 pages
Transfer Theorem For Moment of Inertia
No ratings yet
Transfer Theorem For Moment of Inertia
8 pages
SENG 691 Slides 1 Intro To Cloud Computing and IoT
No ratings yet
SENG 691 Slides 1 Intro To Cloud Computing and IoT
74 pages
The Problem
93% (14)
The Problem
41 pages
Tps 51116
No ratings yet
Tps 51116
29 pages
The Influence of Team Dynamics Over A Team's Performance
No ratings yet
The Influence of Team Dynamics Over A Team's Performance
7 pages
1967 C150G Owners Manual PDF
No ratings yet
1967 C150G Owners Manual PDF
28 pages
Final Report
100% (1)
Final Report
41 pages
Conceptual Data Model - Online Store Sample
No ratings yet
Conceptual Data Model - Online Store Sample
5 pages
Simulation and mechanical testing of 3D printing shin guard composite materials
No ratings yet
Simulation and mechanical testing of 3D printing shin guard composite materials
15 pages
A Study On Comparative Analysis On Customer Satisfaction With Respect To Airtel and Reliance Jio Services
No ratings yet
A Study On Comparative Analysis On Customer Satisfaction With Respect To Airtel and Reliance Jio Services
12 pages
Microcontroller-Based Lock Using Colour Security Code
No ratings yet
Microcontroller-Based Lock Using Colour Security Code
78 pages
RIEDEL Cooling Systems For Computed Tomography
No ratings yet
RIEDEL Cooling Systems For Computed Tomography
18 pages
2010 RAMS Doe and Data Analysis
No ratings yet
2010 RAMS Doe and Data Analysis
30 pages
PUHZ-SHW Service Manual OCH526 PDF
No ratings yet
PUHZ-SHW Service Manual OCH526 PDF
106 pages