0% found this document useful (0 votes)
13 views

Transfer Learning

The document discusses techniques for extracting features from deep convolutional neural networks to use for image and object recognition tasks. It compares using features from earlier versus later layers, and evaluates applying features from networks trained on large datasets to new domains and tasks without further training.

Uploaded by

asim zaman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Transfer Learning

The document discusses techniques for extracting features from deep convolutional neural networks to use for image and object recognition tasks. It compares using features from earlier versus later layers, and evaluates applying features from networks trained on large datasets to new domains and tasks without further training.

Uploaded by

asim zaman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Computer Vision; Image Classi cation; Transfer Learning

YouTube Playlist

Maziar Raissi

Assistant Professor

Department of Applied Mathematics

University of Colorado Boulder

fi
[email protected]
How transferable are features in deep neural networks?
YouTube Playlist

}
sel↵er
{z
|
}
transefer
random split v.s. man-made/natural split {z
|
| {z } | {z }
similar A & B di↵erent tasks A & B

Yosinski, Jason, et al. "How transferable are features in deep neural networks?." Advances in neural information processing systems. 2014.
DeCAF: A Deep Convolutional Activation
Feature for Generic Visual Recognition YouTube Playlist

t-SNE feature visualizations Computation time

Object recognition (Caltech-101)

webcam (green) & dslr (blue)


first pooling layer second to last hidden layer
Subcategory recognition
(Caltech-UCSD bird dataset)

First layers learn “low-level” features, whereas the latter


layers learn semantic or “high-level” features.
Scene recognition (SUN-397)
Features trained on ILSVRC-2012 generalized to SUN-397

Domain adaptation (Office dataset)


The dataset contains three domains: Amazon, which consists of product images taken from
amazon.com; and Webcam and Dslr, which consist of images taken in an office environment
using a webcam or digital SLR camera, respectively.

Donahue, Je , et al. "Decaf: A deep convolutional activation feature for generic visual recognition." International conference on machine learning. 2014.
ff
CNN Features o -the-shelf: an
Astounding Baseline for Recognition YouTube Playlist

Pascal VOC 2007 Image Classification

MIT-67 indoor scenes dataset

!"#$%
Object retrieval (L2 distance) CNNaug: augment the
mean accuracy: mean of the confusion matrix diagonal training set by adding
cropped and rotated samples
CUB 200-2011 Bird dataset
H3D Human Attributes dataset

Sharif Razavian, Ali, et al. "CNN features o -the-shelf: an astounding baseline for recognition." Proceedings of the IEEE conference on computer vision and pattern
recognition workshops. 2014.
ff
ff
Return of the Devil in the Details:
Delving Deep into Convolutional Nets YouTube Video

I ! image
! encoding function
(I) 2 Rd ! vector image representation (Fast)
(Medium)
Shallow Representation (IFV) (Slow)
(Intra-normalisation)

Improved Fisher Vector (IFV)


– Extract a dense collection of patches and
corresponding local descriptors xi 2 RD
(e.g., SIFT) from the image at multiple scales
– Each descriptor xi is then soft-quantized
(i.e., clustered) using a Gaussian Mixture Model Pascal VOC
with K components. multi-label dataset
X Ipos , Ineg ! positive, negative images for each class c
uk := (xi µk ) 2 RD one-vs-rest classification hing loss
{xi :N N (xi )=µk }
wcT (Ipos ) > 1 ⇠, wcT (Ineg ) < 1 + ⇠ ! slack
<latexit sha1_base64="ntrqdW+wl6rpVwv00etC6OSWQNs=">AAACXXicbVDPTxNBGJ0uolgRChw8cPliY4JRml014MEQIhe8YUKBpFvX2enXdsLszGbmW6DZ7L/n/+DJmyevcGa27UF+vGSSl/feN9+Xl+ZKOgrD341g4cni02dLz5svll+urLbW1k+cKazArjDK2LOUO1RSY5ckKTzLLfIsVXianh/U/ukFWieNPqZJjv2Mj7QcSsHJS0nr52UifhxDnI/l1rckJryiMjeuegt7EME2xFfyPTyS0TjymS+wHcG7OgSxlaMxcWvNJcwiTnFxXiWtdtgJp4CHJJqTNpvjKGn9jQdGFBlqEoo714vCnPoltySFwqoZFw5z/zMfYc9TzTN0/XLaRAVvvDKAobH+aYKp+v9EyTPnJlnqkxmnsbvv1eJjXq+g4ed+KXVeEGoxWzQsFJCBulYYSIuC1MQTLqz0t4IYc8sF+fLvbBm4+rSq6YuJ7tfwkJx86EQ7nY/fP7X3v84rWmKb7DXbYhHbZfvskB2xLhPsF/vHrtlN40+wGCwHK7No0JjPbLA7CF7dAiy0tic=</latexit>

Nearest Neighbor
ranking hing loss wcT (Ipos ) > wcT (Ineg ) + 1 ⇠ Performance of shallow representations
accumulated first order di↵erences
X can be significantly improved by adopting
vk := (xi µ k ) 2 2 RD data augmentation, typically used in deep
{xi :N N (xi )=µk } learning. In spite of this improvement,
accumulated second order di↵erences deep architectures still outperform the
2KD
FV (I) = [u ;
1 1v ; . . . ; u ;
K Kv ] 2 R shallow methods by a large margin.
p
IFV (I) FV (I) . sign(·) | · | . `2 normalize
Deep representation (CNN) with pre-training
CNN (I) ! vector activities of penultimate layer
Deep representation (CNN) with pre-training
and fine-tuning
Chat eld, Ken, et al. "Return of the devil in the details: Delving deep into convolutional nets." arXiv preprint arXiv:1405.3531 (2014).
fi
Learning and Transferring Mid-Level Image
Representations using Convolutional Neural Networks YouTube Video

Transfer: ImageNet ! Pascal VOC (Limited Training Data)


<latexit sha1_base64="vDo9iluQ5eqFiwMpg58lOUJWUrk=">AAACP3icbVDLThtBEJwFkhjycuCYywgTiVysXR8CysnCHECKgpGwQbItq3e21x4xO7ua6Q2yLH8Qn5EvyBV/AOKGcs0tY3sP2FDSSKWqbnVNhZmSlnx/6q2tb7x6/aa0ufX23fsPH8uftts2zY3AlkhVaq5CsKikxhZJUniVGYQkVHgZXjdm/uUvNFam+oJGGfYSGGgZSwHkpH65cWFA2xjNd37qLPyJxPe6Rg6GBMakN3u8CVaA4u2zBt//IRNJGHG3JLXUA34MBF/75Ypf9efgz0lQkAor0OyX77tRKvIENQkF1nYCP6PeGAxJoXCy1c0tZiCuXZ6OoxoStL3x/LMT/sUpEY9T454mPlefbowhsXaUhG4yARraVW8mvuR1cooPe2Ops5xQi8WhOFecUj5rjkfSoCA1cgSEkS4rF0MwIMj1u3QlsrNoE9dLsNrCc9KuVYNv1dp5rVI/Khoqsc9sl+2zgB2wOjthTdZigt2yP+yOTb3f3oP36P1djK55xc4OW4L37z8W1K8c</latexit>

“dataset capture bias” and “negative data bias”


<latexit sha1_base64="RyQ1oOlox/HYOeNaBg65Fy4HhqU=">AAACMHicbVC7SgNBFJ31bXxFLW0Gg2gVdoOoZdDGUsGoEENyd/YmGTI7u8zcDYSQH/Ez/AJb/QKtxMLGr3BWt/B1YeBwzrmPOWGqpCXff/ampmdm5+YXFktLyyura+X1jUubZEZgQyQqMdchWFRSY4MkKbxODUIcKrwKBye5fjVEY2WiL2iUYiuGnpZdKYAc1S7vdzoRkBtAXEBKmUEeSrC7uxx0xDsdjT3nHCLPXYVUapcrftX/LP4XBAWosKLO2uW3mygRWYyahAJrm4GfUmsMhqRQOCndZBZTEAPoYdNBDTHa1vjzdxO+45iIdxPjnnZn5uz3jjHE1o7i0DljoL79reXkf1ozo+5Rayx1mhFq8bWomylOCc+j4pE0KEiNHABhpLuViz4YEOQC/bElsvlpE5dL8DuFv+CyVg0OqrXzWqV+XCS0wLbYNttjATtkdXbKzliDCXbL7tkDe/TuvCfvxXv9sk55Rc8m+1He+wdZxalD</latexit>

Inference
<latexit sha1_base64="fFZJzAgRAgDqMCLWw3i3vcBT/fg=">AAACEHicbVBJTsMwFHUYS5nCsGMTUSGxqpIugGUFG9gViQ5SG1WO89NadZzIdpBKlEtwArZwAnaILTfgANwDJ82CtjzJ0tP7w/t+XsyoVLb9baysrq1vbFa2qts7u3v75sFhR0aJINAmEYtEz8MSGOXQVlQx6MUCcOgx6HqTm7zefQQhacQf1DQGN8QjTgNKsNLS0DweFDtSAX52xwMQwAkMzZpdtwtYy8QpSQ2VaA3Nn4EfkSQErgjDUvYdO1ZuioWihEFWHSQSYkwmeAR9TTkOQbpp4ZxZZ1rxrSAS+nFlFerfiRSHUk5DT3eGWI3lYi0X/6v1ExVcuSnlcaL0r2ZGQcIsFVl5FJZPBRDFpppgIqi+1SJjLDBROrA5F1/mp2U6F2cxhWXSadSdi3rjvlFrXpcJVdAJOkXnyEGXqIluUQu1EUFP6AW9ojfj2Xg3PozPWeuKUc4coTkYX79imZ4i</latexit>

Output of the network for class Cn on image patch Pi


<latexit sha1_base64="QsHemL2Kk/TmQNR02lW2ekgo5Nk=">AAACNnicbVA9TxtBFNyDfBiTBANlmqfYkVJZdy4SlMrCDR1GigHJWKe99bt45b3d0+67RJbl/8LP4BekDS0NnUXLT2B9uMA2U43mvdl5O0mupKMwvAu2tt+8ffe+slPd/fDx015t/+DcmcIK7AmjjL1MuEMlNfZIksLL3CLPEoUXybizmF/8Qeuk0b9okuMg47+1TKXg5KW49vO0oLwgMCnQCEEj/TV2DKmxIBR3DhqdWDfAaJDeiZBzEiNodGPZiGv1sBmWgE0SLUmdLdGNa/OroRFFhprKt/tRmNNgyi1JoXBWvSoc5lyMfU7fU80zdINp+ccZfPXKsLwrNZqgVF86pjxzbpIlfjPjNHLrs4X42qxfUHo0mErtW0AtnoPSQgEZWBQGQ2lRkJp4woWV/lYQI265IF/rSsrQLU6b+V6i9RY2yXmrGX1vts5a9fbxsqEK+8y+sG8sYj9Ym52wLusxwa7ZP/af3QY3wX0wDx6eV7eCpeeQrSB4fAKW+6tX</latexit>

M ! number of patches in the image


<latexit sha1_base64="l3VOnoU/MmY2nO9ZiXTXsBL78Z4=">AAACL3icbVDLSgMxFM34rPVVdekmWARXZUZFXRbduBEqWBXaUjLpHSeYSYbkjlqGfoif4Re41S8QN+LGhX9hpnah1gOBw7n33HtzwlQKi77/6k1MTk3PzJbmyvMLi0vLlZXVc6szw6HJtdTmMmQWpFDQRIESLlMDLAklXITXR0X94gaMFVqdYT+FTsKulIgEZ+ikbmXnhLaNuIqRGaNvaRvhDnOVJSEYqiOaMuQxWCoUxRiocG4YdCtVv+YPQcdJMCJVMkKjW/lo9zTPElDIJbO2FfgpdnJmUHAJg3I7s5Ayfu2GtxxVLAHbyYefG9BNp/RopI17CulQ/enIWWJtPwldZ8Iwtn9rhfhfrZVhdNDJhUozBMW/F0WZpKhpkRTtCQMcZd8Rxo1wt1IeM8M4ujx/benZ4rQil+BvCuPkfLsW7NW2T3er9cNRQiWyTjbIFgnIPqmTY9IgTcLJPXkkT+TZe/BevDfv/bt1wht51sgveJ9fFZCqaQ==</latexit>

k 1 (k = 5) ! higher values of k focus on highest


<latexit sha1_base64="j4Mhf716fKJVQTdcCY6uK0RlvF4=">AAACnnicbVHbbtNAEF2bWwi3AI+8jEgrlYdGcQW0L0gVSIgXRJFIGymJovV6bK+83jW745bICh/DX/EB/AdrJw80ZaSVzp4zszN7Jq6UdDQe/w7CW7fv3L3Xu99/8PDR4yeDp8/OnamtwIkwythpzB0qqXFCkhROK4u8jBVexMWHVr+4ROuk0d9oVeGi5JmWqRScPLUc/CpgnuF3iODnQQHv4M0rmFuZ5cStNVcwJ/xBTe4JtHDJVY0OTAp7xR6kRtT+oqFTHa37+5tsJ4yVOoOKk/ACcJ0AJ0Jdc0KgHEEYTVbGdTtC954yV4ddXimTw5369XIwHI/GXcBNEG3BkG3jbDn4M0/8cCVqEoo7N4vGFS0abkkKhev+vHZYcVHwDGceal6iWzSdl2vY90ziP2f90QQd+29Fw0vnVmXsM0tOudvVWvJ/2qym9GTRSF3V3gqxaZTWCshAuxhIpEVBauUBF1b6WUHk3HJBfn3XuiSuHa31Jdp14SY4PxpFb0dHX18PT99vHeqxF+wlO2ARO2an7BM7YxMmgl4wCo6DkxDCj+Hn8MsmNQy2Nc/ZtQinfwEAn801</latexit>

scoring patches and attenuate the contributions


<latexit sha1_base64="cIGQ7FYSpj7AJW9+y+dubxhHTVo=">AAACWnicbZBNSyNBEIY74/d3dL15aQyCF8OMyK5H0YtHFzYqJDHU9NQkjT3dQ3eNGob8O/+E4NWL190fsD1JDhotaHh5q96u4olzJR2F4UstmJtfWFxaXlldW9/Y3Kpv71w7U1iBLWGUsbcxOFRSY4skKbzNLUIWK7yJ7y+q/s0DWieN/kPDHLsZ9LVMpQDyVq9+1yF8otIJY6Xu8xxIDNBx0AkHItQFEHIaIBdGk5VxUcXciB9MciblyjwejeczmRzN/DPq1RthMxwX/yqiqWiwaV316m+dxIgiQ01CgXPtKMypW4IlKRSOVjuFwxzEPfSx7aWGDF23HHPwR3kn4amx/mniY/djooTMuWEW+8kMaOBme5X5Xa9dUHraLaXOC49ETBalheJkeAWVJ9KiIDX0AoSV/lYuBmBBkEf/aUviqtMqLtEsha/i+rgZ/Wwe/z5pnJ1PCS2zPbbPDlnEfrEzdsmuWIsJ9sze2V/2r/YaBMFKsDYZDWrTzA/2qYLd/2QNuOA=</latexit>

of low- and mid-scoring patches


<latexit sha1_base64="WWXU2jUw1JhH8rEDpsF3wcWjqT4=">AAACInicbVBNS8NAEN34Wb+rHr0sFsVLSyKiHkUvHivYKrSlbDaTdnGzG3Ynagn9B/4Mf4FX/QXexJPg1f9h0uag1QcDj/dmmJnnx1JYdN0PZ2p6ZnZuvrSwuLS8srpWXt9oWp0YDg2upTbXPrMghYIGCpRwHRtgkS/hyr85y/2rWzBWaHWJgxg6EespEQrOMJO65d02wj2mOqRS31UpUwGNRFC1XBuhejRmyPtgh91yxa25I9C/xCtIhRSod8tf7UDzJAKFXDJrW54bYydlBgWXMFxsJxZixm9YD1oZVSwC20lH/wzpTqYENNQmK4V0pP6cSFlk7SDys86IYd9Oern4n9dKMDzupELFCYLi40VhIilqmodDA2GAoxxkhHEjslsp7zPDOGYR/toS2Py0PBdvMoW/pLlf8w5r+xcHlZPTIqES2SLbZI945IickHNSJw3CyQN5Is/kxXl0Xp03533cOuUUM5vkF5zPb4RUpP8=</latexit>

Network Architecture: AlexNet


<latexit sha1_base64="GKRiArv4ElOJNRIMZ4UabD6ZqtE=">AAACGXicbVDLTgJBEJzFF+IL9WhMJhITT2SXgxpPoBdPBhN5JEDI7NDAhNlHZnpVsuHkZ/gFXvULvBmvnvwA/8NZ4CBgJZ1UqrrT3eWGUmi07W8rtbS8srqWXs9sbG5t72R396o6iBSHCg9koOou0yCFDxUUKKEeKmCeK6HmDq4Sv3YPSovAv8NhCC2P9XzRFZyhkdrZwxvAh0ANaEnxvkDgGCm4oCUJj8ZpZ3N23h6DLhJnSnJkinI7+9PsBDzywEcumdYNxw6xFTOFgksYZZqRhpDxAetBw1CfeaBb8fiNET02Sod2A2XKRzpW/07EzNN66Lmm02PY1/NeIv7nNSLsnrdi4YcRgs8ni7qRpBjQJBPaEco8LoeGMK6EuZXyPlOMo0luZktHJ6eNTC7OfAqLpFrIO6f5wm0hV7ycJpQmB+SInBCHnJEiuSZlUiGcPJEX8krerGfr3fqwPietKWs6s09mYH39AlYMoTc=</latexit>

Input: 224 ⇥ 224 RGB Image


<latexit sha1_base64="FKfpWlKlXVeQ8dIhecuJaG59uVQ=">AAACHHicbVDLSsNAFJ3UV62vqEs3g63gqiRBVFyVutDuqtgHtKFMJhMdOpmEmYlQQrd+hl/gVr/AnbgV/AD/w0mbhW29MHA499w59x4vZlQqy/o2CkvLK6trxfXSxubW9o65u9eWUSIwaeGIRaLrIUkY5aSlqGKkGwuCQo+Rjje8zPqdRyIkjfidGsXEDdE9pwHFSGlqYMIGjxN1ASuOc9JXNCQSalSBt1d12NBaMjDLVtWaFFwEdg7KIK/mwPzp+xFOQsIVZkjKnm3Fyk2RUBQzMi71E0lihIf6756GHGlPN51cMoZHmvFhEAn9uIIT9u9EikIpR6GnlSFSD3K+l5H/9XqJCs7dlGbHEo6nRkHCoIpgFgv0qSBYsZEGCAuqd4X4AQmElQ5vxsWX2WpjnYs9n8IiaDtV+7Tq3DjlWj1PqAgOwCE4BjY4AzVwDZqgBTB4Ai/gFbwZz8a78WF8TqUFI5/ZBzNlfP0CZ8af2w==</latexit>

width of square patches


<latexit sha1_base64="mwQ+1+zShiGZ0vm9MAh4L53s1yM=">AAACE3icbVA7TsNAEF3zDeEXoKCgWREhUUV2CqCMoKEMEvlIiRWt1+N4lfWH3TEoinIMTkALJ6BDtByAA3APNokLkvCkkZ7em9HMPC+VQqNtf1srq2vrG5uFreL2zu7efungsKmTTHFo8EQmqu0xDVLE0ECBEtqpAhZ5Elre4Gbitx5BaZHE9zhMwY1YPxaB4AyN1CsdPwkfQ5oEVD9kTAFNGfIQdK9Utiv2FHSZODkpkxz1Xumn6yc8iyBGLpnWHcdO0R0xhYJLGBe7mYaU8QHrQ8fQmEWg3dH0gTE9M4pPg0SZipFO1b8TIxZpPYw80xkxDPWiNxH/8zoZBlfuSMRphhDz2aIgkxQTOkmD+kIBRzk0hHElzK2Uh0wxjiazuS2+npw2Nrk4iyksk2a14lxUqnfVcu06T6hATsgpOScOuSQ1ckvqpEE4GZMX8krerGfr3fqwPmetK1Y+c0TmYH39Aso5ntE=</latexit>

Output: Distribution over the ImageNet object classes


<latexit sha1_base64="VfkyP7XJMMcHMzMnSUQpLKfdvpQ=">AAACMXicbVDLSgNBEJyNrxhfUY9eBoPgKewGfOApqAe9aATzgCSE2UnHjJndWWZ6hRDyJX6GX+BVvyA3ETz5E84mOZhowUBR3T3VXX4khUHXHTmphcWl5ZX0amZtfWNzK7u9UzEq1hzKXEmlaz4zIEUIZRQooRZpYIEvoer3LpJ69Qm0ESq8x34EzYA9hKIjOEMrtbJHtzFGMZ7RS+ulhR8nMlV2hGIX6LVthxtAqvxH4Ei5ZMaAaWVzbt4dg/4l3pTkyBSlVvar0VY8DiDE8Rd1z42wOWAaBZcwzDRiAxHjPetWtzRkAZjmYHzekB5YpU07StsX2h0S9ffEgAXG9APfdgYMu2a+loj/1eoxdk6bAxHaACDkE6NOLCkqmmRF20Lbo2XfEsa1sLtS3mWacbSJzri0TbLa0Obizafwl1QKee84X7gr5Irn04TSZI/sk0PikRNSJFekRMqEk2fySt7Iu/PijJwP53PSmnKmM7tkBs73D5L1qyY=</latexit>

rescale each patch to 224 ⇥ 224


<latexit sha1_base64="1EijBHdQvPLlLij6TslvL41TTMQ=">AAACInicbVDLSgNBEJyN7/iKevQymCiewu4i6lH04jGCiYEkhN5JxwyZfTDTK4SQP/Az/AKv+gXexJPg1f9wNsnBGBtmKKq6qe4KEiUNue6nk1tYXFpeWV3Lr29sbm0XdnZrJk61wKqIVazrARhUMsIqSVJYTzRCGCi8C/pXmX73gNrIOLqlQYKtEO4j2ZUCyFLtwpFGI0AhRxA9ngDZn2Je8v0T3iQZouEWltqFolt2x8XngTcFRTatSrvw3ezEIg0xIqHAmIbnJtQagiYpFI7yzdRgAqIP99iwMALr1BqO7xnxQ8t0eDfW9kXEx+zviSGExgzCwHaGQD3zV8vI/7RGSt3z1lBGSUoYiYlRN1XZxVk4vCM1ClIDC0BoaXflogcaBNkIZ1w6JlttZHPx/qYwD2p+2Tst+zd+8eJymtAq22cH7Jh57IxdsGtWYVUm2CN7Zi/s1Xly3px352PSmnOmM3tsppyvH8sIorI=</latexit>

Label Bias: ImageNet ! husky dog, australian terrier, etc. P ! bounding box of a patch
<latexit sha1_base64="lABnpLak84zxhvR9iz7ZADvXCME=">AAACQnicbVBNS1tBFJ3nd+NXapfdDEbBRQjvZWGLq6BQFIooGBWSEO6bd5MMmTfzmLmvEkL+kT/DX9Cd1K0bd8VtF53ELBrTAwOHc+7lnjlxpqSjMHwMFhaXlldW1z4U1jc2t7aLH3euncmtwLowytjbGBwqqbFOkhTeZhYhjRXexP2TsX/zA62TRl/RIMNWCl0tO1IAeald/PYdYlT8WII74mfexHMkvte0stsjsNbc7fFe7voDnphumUPuyIKSoDmhtRJtmSOJSrtYCivhBHyeRFNSYlNctIvPzcSIPEVNQoFzjSjMqDUES1IoHBWaucMMRN8HaniqIUXXGk7+O+L7Xkl4x1j/NPGJ+u/GEFLnBmnsJ1OgnnvvjcX/eY2cOl9bQ6mznFCLt0OdXHEyfFweT6RFQcqXIUFY6bNy0QMLwpcxeyVx42gj30v0voV5cl2tRIeV6mW1VDueNrTGPrNddsAi9oXV2Cm7YHUm2D37yX6xp+AheAl+B69vowvBdOcTm0Hw5y9E9rDK</latexit>

<latexit sha1_base64="dj+984Sb63SpRnOzhyU93l9igCc=">AAACKHicbVDLSgNBEJz1/Tbq0ctgEDxI2A2iHkUvHiOYREhC6J2dTYbMziwzvZqw5Cf8DL/Aq36BN8lV8D/cTXIwxoKGoqqb7i4/lsKi646chcWl5ZXVtfWNza3tnd3C3n7N6sQwXmVaavPgg+VSKF5FgZI/xIZD5Ete93s3uV9/5MYKre5xEPNWBB0lQsEAM6ldOK3QphGdLoIx+ok2kfcx9XWiAqE61Nd9qkMKNAZk3WG7UHRL7hh0nnhTUiRTVNqF72agWRJxhUyCtQ3PjbGVgkHBJB9uNBPLY2A96PBGRhVE3LbS8VdDepwpAQ21yUohHau/J1KIrB1EftYZAXbtXy8X//MaCYaXrVSoOEGu2GRRmEiKmuYR0UAYzlAOMgLMiOxWyrpggGEW5MyWwOan5bl4f1OYJ7VyyTsvle/OilfX04TWyCE5IifEIxfkitySCqkSRp7JK3kj786L8+F8OqNJ64IznTkgM3C+fgByPqeK</latexit>

Bo ! ground truth bounding box for class o


<latexit sha1_base64="Wly5x5HWJeZtcEVdRqwf4jCSu08=">AAACOnicbZC7TsMwFIYdruVeYGSxKEhMVVIhYEKoLIxFoqVSU1WO46RWHTuyT4Aq6tvwGDwBKyysSAyIlQcgSTNwO9On/z/H5/j3YsEN2PaLNTM7N7+wWFlaXlldW9+obm51jEo0ZW2qhNJdjxgmuGRt4CBYN9aMRJ5g197oPPevb5g2XMkrGMesH5FQ8oBTApk0qJ42Bwq7modDIFqrW+wCu4M01CqRPgadwBB7OXMZZnCHA6UxFcQYvKf2JoNqza7bReG/4JRQQ2W1BtU311c0iZiE4pWeY8fQT4kGTgWbLLuJYTGhIxKyXoaSRMz00+KfE7yfKX5xQaAk4EL9PpGSyJhx5GWdEYGh+e3l4n9eL4HgpJ9yGSfAJJ0uChKBQeE8NOxzzSiIcQaEap7diumQaEIhi/bHFt/kp+W5OL9T+AudRt05qjcuD2tnzTKhCtpBu+gAOegYnaEL1EJtRNE9ekRP6Nl6sF6td+tj2jpjlTPb6EdZn1/5xK7S</latexit>

Pascal VOC ! dog


<latexit sha1_base64="zzbUy17b8ClOxDlIisJ7s6d1ZE8=">AAACGHicbVC7TgJBFJ31ifhCLbWYCCZWZJdCLYk0dmIijwQ25O7sABNmZzYzsxpCaPwMv8BWv8DO2Nr5Af6Hs0Ah4ElucnLOvbn3niDmTBvX/XZWVtfWNzYzW9ntnd29/dzBYV3LRBFaI5JL1QxAU84ErRlmOG3GikIUcNoIBpXUbzxQpZkU92YYUz+CnmBdRsBYqZM7qYImwHH9toILbcV6fQNKyccCDmWvk8u7RXcCvEy8GcmjGaqd3E87lCSJqDCEg9Ytz42NPwJlGOF0nG0nmsZABtCjLUsFRFT7o8kXY3xmlRB3pbIlDJ6ofydGEGk9jALbGYHp60UvFf/zWonpXvkjJuLEUEGmi7oJx0biNBIcMkWJ4UNLgChmb8WkDwqIscHNbQl1etrY5uItprBM6qWid1Es3ZXy5etZQhl0jE7ROfLQJSqjG1RFNUTQE3pBr+jNeXbenQ/nc9q64sxmjtAcnK9fK0Sf+A==</latexit>

1. |P \ Bo | 0.2|P | ! Bo overlaps sufficiently with the patch


<latexit sha1_base64="/vZhR5Rf2aUsDpf0sqePJYWNaXM=">AAACXnicbVDBbtNAEN24UEqhNMAFicuIFImTZUeo5ViVC8cgkbZSHEXj9Thede01u2NK5Ibv6y9w48KVKxxZJznQlpFWevNm3szsS2utHEfR916wde/+9oOdh7uPHu892e8/fXbqTGMljaXRxp6n6EirisasWNN5bQnLVNNZevG+q599IeuUqT7xoqZpifNK5Uoie2rWTxOmr9zG4bclXI0gkVjDycxcQTKnzxCFQ8/6xKp5wWituYS14MA3HYDxozXWDlyT+5mKKtYLuFRcABcENbIslrP+IAqjVcBdEG/AQGxiNOv/TDIjm9JPkxqdm8RRzdMWLSupabmbNI5qlBc4p4mHFZbkpu3KiyW89kwGubH+VQwr9l9Fi6VzizL1nSVy4W7XOvJ/tUnD+btpq6q6YarkelHeaGADnbGQKUuy+36mUFrlbwVZoEXJ3v4bWzLXndb5Et924S44HYbxYTj8+HZwfLJxaEe8FK/EGxGLI3EsPoiRGAsprsUv8Vv86f0ItoO9YH/dGvQ2mufiRgQv/gIknriK</latexit>

Pascal VOC ! reading, running, etc.


<latexit sha1_base64="Ocb2i1oH3yOTA4+kA7agbgCfYYE=">AAACK3icbVDLTgIxFO3gG1+oSzeNYOLC4AwLdWlk405M5JEAIXc6BRo6nUl7R0MIn+Fn+AVu9QtcadzyH5bHQsCTNDk55756/FgKg6775aRWVtfWNza30ts7u3v7mYPDiokSzXiZRTLSNR8Ml0LxMgqUvBZrDqEvedXvFcd+9YlrIyL1iP2YN0PoKNEWDNBKrcxFCQwDSSv3RZpraNHpImgdPeeonRII1TmnOlFqQjiyfCuTdfPuBHSZeDOSJTOUWplRI4hYEnKFTIIxdc+NsTkAjYJJPkw3EsNjYD3o8LqlCkJumoPJx4b01CoBbUfaPoV0ov7tGEBoTD/0bWUI2DWL3lj8z6sn2L5uDoSKE+SKTRe1E0kxouOUaCA0Zyj7lgDTwt5KWRc0MLRZzm0JzPi0oc3FW0xhmVQKee8yX3goZG9uZwltkmNyQs6IR67IDbkjJVImjLyQN/JOPpxX59P5dn6mpSln1nNE5uCMfgF2n6dZ</latexit>

500 square patches per each image


<latexit sha1_base64="2KqHRuFTYJ3jKAgdi3sJaXg3YPY=">AAACHXicbVDLSsNAFJ34rPUVdelmtAiuSlLwsSy6cVnBPqANZTK5sUMnyTgzEUro2s/wC9zqF7gTt+IH+B9O2ixs64GBw7nnPub4gjOlHefbWlpeWV1bL22UN7e2d3btvf2WSlJJoUkTnsiOTxRwFkNTM82hIySQyOfQ9ofXeb39CFKxJL7TIwFeRO5jFjJKtJH69tGZ42D1kBIJWBBNB6CwAImB0AFmxgx9u+JUnQnwInELUkEFGn37pxckNI0g1pQTpbquI7SXEakZ5TAu91IFgtChmd01NCYRKC+bfGWMT4wS4DCR5sUaT9S/HRmJlBpFvnFGRA/UfC0X/6t1Ux1eehmLRaohptNFYcqxTnCeCw6YBKr5yBBCJTO3YjogklBt0pvZEqj8tLHJxZ1PYZG0alX3vFq7rVXqV0VCJXSIjtEpctEFqqMb1EBNRNETekGv6M16tt6tD+tzal2yip4DNAPr6xfn2KHw</latexit>

2. |P \ Bo | 0.6|Bo | ! the patch contains large portion of the object


<latexit sha1_base64="04rjYoioJnb8SCZHqk3u+FHgVyU=">AAACZHicbVFNb9NAEN2YAqUUGqh6qlSNiJA4WXZUlR6rcuEYpKatFEfReLO2l6533d1xIXLCT+TOD4BfwBXEOsmhH4y00ts382Zm36aVko6i6EcneLTx+MnTzWdbz7dfvNzpvnp97kxtuRhyo4y9TNEJJbUYkiQlLisrsEyVuEivPrT5ixthnTT6jGaVGJeYa5lJjuSpSbdISHylph9+W8B8AAnHCk4nZg5JLq4hCo9gvrpamReE1povsJJQIaBC4gVwowmldqDQ5p40tu0NJoO2xqSfBafFpNuLwmgZ8BDEa9Bj6xhMuj+TqeF1KTRxhc6N4qiicYO+OVdisZXUTlTIrzAXIw81lsKNm6UjC3jrmSlkxvqjCZbsbUWDpXOzMvWVJVLh7uda8n+5UU3Z8biRuqpJaL4alNUKyEBrL0yl9a9VMw+QW+l3BV6gRU7+E+5Mmbp2tdaX+L4LD8F5P4yPwv6nw97J6dqhTbbP3rB3LGbv2Qn7yAZsyDj7zn6zP+xv51ewHewGe6vSoLPW7LI7ERz8A9XMuzg=</latexit>

<latexit sha1_base64="h5jMJ7XG2qWBgMI6wZMy4QPE1P8=">AAACLXicbVDLSgNBEJz1/Tbq0ctgEDyF3YCPo+jFYwTzgCSE3kknGZydWWZ6lRDyHX6GX+BVv8CDIN7E33DyOGi0YKCoqqZ7Kk6VdBSGb8Hc/MLi0vLK6tr6xubWdm5nt+JMZgWWhVHG1mJwqKTGMklSWEstQhIrrMa3lyO/eofWSaNvqJ9iM4Gulh0pgLzUykVAXCE44sdh45Abn1WQ8hjpHlFzjbLbi42VustTINFD18rlw0I4Bv9LoinJsylKrdxno21ElqAmocC5ehSm1ByAJSkUDtcamcMUxC10se6phgRdczD+2pAfeqXNO8b6p4mP1Z8TA0ic6yexTyZAPTfrjcT/vHpGnbPmQOo0I9RisqiTKU6Gj3ribWlRkOp7AsJKfysXPbAgyLf5a0vbjU4b+l6i2Rb+kkqxEJ0UitfF/PnFtKEVts8O2BGL2Ck7Z1esxMpMsAf2xJ7ZS/AYvAbvwcckOhdMZ/bYLwRf30+dqPI=</latexit>

at least 50% overlap between neighboring patches 3. the patch overlaps with no more than one object
<latexit sha1_base64="MbW2AYR0e9V34wD3foNndP6ivgM=">AAACPXicbVDLThtBEJyF8A7ESY5cWlhIOVm7ECUcSXLhSKQYkGzL6h232YHZmdVML8Ramd/hM/gCronyAdyiXLkya/vAIyW1VKrqVndXWmjlOY7/RHPzrxYWl5ZXVtder2+8abx9d+Rt6SS1pdXWnaToSStDbVas6aRwhHmq6Tg9/1b7xxfkvLLmB48K6uV4atRQSeQg9Rtfukw/udptXY1hSjkjKJBlBjYMaiw8XCrOwFjIrSPgDA1YQ2DTM5I87jeacSueAF6SZEaaYobDfuOuO7CyzMmw1Oh9J4kL7lXoWElN49Vu6alAeY6n1AnUYE6+V01eHcN2UAYwtC6UYZiojycqzL0f5WnozJEz/9yrxf95nZKHe71KmaJkMnK6aFhqYAt1bjBQLnyrR4GgdCrcCjJDh5JDuk+2DHx9Wp1L8jyFl+Rop5V8au18/9jc/zpLaFlsii3xQSTis9gXB+JQtIUU1+JW/BK/o5voLvob/Zu2zkWzmffiCaL7B87GsEc=</latexit>

! 8 di↵erent scales
<latexit sha1_base64="tKh7psqr9jB8kqRpf3O+XEQEZoY=">AAACIXicbVDLTgJBEJzFF+IL9ehlIiHxRHaJUY5ELx4xkUcChMwOvTBh9pGZXpVs+AI/wy/wql/gzXgznv0PZ4GDgJVMUqnqnu4uN5JCo21/WZm19Y3Nrex2bmd3b/8gf3jU0GGsONR5KEPVcpkGKQKoo0AJrUgB810JTXd0nfrNe1BahMEdjiPo+mwQCE9whkbq5YsdJQZDZEqFD7SD8IhJhfaF54GCAKnmTIKe9PIFu2RPQVeJMycFMketl//p9EMe++YPLpnWbceOsJswhYJLmOQ6sYaI8REbQNvQgPmgu8n0nAktGqVPvVCZZ3aYqn87EuZrPfZdU+kzHOplLxX/89oxepVuIoIoRgj4bJAXS4ohTbMxdyvgKMeGMK6E2ZXyIVOMo0lwYUpfp6uluTjLKaySRrnkXJTKt+eF6tU8oSw5IafkjDjkklTJDamROuHkibyQV/JmPVvv1of1OSvNWPOeY7IA6/sXlpGlGQ==</latexit>

=) the patch is labeled as a positive label example for class o


<latexit sha1_base64="n4RgLS9uJLq7r+IbGr3beYxH7dg=">AAACSnicbVBNT9tAEF2nfARaaGiPvYwaKvUU2ahqOSK4cKpSqQGkJIrG6zFZsfZau2NEZOVf8TP4A3CF/oHeql66dnLgo3N6em/evtkXF1o5DsPboPVqZXVtvb2x+frN1vbbzs67E2dKK2kgjTb2LEZHWuU0YMWazgpLmMWaTuOLo1o/vSTrlMl/8qygcYbnuUqVRPbUpPN9pDKfQw5GTFdc8ZSgQJZTUA40xqQpAXSAUBinWF3SggW6Qm8kSI0FqdE52DW780mnG/bCZuAliJagK5bTn3R+jRIjy4xybl4ZRmHB4wotK6lpvjkqHRUoL/Cchh7mmJEbV82/5/DJM0lzQWpyhoZ97Kgwc26WxX4zQ56651pN/k8blpzujyuVFyVTLhdBaamBDdQlQqIsSdYzD1BaX4sEOUWLkn3VT1ISV59W9xI9b+ElONnrRV97ez++dA8Olw21xQfxUXwWkfgmDsSx6IuBkOJa3Il78RDcBL+DP8HfxWorWHreiyfTWvkHN/Szsg==</latexit>

Oquab, Maxime, et al. "Learning and transferring mid-level image representations using convolutional neural networks." Proceedings of the IEEE conference on
computer vision and pattern recognition. 2014.
Questions?
YouTube Playlist

You might also like