基于图割法/最小能量的MATLAB代码包资源-CSDN下载

共4个文件

docx：2个

zip：1个

pdf：1个

基于图割法

需积分: 50 183 浏览量 2015-07-06 23:50:37 上传评论 3 收藏 4.47MB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

GCMex-MATLAB.rar （4个子文件）

Brian Fulkerson

1.docx 13KB

gcmex-2.3.0.zip 219KB

Class Segmentation and Object Localization with Superpixel Neighborhoods.pdf 4.81MB

说明.docx 16KB

Class Segmentation and Object Localization with Superpixel Neighborhoods

Brian Fulkerson

Andrea Vedaldi

Stefano Soatto

Department of Computer Science

Department of Engineering Science

University of California, Los Angeles, CA 90095 University of Oxford, UK

{bfulkers,soatto}@cs.ucla.edu [email protected]

Abstract

We propose a method to identify and localize object

classes in images. Instead of operating at the pixel level,

we advocate the use of superpixels as the basic unit of a

class segmentation or pixel localization scheme. To this

end, we construct a classiﬁer on the histogram of local fea-

tures found in each superpixel. We regularize this clas-

siﬁer by aggregating histograms in the neighborhood of

each superpixel and then reﬁne our results further by us-

ing the classiﬁer in a conditional random ﬁeld operating

on the superpixel graph. Our proposed method exceeds

the previously published state-of-the-art on two challeng-

ing datasets: Graz-02 and the PASCAL VOC 2007 Segmen-

tation Challenge.

1. Introduction

Recent success in image-level object categorization has

led to signiﬁcant interest on the related fronts of localization

and pixel-level categorization. Both areas have seen signiﬁ-

cant progress, through object detection challenges like PAS-

CAL VOC [9]. So far, the most promising techniques seem

to be those that consider each pixel of an image.

For localization, sliding window classiﬁers [8, 3, 21, 35]

consider a window (or all possible windows) around each

pixel of an image and attempt to ﬁnd the classiﬁcation

which best ﬁts the model. Lately, this model often includes

some form of spatial consistency (e.g. [22]). In this way, we

can view sliding window classiﬁcation as a “top-down” lo-

calization technique which tries to ﬁt a coarse global object

model to each possible location.

In object class segmentation, the goal is to produce a

pixel-level segmentation of the input image. Most ap-

proaches are built from the bottom up on learned local rep-

resentations (e.g. TextonBoost [32]) and can be seen as an

evolution of texture detectors. Because of their rather lo-

cal nature, a conditional random ﬁeld [20] or some other

model is often introduced to enforce spatial consistency.

For computational reasons, this usually operates on a re-

duced grid of the image, abandoning pixel accuracy in favor

of speed. The current state-of-the-art for the PASCAL VOC

2007 Segmentation Challenge [31] is a scheme which falls

into this category.

Rather than using the pixel grid, we advocate a repre-

sentation adapted to the local structure of the image. We

consider small regions obtained from a conservative over-

segmentation, or “superpixels,” [29, 10, 25] to be the ele-

mentary unit of any detection, categorization or localization

scheme.

On the surface, using superpixels as the elementary units

seems counter-productive, because aggregating pixels into

groups entails a decision that is unrelated to the ﬁnal task.

However, aggregating pixels into superpixels captures the

local redundancy in the data, and the goal is to perform

this decision in a conservative way to minimize the risk of

merging unrelated pixels [33]. At the same time, moving

to superpixels allows us to measure feature statistics (in this

case: histograms of visual words) on a naturally adaptive

domain rather than on a ﬁxed window. Since superpixels

tend to preserve boundaries, we also have the opportunity

to create a very accurate segmentation by simply ﬁnding

the superpixels which are part of the object.

We show that by aggregating neighborhoods of superpix-

els we can create a robust region classiﬁer which exceeds

the state-of-the-art on Graz-02 pixel-localization and on the

PASCAL VOC 2007 Segmentation Challenge. Our results

can be further reﬁned by a simple conditional random ﬁeld

(CRF) which operates on superpixels, which we propose in

Section 3.4.

2. Related Work

Sliding window classiﬁers have been well explored for

the task of detecting the location of an object in an image [3,

21, 8, 9]. Most recently, Blaschko et al. [3] have shown

that it is feasible to search all possible sub-windows of an

image for an object using branch and bound and a structured

classiﬁer whose output is a bounding box. However, for our

purposes a bounding box is not an acceptable ﬁnal output,

even for the task of localization.

N=0 N=2

Figure 1. Aggregating histograms. An illustration of the detail of our superpixel segmentation and the effectiveness of aggregating

histograms from adjacent segments. From the left: segmentation of a test image from Graz-02, a zoomed in portion of the segmentation,

the classiﬁcation of each segment where more red is more car-like, and the resulting classiﬁcation after aggregating all histograms within

N = 2 distance from the segment being classiﬁed.

Our localization capability is more comparable to

Marszałek [24] or Fulkerson et al. [11]. Marszałek warps

learned shape masks into an image based on distinctive lo-

cal features. Fulkerson performs bag-of-features classiﬁca-

tion within a local region, as we do, but the size of the region

is ﬁxed (a rectangular window). In contrast, our method

provides a natural neighborhood size, expressed in terms of

low level image regions (the superpixels). A comparison

with these methods is provided in Table 2.

Class segmentation algorithms which operate at the pixel

level are often based on local features like textons [32] and

are augmented by a conditional random ﬁeld or another spa-

tial coherency aid [15, 19, 16, 37, 17, 28, 13] to reﬁne the re-

sults. Shotton et al. [31] constructs semantic texton forests

for extremely fast classiﬁcation. Semantic texton forests

are essentially randomized forests of simple texture classi-

ﬁers which are themselves randomized forests. We compare

our results with and without an explicit spatial aid (a CRF)

with those of Shotton in Table 3. Another notable work in

this area is that of Gould et al. [13] who recently proposed

a superpixel-based CRF which learns relative location off-

sets of categories. We eventually augment our model with a

CRF on superpixels, but we do not model the relative loca-

tions of objects explicitly, instead preferring to use stronger

local features and learn context via connectedness in the su-

perpixel graph.

A number of works utilize one or more segmentations

as a starting point for their task. An early example is

Barnard et al. [2], who explore associating labels with im-

age regions using simple color features and then merging

regions based on similarity over the segment-label distri-

bution. More recently, Russell et al. [30] build a bag-of-

features representation on multiple segmentations to auto-

matically discover object categories and label them in an

unsupervised fashion. Similarly, Galleguillos et al. [12]

use Multiple Instance Learning (MIL) to localize objects

in weakly labeled data. Both assume that at least one of

their segmentations contains a segment which correctly sep-

arates the entire object from the background. By operating

on superpixels directly, we can avoid this assumption and

the associated difﬁculty of ﬁnding the one “good” segment.

Perhaps the most closely related work to ours is that

of Pantofaru et al. [27]. Pantofaru et al. form superpixel-

like objects by intersecting multiple segmentations and then

classify these by averaging the classiﬁcation results from all

of the member regions. Their model allows them to gather

classiﬁcation information from a number of different neigh-

borhood sizes (since each member segment has a different

extent around the region being classiﬁed). However, mul-

tiple segmentations are much more computationally expen-

sive than superpixels, and we exceed their performance on

the VOC 2007 dataset (see Table 3).

Additionally, a number of authors use graphs of image

structures for various purposes, including image categoriza-

tion [14, 26] and medical image classiﬁcation [1]. Although

we operate on a graph, we do not seek to mine discrim-

inative substructures [26] or classify images based on the

similarity of walks [14]. Instead we use the graph only to

deﬁne neighborhoods and optionally to construct a condi-

tional random ﬁeld.

3. Superpixel Neighborhoods

3.1. Superpixels

We use quick shift [36] to extract superpixels from our

input images. Our model is quite simple: we perform quick

shift on a ﬁve-dimensional vector composed of the LUV

colorspace representation of each pixel and its location in

the image.

Unlike superpixelization schemes based on normalized

cuts (e.g. [29]), the superpixels produced by quick shift are

not ﬁxed in approximate size or number. A complex image

with many ﬁne scale image structures may have many more

superpixels than a simple one, and there is no parameter

which puts a penalty on the boundary, leading to superpixels

which are quite varied in size and shape. Statistics related

to our superpixels (such as the average size and degree in

the graph) are detailed in Section 4.

This produces segmentations, like the one in Figure 1,

which consist of many small regions that preserve most of

the boundaries in the original image. Since we perform this

segmentation on the full resolution image, we leave open

the potential to obtain a nearly pixel-perfect segmentation

of the object.

3.2. Classiﬁcation

We construct a bag-of-features classiﬁer which operates

on the regions deﬁned by the superpixels we have found.

SIFT descriptors [23] are extracted for each pixel of the

image at a ﬁxed scale and orientation using the fast SIFT

framework found in [34]. The extracted descriptors are then

quantized using a K-means dictionary and aggregated into

one l

-normalized histogram h

∈ R

for each superpixel

∈ S. In order to train the classiﬁer, each superpixel s

is assigned the most frequent class label it contains (since

the ground-truth labels have pixel-level granularity). Then

a one-vs-rest support vector machine (SVM) with an RBF-

kernel is trained on the labeled histograms for each of

the object categories. This yields discriminant functions of

the form

C(h

) =

j=1

exp(−γd

, h

))

where c

∈ R are coefﬁcients and h

representative

histograms (support vectors) selected by SVM training,

γ ∈ R

is a parameter selected by cross-validation, and

, h

) is the χ

distance between histograms h

and

, deﬁned as

, h

) =

k=1

(k) − h

(k))

(k) + h

(k)

The classiﬁer which results from this is very speciﬁc. It

ﬁnds superpixels which resemble superpixels that were seen

in the training data without considering the surrounding re-

gion. This means that while a wheel or grille on a car may

be correctly identiﬁed, the nearby hub of the wheel or the

headlight can be detected with lower conﬁdence or missed

altogether (Figure 1).

Another drawback of learning a classiﬁer for each super-

pixel is that the histograms associated with each superpixel

are very sparse, often containing only a handful of non-

zero elements. This is due to the nature of our superpix-

els: by deﬁnition they cover areas that are roughly uniform

in color and texture. Since our features are ﬁxed-scale and

extracted densely, our superpixels sometimes contain tens

or even hundreds of descriptors that quantize to the same

visual word.

3.3. Superpixel Neighborhoods

We address both of the problems mentioned in the previ-

ous section by introducing histograms based on superpixel

neighborhoods. Let G(S, E) be the adjacency graph of su-

perpixels s

∈ S in an image, and H

be the unnormalized

histogram associated with this region. E is the set of edges

formed between pairs of adjacent superpixels (s

, s

) in the

image and D(s

, s

) is the length of shortest path between

two superpixels. Then, H

is the histogram obtained by

merging the histograms of the superpixel s

and neighbors

who are less than N nodes away in the graph:

|D(s

)≤N

The learning framework is unchanged, except that we de-

scribe superpixels by the histograms h

= H

/kH

place of h

Using these histograms in classiﬁcation addresses both

of our previous issues. First, since adjacent superpixels

must be visually dissimilar, histograms constructed from

superpixel neighborhoods contain more diverse features and

are therefore less sparse. This provides a regularization for

our SVM, reducing overﬁtting. It also provides spatial con-

sistency in our classiﬁcation because as we increase N, his-

tograms of adjacent superpixels have more features in com-

mon.

Second, because we are effectively increasing the spatial

extent of the region considered in classiﬁcation, we are also

providing our classiﬁer with a better description of the ob-

ject. As we increase N we move from the “part” level to

the “object” level, and since not all training superpixels will

lie on the interior of the object, we are also learning some

“context”.

However, note that as N becomes larger we will blur

the boundaries of our objects since superpixels which are

on both sides of the object boundary will have similar his-

tograms. In the next section, we explore adding a CRF to

reduce this effect.

3.4. Reﬁnement with a CRF

In order to recover more precise boundaries while still

maintaining the beneﬁts of increasing N, we must intro-

duce new constraints that allow us to reduce misclassiﬁca-

tions that occur near the edges of objects. Conditional ran-

dom ﬁelds provide a natural way to incorporate such con-

straints by including them in the pairwise edge potential of

the model. Let P (c|G; w) be the conditional probability

of the set of class label assignments c given the adjacency

graph G(S, E) and a weight w:

− log(P (c|G; w)) =

∈S

Ψ(c

) + w

)∈E

Φ(c

, c

, s

)

评论收藏

内容反馈

weijueludinggong

粉丝: 0

基于图割法/最小能量的MATLAB代码包

Boykov 和 Kolmogorov 对最大流/最小切割算法的 MATLAB 包装器：与动态切割的接口

基于图割的matlab代码

基于图割的Matlab实现源码

graphcut最小割minflow的matlab代码

meteor-ford-fulkerson:Ford-Fulkerson 最大流最小割算法的简单实现

MATLAB（GUI）图割法(Grabcut)对象分割

寻找割集的matlab算例程序

最小割算法的实现：连接权重图的最小割-matlab开发

基于最小能量分割.rar

多张图片的切割MATLAB程序 grabcut 图割

【计算方法】基于MATLAB的单点弦割法程序、牛顿插值法程序、复化梯形求积公式程序、复化辛普生求积公式程序、二阶龙格-库塔法转二阶Adams预报-校正法实现程序

图割算法代码matlab-graphCutDynamicMex_BoykovKolmogorov:Boykov和Kolmogorov对最大流量

基于最低能量的线裁剪的图像拼接matlab的源码+数据（课程设计）.zip

能量检测法matlab程序

小波包能量谱matlab程序

小波包分解与重构能量熵特征提取MATLAB代码

使用图割实现马尔可夫随机场能量最小化的MATLAB代码.zip

基于最小能量线裁剪的图像拼接Matlab源代码及数据（课程设计）

【路径规划】基于人工势场法机器人自动避障matlab代码.zip

matlab水击代码.zip_MATLAB水击_matlab_matlab水锤_水击_水锤 matlab

matlab代码 轨迹优化-最小捕捉轨迹生成主控

能量检测matlab代码

最小费用最大流matlab代码

基于Graphcut的图像分割(Matlab)

最大流最小割程序

最大流问题的 Ford-Fulkerson 算法：解决最大流最小割问题的 Edmonds-Karp 实现-matlab开发

多标签图割算法

matlab数据光滑代码-GCMex:Matlab封装了Veksler，Boykov，Zabih和Kolmogorov的GraphCut算法的

图割的立体匹配(基础入门)

SQL Server- 零散知识积累

ObjectARX二次开发视频教程(C++)-非模态对话框-创建一条直线

最新资源

matlab代码轨迹优化-最小捕捉轨迹生成主控