55
3.7 Global optimization
183
gradient descent (Metropolis,Rosenbluth,Rosenbluthetal.1953;GemanandGeman1984).
When the amount of noise is decreased over time, this technique is known as simulated an-
nealing (Kirkpatrick,Gelatt,andVecchi1983;Carnevali,Coletti,andPatarnello1985;Wol-
berg and Pavlidis 1985; Swendsen and Wang 1987)andwasfirstpopularizedincomputer
vision byGemanandGeman(1984) and later applied to stereo matching byBarnard(1989),
amongothers.
Even this technique, however, does not perform that well (Boykov,Veksler,andZabih
2001). Forbinaryimages,amuchbettertechnique,introducedtothecomputervisioncom-
munity byBoykov,Veksler,andZabih(2001) is to re-formulate the energy minimization as
amax-flow/min-cut graph optimization problem (Greig,Porteous,andSeheult1989). This
technique has informally come to be known as graph cuts in the computer visioncommunity
(BoykovandKolmogorov2010). For simple energy functions, e.g., those where the penalty
for non-identical neighboring pixels is a constant, this algorithm is guaranteedto produce the
global minimum. . KolmogorovandZabih(2004) formally characterize the class of binary
energy potentials (regularity conditions) for which these results hold, while newer work by
Komodakis, Tziritas, and Paragios(2008)and Rother, Kolmogorov, Lempitsky et al.(2007)
provide good algorithms for the cases when they do not.
In additionto the above mentionedtechniques, a number of other optimizationapproaches
have been developed for MRF energy minimization, such as (loopy) belief propagation and
dynamic programming (for one-dimensional problems). These are discussed in more detail
in AppendixB.5as wellas the comparative survey paper bySzeliski,Zabih,Scharsteinetal.
(2008).
Ordinal-valued MRFs
In addition to binary images, Markov random fields can be applied to ordinal-valued labels
such as grayscale images or depth maps. The term ”ordinal” indicates that the labels have an
implied ordering, e.g., that higher values are lighter pixels. In the next section, we look at
unordered labels, suchas source image labels for image compositing.
In many cases, it is common to extend the binary data and smoothness prior terms as
E
d
(i;j) = w(i;j)
d
(f(i;j) d(i;j))
(3.112)
and
E
p
(i;j) = s
x
(i;j)
p
(f(i;j) f(i + 1;j)) + s
y
(i;j)
p
(f(i;j) f(i;j + 1)); (3.113)
which are robust generalizations of the quadratic penalty terms (3.101) and (3.100), first
introduced in (3.105). As before, the w(i;j), s
x
(i;j) and s
y
(i;j) weights can be used to
locally control the data weighting and the horizontal and vertical smoothness. Instead of
48
184
Computer Vision: Algorithms and Applications (September 3, 2010 draft)
(a)
(b)
(c)
(d)
Figure 3.57
Grayscale image denoising and inpainting: (a) original image; (b) image
corrupted by noise and with missing data (black bar); (c) image restored using loopy be-
lief propagation; (d) image restored using expansion move graph cuts. Images are from
http://vision.middlebury.edu/MRF/results/(Szeliski,Zabih,Scharsteinetal.2008).
using a quadratic penalty, however, a general monotonically increasing penalty function ()
is used. (Different functions can be used for the data and smoothness terms.) For example,
p
can be a hyper-Laplacian penalty
p
(d) = jdj
p
; p < 1;
(3.114)
which better encodes the distribution of gradients (mainly edges) in an image than either a
quadratic or linear (total variation) penalty.
24
Levin and Weiss(2007)usesuchapenalty
to separate a transmitted and reflected image (Figure8.17) by encouraging gradients to lie in
one or the other image, but not both. More recently,Levin,Fergus,Durandetal.(2007) use
the hyper-Laplacian as a prior for image deconvolution (deblurring) andKrishnanandFergus
(2009) develop a faster algorithm for solving such problems. For the data penalty,
d
can be
quadratic (to model Gaussian noise) or the log of a contaminated Gaussian (AppendixB.3).
When
p
is a quadratic function, the resulting Markov random field is called a Gaussian
Markov random field (GMRF) andits minimum can befoundby sparse linear system solving
(3.103). When the weighting functions are uniform, the GMRF becomes a special case of
Wiener filtering (Section3.4.3). Allowing the weighting functions to depend on the input
image (a special kind of conditional random field, which we describe below) enables quite
sophisticated image processing algorithms to be performed, including colorization (Levin,
Lischinski, and Weiss 2004),interactivetonemapping(Lischinski, Farbman, Uyttendaele et
al. 2006a),naturalimagematting(Levin, Lischinski, and Weiss 2008),andimagerestoration
(Tappen,Liu,Freemanetal.2007).
24
Notethat,unlikeaquadraticpenalty,thesumofthehorizontalandverticalderivativep-normsisnotrotationally
invariant. Abetter approach may be to locally estimate the gradient direction and to impose different norms on the
perpendicularand parallel components,whichRothandBlack(2007b)call asteerable randomfield.
41
3.7 Global optimization
185
(a) initial labeling
(b) standard move
(c) --swap
(d) -expansion
Figure 3.58 Multi-level graph optimization from (Boykov, Veksler,andZabih2001)
c
2001 IEEE: (a) initial problem configuration; (b) the standard move only changes one pixel;
(c) the --swap optimally exchanges all and -labeled pixels; (d) the -expansion move
optimally selects amongcurrent pixel values and the label.
When
d
or
p
are non-quadratic functions, gradient descent techniques such as non-
linear least squares or iteratively re-weighted least squares can sometimes be used (Ap-
pendixA.3). However, if the search space has lots of local minima, as is the case for stereo
matching (Barnard1989;Boykov,Veksler,andZabih2001), more sophisticated techniques
are required.
The extension of graph cut techniques to multi-valued problems was first proposed by
Boykov, Veksler, and Zabih(2001). Intheirpaper,theydeveloptwodifferentalgorithms,
called the swap move and the expansionmove, which iterate among aseries of binarylabeling
sub-problems tofinda goodsolution(Figure3.58). Notethat aglobalsolution is generallynot
achievable, as the problem is provably NP-hard for general energy functions. Because both
these algorithms use a binary MRF optimizationinside their inner loop, theyaresubject to the
kindof constraints ontheenergy functions that occur in the binarylabeling case(Kolmogorov
andZabih2004).AppendixB.5.4discussesthesealgorithmsinmoredetail,alongwithsome
more recently developed approaches to this problem.
Another MRF inference technique is belief propagation (BP). While belief propagation
was originally developed for inference over trees, where it is exact (Pearl1988), it has more
recently been applied to graphs with loops such as Markov random fields (Freeman,Pasz-
tor, and Carmichael 2000; Yedidia, Freeman, and Weiss 2001). Infact,someofthebetter
performing stereo-matching algorithms use loopy belief propagation (LBP) to perform their
inference (Sun,Zheng,andShum2003). LBP is discussed in more detail in AppendixB.5.3
as well as the comparative survey paper on MRF optimization (Szeliski,Zabih,Scharsteinet
al. 2008).
Figure3.57 shows an example of image denoising and inpainting (hole filling) using a
non-quadratic energy function (non-Gaussian MRF). The original image has been corrupted
by noise and a portion of the data has been removed (the black bar). In this case, the loopy
40
186
Computer Vision: Algorithms and Applications (September 3, 2010 draft)
f(i,j)
s
x
(i,j)
f(i,j+1)
s
y
(i,j)
w(i, j)
d(i, j)
f(i+1,j)
f(i+1,j+1)
d(i, j+1)
Figure 3.59 Graphical model for a Markovrandom field witha more complexmeasurement
model. The additional colored edges show how combinations of unknown values (say, in a
sharp image) produce the measured values (a noisy blurred image). The resulting graphical
modelis still a classic MRF andis justas easy tosample from, but some inference algorithms
(e.g., those based on graph cuts) may not be applicable because of the increased network
complexity, since state changes during the inference become more entangledandthe posterior
MRF has much larger cliques.
belief propagation algorithm computes a slightly lower energy and also a smoother image
than the alpha-expansion graph cut algorithm.
Of course, the above formula (3.113) for the smoothness term E
p
(i;j) just shows the
simplest case. In more recent work,RothandBlack(2009) propose a Field of Experts (FoE)
model, which sums up a large number of exponentiated local filter outputs to arrive at the
smoothness penalty.WeissandFreeman(2007) analyze this approach and compare it to the
simpler hyper-Laplacian model of natural image statistics. . LyuandSimoncelli(2009) use
GaussianScale Mixtures (GSMs) to construct an inhomogeneous multi-scale MRF, with one
(positive exponential) GMRF modulating the variance (amplitude) of another Gaussian MRF.
It is alsopossibletoextend themeasurementmodeltomakethesampled(noise-corrupted)
inputpixels correspondto blends of unknown (latent) image pixels, as in Figure3.59. This is
the commonly occurring case when trying to de-blur an image. While this kind of a model is
still a traditional generative Markov random field, finding an optimal solution can be difficult
because the clique sizes get larger. In such situations, gradient descent techniques, such
as iteratively reweighted least squares, can be used (Joshi, Zitnick, Szeliskietal. 2009).
Exercise3.31 has you explore some of these issues.
66
3.7 Global optimization
187
Figure 3.60 An unordered label MRF (Agarwala, Dontcheva,Agrawalaetal.2004)
c
2004 ACM: Strokes in each of the source images on the left are used as constraints on an
MRF optimization, whichis solved using graph cuts. The resulting multi-valued labelfield is
shown as a color overlay in the middle image, and the final composite is shown on the right.
Unordered labels
Another case with multi-valued labels where Markov random fields are often applied are
unordered labels, i.e., labels where there is no semantic meaning to the numerical difference
between the values of two labels. For example, if we are classifying terrain from aerial
imagery, it makes no sense to take the numeric difference between the labels assigned to
forest, field, water, and pavement. In fact, the adjacencies of these various kinds of terrain
each have different likelihoods, so it makes more sense to use a prior of the form
E
p
(i;j) = s
x
(i;j)V(l(i;j);l(i + 1;j)) + s
y
(i;j)V (l(i;j);l(i;j + 1));
(3.115)
where V(l
0
;l
1
) is a general compatibility or potential function. (Note that we have also
replaced f(i;j)with l(i;j) to makeitclearer that these are labelsrather than discrete function
samples.) An alternative way to write this prior energy (Boykov,Veksler,andZabih2001;
Szeliski, Zabih, Scharstein et al. 2008)is
E
p
=
X
(p;q)2N
V
p;q
(l
p
;l
q
);
(3.116)
where the (p;q) are neighboring pixels and a spatiallyvarying potentialfunction V
p;q
is eval-
uated for each neighboring pair.
Animportant application of unordered MRF labeling is seam findingin image composit-
ing (Davis1998;Agarwala,Dontcheva, Agrawalaetal.2004) (see Figure3.60, which is
explained in more detail in Section9.3.2). Here, the compatibility V
p;q
(l
p
;l
q
)measures the
quality of the visual appearance that would result from placing a pixel p from image l
p
next
to a pixel q from image l
q
.As with most MRFs, we assume that V
p;q
(l;l) = 0, i.e., it is per-
fectly fine to choose contiguous pixels from the same image. For different labels, however,
62
188
Computer Vision: Algorithms and Applications (September 3, 2010 draft)
Figure 3.61 Image segmentation(BoykovandFunka-Lea2006)
c 2006Springer: The user
draws a few red strokes in the foreground object and a few blue ones in the background. The
system computes color distributions for the foreground and background and solves a binary
MRF. Thesmoothness weights are modulatedbythe intensitygradients (edges), whichmakes
this a conditional random field (CRF).
the compatibility V
p;q
(l
p
;l
q
)may depend on the values of the underlying pixels I
l
p
(p) and
I
l
q
(q).
Consider, for example, where one image I
0
is allsky blue, i.e., I
0
(p) = I
0
(q) = B, while
the other image I
1
has a transition from sky blue, I
1
(p) = B, to forest green, I
1
(q) = G.
I
0
:
p
q
p
q
:I
1
In this case, V
p;q
(1;0) = 0 (the colors agree), while V
p;q
(0;1) > 0 (the colors disagree).
Conditional random fields
In a classic Bayesian model (3.106–3.108),
p(xjy) / p(yjx)p(x);
(3.117)
the prior distribution p(x) is independent of the observations y. Sometimes, however, it is
useful to modify our prior assumptions, say about the smoothness of the field we are trying
to estimate, in response to the sensed data. Whether this makes sense from a probability
viewpoint is something we discuss once we have explained the new model.
Consider the interactive image segmentation problem shown in Figure3.61 (Boykovand
Funka-Lea 2006).Inthisapplication,theuserdrawsforeground(red)andbackground(blue)
strokes, and the system then solves a binary MRF labeling problem to estimate the extent of
the foreground object. In addition to minimizing a data term, which measures the pointwise
similarity between pixel colors and the inferred region distributions (Section5.5), the MRF
Documents you may be interested
Documents you may be interested