65
13.3 Light fields and Lumigraphs
631
While a light field can be used to render a complex 3D scene from novel viewpoints, a
much better rendering (with less ghosting) can be obtained if something is known about its
3D geometry. The Lumigraph system ofGortler,Grzeszczuk,Szeliskietal.(1996) extends
the basic light field rendering approach by taking into account the 3D location of surface
points corresponding to each 3D ray.
Consider the ray (s;u) corresponding to the dashed line in Figure13.8, which intersects
the object’s surface at a distance z from the uv plane. When we look up the pixel’s color in
camera s
i
(assumingthatthelight fieldis discretely sampled on a regular 4D(s;t;u;v) grid),
the actual pixel coordinate is u
0
,instead of the original u value specified by the (s;u) ray.
Similarly, for camera s
i+1
(where s
i
s s
i+1
), pixel address u
00
is used. Thus, instead of
using quadri-linear interpolation of the nearest sampled (s;t;u;v) values around a given ray
to determine its color, the (u;v) values are modified for each discrete (s
i
;t
i
)camera.
Figure13.8 also shows the same reasoning in ray space. Here, the original continuous-
valued (s;u) ray is represented by a triangle and the nearby sampled discrete values are
shown as circles. Instead of just blending the four nearest samples, as would be indicated
by the vertical and horizontal dashed lines, the modified (s
i
;u
0
)and (s
i+1
;u
00
)values are
sampled instead and their values are then blended.
The resulting rendering system produces images of much better quality than a proxy-free
light field and is the method of choice whenever 3D geometry can be inferred. In subsequent
work,Isaksen,McMillan,andGortler (2000) show how a planar proxy for the scene, which
is a simpler 3D model, can be used to simplify the resampling equations. They also describe
how to create synthetic aperture photos, which mimic what might be seen by a wide-aperture
lens, by blending more nearby samples (LevoyandHanrahan1996). A similar approach
can be used to re-focus images taken with a plenoptic (microlens array) camera (Ng,Levoy,
Br´eedif et al. 2005; Ng 2005)oralightfieldmicroscope(Levoy, Ng, Adams et al. 2006).It
can also be used to see through obstacles, using extremely large synthetic apertures focused
on a background that can blur out foreground objects and make them appear translucent
(Wilburn,Joshi,Vaishetal.2005;Vaish,Szeliski,Zitnicketal.2006).
Nowthatwe understand howtorender new images from a lightfield, how do wego about
capturingsuchdata sets? One answer is to move acalibratedcamerawith amotion controlrig
or gantry.6 Another approach is to take handheld photographs andto determine the pose and
intrinsic calibrationof each image usingeither a calibrated stage or structure from motion. In
this case, the images need to be rebinned into a regular 4D (s;t;u;v) space before they can
be used for rendering (Gortler,Grzeszczuk,Szeliskietal.1996). Alternatively, the original
images can be used directly using a process called the unstructured Lumigraph, which we
6
Seehttp://lightfield.stanford.edu/acq.htmlfor a description ofsome of the gantries and camera arrays built at
the Stanford Computer Graphics Laboratory. This Web site also provides a number oflight field data sets thatarea
great sourceofresearch and project material.
36
632
Computer Vision: Algorithms and Applications (September 3, 2010 draft)
describe below.
Because of thelarge number of images involved, lightfields andLumigraphs can bequite
voluminous to store and transmit. Fortunately, as you can tell from Figure13.7b, there is
atremendous amount of redundancy (coherence) in a light field, which can be made even
more explicit by first computing a 3D model, as in the Lumigraph. A number of techniques
have been developed to compress and progressively transmit such representations (Gortler,
Grzeszczuk, Szeliski et al. 1996; Levoy and Hanrahan 1996; Rademacher and Bishop 1998;
Magnor and Girod 2000; Wood, Azuma, Aldinger et al. 2000; Shum, Kang, and Chan 2003;
Magnor, Ramanathan, and Girod 2003; Shum, Chan, and Kang 2007).
13.3.1 Unstructured Lumigraph
When the images in a Lumigraph are acquired in anunstructured (irregular) manner, it canbe
counterproductive to resample the resulting light rays into a regularly binned (s;t;u;v) data
structure. This is both because resampling always introduces a certain amount of aliasing and
because the resulting gridded light field can be populated very sparsely or irregularly.
Thealternative is torender directlyfrom the acquiredimages, byfinding for eachlightray
ina virtual camera the closest pixels in the originalimages. The unstructured Lumigraph ren-
dering (ULR) system ofBuehler,Bosse,McMillanetal.(2001) describes howto select such
pixels by combining a number of fidelity criteria, including epipole consistency (distance of
rays to a source camera’s center), angular deviation (similar incidence direction on the sur-
face), resolution (similar sampling density along the surface), continuity (to nearby pixels),
and consistency (along the ray). These criteria can all be combined to determine a weighting
function between each virtual camera’s pixel and a number of candidate input cameras from
which it can draw colors. To make the algorithm more efficient, the computations are per-
formedby discretizing the virtual camera’s imageplane using a regular grid overlaid with the
polyhedral object mesh model and the input camera centers of projection and interpolating
the weighting functions between vertices.
The unstructured Lumigraph generalizes previous work in both image-based rendering
andlight fieldrendering. When theinputcameras aregridded, the ULRbehaves thesame way
as regular Lumigraph rendering. When fewer cameras are available but the geometryis accu-
rate, the algorithm behaves similarly to view-dependent texture mapping (Section13.1.1).
13.3.2 Surface light fields
Of course, using a two-plane parameterization for a light fieldis notthe onlypossible choice.
(It is the one usually presented first since the projection equations and visualizations are the
easiest to draw and understand.) As we mentioned on the topic of light field compression,
30
13.3 Light fields and Lumigraphs
633
(a)
(b)
Figure 13.9 Surface light fields (Wood,Azuma,Aldingeretal.2000)
c 2000 ACM: (a)
example of a highly specular object with strong inter-reflections; (b) the surface light field
stores the light emanating from eachsurface point inall visible directions as a“Lumisphere”.
if we know the 3D shape of the object or scene whose light field is being modeled, we can
effectively compress the field because nearby rays emanating from nearby surface elements
have similar color values.
In fact, if the object is totally diffuse, ignoring occlusions, which can be handled using
3D graphics algorithms or z-buffering, all rays passing through a given surface point will
have the same color value. Hence, the light field “collapses” to the usual 2D texture-map
defined over an object’s surface. Conversely, if the surface is totallyspecular (e.g., mirrored),
each surface pointreflects a miniature copy of the environmentsurrounding that point. In the
absence of inter-reflections (e.g., a convex object in a large open space), each surface point
simplyreflects the far-fieldenvironment map(Section2.2.1), which againis two-dimensional.
Therefore, is seems that re-parameterizing the 4D light field to lie on the object’s surface can
be extremely beneficial.
These observations underlie the surface light field representation introduced byWood,
Azuma, Aldinger et al.(2000). Intheirsystem,anaccurate3Dmodelisbuiltoftheobject
being represented. Then the Lumisphere of all rays emanating from each surface point is
estimated or captured(Figure13.9). NearbyLumispheres will be highly correlated and hence
amenable to both compression and manipulation.
To estimate the diffuse component of each Lumisphere, a median filtering over all visible
exiting directions is first performed for each channel. Once this has been subtracted from the
Lumisphere, the remaining values, which should consist mostly of the specular components,
are reflected aroundthe localsurfacenormal (2.89), which turns each Lumisphere into acopy
of the local environment around that point. Nearby Lumispheres can then be compressed
using predictive coding, vector quantization, or principal component analysis.
C# Word - Delete Word Document Page in C#.NET doc.Save(outPutFilePath); Delete Consecutive Pages from Word in C#. int[] detelePageindexes = new int[] { 1, 3, 5, 7, 9 }; // Delete pages.
delete a page from a pdf online; add and delete pages in pdf online
35
634
Computer Vision: Algorithms and Applications (September 3, 2010 draft)
The decomposition into a diffuse and specular component can also be used to perform
editing or manipulation operations, such as re-painting the surface, changing the specular
component of the reflection (e.g., by blurring or sharpening the specular Lumispheres), or
even geometrically deforming the object while preservingdetailed surface appearance.
13.3.3 Application: Concentric mosaics
Auseful andsimple version of light field renderingis a panoramic image withparallax, i.e., a
video or series of photographs taken from a camera swinging in front of some rotation point.
Such panoramas can be captured by placing a camera on a boom on a tripod, or even more
simply, by holdinga camera at arm’s length while rotating your body around a fixed axis.
The resulting set of images can bethought of as a concentric mosaic (ShumandHe1999;
Shum, Wang, Chai et al. 2002)oralayereddepthpanorama(Zheng, Kang, Cohen et al.
2007). Theterm“concentricmosaic”comesfromaparticularstructurethatcanbeusedto
re-bin all of the sampled rays, essentially associatingeach column of pixels withthe “radius”
of the concentric circletowhichit is tangent(ShumandHe1999;Peleg,Ben-Ezra,andPritch
2001).
Rendering from such data structures is fast and straightforward. If we assume that the
scene is far enough away, for any virtual camera location, we can associate each column of
pixels in the virtual camera with the nearest column of pixels in the input image set. (For
aregularly captured set of images, this computation can be performed analytically.) If we
have some rough knowledge of the depth of such pixels, columns can be stretched vertically
to compensate for the change in depth between the two cameras. If we have an even more
detailed depth map (Peleg,Ben-Ezra,andPritch2001;Li,Shum,Tangetal.2004;Zheng,
Kang, Cohen et al. 2007),wecanperformpixel-by-pixeldepthcorrections.
While thevirtual camera’s motionis constrainedtoliein the planeof theoriginal cameras
andwithintheradiusof the originalcapture ring, the resultingexperiencecanexhibitcomplex
rendering phenomena, such as reflections andtranslucencies, which cannot be capturedusing
atexture-mapped 3D model of the world. Exercise13.10 has you construct a concentric
mosaic rendering system from a series of hand-held photos or video.
13.4 Environment mattes
So far in this chapter, we have dealt with view interpolation and light fields, which are tech-
niques for modeling and rendering complex static scenes seen from different viewpoints.
What if instead of moving around a virtual camera, we take a complex, refractive object,
such as the water goblet shown in Figure13.10, and place it in front of a new background?
60
13.4 Environment mattes
635
(a)
(b)
(c)
(d)
Figure 13.10 Environment mattes: (a–b) a refractive objectcan be placed infront of a series
of backgrounds and their light patterns will be correctly refracted (Zongker,Werner,Cur-
less et al. 1999)(c)multiplerefractionscanbehandledusingamixtureofGaussiansmodel
and (d) real-time mattes can be pulled using a single graded colored background (Chuang,
Zongker, Hindorff et al. 2000)
c
2000 ACM.
Instead of modeling the 4D space of rays emanating from a scene, we now need to model
how each pixel in our view of this object refracts incident lightcomingfrom its environment.
What is the intrinsic dimensionality of such a representation and how do we go about
capturingit? Let us assume thatif we trace a light ray from the camera at pixel(x;y) toward
the object, it is reflected or refracted back out toward its environment at an angle (;). If
weassume thatother objects andilluminants aresufficiently distant(the same assumption we
made for surface light fields in Section13.3.2), this 4D mapping (x;y) ! (;) captures all
the informationbetween a refractive object and its environment.Zongker,Werner,Curlesset
al.(1999)callsucharepresentationanenvironmentmatte,sinceitgeneralizestheprocessof
object matting(Section10.4) to not only cutand paste an objectfrom one image into another
but also take into account the subtle refractive or reflective interplay between the object and
its environment.
Recall from Equations (3.8) and (10.30) that a foreground object can be represented by
its premultiplied colors and opacities (F;). Such a matte can then be composited onto a
new background B using
C
i
=
i
F
i
+(1
i
)B
i
;
(13.1)
where i is the pixel under consideration. In environment matting, we augment this equation
with a reflective or refractive term to model indirect light paths between the environment
and the camera. In the original work ofZongker,Werner,Curlessetal.(1999), this indirect
component I
i
is modeled as
I
i
=R
i
Z
A
i
(x)B(x)dx;
(13.2)
where A
i
is the rectangular area of support for that pixel, R
i
is the colored reflectance or
82
636
Computer Vision: Algorithms and Applications (September 3, 2010 draft)
transmittance (for colored glossy surfaces or glass), and B(x) is the background (environ-
ment) image, which is integrated over the area A
i
(x). In follow-on work,Chuang,Zongker,
Hindorff et al.(2000)useasuperpositionoforientedGaussians,
I
i
=
X
j
R
ij
Z
G
ij
(x)B(x)dx;
(13.3)
where each 2D Gaussian
G
ij
(x) = G
2D
(x;c
ij
;
ij
;
ij
)
(13.4)
is modeled by its center c
ij
,unrotated widths
ij
=(
x
ij
;
y
ij
), andorientation
ij
.
Given a representation for an environment matte, how canwe go about estimating itfor a
particular object? The trick is to place the objectin front of a monitor (or surrounded by a set
of monitors), where we can change the illumination patterns B(x) and observe the value of
each composite pixel C
i
.
7
As with traditional two-screen matting (Section10.4.1), we can use a variety of solid
colored backgrounds to estimate each pixel’s foreground color
i
F
i
and partial coverage
(opacity)
i
. To estimate the area of support A
i
in (13.2),Zongker,Werner,Curlessetal.
(1999) use aseries of periodic horizontaland vertical solid stripes at different frequencies and
phases, which is reminiscent of the structured light patterns used in active rangefinding (Sec-
tion12.2). For the more sophisticated mixture of Gaussian model (13.3),Chuang,Zongker,
Hindorff et al.(2000)sweepaseriesofnarrowGaussianstripesatfourdifferentorientations
(horizontal, vertical, and two diagonals), which enables them to estimate multiple oriented
Gaussian responses at each pixel.
Once an environment matte has been “pulled”, it is then a simple matter to replace the
background with a new image B(x) to obtain a novel composite of the object placed in a
differentenvironment (Figure13.10a–c). The useof multiplebackgrounds duringthematting
process, however, precludes the use of this techniquewith dynamic scenes, e.g., water pouring
into a glass (Figure13.10d). In this case, a single graded color background can be used to
estimate a single2Dmonochromatic displacement for eachpixel(Chuang,Zongker,Hindorff
et al. 2000).
13.4.1 Higher-dimensional light fields
As you can tell from the preceding discussion, an environment matte in principle maps every
pixel (x;y) into a 4D distribution over light rays and is, hence, a six-dimensional representa-
tion. (In practice, each 2D pixel’s response is parameterized using a dozen or so parameters,
7Ifwerelaxtheassumptionthattheenvironmentisdistant,themonitorcanbeplacedatseveraldepthstoestimate
adepth-dependent mapping function (Zongker,Werner,Curlessetal.1999).
26
13.4 Environment mattes
637
Figure 13.11 The geometry-image continuum in image-based rendering (Kang, Szeliski,
and Anandan 2000) c
2000IEEE.Representationsattheleftofthespectrum m usemore
detailed geometry and simpler image representations, while representations and algorithms
on the right use more images and less geometry.
e.g., fF;;B;R;Ag, instead of a full mapping.) What if we want to model an object’s re-
fractive properties from every potential point of view? In this case, we need a mapping from
every incoming 4D light ray to every potential exiting 4D light ray, which is an 8D represen-
tation. If we use the same trick as with surface light fields, we can parameterize each surface
point by its 4D BRDF to reduce this mapping back down to 6D but this loses the ability to
handle multiple refractive paths.
If we want to handle dynamic light fields, we need to add another temporal dimension.
(Wenger,Gardner,Tchouetal.(2005) gives a nice example of a dynamic appearance and
illumination acquisition system.) Similarly, if we want a continuous distribution over wave-
lengths, this becomes another dimension.
These examples illustrate how modeling the full complexity of a visual scene through
sampling can be extremely expensive. Fortunately, constructing specialized models, which
exploit knowledge about the physics of light transport along with the natural coherence of
real-world objects, can make these problems more tractable.
13.4.2 The modeling to rendering continuum
The image-based rendering representations and algorithms we have studied in this chapter
spanacontinuum rangingfrom classic3Dtexture-mapped models allthewayto pure sampled
ray-based representations such as light fields (Figure13.11). Representations such as view-
dependent texture maps and Lumigraphs still use a single global geometric model, but select
the colors to map onto these surfaces from nearby images. View-dependent geometry, e.g.,
37
638
Computer Vision: Algorithms and Applications (September 3, 2010 draft)
multiple depth maps, sidestep the need for coherent 3D geometry, and can sometimes better
model local non-rigid effects such as specular motion (Swaminathan,Kang,Szeliskietal.
2002; Criminisi, Kang, Swaminathan et al. 2005). Spriteswithdepthandlayereddepth
images use image-based representations of both color and geometry and can be efficiently
rendered using warping operations rather than 3D geometric rasterization.
The best choice of representation and rendering algorithm depends on both the quantity
and quality of the input imagery as well as the intended application. When nearby views are
being rendered, image-based representations capture more of the visual fidelity of the real
world because they directly sample its appearance. On the other hand, if only a few input
images are available or the image-basedmodels need to be manipulated, e.g., to change their
shape or appearance, more abstract 3D representations such as geometric and local reflection
models are a better fit. As we continue to capture and manipulate increasingly larger quan-
tities of visual data, research into these aspects of image-based modeling and rendering will
continue to evolve.
13.5 Video-based rendering
Sincemultipleimages can be usedtorender newimages or interactive experiences, can some-
thing similar be done with video? In fact, a fair amount of work has been done in the area
of video-based rendering and video-based animation, two terms first introduced bySch¨odl,
Szeliski, Salesin et al.(2000)todenotetheprocessofgeneratingnewvideosequencesfrom
captured video footage. An early example of such work is Video Rewrite (Bregler,Covell,
andSlaney1997),inwhicharchivalvideofootageis“re-animated”byhavingactorssaynew
utterances (Figure13.12). More recently, the term video-based rendering has been used by
some researchers to denote the creation of virtual camera moves from a set of synchronized
video cameras placed in a studio (Magnor2005). (The terms free-viewpoint video and 3D
video are also sometimes used, see Section13.5.4.)
In this section, we present a number of video-based rendering systems and applications.
We start with video-based animation (Section13.5.1), in which video footage is re-arranged
or modified, e.g., in the capture and re-rendering of facial expressions. A special case of this
are video textures (Section13.5.2), in which source video is automatically cut into segments
and re-looped to create infinitely long video animations. It is also possible to create such
animations from still pictures or paintings, by segmenting the image into separately moving
regions and animating them using stochastic motion fields (Section13.5.3).
Next, we turn our attention to 3D video (Section13.5.4), in which multiple synchronized
videocameras are usedto film a scene from differentdirections. Thesource video frames can
then be re-combined using image-based rendering techniques, such as view interpolation, to
Documents you may be interested
Documents you may be interested