CHAPTER
SEVEN
DENOISINGAUTOENCODERS(DA)
Note: ThissectionassumesthereaderhasalreadyreadthroughClassifyingMNISTdigitsusingLogistic
RegressionandMultilayerPerceptron. AdditionallyitusesthefollowingTheanofunctionsandconcepts
:T.tanh,sharedvariables,basicarithmeticops,T.grad,Randomnumbers,floatX.Ifyouintendtorunthe
codeonGPUalsoreadGPU.
Note: Thecodeforthissectionisavailablefordownloadhere.
TheDenoisingAutoencoder(dA)isanextensionofaclassicalautoencoderanditwasintroducedas a
buildingblockfordeepnetworksin[Vincent08]. Wewillstartthetutorialwithashortdiscussionon
Autoencoders.
7.1 Autoencoders
Seesection4.6of[Bengio09]foranoverviewofauto-encoders.Anautoencodertakesaninputx2[0;1]
d
andfirstmapsit(withanencoder)toahiddenrepresentationy2[0;1]
d
0
throughadeterministicmapping,
e.g.:
y=s(Wx+b)
Wheresisanon-linearitysuchasthesigmoid. Thelatentrepresentationy,orcodeisthenmappedback
(withadecoder)intoareconstructionzofthesameshapeasx. Themappinghappensthroughasimilar
transformation,e.g.:
z=s(W
0
y+b
0
)
(Here,theprimesymboldoesnotindicatematrixtransposition.) zshouldbeseenasapredictionofx,
giventhecodey. Optionally,theweightmatrixW
0
ofthereversemappingmaybeconstrainedtobethe
transposeoftheforwardmapping:W
0
=W
T
.Thisisreferredtoastiedweights. Theparametersofthis
model(namelyW,b,b
0
and,ifonedoesn’tusetiedweights,alsoW
0
)areoptimizedsuchthattheaverage
reconstructionerrorisminimized.
Thereconstructionerrorcanbemeasuredinmanyways,dependingontheappropriatedistributionalas-
sumptionsontheinputgiventhecode. ThetraditionalsquarederrorL(xz)=jjx zjj
2
,canbeused. If
65
Batch pdf to jpg online - Convert PDF to JPEG images in C#.net, ASP.NET MVC, WinForms, WPF project
How to convert PDF to JPEG using C#.NET PDF to JPEG conversion / converter library control SDK
convert pdf image to jpg online; convert pdf to jpg c#
Batch pdf to jpg online - VB.NET PDF Convert to Jpeg SDK: Convert PDF to JPEG images in vb.net, ASP.NET MVC, WinForms, WPF project
Online Tutorial for PDF to JPEG (JPG) Conversion in VB.NET Image Application
batch pdf to jpg converter online; convert pdf file to jpg online
DeepLearningTutorial,Release0.1
theinputisinterpretedaseitherbitvectorsorvectorsofbitprobabilities,cross-entropyofthereconstruction
canbeused:
L
H
(x;z)= 
Xd
k=1
[x
k
logz
k
+(1 x
k
)log(1 z
k
)]
Thehopeisthatthecodeyisadistributedrepresentationthatcaptures thecoordinatesalongthemain
factorsofvariationinthedata. Thisissimilartothewaytheprojectiononprincipalcomponentswould
capturethemainfactorsofvariationinthedata.Indeed,ifthereisonelinearhiddenlayer(thecode)andthe
meansquarederrorcriterionisusedtotrainthenetwork,thenthekhiddenunitslearntoprojecttheinputin
thespanofthefirstkprincipalcomponentsofthedata.Ifthehiddenlayerisnon-linear,theauto-encoder
behavesdifferentlyfromPCA,withtheabilitytocapturemulti-modalaspectsoftheinputdistribution.The
departurefromPCAbecomesevenmoreimportantwhenweconsiderstackingmultipleencoders(andtheir
correspondingdecoders)whenbuildingadeepauto-encoder[Hinton06].
Becauseyisviewedasalossycompressionofx,itcannotbeagood(small-loss)compressionforallx.
Optimizationmakesitagoodcompressionfortrainingexamples,andhopefullyforotherinputsaswell,but
notforarbitraryinputs.Thatisthesenseinwhichanauto-encodergeneralizes:itgiveslowreconstruction
errorontestexamplesfromthesamedistributionasthetrainingexamples,butgenerallyhighreconstruction
erroronsamplesrandomlychosenfromtheinputspace.
Wewanttoimplementanauto-encoderusingTheano,intheformofaclass,thatcouldbeafterwardsused
inconstructingastackedautoencoder. Thefirststepistocreatesharedvariablesfortheparametersofthe
autoencoderW,bandb
0
.(Sinceweareusingtiedweightsinthistutorial,W
T
willbeusedforW
0
):
def __init__(
self,
numpy_rng,
theano_rng=None,
input=None,
n_visible=784,
n_hidden=500,
W=None,
bhid=None,
bvis=None
):
"""
Initialize the dA class by specifying the number of visible units (the
dimension d d of the input ), the number of hidden units ( the dimension
d’ of the e latent t or hidden space ) and the e corruption n level. The
constructor also receives symbolic variables for the input, weights and
bias. Such a symbolic variables are useful l when, , for example the input
is the result of some computations, or when n weights s are shared between
the dA and an MLP layer. When dealing with h SdAs s this always happens,
the dA on n layer r 2 gets as input the output t of f the dA on layer 1,
and the weights of the dA are used in the second stage of training
to construct an MLP.
:type numpy_rng: numpy.random.RandomState
:param numpy_rng: number random generator used to generate weights
:type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
66
Chapter7. DenoisingAutoencoders(dA)
JPEG to PDF Converter | Convert JPEG to PDF, Convert PDF to JPEG
software; Support a batch conversion of JPG to PDF with amazingly high speed; Get a compressed PDF file after conversion; Support
convert online pdf to jpg; convert pdf to jpg converter
JPG to GIF Converter | Convert JPEG to GIF, Convert GIF to JPG
speed JPEG to GIF Converter, faster than other JPG Converters; when you convert the files in batch; Storing conversion so the user who is not online still can
best way to convert pdf to jpg; batch convert pdf to jpg
DeepLearningTutorial,Release0.1
:param theano_rng: Theano random generator; ; if f None is given one is
generated based on a seed drawn from ‘rng‘
:type input: theano.tensor.TensorType
:param input: a symbolic description of the e input t or None for
standalone dA
:type n_visible: int
:param n_visible: number of visible units
:type n_hidden: int
:param n_hidden:
number of hidden units
:type W: : theano.tensor.TensorType
:param W: : Theano o variable pointing to a set t of f weights that should be
shared belong the dA and another r architecture; ; if dA should
be standalone set this to None
:type bhid: theano.tensor.TensorType
:param bhid: Theano variable pointing to a a set t of biases values (for
hidden units) that should be shared belong dA and another
architecture; if dA should be e standalone e set this to None
:type bvis: theano.tensor.TensorType
:param bvis: Theano variable pointing to a a set t of biases values (for
visible units) that should be e shared d belong dA and another
architecture; if dA should be e standalone e set this to None
"""
self.n_visible n_visible
self.n_hidden n_hidden
# create e a a Theano random generator that gives symbolic random values
if not theano_rng:
theano_rng RandomStreams(numpy_rng.randint(2
**
30))
# note : : W’ ’ was written as ‘W_prime‘ and b’ ’ as s ‘b_prime‘
if not W:
# W is initialized with ‘initial_W‘ which is uniformely sampled
# from -4
*
sqrt(6./(n_visible+n_hidden)) and
# 4
*
sqrt(6./(n_hidden+n_visible))the output of uniform if
# converted using asarray to dtype
# theano.config.floatX so that the code e is s runable on GPU
initial_W numpy.asarray(
numpy_rng.uniform(
low=-4
*
numpy.sqrt(6. (n_hidden n_visible)),
high=4
*
numpy.sqrt(6. (n_hidden n_visible)),
size=(n_visible, n_hidden)
),
dtype=theano.config.floatX
)
theano.shared(value=initial_W, name=’W’, borrow=True)
7.1. Autoencoders
67
JPG to DICOM Converter | Convert JPEG to DICOM, Convert DICOM to
Open JPEG to DICOM Converter first; Load JPG images from local folders in "File" in toolbar Windows Explorer; Select "Batch Conversion" & Choose "DICOM" in
convert pdf to jpg file; change pdf file to jpg file
JPG to JBIG2 Converter | Convert JPEG to JBIG2, Convert JBIG2 to
Ability to preserve original images without any affecting; Ability to convert image swiftly between JPG & JBIG2 in single and batch mode;
pdf to jpg; conversion pdf to jpg
DeepLearningTutorial,Release0.1
if not bvis:
bvis theano.shared(
value=numpy.zeros(
n_visible,
dtype=theano.config.floatX
),
borrow=True
)
if not bhid:
bhid theano.shared(
value=numpy.zeros(
n_hidden,
dtype=theano.config.floatX
),
name=’b’,
borrow=True
)
self.W
# b corresponds to the bias of the hidden
self.bhid
# b_prime e corresponds s to the bias of the visible
self.b_prime bvis
# tied weights, therefore W_prime is W transpose
self.W_prime self.W.T
self.theano_rng theano_rng
# if no input is given, generate a variable e representing g the input
if input is None:
# we e use e a matrix because we expect a minibatch of several
# examples, each example being a row
self.T.dmatrix(name=’input’)
else:
self.input
self.params [self.W, self.b, self.b_prime]
Notethatwepassthesymbolicinputtotheautoencoderasaparameter.Thisissothatwecanconcatenate
layersofautoencoderstoformadeepnetwork: thesymbolicoutput(theyabove)oflayerkwillbethe
symbolicinputoflayerk+1.
Nowwecanexpressthecomputationofthelatentrepresentationandofthereconstructedsignal:
def get_hidden_values(selfinput):
""" Computes the values of the hidden layer r """
return T.nnet.sigmoid(T.dot(inputself.W) self.b)
def get_reconstructed_input(self, hidden):
"""Computes the reconstructed input given the values of the
hidden layer
"""
return T.nnet.sigmoid(T.dot(hidden, self.W_prime) self.b_prime)
68
Chapter7. DenoisingAutoencoders(dA)
JPG to Word Converter | Convert JPEG to Word, Convert Word to JPG
Open JPEG to Word Converter first; Load JPG images from local folders in "File" in toolbar Windows Explorer; Select "Batch Conversion" & Choose "Word" in
changing file from pdf to jpg; change pdf to jpg on
JPG to JPEG2000 Converter | Convert JPEG to JPEG2000, Convert
Open JPEG to JPEG2000 Converter first; ad JPG images from local folders in "File" in toolbar Windows Explorer; Select "Batch Conversion" & Choose "JPEG2000" in
convert pdf to jpg; convert pdf to jpg 100 dpi
DeepLearningTutorial,Release0.1
Andusingthesefunctionswecancomputethecostandtheupdatesofonestochasticgradientdescentstep:
def get_cost_updates(self, corruption_level, learning_rate):
""" This s function n computes the cost and the e updates s for one trainng
step of the dA """
tilde_x self.get_corrupted_input(self.x, , corruption_level)
self.get_hidden_values(tilde_x)
self.get_reconstructed_input(y)
# note : : we e sum over the size of a datapoint; if we are using
#
minibatches, L will be a vector, with one entry per
#
example in minibatch
= - T.sum(self.x
*
T.log(z) (self.x)
*
T.log(z), axis=1)
# note : : L L is now a vector, where each element is the
#
cross-entropy cost of the reconstruction of the
#
corresponding example of the minibatch. We need to
#
compute the average of all these to get the cost of
#
the minibatch
cost T.mean(L)
# compute e the e gradients of the cost of the e ‘dA‘ ‘ with respect
# to its s parameters
gparams T.grad(cost, self.params)
# generate the list of updates
updates [
(param, param learning_rate
*
gparam)
for param, gparam in zip(self.params, gparams)
]
return (cost, updates)
WecannowdefineafunctionthatappliediterativelywillupdatetheparametersW,bandb_primesuch
thatthereconstructioncostisapproximatelyminimized.
da = dA(
numpy_rng=rng,
theano_rng=theano_rng,
input=x,
n_visible=28
*
28,
n_hidden=500
)
cost, updates s = = da.get_cost_updates(
corruption_level=0.,
learning_rate=learning_rate
)
train_da = theano.function(
[index],
cost,
updates=updates,
givens={
x: train_set_x[index
*
batch_size: (index + 1)
*
batch_size]
}
7.1. Autoencoders
69
JPG to PNG Converter | Convert JPEG to PNG, Convert PNG to JPG
Open JPEG to PNG Converter first; Load JPG images from local folders in "File" in toolbar Windows Explorer; Select "Batch Conversion" & Choose "PNG" in "Output
convert multiple page pdf to jpg; convert pdf image to jpg image
VB.NET Image: PDF to Image Converter, Convert Batch PDF Pages to
VB.NET > Convert PDF to Image. "This online guide content end users to convert PDF and PDF/A documents used commonly in daily life (like tiff, jpg, png, bitmap
.pdf to .jpg converter online; change pdf file to jpg
DeepLearningTutorial,Release0.1
)
start_time = = timeit.default_timer()
############
# TRAINING #
############
# go through h training g epochs
for epoch in n xrange(training_epochs):
# go through trainng set
c = []
for batch_index in xrange(n_train_batches):
c.append(train_da(batch_index))
print ’Training epoch %d, cost ’ % epoch, numpy.mean(c)
end_time = timeit.default_timer()
training_time = = (end_time - start_time)
print >> sys.stderr, (’The no corruption code for file ’ +
os.path.split(__file__)[1] +
’ ran for %.2fm’ % ((training_time) / 60.))
image = Image.fromarray(
tile_raster_images(X=da.W.get_value(borrow=True).T,
img_shape=(28, 28), tile_shape=(10, 10),
tile_spacing=(1, 1)))
image.save(’filters_corruption_0.png’)
# start-snippet-3
#####################################
# BUILDING THE MODEL CORRUPTION 30% #
#####################################
rng = numpy.random.RandomState(123)
theano_rng = = RandomStreams(rng.randint(2
**
30))
da = dA(
numpy_rng=rng,
theano_rng=theano_rng,
input=x,
n_visible=28
*
28,
n_hidden=500
)
cost, updates s = = da.get_cost_updates(
corruption_level=0.3,
learning_rate=learning_rate
)
train_da = theano.function(
[index],
70
Chapter7. DenoisingAutoencoders(dA)
DeepLearningTutorial,Release0.1
cost,
updates=updates,
givens={
x: train_set_x[index
*
batch_size: (index + 1)
*
batch_size]
}
)
start_time = = timeit.default_timer()
############
# TRAINING #
############
# go through h training g epochs
for epoch in n xrange(training_epochs):
# go through trainng set
c = []
for batch_index in xrange(n_train_batches):
c.append(train_da(batch_index))
print ’Training epoch %d, cost ’ % epoch, numpy.mean(c)
end_time = timeit.default_timer()
training_time = = (end_time - start_time)
print >> sys.stderr, (’The 30% corruption code e for r file ’ +
os.path.split(__file__)[1] +
’ ran for %.2fm’ % (training_time / 60.))
# end-snippet-3
# start-snippet-4
image = Image.fromarray(tile_raster_images(
X=da.W.get_value(borrow=True).T,
img_shape=(28, 28), tile_shape=(10, 10),
tile_spacing=(1, 1)))
image.save(’filters_corruption_30.png’)
# end-snippet-4
os.chdir(’../’)
if __name__ == ’__main__’:
test_dA()
Ifthereisnoconstraintbesidesminimizingthereconstructionerror,onemightexpectanauto-encoderwith
ninputsandanencodingofdimensionn(orgreater)tolearntheidentityfunction,merelymappinganinput
toitscopy.Suchanautoencoderwouldnotdifferentiatetestexamples(fromthetrainingdistribution)from
otherinputconfigurations.
Surprisingly, experimentsreportedin[Bengio07]suggestthat, inpractice,whentrainedwithstochastic
gradientdescent,non-linearauto-encoderswithmorehiddenunitsthaninputs(calledovercomplete)yield
usefulrepresentations.(Here,“useful”meansthatanetworktakingtheencodingasinputhaslowclassifi-
7.1. Autoencoders
71
DeepLearningTutorial,Release0.1
cationerror.)
AsimpleexplanationisthatstochasticgradientdescentwithearlystoppingissimilartoanL2regularization
oftheparameters.Toachieveperfectreconstructionofcontinuousinputs,aone-hiddenlayerauto-encoder
withnon-linearhiddenunits(exactlylikeintheabovecode)needsverysmallweightsinthefirst(encoding)
layer,tobringthenon-linearityofthehiddenunitsintotheirlinearregime,andverylargeweightsinthe
second(decoding)layer. Withbinaryinputs,verylargeweightsarealsoneededtocompletelyminimize
thereconstructionerror.Sincetheimplicitorexplicitregularizationmakesitdifficulttoreachlarge-weight
solutions,theoptimizationalgorithmfindsencodingswhichonlyworkwellforexamplessimilartothosein
thetrainingset,whichiswhatwewant.Itmeansthattherepresentationisexploitingstatisticalregularities
presentinthetrainingset,ratherthanmerelylearningtoreplicatetheinput.
Thereareotherwaysbywhichanauto-encoderwithmorehiddenunitsthaninputscouldbepreventedfrom
learningtheidentityfunction,capturingsomethingusefulabouttheinputinitshiddenrepresentation.One
istheadditionofsparsity(forcingmanyofthehiddenunitstobezeroornear-zero).Sparsityhasbeenex-
ploitedverysuccessfullybymany[Ranzato07][Lee08].Anotheristoaddrandomnessinthetransformation
frominputtoreconstruction. ThistechniqueisusedinRestrictedBoltzmannMachines(discussedlaterin
RestrictedBoltzmannMachines(RBM)),aswellasinDenoisingAuto-Encoders,discussedbelow.
7.2 DenoisingAutoencoders
Theideabehinddenoisingautoencodersissimple.Inordertoforcethehiddenlayertodiscovermorerobust
featuresandpreventitfromsimplylearningtheidentity,wetraintheautoencodertoreconstructtheinput
fromacorruptedversionofit.
Thedenoisingauto-encoderisastochasticversionoftheauto-encoder.Intuitively,adenoisingauto-encoder
doestwothings:trytoencodetheinput(preservetheinformationabouttheinput),andtrytoundotheeffect
ofacorruptionprocessstochasticallyappliedtotheinputoftheauto-encoder.Thelattercanonlybedone
bycapturingthestatisticaldependenciesbetweentheinputs.Thedenoisingauto-encodercanbeunderstood
fromdifferentperspectives(themanifoldlearningperspective,stochasticoperatorperspective,bottom-up–
informationtheoreticperspective,top-down–generativemodelperspective),allofwhichareexplainedin
[Vincent08].Seealsosection7.2of[Bengio09]foranoverviewofauto-encoders.
In[Vincent08],thestochasticcorruptionprocessrandomlysetssomeoftheinputs(asmanyashalfofthem)
tozero.Hencethedenoisingauto-encoderistryingtopredictthecorrupted(i.e. missing)valuesfromthe
uncorrupted(i.e.,non-missing)values,forrandomlyselectedsubsetsofmissingpatterns. Notehowbeing
abletopredictanysubsetofvariablesfromtherestisasufficientconditionforcompletelycapturingthe
jointdistributionbetweenasetofvariables(thisishowGibbssamplingworks).
Toconverttheautoencoderclassintoadenoisingautoencoderclass,allweneedtodoistoaddastochastic
corruptionstepoperatingontheinput.Theinputcanbecorruptedinmanyways,butinthistutorialwewill
sticktotheoriginalcorruptionmechanismofrandomlymaskingentriesoftheinputbymakingthemzero.
Thecodebelowdoesjustthat:
def get_corrupted_input(selfinput, corruption_level):
"""This function keeps ‘‘1-corruption_level‘‘ entries of the inputs the
same and d zero-out t randomly selected subset t of f size ‘‘coruption_level‘‘
Note : first argument of theano.rng.binomial is the shape(size) of
random numbers that it should produce
second argument is the number of trials
72
Chapter7. DenoisingAutoencoders(dA)
DeepLearningTutorial,Release0.1
third argument is the probability of f success s of any trial
this will produce an array of 0s and 1s where 1 has a
probability of 1 - ‘‘corruption_level‘‘ and 0 with
‘‘corruption_level‘‘
The binomial function return int64 4 data a type by
default.
int64 multiplicated by the input
type(floatX) always return float64. . To o keep all data
in floatX when floatX is float32, we set the dtype of
the binomial to floatX. As in our case the value of
the binomial is always 0 or 1, this s don’t t change the
result. This is needed to allow the e gpu u to work
correctly as it only support float32 for now.
"""
return self.theano_rng.binomial(size=input.shape, n=1,
p=corruption_level,
dtype=theano.config.floatX)
*
input
Inthestackedautoencoderclass(StackedAutoencoders)theweightsofthedAclasshavetobesharedwith
thoseofacorrespondingsigmoidlayer.Forthisreason,theconstructorofthedAalsogetsTheanovariables
pointingtothesharedparameters.IfthoseparametersarelefttoNone,newoneswillbeconstructed.
Thefinaldenoisingautoencoderclassbecomes:
class dA(object):
"""Denoising Auto-Encoder r class (dA)
A denoising autoencoders tries to reconstruct the input from a corrupted
version of it t by y projecting it first in a latent space and reprojecting
it afterwards s back k in the input space. Please refer to Vincent et al.,2008
for more details. If x is the input then equation (1) computes a partially
destroyed version of x by means of a stochastic c mapping g q_D. Equation (2)
computes the e projection n of the input into the latent space. Equation (3)
computes the e reconstruction n of the input, while e equation n (4) computes the
reconstruction error.
.. math::
\tilde{x} ~ ~ q_D(\tilde{x}|x)
(1)
y = s(W \tilde{x} + b)
(2)
x = s(W’ ’ y
+ b’)
(3)
L(x,z) = = -sum_{k=1}^d d [x_k \log z_k + (1-x_k) \log( 1-z_k)]
(4)
"""
def __init__(
self,
numpy_rng,
7.2. DenoisingAutoencoders
73
DeepLearningTutorial,Release0.1
theano_rng=None,
input=None,
n_visible=784,
n_hidden=500,
W=None,
bhid=None,
bvis=None
):
"""
Initialize the dA class by specifying the number of visible units (the
dimension d d of the input ), the number of hidden units ( the dimension
d’ of the e latent t or hidden space ) and the e corruption n level. The
constructor also receives symbolic variables for the input, weights and
bias. Such a symbolic variables are useful l when, , for example the input
is the result of some computations, or when n weights s are shared between
the dA and an MLP layer. When dealing with h SdAs s this always happens,
the dA on n layer r 2 gets as input the output t of f the dA on layer 1,
and the weights of the dA are used in the second stage of training
to construct an MLP.
:type numpy_rng: numpy.random.RandomState
:param numpy_rng: number random generator used to generate weights
:type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
:param theano_rng: Theano random generator; ; if f None is given one is
generated based on a seed drawn from ‘rng‘
:type input: theano.tensor.TensorType
:param input: a symbolic description of the e input t or None for
standalone dA
:type n_visible: int
:param n_visible: number of visible units
:type n_hidden: int
:param n_hidden:
number of hidden units
:type W: : theano.tensor.TensorType
:param W: : Theano o variable pointing to a set t of f weights that should be
shared belong the dA and another r architecture; ; if dA should
be standalone set this to None
:type bhid: theano.tensor.TensorType
:param bhid: Theano variable pointing to a a set t of biases values (for
hidden units) that should be shared belong dA and another
architecture; if dA should be e standalone e set this to None
:type bvis: theano.tensor.TensorType
:param bvis: Theano variable pointing to a a set t of biases values (for
visible units) that should be e shared d belong dA and another
architecture; if dA should be e standalone e set this to None
74
Chapter7. DenoisingAutoencoders(dA)
Documents you may be interested
Documents you may be interested