92
Chapter29.ReliabilityandtheWrite-AheadLog
beincreased.Bulkoperationssuchaslarge
COPY
transfersmightcauseanumberofsuchwarningsto
appearifyouhavenotset
checkpoint_segments
highenough.
ToavoidfloodingtheI/Osystemwithaburstofpagewrites,writingdirtybuffersduringacheckpoint
isspreadoveraperiodoftime.Thatperiodiscontrolledbycheckpoint_completion_target,which
is givenas afractionofthecheckpointinterval. TheI/Orateis adjustedsothatthe checkpoint
finisheswhenthegivenfractionof
checkpoint_segments
WALsegmentshavebeenconsumed
since checkpoint start, or the given fraction of
checkpoint_timeout
seconds have elapsed,
whichever is sooner. With the default value of 0.5, PostgreSQL can be expected to complete
each checkpoint in about half the time before the next checkpoint starts. On a system that’s
very close to maximum I/O throughput during normal operation, you might want to increase
checkpoint_completion_target
toreducetheI/Oloadfromcheckpoints.Thedisadvantageof
this isthatprolongingcheckpointsaffectsrecoverytime,becausemoreWALsegmentswillneed
to be kept around for possible use in recovery. Although
checkpoint_completion_target
canbesetashighas1.0,itisbesttokeepitlessthanthat(perhaps0.9atmost)sincecheckpoints
includesomeotheractivitiesbesideswritingdirtybuffers.Asettingof1.0isquitelikelytoresultin
checkpointsnotbeingcompletedontime,whichwouldresultinperformancelossduetounexpected
variationinthenumberofWALsegmentsneeded.
There willalways be atleastoneWAL segmentfile, andwill normallynotbemorethan (2 +
checkpoint_completion_target
)*
checkpoint_segments
+1or
checkpoint_segments
+wal_keep_segments+1files.Eachsegmentfileisnormally16MB(thoughthissizecanbealtered
whenbuildingtheserver).YoucanusethistoestimatespacerequirementsforWAL.Ordinarily,when
oldlogsegmentfilesarenolongerneeded,theyarerecycled(thatis,renamedtobecomefutureseg-
mentsinthenumberedsequence).If,duetoashort-termpeakoflogoutputrate,therearemorethan
3*
checkpoint_segments
+1segmentfiles,theunneededsegmentfileswillbedeletedinsteadof
recycleduntilthesystemgetsbackunderthislimit.
Inarchiverecoveryorstandbymode,theserverperiodicallyperformsrestartpoints,whicharesimilar
tocheckpointsinnormaloperation:theserverforcesallitsstatetodisk,updatesthe
pg_control
filetoindicatethatthealready-processedWALdataneednotbescannedagain,andthenrecycles
anyoldlogsegmentfilesinthe
pg_xlog
directory.Restartpointscan’tbeperformedmorefrequently
thancheckpointsinthemasterbecauserestartpointscanonlybeperformedatcheckpointrecords.
Arestartpointistriggeredwhenacheckpointrecordisreachedifatleast
checkpoint_timeout
secondshavepassedsincethelastrestartpoint.Instandbymode,arestartpointisalsotriggeredifat
least
checkpoint_segments
logsegmentshavebeenreplayedsincethelastrestartpoint.
TherearetwocommonlyusedinternalWALfunctions:
XLogInsert
and
XLogFlush
.
XLogInsert
isusedtoplaceanewrecordintotheWALbuffersinsharedmemory.Ifthereisnospaceforthe
newrecord,
XLogInsert
willhavetowrite(movetokernelcache)afewfilledWALbuffers.This
isundesirablebecause
XLogInsert
isusedoneverydatabaselowlevelmodification(forexample,
rowinsertion)atatimewhenanexclusivelockisheldonaffecteddatapages,sotheoperationneeds
tobeasfastaspossible.Whatisworse,writingWALbuffersmightalsoforcethecreationofanew
logsegment,whichtakesevenmoretime.Normally,WALbuffersshouldbewrittenandflushedby
an
XLogFlush
request,whichismade,forthemostpart,attransactioncommittimetoensurethat
transactionrecordsareflushedtopermanentstorage.Onsystemswithhighlogoutput,
XLogFlush
requestsmightnotoccuroftenenoughtoprevent
XLogInsert
fromhavingtodowrites.Onsuch
systemsoneshouldincreasethenumberofWALbuffersbymodifyingthewal_buffersparameter.
Whenfull_page_writesissetandthesystemisverybusy,setting
wal_buffers
higherwillhelp
smoothresponsetimesduringtheperiodimmediatelyfollowingeachcheckpoint.
Thecommit_delayparameterdefinesforhowmanymicrosecondsagroupcommitleaderprocesswill
sleepafteracquiringalockwithin
XLogFlush
,whilegroupcommitfollowersqueueupbehindthe
leader.ThisdelayallowsotherserverprocessestoaddtheircommitrecordstotheWALbuffersso
thatallofthemwillbeflushedbytheleader’seventualsyncoperation.Nosleepwilloccuriffsync
643
82
Chapter29.ReliabilityandtheWrite-AheadLog
isnotenabled,oriffewerthancommit_siblingsothersessionsarecurrentlyinactivetransactions;
thisavoidssleepingwhenit’sunlikelythatanyothersessionwillcommitsoon.Notethatonsome
platforms,theresolutionofasleeprequestistenmilliseconds,sothatanynonzero
commit_delay
settingbetween1and10000microsecondswouldhavethesameeffect.Notealsothatonsomeplat-
forms,sleepoperationsmaytakeslightlylongerthanrequestedbytheparameter.
Sincethepurposeof
commit_delay
istoallowthecostofeachflushoperationtobeamortized
acrossconcurrentlycommittingtransactions(potentiallyattheexpenseoftransactionlatency),itis
necessarytoquantifythatcostbeforethesettingcanbechosenintelligently.Thehigherthatcost
is,themoreeffective
commit_delay
isexpectedtobeinincreasingtransactionthroughput,upto
apoint.Thepg_test_fsyncprogramcanbeusedtomeasuretheaveragetimeinmicrosecondsthata
singleWALflushoperationtakes.Avalueofhalfoftheaveragetimetheprogramreportsittakesto
flushafterasingle8kBwriteoperationisoftenthemosteffectivesettingfor
commit_delay
,sothis
valueisrecommendedasthestartingpointtousewhenoptimizingforaparticularworkload.While
tuning
commit_delay
isparticularlyusefulwhentheWALlogisstoredonhigh-latencyrotating
disks,benefitscanbesignificantevenonstoragemediawithveryfastsynctimes,suchassolid-state
drivesorRAIDarrayswithabattery-backedwritecache;butthisshoulddefinitelybetestedagainsta
representativeworkload.Highervaluesof
commit_siblings
shouldbeusedinsuchcases,whereas
smaller
commit_siblings
valuesareoftenhelpfulonhigherlatencymedia.Notethatitisquite
possiblethatasettingof
commit_delay
thatistoohighcanincreasetransactionlatencybysomuch
thattotaltransactionthroughputsuffers.
When
commit_delay
issettozero(thedefault),itisstillpossibleforaformofgroupcommitto
occur,buteachgroupwillconsistonlyofsessionsthatreachthepointwheretheyneedtoflushtheir
commitrecordsduringthewindowinwhichthepreviousflushoperation(ifany)isoccurring.At
higherclientcountsa“gangwayeffect”tendstooccur,sothattheeffectsofgroupcommitbecome
significantevenwhen
commit_delay
iszero,andthusexplicitlysetting
commit_delay
tendsto
helpless.Setting
commit_delay
canonlyhelpwhen(1)therearesomeconcurrentlycommitting
transactions,and(2)throughputislimitedtosomedegreebycommitrate;butwithhighrotational
latencythissettingcanbeeffectiveinincreasingtransactionthroughputwithasfewastwoclients
(thatis,asinglecommittingclientwithonesiblingtransaction).
Thewal_sync_methodparameterdetermineshowPostgreSQLwillaskthekerneltoforceWALup-
datesouttodisk.Alltheoptionsshouldbethesameintermsofreliability,withtheexceptionof
fsync_writethrough
,whichcansometimesforceaflushofthediskcacheevenwhenotherop-
tionsdonotdoso.However,it’squiteplatform-specificwhichonewillbethefastest.Youcantestthe
speedsofdifferentoptionsusingthepg_test_fsyncprogram.Notethatthisparameterisirrelevantif
fsync
hasbeenturnedoff.
Enablingthewal_debugconfigurationparameter(providedthatPostgreSQLhasbeencompiledwith
supportforit)willresultineach
XLogInsert
and
XLogFlush
WALcallbeingloggedtotheserver
log.Thisoptionmightbereplacedbyamoregeneralmechanisminthefuture.
29.5. WAL Internals
WALisautomaticallyenabled;noactionisrequiredfromtheadministratorexceptensuringthatthe
disk-spacerequirementsfortheWALlogsaremet,andthatanynecessarytuningisdone(seeSection
29.4).
WALlogsarestoredinthedirectory
pg_xlog
underthedatadirectory,asasetofsegmentfiles,
normallyeach16MBinsize(butthesizecanbechangedbyalteringthe
--with-wal-segsize
configureoptionwhenbuildingthe server).Each segmentis dividedintopages, normally8 kB
each (this size can be changed via the
--with-wal-blocksize
configure option). The log
644
42
Chapter29.ReliabilityandtheWrite-AheadLog
recordheadersaredescribedin
access/xlog.h
;therecordcontentisdependentonthetypeof
eventthatis beinglogged.Segmentfiles aregivenever-increasingnumbersasnames, startingat
000000010000000000000000
.Thenumbersdonotwrap,butitwilltakeavery,verylongtimeto
exhausttheavailablestockofnumbers.
Itisadvantageousifthelogislocatedonadifferentdiskfromthemaindatabasefiles.Thiscanbe
achievedbymovingthe
pg_xlog
directorytoanotherlocation(whiletheserverisshutdown,of
course)andcreatingasymboliclinkfromtheoriginallocationinthemaindatadirectorytothenew
location.
TheaimofWAListoensurethatthelogiswrittenbeforedatabaserecordsarealtered,butthiscan
besubvertedbydiskdrivesthatfalselyreportasuccessfulwritetothekernel,wheninfacttheyhave
onlycachedthedataandnotyetstoreditonthedisk.Apowerfailureinsuchasituationmightlead
toirrecoverabledatacorruption.AdministratorsshouldtrytoensurethatdisksholdingPostgreSQL’s
WALlogfilesdonotmakesuchfalsereports.(SeeSection29.1.)
Afteracheckpointhasbeenmadeandthelogflushed,thecheckpoint’spositionissavedinthefile
pg_control
.Therefore,atthestartofrecovery,theserverfirstreads
pg_control
andthenthe
checkpointrecord;thenitperformstheREDOoperationbyscanningforwardfromthelogposition
indicatedinthecheckpointrecord.Becausetheentirecontentofdatapagesissavedinthelogon
thefirstpagemodificationafteracheckpoint(assumingfull_page_writesisnotdisabled),allpages
changedsincethecheckpointwillberestoredtoaconsistentstate.
Todealwiththecasewhere
pg_control
iscorrupt,weshouldsupportthepossibilityofscanning
existinglogsegmentsinreverseorder—newesttooldest—inordertofindthelatestcheckpoint.
Thishasnotbeenimplementedyet.
pg_control
issmallenough(lessthanonediskpage)thatit
isnotsubjecttopartial-writeproblems,andasofthiswritingtherehavebeennoreportsofdatabase
failuresduesolelytotheinabilitytoread
pg_control
itself.Sowhileitistheoreticallyaweakspot,
pg_control
doesnotseemtobeaprobleminpractice.
645
45
Chapter 30. Regression Tests
TheregressiontestsareacomprehensivesetoftestsfortheSQLimplementationinPostgreSQL.
TheyteststandardSQLoperationsaswellastheextendedcapabilitiesofPostgreSQL.
30.1. Running the Tests
Theregressiontestscanberunagainstanalreadyinstalledandrunningserver,orusingatempo-
raryinstallationwithinthebuildtree.Furthermore,thereisa“parallel”anda“sequential”modefor
runningthetests.Thesequentialmethodrunseachtestscriptalone,whiletheparallelmethodstarts
upmultipleserverprocessestorungroupsoftestsinparallel.Paralleltestingaddsconfidencethat
interprocesscommunicationandlockingareworkingcorrectly.
30.1.1.RunningtheTestsAgainstaTemporaryInstallation
Toruntheparallelregressiontestsafterbuildingbutbeforeinstallation,type:
make check
inthetop-leveldirectory.(Oryoucanchangeto
src/test/regress
andrunthecommandthere.)
Attheendyoushouldseesomethinglike:
=======================
All 115 tests passed.
=======================
orotherwiseanoteaboutwhichtestsfailed.SeeSection30.2belowbeforeassumingthata“failure”
representsaseriousproblem.
Becausethistestmethodrunsatemporaryserver,itwillnotworkifyoudidthebuildastherootuser,
sincetheserverwillnotstartasroot.Recommendedprocedureisnottodothebuildasroot,orelse
toperformtestingaftercompletingtheinstallation.
IfyouhaveconfiguredPostgreSQLtoinstallintoalocationwhereanolderPostgreSQLinstallation
alreadyexists,andyouperform
make check
beforeinstallingthenewversion,youmightfindthatthe
testsfailbecausethenewprogramstrytousethealready-installedsharedlibraries.(Typicalsymptoms
arecomplaintsaboutundefinedsymbols.)Ifyouwishtorunthetestsbeforeoverwritingtheold
installation,you’llneedtobuildwith
configure --disable-rpath
.Itisnotrecommendedthat
youusethisoptionforthefinalinstallation,however.
TheparallelregressionteststartsquiteafewprocessesunderyouruserID.Presently,themaximum
concurrencyistwentyparalleltestscripts, whichmeans fortyprocesses:there’s aserverprocess
andapsqlprocessforeachtestscript.Soifyoursystemenforcesaper-userlimitonthenumberof
processes,makesurethislimitisatleastfiftyorso,elseyoumightgetrandom-seemingfailuresinthe
paralleltest.Ifyouarenotinapositiontoraisethelimit,youcancutdownthedegreeofparallelism
bysettingthe
MAX_CONNECTIONS
parameter.Forexample:
make MAX_CONNECTIONS=10 check
runsnomorethantentestsconcurrently.
646
96
Chapter30.RegressionTests
30.1.2.RunningtheTestsAgainstanExistingInstallation
Torunthetestsafterinstallation(seeChapter15),initializeadataareaandstarttheserverasexplained
inChapter17,thentype:
make installcheck
orforaparalleltest:
make installcheck-parallel
Thetestswillexpecttocontacttheserveratthelocalhostandthedefaultportnumber,unlessdirected
otherwiseby
PGHOST
and
PGPORT
environmentvariables.Thetestswillberuninadatabasenamed
regression
;anyexistingdatabasebythisnamewillbedropped. Thetestswillalsotransiently
createsomecluster-wideobjects,suchasuseridentitiesnamed
regressuser
N
.
30.1.3.AdditionalTestSuites
The
make check
and
make installcheck
commandsrunonlythe“core”regressiontests,which
testbuilt-infunctionalityofthePostgreSQLserver.Thesourcedistributionalsocontainsadditional
testsuites,mostofthemhavingtodowithadd-onfunctionalitysuchasoptionalprocedurallanguages.
Torunalltestsuitesapplicabletothemodulesthathavebeenselectedtobebuilt,includingthecore
tests,typeoneofthesecommandsatthetopofthebuildtree:
make check-world
make installcheck-world
Thesecommandsrunthetestsusingtemporaryserversoranalready-installedserver,respectively,
justaspreviouslyexplainedfor
make check
and
make installcheck
.Otherconsiderationsare
thesameaspreviouslyexplainedforeachmethod.Notethat
make check-world
buildsaseparate
temporaryinstallationtreeforeachtestedmodule,soitrequiresagreatdealmoretimeanddiskspace
than
make installcheck-world
.
Alternatively,youcanrunindividualtestsuitesbytyping
make check
or
make installcheck
intheappropriatesubdirectoryofthebuildtree.Keepinmindthat
make installcheck
assumes
you’veinstalledtherelevantmodule(s),notonlythecoreserver.
Theadditionalteststhatcanbeinvokedthiswayinclude:
•
Regressiontestsforoptionalprocedurallanguages(otherthanPL/pgSQL,whichistestedbythe
coretests).Thesearelocatedunder
src/pl
.
•
Regressiontestsfor
contrib
modules,locatedunder
contrib
.Notall
contrib
moduleshave
tests.
•
RegressiontestsfortheECPGinterfacelibrary,locatedin
src/interfaces/ecpg/test
.
•
Testsstressingbehaviorofconcurrentsessions,locatedin
src/test/isolation
.
•
Testsofclientprogramsunder
src/bin
.SeealsoSection30.4.
When using
installcheck
mode, these tests will destroy any existing databases named
pl_regression
,
contrib_regression
,
isolationtest
,
regress1
, or
connectdb
, aswell
as
regression
.
647
Documents you may be interested
Documents you may be interested