107
Chapter 18. Server Configuration
work_mem
(
integer
)
Specifies the amount of memory to be used by internal sort operations and hash tables before
writing to temporary disk files. The value defaults to four megabytes (
4MB
). Note that for a
complex query, several sort or hash operations might be running in parallel; each operation will
be allowed to use as much memory as this value specifies before it starts to write data into
temporary files. Also, several running sessions could be doing such operations concurrently.
Therefore, the total memory used could be many times the value of
work_mem
;it is necessary
to keep this fact in mind when choosing the value. Sort operations are used for
ORDER BY
,
DISTINCT
,and merge joins. Hash tables are used in hash joins, hash-based aggregation, and
hash-based processing of
IN
subqueries.
maintenance_work_mem
(
integer
)
Specifies the maximum amount of memory to be used by maintenance operations, such as
VACUUM
,
CREATE INDEX
,and
ALTER TABLE ADD FOREIGN KEY
.It defaults to 64 megabytes
(
64MB
). Since only one of these operations can be executed at a time by a database session,
and an installation normally doesn’t have many of them running concurrently, it’s safe to set
this value significantly larger than
work_mem
.Larger settings might improve performance for
vacuuming andfor restoring database dumps.
Note that when autovacuum runs, up to autovacuum_max_workers times this memory may be
allocated, so be careful not to set the default value too high. It may be useful to control for this
by separately settingautovacuum_work_mem.
autovacuum_work_mem
(
integer
)
Specifies the maximum amount of memory to be used by each autovacuum worker process. It
defaults to -1, indicating that the value of maintenance_work_mem should be used instead. The
setting has no effect on the behavior of
VACUUM
when run in other contexts.
max_stack_depth
(
integer
)
Specifies the maximum safe depth of the server’s execution stack. The ideal setting for this pa-
rameter is the actualstack size limitenforced bythekernel(as set by
ulimit -s
or localequiv-
alent), less a safety margin of a megabyte or so. The safety margin is needed because the stack
depth is not checked in every routine in the server, but only in key potentially-recursive routines
such as expression evaluation. The default setting is two megabytes (
2MB
), which is conserva-
tively small and unlikely to risk crashes. However, it might be too small to allow execution of
complex functions. Only superusers can change this setting.
Setting
max_stack_depth
higher than the actual kernel limit will mean that a runaway re-
cursive function can crash an individual backend process. On platforms where PostgreSQL can
determine the kernel limit, the server will not allow this variable to be set to an unsafe value.
However, not all platforms provide the information, so caution is recommended in selecting a
value.
dynamic_shared_memory_type
(
enum
)
Specifies the dynamicshared memoryimplementationthatthe server shoulduse. Possiblevalues
are
posix
(for POSIX shared memory allocated using
shm_open
),
sysv
(for System V shared
memory allocated via
shmget
),
windows
(for Windows shared memory),
mmap
(to simulate
shared memory using memory-mapped files stored in the data directory), and
none
(to disable
this feature). Not all values aresupportedonallplatforms; the firstsupportedoption is thedefault
for that platform. The use of the
mmap
option, which is not the default on any platform, is gener-
ally discouraged because the operating system maywrite modifiedpages backtodiskrepeatedly,
increasing system I/O load; however, it may be useful for debugging, when the
pg_dynshmem
directory is stored on a RAM disk, or when other sharedmemory facilities are not available.
471
68
Chapter 18. Server Configuration
18.4.2. Disk
temp_file_limit
(
integer
)
Specifies the maximum amount of disk space that a session can use for temporary files, such as
sort and hash temporary files, or the storage file for a held cursor. A transaction attempting to
exceed this limit will be canceled. The value is specifiedin kilobytes, and
-1
(the default) means
no limit. Only superusers can change this setting.
This setting constrains the total space used at any instant by all temporary files used by a given
PostgreSQL session. It should be noted that disk space used for explicit temporary tables, as
opposed to temporary files used behind-the-scenes in query execution, does not count against
this limit.
18.4.3. Kernel Resource Usage
max_files_per_process
(
integer
)
Sets the maximum number of simultaneously open files allowed to each server subprocess. The
default is one thousand files. If the kernel is enforcing a safe per-process limit, you don’t need
to worry about this setting. But on some platforms (notably, most BSD systems), the kernel
will allow individual processes to open many more files than the system can actually support if
manyprocesses all try toopenthatmany files. If you findyourself seeing “Too many open files”
failures, try reducing this setting. This parameter can only be set at server start.
18.4.4. Cost-based Vacuum Delay
During the execution of VACUUM and ANALYZE commands, the system maintains an internal
counter that keeps track of the estimated cost of the various I/O operations that are performed. When
the accumulated cost reaches a limit (specified by
vacuum_cost_limit
), the process performing
the operation will sleep for a short periodof time, as specified by
vacuum_cost_delay
.Then itwill
reset the counter and continue execution.
The intent of this feature is to allow administrators to reduce the I/O impact of these commands on
concurrent database activity. There are many situations where it is not important that maintenance
commands like
VACUUM
and
ANALYZE
finish quickly; however, it is usually very important that these
commands do not significantly interfere with the ability of the system to perform other database
operations. Cost-based vacuum delay provides a way for administrators to achieve this.
This feature is disabled by default for manually issued
VACUUM
commands. To enable it, set the
vacuum_cost_delay
variable to a nonzero value.
vacuum_cost_delay
(
integer
)
The length of time, in milliseconds, that the process will sleep when the cost limit has been
exceeded. The defaultvalueis zero, which disables thecost-basedvacuum delayfeature. Positive
values enable cost-based vacuuming. Note that on many systems, the effective resolutionof sleep
delays is 10 milliseconds; setting
vacuum_cost_delay
to a value that is not a multiple of 10
might have the same results as setting it to the next higher multiple of 10.
When using cost-based vacuuming, appropriate values for
vacuum_cost_delay
are usually
quite small, perhaps 10 or 20 milliseconds. Adjusting vacuum’s resource consumption is best
done by changing the other vacuum cost parameters.
472
79
Chapter 18. Server Configuration
vacuum_cost_page_hit
(
integer
)
The estimatedcost for vacuuming a buffer foundin the sharedbuffer cache. It represents the cost
tolock the buffer pool, lookuptheshared hashtable and scanthe contentof the page. Thedefault
value is one.
vacuum_cost_page_miss
(
integer
)
Theestimatedcostfor vacuuming abuffer that has tobe readfrom disk. This represents the effort
to lock the buffer pool, lookup the shared hash table, read the desired block in from the disk and
scan its content. The default value is 10.
vacuum_cost_page_dirty
(
integer
)
The estimated cost charged when vacuum modifies a block that was previously clean. It repre-
sents the extra I/O required to flush the dirtyblock out to disk again. The default value is 20.
vacuum_cost_limit
(
integer
)
The accumulated cost that will cause the vacuuming process to sleep. The default value is 200.
Note: There are certain operations that hold critical locks and should therefore complete as
quickly as possible. Cost-based vacuum delays do not occur during such operations. Therefore it
ispossible that the cost accumulates farhigherthanthespecifiedlimit. Toavoid uselessly long de-
laysinsuchcases, theactual delayiscalculatedas
vacuum_cost_delay
*
accumulated_balance
/
vacuum_cost_limit
with a maximum of
vacuum_cost_delay
*4.
18.4.5. Background Writer
There is a separate server process called the background writer, whose function is to issue writes of
“dirty” (new or modified) shared buffers. It writes shared buffers so server processes handling user
queries seldom or never need to wait for a write to occur. However, the background writer does cause
anet overall increase in I/Oload, because while a repeatedly-dirtied page might otherwise be written
only once per checkpoint interval, the background writer might write it several times as it is dirtied
in the same interval. The parameters discussed in this subsection can be used to tune the behavior for
local needs.
bgwriter_delay
(
integer
)
Specifies the delay between activity rounds for the background writer. In each round the writer
issues writes for some number of dirty buffers (controllable by the following parameters). It
then sleeps for
bgwriter_delay
milliseconds, and repeats. When there are no dirty buffers in
the buffer pool, though, it goes into a longer sleep regardless of
bgwriter_delay
.The default
value is 200 milliseconds (
200ms
). Note that on many systems, the effective resolution of sleep
delays is 10milliseconds;setting
bgwriter_delay
to a value that is not a multiple of 10 might
have the same results as setting it to the next higher multiple of 10. This parameter can only be
set in the
postgresql.conf
file or on the server command line.
bgwriter_lru_maxpages
(
integer
)
In each round, no more than this many buffers will be written by the background writer. Set-
ting this to zero disables background writing. (Note that checkpoints, which are managed by
aseparate, dedicated auxiliary process, are unaffected.) The default value is 100 buffers. This
parameter can only be set in the
postgresql.conf
file or on the server command line.
473
62
Chapter 18. Server Configuration
bgwriter_lru_multiplier
(
floating point
)
The number of dirty buffers written in each round is based on the number of new buffers that
havebeenneededbyserver processesduring recent rounds. The averagerecentneedismultiplied
by
bgwriter_lru_multiplier
to arrive at an estimate of the number of buffers that will be
needed during the next round. Dirty buffers are written until there are that many clean, reusable
buffers available. (However, no more than
bgwriter_lru_maxpages
buffers will be written
per round.) Thus, a settingof 1.0 represents a “just in time” policyof writing exactly the number
of buffers predicted to be needed. Larger values provide some cushion against spikes in demand,
whilesmaller values intentionallyleave writes to be done byserver processes. The defaultis 2.0.
This parameter can only be set in the
postgresql.conf
file or on the server command line.
Smaller values of
bgwriter_lru_maxpages
and
bgwriter_lru_multiplier
reduce the extra
I/O load caused by the background writer, but make it more likely that server processes will have to
issue writes for themselves, delaying interactive queries.
18.4.6. Asynchronous Behavior
effective_io_concurrency
(
integer
)
Sets the number of concurrent disk I/O operations that PostgreSQL expects can be executed
simultaneously. Raising this value willincrease the number of I/O operations thatany individual
PostgreSQL session attempts to initiate in parallel. The allowed range is 1 to 1000, or zero to
disable issuance of asynchronous I/O requests. Currently, this setting only affects bitmap heap
scans.
Agood starting point for this setting is the number of separate drives comprising a RAID 0
stripe or RAID 1 mirror being used for the database. (For RAID 5 the parity drive should not
be counted.) However, if the database is often busy with multiple queries issued in concurrent
sessions, lower values may be sufficient to keep the disk array busy. A value higher than needed
to keep the disks busy will only result in extra CPUoverhead.
For more exotic systems, such as memory-based storage or a RAID array that is limited by bus
bandwidth, the correct value might be the number of I/O paths available. Some experimentation
may be needed to find the best value.
Asynchronous I/O depends on an effective
posix_fadvise
function, which some operating
systems lack. If the function is not present then setting this parameter to anything but zero will
result in an error. On some operating systems (e.g., Solaris), the function is present but does not
actually do anything.
max_worker_processes
(
integer
)
Sets the maximum number of background processes that the system can support. This parameter
can only be set at server start.
When running a standby server, you must set this parameter to the same or higher value than on
the master server. Otherwise, queries will not be allowed in the standby server.
18.5. Write Ahead Log
For additional informationon tuning these settings, see Section 29.4.
474
105
Chapter 18. Server Configuration
18.5.1. Settings
wal_level
(
enum
)
wal_level
determines how much information is written to the WAL. The default value is
minimal
,which writes only the information needed to recover from a crash or immediate shut-
down.
archive
adds logging required for WAL archiving;
hot_standby
further adds informa-
tionrequiredtorunread-onlyqueries onastandbyserver;and, finally
logical
addsinformation
necessary to support logical decoding. Each level includes the information logged at all lower
levels. This parameter can only be set at server start.
In
minimal
level, WAL-loggingof some bulkoperations can be safelyskipped, whichcanmake
those operations much faster (see Section 14.4.7). Operations in which this optimization can be
applied include:
CREATE TABLE AS
CREATE INDEX
CLUSTER
COPY
into tables that were created or truncated in the same transaction
But minimal WAL does not contain enough information to reconstruct the data from a base
backup and the WAL logs, so
archive
or higher must be used to enable WAL archiving
(archive_mode) and streaming replication.
In
hot_standby
level, the same information is logged as with
archive
, plus information
needed to reconstruct the status of running transactions from the WAL. To enable read-only
queries on a standby server,
wal_level
must be set to
hot_standby
or higher on the primary,
andhot_standby mustbeenabledin the standby. It is thought that there islittle measurable differ-
ence in performance between using
hot_standby
and
archive
levels, sofeedbackis welcome
if any production impacts are noticeable.
In
logical
level, the same information is logged as with
hot_standby
, plus information
needed to allow extracting logical changesets from the WAL. Using a level of
logical
will
increase the WAL volume, particularly if many tables are configured for
REPLICA IDENTITY
FULL
andmany
UPDATE
and
DELETE
statements are executed.
fsync
(
boolean
)
If this parameter is on, the PostgreSQL server will try to make sure that updates are phys-
ically written to disk, by issuing
fsync()
system calls or various equivalent methods (see
wal_sync_method). This ensures that the database cluster can recover to a consistent state af-
ter an operating system or hardware crash.
While turning off
fsync
is often a performance benefit, this can result in unrecoverable data
corruption in the event of a power failure or system crash. Thus it is only advisable to turn off
fsync
if you can easily recreate your entire database from external data.
Examples of safe circumstances for turning off
fsync
include the initial loading of a new
database cluster from a backup file, using a database cluster for processing a batch of data af-
ter which the database will be thrown away and recreated, or for a read-only database clone
which gets recreated frequently and is not used for failover. High quality hardware alone is not a
sufficient justification for turning off
fsync
.
For reliable recovery when changing
fsync
off toon, it is necessaryto force all modifiedbuffers
in the kernel to durable storage. This can be done while the cluster is shutdown or while fsync is
on by running
initdb --sync-only
,running
sync
,unmounting the file system, or rebooting
the server.
475
117
Chapter 18. Server Configuration
In many situations, turning off synchronous_commit for noncritical transactions can provide
much of the potential performance benefit of turning off
fsync
,without the attendant risks of
data corruption.
fsync
can onlybe set in the
postgresql.conf
file or on the server command line. If you turn
this parameter off, also consider turning off full_page_writes.
synchronous_commit
(
enum
)
Specifies whether transaction commit will wait for WAL records to be written to disk before
the command returns a “success” indication to the client. Valid values are
on
,
remote_write
,
local
,and
off
.The default, and safe, setting is
on
.When
off
,there can be a delay between
when success is reported to the client and when the transaction is really guaranteed to be safe
against a server crash. (The maximum delay is three times wal_writer_delay.) Unlike fsync,
setting this parameter to
off
does not create any risk of database inconsistency: an operating
system or database crash might result in some recent allegedly-committed transactions being
lost, butthe database state will be justthe same as if thosetransactions hadbeenabortedcleanly.
So, turning
synchronous_commit
off can be a useful alternative when performance is more
important than exact certainty about the durability of a transaction. For more discussion see
Section 29.3.
If synchronous_standby_names is set, this parameter also controls whether or not transaction
commits willwait for thetransaction’s WAL records tobe replicated tothestandby server. When
set to
on
,commits will wait until a reply from the current synchronous standby indicates it has
received the commit record of the transaction and flushed it to disk. This ensures the transaction
willnotbelostunless bothprimaryand standby suffer corruptionof their database storage. When
set to
remote_write
,commits will wait until a reply from the current synchronous standby
indicates it has received the commit record of the transaction and written it out to the standby’s
operating system, but the data has not necessarily reached stable storage on the standby. This
setting is sufficient to ensure data preservation even if the standby instance of PostgreSQL were
to crash, but not if the standby suffers an operating-system-level crash.
When synchronous replication is in use, it will normally be sensible either to wait for both
local flush to disk and replication of WAL records, or to allow the transaction to commit asyn-
chronously. However, the setting
local
is available for transactions that wish to wait for local
flush to disk, but not synchronous replication. If
synchronous_standby_names
is not set, the
settings
on
,
remote_write
and
local
all provide the same synchronization level: transaction
commits only wait for local flush to disk.
This parameter can be changed at any time; the behavior for any one transaction is determined
by the setting in effect when it commits. It is therefore possible, and useful, to have some trans-
actions commit synchronously and others asynchronously. For example, to make a single multi-
statement transaction commit asynchronouslywhenthedefault is theopposite, issue
SET LOCAL
synchronous_commit TO OFF
within the transaction.
wal_sync_method
(
enum
)
Method used for forcing WAL updates out to disk. If
fsync
is off then this setting is irrelevant,
since WAL file updates will not be forced out at all. Possible values are:
•
open_datasync
(write WAL files with
open()
option
O_DSYNC
)
•
fdatasync
(call
fdatasync()
at each commit)
•
fsync
(call
fsync()
at eachcommit)
•
fsync_writethrough
(call
fsync()
at each commit, forcing write-through of any disk
write cache)
•
open_sync
(write WAL files with
open()
option
O_SYNC
)
476
80
Chapter 18. Server Configuration
The
open_
*options also use
O_DIRECT
if available. Not all of these choices are available on
all platforms. The default is the first method in the above list that is supported by the platform,
except that
fdatasync
is the default on Linux. The default is not necessarily ideal; it might be
necessary to change this settingor other aspects of your system configuration in order tocreate a
crash-safe configuration or achieve optimal performance. These aspects are discussed in Section
29.1. This parameter can only be set in the
postgresql.conf
file or on the server command
line.
full_page_writes
(
boolean
)
When this parameter is on, the PostgreSQL server writes the entire content of each disk page
to WAL during the first modification of that page after a checkpoint. This is needed because a
page write thatis in process during an operating system crash might beonly partiallycompleted,
leading to an on-disk page that contains a mix of old and new data. The row-level change data
normally stored in WAL will not be enough to completely restore such a page during post-crash
recovery. Storing the full page image guarantees that the page can be correctly restored, but at
the price of increasing the amount of data that must be written to WAL. (Because WAL replay
always starts from a checkpoint, it is sufficient to do this during the first change of each page
after a checkpoint. Therefore, one way to reduce the cost of full-page writes is to increase the
checkpoint interval parameters.)
Turning this parameter off speeds normal operation, but might lead to either unrecoverable data
corruption, or silent data corruption, after a system failure. The risks are similar to turning off
fsync
,though smaller, and it should be turned off only based on the same circumstances rec-
ommended for that parameter.
Turning off this parameter does not affect use of WAL archiving for point-in-time recovery
(PITR) (see Section 24.3).
This parameter can only be set in the
postgresql.conf
file or on the server command line.
The default is
on
.
wal_log_hints
(
boolean
)
When this parameter is
on
,the PostgreSQL server writes the entire content of each disk page to
WAL during the first modification of that page after a checkpoint, even for non-critical modifi-
cations of so-called hint bits.
If data checksums are enabled, hint bit updates are always WAL-logged and this setting is ig-
nored. Youcanuse thissettingtotesthow much extra WAL-loggingwould occur if your database
haddata checksums enabled.
This parameter can only be set at server start. The default value is
off
.
wal_buffers
(
integer
)
The amount of shared memory used for WAL data that has not yet been written to disk. The
default setting of -1 selects a size equal to 1/32nd (about 3%) of shared_buffers, but not less
than
64kB
nor more than the size of one WAL segment, typically
16MB
.This value can be set
manually if the automatic choice is too large or too small, but any positive value less than
32kB
will be treated as
32kB
.This parameter can only be set at server start.
The contents of the WAL buffers are written out to disk at every transaction commit, so ex-
tremely large values are unlikely to provide a significant benefit. However, setting this value to
at least a fewmegabytes canimprove write performanceon abusyserver where manyclients are
committing at once. The auto-tuning selected by the default setting of -1 should give reasonable
results in most cases.
477
Documents you may be interested
Documents you may be interested