Tuning Linux for PostgreSQL¶
A reference conf could be found here and its slides here. Good advices there.
Kernel cache tuning¶
Kernel parameters dirty* manage the cache behavior. A good understanding of how cache works could be found there.
By default :
scaillet@Sylvain ~ $ sudo sysctl -a | grep dirty vm.dirty_background_bytes = 0 # ratio limit applies vm.dirty_background_ratio = 10 # 10% of RAM is available for cache --> huge on modern servers ! vm.dirty_bytes = 0 # ratio limit applies vm.dirty_expire_centisecs = 3000 # cache has to flush at least every 30s vm.dirty_ratio = 20 # hard limit of 20% of RAM vm.dirty_writeback_centisecs = 500 vm.dirtytime_expire_seconds = 43200
They are set for basic OS usage. In case of Database server with high workload, it should be tuned not to reduce the impact of the checkpoint tuning, for example.
One of the main drawback of bad tuning is heavy I/O pikes. They could occured because :
- the cache write asynchroneously most of the time
- lot of I/O overwhelm the cache
- the cache decided to flush everything by turning writings synchroneously
If we force the cache to flush its data more frequently, it might reduce the heavy pike cases.
You can also see statistics on the page cache in /proc/vmstat:
$ cat /proc/vmstat | egrep "dirty|writeback" nr_dirty 878 nr_writeback 0 nr_writeback_temp 0
In my case I have 878 dirty pages waiting to be written to disk.
Typically for our usage (heavy workload), a good way is to reduce the cache size by lowering ratios :
vm.dirty_background_ratio = 5 vm.dirty_ratio = 10
Another approch is given by Ilya Kosmodemiansky in the slide 39.
vm.dirty_background_bytes = 67108864 vm.dirty_bytes = 536870912
These values looks more reasonable for RAID with 512MB cache on board.
In fact, it's good to configure the cache to have max some hundreds of MB to synchronize, no more : less than 1GB.
What on VM ?? In the following slides, the 13th gives a definitive advice : no VM for Postgres !
Kernel other parameters¶
On dedicated servers, we could tune some other kernel parameters.
-
Swap is bad for SGBD. Set :
vm.swappiness = 5 # or 10
-
Overcommit is also bad. Set :
vm.overcommit_memory = 2 # to deactivate vm.overcommit_ratio = (RAM – SWAP) * 100 / RAM
To know how much swap is configured on a server, type swapon -s to display the size in Kbytes.
To learn more about overcommit values, see the linux documentation. Possible values are :
0: heuristic overcommit (this is the default) 1: always overcommit, never check 2: always check, never overcommit
-
NUMA architecture is also bad for PostgreSQL :
vm.zone_reclaim_mode = 0 # to deactivate