wiki:Performance

Context Navigation

Version 12 (modified by damiano, 13 years ago) ( diff )
—

Guidelines
1. OS optimizations
2. PostgreSQL
Benchmarks
1. Ingestion
  1. Regular tiling
  2. Regular tiling + VACUUM
PostgreSQL tips
Tile cache
1. How to use
2. Important limitation

Guidelines

OS optimizations

filesystem - it's recommended to use the xfs filesystem, over ext4 or similar. Mount with noatime, e.g. in /etc/fstab:

# / was on /dev/sda1 during installation
UUID=212a6de3-515b-4e12-aa17-3e466f6966aa /              ext4    noatime,errors=remount-ro 0       1

scheduler - deadline for spinning disk, noop for ssd or high caching raid, e.g.
```
echo deadline > /sys/block/DEVICE/queue/scheduler
```
read ahead - enable with below command, which can be made permanent by putting in /etc/rc.local for example:
```
/sbin/blockdev --setra 4096 /dev/DEVICE
```
write cache - put the below in /etc/sysctl.conf and load with sysctl -p:
```
vm.dirty_background_ratio = 5
vm.dirty_background_ratio = 10
```
shared memory - run attachment:calc_kernel_shm.sh and substituted the outputs into /etc/sysctl.conf:
```
kernel.shmmax=VALUE
kernel.shmall=VALUE
```

PostgreSQL

The default PostgreSQL configuration can be tuned for better performance with rasdaman. The parameters below should be adapted in the postgresql.conf, typically found under /etc/postgresql.

max_connections - as long as a single user is using the database, this can be decreased to about 40
shared_buffers - should be 25%-33% of your total RAM
work_mem - (total RAM / connections) / 4..16, but not lower then 128 MB. When you have 4 GB RAM and you are alone then 256 MB is fine.
maintenance_work_mem - total RAM / 16
synchronous_commit - set to off when doing bulk ingestion
wal_buffers - should be 16 MB; by default 1/32 from the shared buffer value is taken as long as it is lower then 16 MB otherwise it is 16 MB
checkpoint_segments - 256 is a good start value when doing bulk ingestion
checkpoint_completion_target - 0.0 (especially important for bulk ingestion)
random_page_cost - 2.0 for spinning disk and 1.01 for ssd, high caching raid and when DB fits into RAM
effective_cache_size - total RAM - shared_buffers value
logging_collector - set to on if it isn't set already
log_line_prefix - set to '%t ' to get timestamps in the logs

Benchmarks

Ingestion

Below are the results of ingesting one dataset using various tilings and postgres parameters.

Regular tiling

The dataset is 150 png rgb images of size 1024x608, so that each image is 1867776 bytes uncompressed, or 280 MB in total.

Default postgres parameters:

Tiling scheme	Tile size	Import time	DB size before	DB size after	BLOBs size	BLOBs size2
0:31,0:31,0:149	460800	84 minutes	184M	19416M	267 MB	18 GB

Changed parameters in postgresql.conf: shared buffers = 1024 MB, temp_buffers = 8MB, fsync = off, synchronous_commit = off, wal_sync_method = fsync, full_page_writes = off, wal_buffers = 1MB, wal_writer_delay = 2000ms, checkpoint_segments = 32

Tiling scheme	Tile size	Import time	DB size before	DB size after	BLOBs size	BLOBs size2
0:31,0:31,0:149	460800	35 minutes	184M	19416M	267 MB	18 GB

Using tile cache

Tiling scheme	Tile size	Import time	DB size before	DB size after	BLOBs size	BLOBs size2
0:31,0:31,0:149	460800	48 seconds	37M	329M	267 MB	267 MB

Regular tiling + VACUUM

The dataset is 50 png rgb images of size 1024x608, so that each image is 1867776 bytes uncompressed, or 89 MB in total.

Changed parameters in postgresql.conf: shared buffers = 1024 MB, temp_buffers = 8MB, fsync = off, synchronous_commit = off, wal_sync_method = fsync, full_page_writes = off, wal_buffers = 8MB, wal_writer_delay = 10000ms, checkpoint_segments = 16

Tiling scheme	Tile size	Import time	DB size before	DB size after	BLOBs size	BLOBs size2	BLOBs after VACUUM
0:31,0:31,0:149	460800	7m 34s	47M	2464M	267 MB	1776 MB	96MB

PostgreSQL tips

Total size of BLOBs

BLOBs are stored in the pg_largeobject in postgres. Each BLOB is divided into rows (pages) of 2048 bytes typically. More info here

Query below computes the space that all BLOBs take.

SELECT pg_size_pretty(count(loid) * 2048) FROM pg_largeobject;

Total size of BLOBs 2

SELECT tablename,
       pg_size_pretty(size) AS size_pretty,
       pg_size_pretty(total_size) AS total_size_pretty
FROM (SELECT *, pg_relation_size(schemaname||'.'||tablename) AS size,
                pg_total_relation_size(schemaname||'.'||tablename) AS total_size
      FROM pg_tables) AS TABLES
WHERE TABLES.tablename = 'pg_largeobject'
ORDER BY total_size DESC;

Size of individual BLOBs

SELECT 	loid,
        pg_size_pretty(count(*) * 2048)
FROM pg_catalog.pg_largeobject
GROUP BY loid
ORDER BY count(*) DESC;

Size of RASBASE tables

SELECT tablename,
       pg_size_pretty(size) AS size_pretty,
       pg_size_pretty(total_size) AS total_size_pretty
FROM (SELECT *, pg_relation_size(schemaname||'.'||tablename) AS size,
                pg_total_relation_size(schemaname||'.'||tablename) AS total_size
      FROM pg_tables) AS TABLES
WHERE TABLES.schemaname = 'public'
ORDER BY total_size DESC;

Reset WAL

Write-ahead logging is used to ensure data integrity. In postgres these logs are stored in pg_xlog and it's recommended to put them in a separate disk. To reset the logs:

service stop postgresql
sudo -u postgres pg_resetxlog $PGDATA

Remove orphaned BLOBs

Orphaned BLOBs are BLOBs that are stored but they are not referenced by oid from any table in the database. To clear them up use vacuumlo

vacuumlo RASBASE

Note, vacuumlo is an additional module made available through postgresql-contrib

VACUUM tables

Tuples that have been deleted or obsoleted are normally not physically deleted, and thus the space continues to be still used, until a VACUUM is performed on the database or specific tables. VACUUM alone doesn't free up the disk space for further use by the OS, VACUUM FULL is necessary for this. In rasdaman it's most important to do periodic VACUUM on the BLOBs table:

psql -d RASBASE -c "VACUUM FULL pg_largeobject"

vacuumdb -d RASBASE -t "pg_largeobject" -f

Tile cache

A tile cache can be enabled in rasdaman during bulk ingestion, in order to work around inefficiencies of BLOB handling in postgres and slow performance for certain tiling schemes.

How to use

The cache can be enabled in rasmgr.conf by specifying a --cachelimit parameter after the -xp parameter:

  --cachelimit <limit>	(default: 0)
         specifies upper limit in bytes on using memory for caching

An example rasmgr.conf configuration that allows the cache to use up to 1GB memory:

define dbh rasdaman_host -connect RASBASE
define db RASBASE -dbh rasdaman_host

define srv N1 -host HOST -type n -port 9000 -dbh rasdaman_host
change srv N1 -countdown 20000000 -autorestart on -xp --timeout 30000000 --cachelimit 1073741824

When enabled, INSERT and UPDATE rasql statements can be issued in the normal way. At the end of the ingestion a COMMIT command should be executed, to make sure that the cache is cleared and everything is flushed to disk. E.g.

rasql -q 'insert into coll ...'
rasql -q 'update coll ...'
rasql -q 'update coll ...'
rasql -q 'update coll ...'

# done with ingestion
rasql -q 'commit'

Important limitation

At the moment the tile cache should be used during bulk ingestion _ONLY_, and disabled afterwards. Leaving the tile cache on will cause selection queries to fail randomly. Furthermore, it is best to have only one server defined in rasmgr.conf, and -countdown and --timeout parameters set to very large values.

Attachments (1)

calc_kernel_shm.sh (901 bytes ) - added by Dimitar Misev 13 years ago.

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text