[[PageOutline]] = Guidelines = = Benchmarks = == Ingestion == Below are the results of ingesting one dataset using various tilings and postgres parameters. === Regular tiling === The dataset is 150 png rgb images of size 1024x608, so that each image is 1867776 bytes uncompressed, or 280 MB in total. * Default postgres parameters: || '''Tiling scheme''' || '''Tile size''' || '''Import time''' || '''DB size before''' || '''DB size after''' || '''BLOBs size''' || '''BLOBs size2''' || || 0:31,0:31,0:149 || 460800 || 84 minutes || 184M || '''19416M''' || 267 MB || 18 GB || * Changed parameters in postgresql.conf: shared buffers = 1024 MB, temp_buffers = 8MB, fsync = off, synchronous_commit = off, wal_sync_method = fsync, full_page_writes = off, wal_buffers = 1MB, wal_writer_delay = 2000ms, checkpoint_segments = 32 || '''Tiling scheme''' || '''Tile size''' || '''Import time''' || '''DB size before''' || '''DB size after''' || '''BLOBs size''' || '''BLOBs size2''' || || 0:31,0:31,0:149 || 460800 || 35 minutes || 184M || '''19416M''' || 267 MB || 18 GB || * Using [#Tilecache tile cache] || '''Tiling scheme''' || '''Tile size''' || '''Import time''' || '''DB size before''' || '''DB size after''' || '''BLOBs size''' || '''BLOBs size2''' || || 0:31,0:31,0:149 || 460800 || '''48 seconds''' || 37M || 329M || 267 MB || 267 MB || === Regular tiling + VACUUM === The dataset is 50 png rgb images of size 1024x608, so that each image is 1867776 bytes uncompressed, or 89 MB in total. * Changed parameters in postgresql.conf: shared buffers = 1024 MB, temp_buffers = 8MB, fsync = off, synchronous_commit = off, wal_sync_method = fsync, full_page_writes = off, ''wal_buffers = 8MB, wal_writer_delay = 10000ms, checkpoint_segments = 16'' || '''Tiling scheme''' || '''Tile size''' || '''Import time''' || '''DB size before''' || '''DB size after''' || '''BLOBs size''' || '''BLOBs size2''' || '''BLOBs after [#VACUUMtables VACUUM]''' || 0:31,0:31,0:149 || 460800 || 7m 34s || 47M || '''2464M''' || 267 MB || 1776 MB || 96MB || = PostgreSQL tips = == Total size of BLOBs == BLOBs are stored in the `pg_largeobject` in postgres. Each BLOB is divided into rows (pages) of 2048 bytes typically. More info [http://www.postgresql.org/docs/8.4/static/catalog-pg-largeobject.html here] Query below computes the space that all BLOBs take. {{{ SELECT pg_size_pretty(count(loid) * 2048) FROM pg_largeobject; }}} == Total size of BLOBs 2 == {{{ SELECT tablename, pg_size_pretty(size) AS size_pretty, pg_size_pretty(total_size) AS total_size_pretty FROM (SELECT *, pg_relation_size(schemaname||'.'||tablename) AS size, pg_total_relation_size(schemaname||'.'||tablename) AS total_size FROM pg_tables) AS TABLES WHERE TABLES.tablename = 'pg_largeobject' ORDER BY total_size DESC; }}} == Size of individual BLOBs == {{{ SELECT loid, pg_size_pretty(count(*) * 2048) FROM pg_catalog.pg_largeobject GROUP BY loid ORDER BY count(*) DESC; }}} == Size of RASBASE tables == {{{ SELECT tablename, pg_size_pretty(size) AS size_pretty, pg_size_pretty(total_size) AS total_size_pretty FROM (SELECT *, pg_relation_size(schemaname||'.'||tablename) AS size, pg_total_relation_size(schemaname||'.'||tablename) AS total_size FROM pg_tables) AS TABLES WHERE TABLES.schemaname = 'public' ORDER BY total_size DESC; }}} == Reset WAL == [http://www.postgresql.org/docs/9.1/static/wal-intro.html Write-ahead logging] is used to ensure data integrity. In postgres these logs are stored in pg_xlog and it's [http://www.postgresql.org/docs/9.1/static/wal-internals.html recommended] to put them in a separate disk. To reset the logs: 1. `service stop postgresql` 1. `sudo -u postgres pg_resetxlog $PGDATA` == Remove orphaned BLOBs == Orphaned BLOBs are BLOBs that are stored but they are not referenced by oid from any table in the database. To clear them up use [http://www.postgresql.org/docs/8.4/static/vacuumlo.html vacuumlo] * `vacuumlo RASBASE` == VACUUM tables == Tuples that have been deleted or obsoleted are normally not physically deleted, and thus the space continues to be still used, until a `VACUUM` is performed on the database or specific tables. `VACUUM` alone doesn't free up the disk space for further use by the OS, `VACUUM FULL` is necessary for this. In rasdaman it's most important to do periodic VACUUM on the BLOBs table: * psql -d RASBASE -c "VACUUM FULL pg_largeobject" or * vacuumdb -d RASBASE -t "pg_largeobject" -f = Tile cache = A tile cache can be enabled in rasdaman during bulk ingestion, in order to work around inefficiencies of BLOB handling in postgres and slow performance for certain tiling schemes. == How to use == The cache can be enabled in rasmgr.conf by specifying a `--cachelimit` parameter after the `-xp` parameter: {{{ --cachelimit (default: 0) specifies upper limit in bytes on using memory for caching }}} An example `rasmgr.conf` configuration that allows the cache to use up to 1GB memory: {{{ define dbh rasdaman_host -connect RASBASE define db RASBASE -dbh rasdaman_host define srv N1 -host HOST -type n -port 9000 -dbh rasdaman_host change srv N1 -countdown 20000000 -autorestart on -xp --timeout 30000000 --cachelimit 1073741824 }}} When enabled, `INSERT` and `UPDATE` rasql statements can be issued in the normal way. At the end of the ingestion a `COMMIT` command should be executed, to make sure that the cache is cleared and everything is flushed to disk. E.g. {{{ rasql -q 'insert into coll ...' rasql -q 'update coll ...' rasql -q 'update coll ...' rasql -q 'update coll ...' # done with ingestion rasql -q 'commit' }}} == Important limitation == At the moment the tile cache should be used during bulk ingestion '''_ONLY_''', and disabled afterwards. Leaving the tile cache on will cause selection queries to fail randomly. Furthermore, it is best to have only one server defined in rasmgr.conf, and `-countdown` and `--timeout` parameters set to very large values.