Version 10 (modified by 11 years ago) ( diff ) | ,
---|
Guidelines
Benchmarks
Ingestion
Below are the results of ingesting one dataset using various tilings and postgres parameters.
Regular tiling
The dataset is 150 png rgb images of size 1024x608, so that each image is 1867776 bytes uncompressed, or 280 MB in total.
- Default postgres parameters:
Tiling scheme | Tile size | Import time | DB size before | DB size after | BLOBs size | BLOBs size2 |
0:31,0:31,0:149 | 460800 | 84 minutes | 184M | 19416M | 267 MB | 18 GB |
- Changed parameters in postgresql.conf: shared buffers = 1024 MB, temp_buffers = 8MB, fsync = off, synchronous_commit = off, wal_sync_method = fsync, full_page_writes = off, wal_buffers = 1MB, wal_writer_delay = 2000ms, checkpoint_segments = 32
Tiling scheme | Tile size | Import time | DB size before | DB size after | BLOBs size | BLOBs size2 |
0:31,0:31,0:149 | 460800 | 35 minutes | 184M | 19416M | 267 MB | 18 GB |
- Using tile cache
Tiling scheme | Tile size | Import time | DB size before | DB size after | BLOBs size | BLOBs size2 |
0:31,0:31,0:149 | 460800 | 48 seconds | 37M | 329M | 267 MB | 267 MB |
Regular tiling + VACUUM
The dataset is 50 png rgb images of size 1024x608, so that each image is 1867776 bytes uncompressed, or 89 MB in total.
- Changed parameters in postgresql.conf: shared buffers = 1024 MB, temp_buffers = 8MB, fsync = off, synchronous_commit = off, wal_sync_method = fsync, full_page_writes = off, wal_buffers = 8MB, wal_writer_delay = 10000ms, checkpoint_segments = 16
Tiling scheme | Tile size | Import time | DB size before | DB size after | BLOBs size | BLOBs size2 | BLOBs after VACUUM |
0:31,0:31,0:149 | 460800 | 7m 34s | 47M | 2464M | 267 MB | 1776 MB | 96MB |
PostgreSQL tips
Total size of BLOBs
BLOBs are stored in the pg_largeobject
in postgres. Each BLOB is divided into rows (pages) of 2048 bytes typically. More info here
Query below computes the space that all BLOBs take.
SELECT pg_size_pretty(count(loid) * 2048) FROM pg_largeobject;
Total size of BLOBs 2
SELECT tablename, pg_size_pretty(size) AS size_pretty, pg_size_pretty(total_size) AS total_size_pretty FROM (SELECT *, pg_relation_size(schemaname||'.'||tablename) AS size, pg_total_relation_size(schemaname||'.'||tablename) AS total_size FROM pg_tables) AS TABLES WHERE TABLES.tablename = 'pg_largeobject' ORDER BY total_size DESC;
Size of individual BLOBs
SELECT loid, pg_size_pretty(count(*) * 2048) FROM pg_catalog.pg_largeobject GROUP BY loid ORDER BY count(*) DESC;
Size of RASBASE tables
SELECT tablename, pg_size_pretty(size) AS size_pretty, pg_size_pretty(total_size) AS total_size_pretty FROM (SELECT *, pg_relation_size(schemaname||'.'||tablename) AS size, pg_total_relation_size(schemaname||'.'||tablename) AS total_size FROM pg_tables) AS TABLES WHERE TABLES.schemaname = 'public' ORDER BY total_size DESC;
Reset WAL
Write-ahead logging is used to ensure data integrity. In postgres these logs are stored in pg_xlog and it's recommended to put them in a separate disk. To reset the logs:
service stop postgresql
sudo -u postgres pg_resetxlog $PGDATA
Remove orphaned BLOBs
Orphaned BLOBs are BLOBs that are stored but they are not referenced by oid from any table in the database. To clear them up use vacuumlo
vacuumlo RASBASE
Note, vacuumlo is an additional module made available through postgresql-contrib
VACUUM tables
Tuples that have been deleted or obsoleted are normally not physically deleted, and thus the space continues to be still used, until a VACUUM
is performed on the database or specific tables. VACUUM
alone doesn't free up the disk space for further use by the OS, VACUUM FULL
is necessary for this. In rasdaman it's most important to do periodic VACUUM on the BLOBs table:
- psql -d RASBASE -c "VACUUM FULL pg_largeobject"
or
- vacuumdb -d RASBASE -t "pg_largeobject" -f
Tile cache
A tile cache can be enabled in rasdaman during bulk ingestion, in order to work around inefficiencies of BLOB handling in postgres and slow performance for certain tiling schemes.
How to use
The cache can be enabled in rasmgr.conf by specifying a --cachelimit
parameter after the -xp
parameter:
--cachelimit <limit> (default: 0) specifies upper limit in bytes on using memory for caching
An example rasmgr.conf
configuration that allows the cache to use up to 1GB memory:
define dbh rasdaman_host -connect RASBASE define db RASBASE -dbh rasdaman_host define srv N1 -host HOST -type n -port 9000 -dbh rasdaman_host change srv N1 -countdown 20000000 -autorestart on -xp --timeout 30000000 --cachelimit 1073741824
When enabled, INSERT
and UPDATE
rasql statements can be issued in the normal way. At the end of the ingestion a COMMIT
command should be executed, to make sure that the cache is cleared and everything is flushed to disk. E.g.
rasql -q 'insert into coll ...' rasql -q 'update coll ...' rasql -q 'update coll ...' rasql -q 'update coll ...' # done with ingestion rasql -q 'commit'
Important limitation
At the moment the tile cache should be used during bulk ingestion _ONLY_, and disabled afterwards. Leaving the tile cache on will cause selection queries to fail randomly. Furthermore, it is best to have only one server defined in rasmgr.conf, and -countdown
and --timeout
parameters set to very large values.
Attachments (1)
- calc_kernel_shm.sh (901 bytes ) - added by 11 years ago.
Download all attachments as: .zip