[[PageOutline]]

= Guidelines =


= Benchmarks =

== Ingestion ==

Below are the results of ingesting one dataset using various tilings and postgres parameters.

=== Regular tiling ===

The dataset is 150 png rgb images of size 1024x608, so that each image is 1867776 bytes uncompressed, or 280 MB in total.

 * Default postgres parameters:

|| '''Tiling scheme''' || '''Tile size''' || '''Import time''' || '''DB size before''' || '''DB size after''' || '''BLOBs size''' || '''BLOBs size2''' ||
|| 0:31,0:31,0:149 || 460800 || 84 minutes || 184M || '''19416M''' || 267 MB || 18 GB ||

 * Changed parameters in postgresql.conf: shared buffers = 1024 MB, temp_buffers = 8MB, fsync = off, synchronous_commit = off, wal_sync_method = fsync, full_page_writes = off, wal_buffers = 1MB, wal_writer_delay = 2000ms, checkpoint_segments = 32
   
|| '''Tiling scheme''' || '''Tile size''' || '''Import time''' || '''DB size before''' || '''DB size after''' || '''BLOBs size''' || '''BLOBs size2''' ||
|| 0:31,0:31,0:149 || 460800 || 35 minutes || 184M || '''19416M''' || 267 MB || 18 GB ||

 * Using [#Tilecache tile cache]
   
|| '''Tiling scheme''' || '''Tile size''' || '''Import time''' || '''DB size before''' || '''DB size after''' || '''BLOBs size''' || '''BLOBs size2''' ||
|| 0:31,0:31,0:149 || 460800 || '''48 seconds''' || 37M || 329M || 267 MB || 267 MB ||

=== Regular tiling + VACUUM ===

The dataset is 50 png rgb images of size 1024x608, so that each image is 1867776 bytes uncompressed, or 89 MB in total.

 * Changed parameters in postgresql.conf: shared buffers = 1024 MB, temp_buffers = 8MB, fsync = off, synchronous_commit = off, wal_sync_method = fsync, full_page_writes = off, ''wal_buffers = 8MB, wal_writer_delay = 10000ms, checkpoint_segments = 16''
   
|| '''Tiling scheme''' || '''Tile size''' || '''Import time''' || '''DB size before''' || '''DB size after''' || '''BLOBs size''' || '''BLOBs size2''' || '''BLOBs after [#VACUUMtables VACUUM]'''
|| 0:31,0:31,0:149 || 460800 || 7m 34s || 47M || '''2464M''' || 267 MB || 1776 MB || 96MB || 


= PostgreSQL tips =

== Total size of BLOBs ==
BLOBs are stored in the `pg_largeobject` in postgres. Each BLOB is divided into rows (pages) of 2048 bytes typically. More info [http://www.postgresql.org/docs/8.4/static/catalog-pg-largeobject.html here]

Query below computes the space that all BLOBs take.
{{{
SELECT pg_size_pretty(count(loid) * 2048) FROM pg_largeobject;
}}}

== Total size of BLOBs 2 ==
{{{
SELECT tablename,
       pg_size_pretty(size) AS size_pretty,
       pg_size_pretty(total_size) AS total_size_pretty
FROM (SELECT *, pg_relation_size(schemaname||'.'||tablename) AS size,
                pg_total_relation_size(schemaname||'.'||tablename) AS total_size
      FROM pg_tables) AS TABLES
WHERE TABLES.tablename = 'pg_largeobject'
ORDER BY total_size DESC;
}}}

== Size of individual BLOBs ==
{{{
SELECT 	loid,
        pg_size_pretty(count(*) * 2048)
FROM pg_catalog.pg_largeobject
GROUP BY loid
ORDER BY count(*) DESC;
}}}

== Size of RASBASE tables ==
{{{
SELECT tablename,
       pg_size_pretty(size) AS size_pretty,
       pg_size_pretty(total_size) AS total_size_pretty
FROM (SELECT *, pg_relation_size(schemaname||'.'||tablename) AS size,
                pg_total_relation_size(schemaname||'.'||tablename) AS total_size
      FROM pg_tables) AS TABLES
WHERE TABLES.schemaname = 'public'
ORDER BY total_size DESC;
}}}

== Reset WAL ==
[http://www.postgresql.org/docs/9.1/static/wal-intro.html Write-ahead logging] is used to ensure data integrity. In postgres these logs are stored in pg_xlog and it's [http://www.postgresql.org/docs/9.1/static/wal-internals.html recommended] to put them in a separate disk. To reset the logs:
1. `service stop postgresql`
1. `sudo -u postgres pg_resetxlog $PGDATA`

== Remove orphaned BLOBs ==
Orphaned BLOBs are BLOBs that are stored but they are not referenced by oid from any table in the database. To clear them up use [http://www.postgresql.org/docs/8.4/static/vacuumlo.html vacuumlo]
 * `vacuumlo RASBASE`

Note, vacuumlo is an additional module made available through postgresql-contrib

== VACUUM tables ==
Tuples that have been deleted or obsoleted are normally not physically deleted, and thus the space continues to be still used, until a `VACUUM` is performed on the database or specific tables. `VACUUM` alone doesn't free up the disk space for further use by the OS, `VACUUM FULL` is necessary for this. In rasdaman it's most important to do periodic VACUUM on the BLOBs table:
 * psql -d RASBASE -c "VACUUM FULL pg_largeobject"

or

 * vacuumdb -d RASBASE -t "pg_largeobject" -f


= Tile cache =

A tile cache can be enabled in rasdaman during bulk ingestion, in order to work around inefficiencies of BLOB handling in postgres and slow performance for certain tiling schemes.

== How to use ==

The cache can be enabled in rasmgr.conf by specifying a `--cachelimit` parameter after the `-xp` parameter:
{{{
  --cachelimit <limit>	(default: 0)
         specifies upper limit in bytes on using memory for caching
}}}

An example `rasmgr.conf` configuration that allows the cache to use up to 1GB memory:
{{{
define dbh rasdaman_host -connect RASBASE
define db RASBASE -dbh rasdaman_host

define srv N1 -host HOST -type n -port 9000 -dbh rasdaman_host
change srv N1 -countdown 20000000 -autorestart on -xp --timeout 30000000 --cachelimit 1073741824
}}}

When enabled, `INSERT` and `UPDATE` rasql statements can be issued in the normal way. At the end of the ingestion a `COMMIT` command should be executed, to make sure that the cache is cleared and everything is flushed to disk. E.g.
{{{
rasql -q 'insert into coll ...'
rasql -q 'update coll ...'
rasql -q 'update coll ...'
rasql -q 'update coll ...'

# done with ingestion
rasql -q 'commit'
}}}

== Important limitation ==

At the moment the tile cache should be used during bulk ingestion '''_ONLY_''', and disabled afterwards. Leaving the tile cache on will cause selection queries to fail randomly. Furthermore, it is best to have only one server defined in rasmgr.conf, and `-countdown` and `--timeout` parameters set to very large values.