wiki:PetascopeDevGuide

Version 70 (modified by pbaumann, 21 months ago) (diff)

--

Petascope Developer's Documentation

This page serves as an introduction to the petascope component from a developer's perspective (see also installation guide and user guide). Additional resources include:

Further, unmaintained information:

Petascope relies on a SECORE Coordinate Reference System (CRS) resolver that can provide proper metadata on a coverage's native CRSs. One can either deploy a local SECORE instance, or use the official OGC SECORE resolver at http://www.opengis.net/def/ .

IDE

Petascope can be opened directly as a web project in NetBeans (version > 7.2 recommended at the moment).
In NetBeans, go to File -> Open Project, and open $RASDAMAN/applications/petascope, where $RASDAMAN is path to the rasdaman source tree.

NOTE: to avoid misconfiguration when deploy, rename "petascope" project to "rasdaman" and you can deploy normally inside NetBeans? instead of running make file. This steps will help when you need to debug Petascope directly inside NetBeans? not attaching debugger to Java Web Application like Tomcat or Glassfish. Remember wcs_client is not included in petascope project inside NetBeans?, so to see WCS Client also, you have to use make as it will build NetBeans? also with include wcs_client to rasdaman.war in petascope/build/dist/rasdaman.war

Caches

The caches for CRSs are found in the static CrsUtil class and serve for different purposes:

Additionally, an other cache is kept for storing conversions of upcoming WC*S subsets to rasdaman marray indexes. The method convertToPixelIndices() in CrsUtil can in fact be called more than once within a single WC*S request, i.e. the grid indexes are needed in more than one place. When subsets change the domain set metadata of the coverage indeed the indexes in the GridEnvelope need to be updated, then they are again needed when fetching data from rasdaman to build the RasQL query. Such cache hence ensures a further margin of performance gain.

Petascope database schema: petascopedb

Coverage metadata is stored in relational database tables, called petascopedb. The database schema visual documentation of this database can be downloaded from here. This is where database creation and update is handled.

This schema documentation can be regenerated with this script. The script accepts as arguments the location of the Schema Spy jar file, the location of the PostrgreSQL driver jar file (can be found at $RASDAMAN/applications/petascope/lib/postgresql-...jar) and the location of the petascope.properties settings file ($RMANHOME/etc/petascope.properties).

The structure of the database schema has significantly changed in the shift of major versions from 8 to 9 of rasdaman. The upgrade process (schema, migration, triggers, procedures, etc.) is associated with the update counter n.8: being a major upgrade (and not an incremental update) of the schema, it is composed of multiple SQL files and it is controlled by a bash script, update8.sh.

The schema strongly relies on the GMLCOV coverage model. Starting from the root table ps_coverage the metadata is mainly subdivided into three big branches:

geometry/topology
domainSet in OGC language; to be stored in ps_domain_set and referencing tree of tables.
data/feature
rangeSet; to be stored/referenced in ps_range_set and referencing tree of tables.
description of the feature
rangeType to be stored in ps_range_type_component and referencing tree of tables.

Design principles

The design of the database follows some common principles:

  • surrogate (primary) keys are usually preferred over natural keys, and their column name is id;
  • tables names are composed of a prefix (ps for Petascope) and a label, separated by an underscore:
    <table-name> = <prefix>_<table-label> (e.g. ps_coverage);
  • foreign keys names append _id to the label of the tables from which they import the key:
    <fk-name> = <table-label>_id (e.g. coverage_id)
  • singular names are preferred over plural names for tables (e.g. ps_coverage, and not ps_coverages)
  • composite names are separated by underscore _ (e.g. ps_range_type_component)

The database is relatively active: several integrity rules are applied via triggers (see '$ grep TRIGGER -A1 <rasdaman-git>/applications/petascope/src/main/db/petascope/update8/triggers.sql'). When consistency cannot be evaluated by the database alone, Petascope itself implements guards when reading a coverage's metadata (see read() method in DbMetadataSource.java). This happens for example when coverage metadata has to be validated against its native CRS, resolved from SECORE.

The schema design goal has been focusing on integrity: small (higher normalization) tables and less anomalies were created to handle the high diversity of data that can be expressed by a coverage, gridded or not. Future denormalizations might apply if performance gains pay off.

Tables and fields

Before starting with a detailed description of the database schema, here is a classification of different types of tables:

  • [COV] : COVerage scope
    The table represents an entity of a GMLCOV coverage.
  • [SER] : SERvice scope
    The table represents an entity of the service as a whole (related to GetCapabilities).
  • [DIC] : DICtionary tables
    The table represents a catalog of independent entities/concepts.
  • [n:m] : the table is used to model n:m relationships between tables in the relational db.

These prefixes will be used now to categorize each table to its own scope; colors will enforce the classification and as well will assign tables to their semantic area, with saturation fading out for dependent tables. The description of the tables will be divided by logical packages, and will refer to the tables for the coverage model (W*CS services): it is suggested to read through it along with the graphical view of the schema or its class diagram. Please note, the package of tables for WMS purposes is completely disconnected and is not covered here.

  1. General coverage metadata
    • [COV] ps_coverage : The core table of the database, it is connected directly and indirectly to all other tables (coverage scope). Indeed dropping a coverage from the database is achieved by deleting the record in this table: the DELETE will cascade to all other linked metadata. It stores the name of the coverage (the name you target in the OWS services, which is not necessarily the same as the associated rasdaman collection's name, in case of grid coverages). The GMLCOV type is stored as well: this information is a derived attribute, but it is actively used by Petascope. Finally the native format is stored: application/octet-stream is the default for grid coverages ( = bytes from rasdaman db).
    • [DIC] ps_gml_subtype : The catalog of GMLCOV coverage types: ids are self-referencing within the table, so that Petascope can reconstruct the whole path of coverage types from the concrete to the root AbstractCoverage type in the wcs:Capabilities/wcs:Contents/wcs:CoverageSummary.
    • [COV] ps_extra_metadata : This table is used to provide different kinds of additional "extra" descriptive metadata to a coverage. The type of metadata is stored in ps_extra_metadata_type, which initially defines gmlcov (/wcs:CoverageDescriptions/wcs:CoverageDescription/gmlcov:metadata) and ows (/wcs:Capabilities/wcs:Contents/wcs:CoverageSummary/ows:Metadata) metadata types, plus a special field for the optional attribute table's name of an image (rasgeo component).
    • [DIC] ps_extra_metadata_type : Catalog of extra metadata types, e.g. GMLCOV or OWS, see ps_extra_metadata.
    • [DIC] ps_mime_type : catalog of MIME types, used to define the coverage native format (and as well to support service capabilities description, e.g. supportedFormat).

  2. Geometry (domain set)
    • [COV] ps_domain_set : This table contains domain set metadata shared by any kind of coverage, gridded or not. Since the geometry of a coverage is very much dependent on its type, only the CRS is stored here, as an ordered array of FKs. The order is important since it determines the order of coordinates components that appear in a gml:domainSet.
    • [COV] ps_bounding_box : This table contains lower left and upper right corner points of a bounding box for each coverage as an ordered list of coordinates in the native CRS. Note that this table is now only actively read for !Multi* coverages, while the BBOX of a grid is deduced from its domain set. In the future this table could be used for warped and rotated grids for performance gains.
    • [COV] ps_gridded_domain_set : domainSet metadata shared by any kind of grid coverage (note that GridCoverage types are deprecated here, whereas RectifiedGrid types with Index CRS are the recommended way to store non-geo datasets.). The origin is directly stored in this table as an ordered array of coordinates that must follow the order of axis definition inside the native CRS. The geometry of each grid axis is spread in the other related tables, i.e. ps_grid_axis, ps_rectilinear_axis and ps_vector_coefficients.
    • [COV] ps_grid_axis : Principally, this table determines the order of the axis in the grid topology: it must reflect the position of each axis in the rasdaman marray. To allow future extensions of this database to warped grids, no further information is stored here (e.g. an n:m relationship with a table pointing to LUTs, i.e. rasdaman metadata collections).
    • [COV] ps_rectilinear_axis : Table with information specific to a rectilinear axis of a grid coverage, independently of its spacing. The vectorial resolution (offset vector) is stored as an array of coordinates which must match the order of axes in the native CRS definition. A trigger checks that the cardinality of coordinates here is in accordance with the grid origin in ps_gridded_domain_set.
    • [COV] ps_vector_coefficients : Finally, in case a rectilinear axis has irregular spacing between its grid points (like the temporal dimension in an irregular time series, for instance), additional coefficients are stored, one per row, to weight the distance of each point along this axis to the origin, in terms of offset vectors.
    • [COV] ps_multipoint : Directly connected with ps_domain_set, this table merges domain set metadata and range set data (actual values) for non-gridded multipoint coverages (point clouds). The geographic position of each point is stored as PostGIS geometry column. This table is not created if PostGIS is not found in the system during database update/creation (update_petascopedb.sh). Due to the high number of points in an average point cloud dataset, a unique table with coverage domain and range was created for performance constraints. This table, thus the structural support for point-cloud datasets, is enabled as soon as PostGIS (>2.0) is found to be installed in the hosting machine.
    • [DIC] ps_crs : Catalog of actionable HTTP URIs of Coordinate Reference Systems. They must point to a running SECORE instance. The keyword %SECORE_URL% can be used to parametrize the URI (protocol + domain name [+ port number] + servlet context) to the configured SECORE host, for example %SECORE_URL%/crs/EPSG/0/4326.

  3. Data storage (range set)
    • [COV] ps_range_set : This table serves as dispatcher for possibly different storage methods for a coverage. A pair of fields determine the table and correspondent PK (id) where the storage information is available for the coverage. Currently only ps_rasdaman_collection is a legal table (it is also set by default).
    • [COV] ps_rasdaman_collection : Fundamental table, by means of which Petascope can fetch the actual grid values. From rasdaman 9.0 on there is a 1:1 association between an OGC coverage and a single marray in some rasdaman collection. This means that both collection name and marray OID have to be compulsorily set. Optionally, the rasdaman base type can be set.
    • [COV] ps_multipoint : see table description in the domain set section above.

  4. Feature space (range type)
    • [COV] ps_range_type_component : Rows in this table represent a single band (channel / range type component) of a coverage. Each channel has an order, a label, a data type and a reference to a Sensor Web Enablement (SWE) field for more detailed information. As similarly done for ps_range_set, the reference to the SWE field is here obtained through the pair [table name, tuple PK]. As only continuous SWE quantities are currently supported, the table name defaults to ps_quantity.
    • [DIC] ps_range_data_type : WCPS data types.
    • [DIC] ps_quantity : Catalog of SWE quantities. Full support of the vast set of SWE attributes is not implemented (see #582): label, description, definition URI, unit of measure, NIL values and multiple allowed intervals contraints are currently supported. To ease the interpretation of the schema, a set of primitive dimensionless ([100]) quantities populate the db to which the triggered drop cascade will not have effect. Tuples of ps_uom and ps_interval linked to primitive quantities are primitive as well, and will not be dropped if not manually deleted. Other quantities will be dropped when no other coverage is using them (cleanup behavior).
    • [n:m] ps_quantity_interval : n to m association between ps_quantity and ps_interval: an SWE quantity can be constrained to have multiple allowed intervals, and a same interval can be used to constraint multiple quantities.
    • [DIC] ps_interval : Catalog of (min,max) allowed intervals.
    • [DIC] ps_nil_value : Catalog of NIL values for a data record; each value is associated with a reason, which shall be expressed via URI (e.g. see OGC reasons).
    • [DIC] ps_uom : Catalog of UCUM codes of Unit of Measures (UoM).

  5. Service capabilities and OWS
    • [SER] ps_service_identification : This table stores the /wcs:Capabilities/ows:ServiceIdentification part of service capabilities. A single tuple is allowed in this table: a single WCS service can be provided.
    • [SER] ps_service_provider : Metadata concerning the /wcs:Capabilities/ows:ServiceProvider part of service capabilities, including an ows:Description element (table ps_description).
    • [DIC] ps_telephone : An ows:TelephoneType element, declared by the service provider (//ows:ServiceProvider/ows:ServiceContact/ows:ContactInfo): 0+ voice telephone numbers and 0+ facsimile numbers can be inserted for a single contact.
    • [DIC] ps_role_code : An ows:Role element to be assigned to a service provider: values are taken from Subclause B.5.5 of ISO 19115:2003.
    • [DIC] ps_description : Catalog of ows:Descriptions elements, now only for service identification, but potentially for coverage summaries too. It contains title(s), abstract(s) and one or more keyword groups (ps_keyword_group).
    • [DIC] ps_keyword_group : Catalog of keyword groups (ows:Description/ows:Keywords): each keyword group has a specified type, and can contain different actual keywords, for different languages.
    • [DIC] ps_keyword : Catalog of actual keywords (ows:LanguageStringType), to be part of a keyword group. An optional xml:lang attribute can set the language in which the keyword is written.

  6. Data encoding (See this page for more details)
    • [DIC] ps_mime_type : Catalog of MIME types, used for coverage native formats and service capabilities specification. MIME types are constrained to a maximum of 255 chars, as by RFC spec (type(127) + subtype(127) + "/"(1) ).
    • [DIC] ps_format : Catalog of internal encoding format labels: their GDAL and MIME formats correspondent can be determined via FKs.
    • [DIC] ps_gdal_format : Catalog of GDAL identifiers for different data encoding.

  7. Internal tables for software setup
    • [DIC] ps_numeric_constants : internal table with numeric constants used during new schema creation, population and migration.
    • [DIC] ps_string_constants : internal table with alphanumeric constants used during new schema creation, population and migration.

  8. WMS tables
    • [SER] ps_services : This table stores the service-related parameters, like contact information, service identification, capabilities, etc. Note that only one service can be defined.
    • [DIC] ps_layers : Stores information for the layers of a WMS service, like CRS, bounding box, resolution, etc.
    • [n:m] ps_servicelayer : Pairs services and layers, numbering the layers in the sequence in which they should be displayed.
    • [DIC] ps_styles : A layer can have N styles, defined and linked to the layer in this table.
    • [DIC] ps_pyramidLevels : A WMS layer is usually made up of a pyramid of images; this table links the layer with the corresponding collection in rasdaman, for each scale factor.

Further comments on single tables and single columns can be found in this file. The whole set of SQL files used to create (and migrate) the database schema can be found in the same folder.
For a concrete example of metadata for a coverage, please run our systemtests, then checkout the data for the inserted coverages, possibly with the help of our stored procedures.

SQL macros

A first set of SQL macros is made available for users (and devs) in order to ease the analysis of the (currently only gridded-) coverage metadata in petascopedb, and they are hereby explained with examples.

getCrs('<coverage_name>')
Returns the ordered sequence of single CRS URIs which together constitute the (composed) native CRS of the coverage (such URIs for single CRSs will we shown together on a single compound URI (def/crs-compound?) in the WCS responses):
petascopedb=# SELECT * FROM getCrs('eobstest');
 id |                                       uri                                        
----+----------------------------------------------------------------------------------
 11 | http://localhost:8080/def/crs/OGC/0/Temporal?epoch="1950-01-01T00:00:00"&uom="d"
 12 | http://localhost:8080/def/crs/EPSG/0/4326
(2 rows)
getDomainSet('<coverage_name>')
Returnes the CRS coordinates of grid origin and of every offset vector of the coverage, ordered by the order of grid axis inside rasdaman:
petascopedb=# SELECT * FROM getDomainSet('eobstest');
 rasdaman_order | grid_origin | offset_vector 
----------------+-------------+---------------
              0 | {0,75.5,25} | {1,0,0}
              1 | {0,75.5,25} | {0,0,0.5}
              2 | {0,75.5,25} | {0,-0.5,0}
(3 rows)
getRangeSet('<coverage_name>')
Returns the collection and the internal marray (its OID) associated with the coverage; optionally the rasdaman base type can be set:
petascopedb=# SELECT * FROM getRangeSet('eobstest');
 coverage name | collection name | collection OID | base_type 
---------------+-----------------+----------------+-----------
 eobstest      | eobstest        |         176641 | 
(1 row)
getRangeType('<coverage_name>')
Returns a description of principal SWE metadata for each component of the coverage's range (that is, each band or channel):
petascopedb=# SELECT * FROM getRangeType('rgb');
 component order | name  | SWE type |   data type   | UoM  | allowed interval(s) 
-----------------+-------+----------+---------------+------+---------------------
               0 | red   | Quantity | unsigned char | 10^0 | (0,255)
               1 | green | Quantity | unsigned char | 10^0 | (0,255)
               2 | blue  | Quantity | unsigned char | 10^0 | (0,255)
(3 rows)

Further macros will be developed, but please contact the mailing lists if you want to suggest/request a specific change to existing macros, or a new one.

Limitations

The current schema supports gridded and multipoint coverages. However, it is designed to be extensible to further types in the coverage hierarchy.

Concerning grid coverages, RectifiedGridCoverage types are supported as long as the offset vectors are aligned with CRS axes, i.e. they have a single non-zero component. ReferenceableGridCoverage types are supported as long as the grid axes are, again, aligned with CRS axes (and hence rectilinear). The domainSet of the latter is describe via the concrete element ReferenceableGridByVectors: for the supported types of referenceable grids, this is indeed more compact than ReferenceableGridByArray elements. See the user guide for further details.

Finally, just SWE quantities are currently supported: other kinds of SWE data fields (category, count, etc.) can be easily extended though. SWE metadata is partially configurable (#582) and also currently partially supported by Petascope (#573).

WCPS grammar

In certain situations the grammar definition of the WCPS parser needs to be modified. ANTLR (ANother Tool for Language Recognition) has been adopted for this purpose, you can learn more about ANTLR grammars in the official documentation.

Here are the steps you should follow when working on a patch on the WCPS grammar:

  1. Edit the WCPS grammar (applications/petascope/src/main/java/petascope/wcps/grammar/wcps.g) and add a brief note on the applied changes in the header;
  2. Recompile the grammar so that the new parser and lexer are created:
    $ cd applications/petascope/src/main/java/petascope/wcps/grammar/
    $ java -cp ~/rasdaman/applications/petascope/lib/antlrworks-1.3.1.jar org.antlr.Tool wcps.g
    $ git status
    # On branch <devbranch>
    # Changes not staged for commit:
    #   (use "git add <file>..." to update what will be committed)
    #   (use "git checkout -- <file>..." to discard changes in working directory)
    #
    #       modified:   wcps.g
    #       modified:   wcpsLexer.java
    #       modified:   wcpsParser.java
    
  3. Now you can stage the changes, check the system tests and proceed to the patch.

NOTE: never change wcpsParser.java or wcpsLexer.java directly.

Discussions

Tickets

Summary Reporter Owner Ticket
Petascope_Error when creating the duplicate type for cell/mdd/set for a coverage to be reinserted after removing it bphamhuu bphamhuu #1612
add correct lat/lon to pixel index transformation test vmerticariu bphamhuu #1602
CONCAT should be supported in WCPS? dmisev #1597
Petascope_Create correct base type with range fields when create collection bphamhuu bbell #1342
WCS_CRS_Extension Support reproject on distort coverage. bphamhuu #1316
Petascope_More meaniningful error from Rasql projection() bphamhuu #1308
3D coverages in WMS dmisev dmisev #1304
Rasql_WCPS_Support interpolation argument in project() and crsTransform() functions bphamhuu dmisev #1303
Petascope ODMG - Wrong return from single boolean value in Rasql query bphamhuu #1273
WCST Import should write the result of the ingestion to a log file mdumitru bphamhuu #1143
WCS-T doesn't support CInt16 dmisev dmisev #1095
Inserting slices in middle of existing irregular timeseries dmisev vmerticariu #936
Move coverage's grid origin when domain is extended vmerticariu vmerticariu #913
Customizable handling of sample size of a coverage point pcampalani vmerticariu #680
Support for gml:CompoundCRS pcampalani dmisev #679
WCPS1.5_WCPS Interval expression to actually support mathematical expressions pcampalani vmerticariu #596
Petascope streaming results dmisev dmisev #268

Attachments (8)