wiki:Dev/NetcdfFormat

Version 14 (modified by Dimitar Misev, 9 years ago) ( diff )

Support for decoding netCDF in rasdaman

netCDF format support (decode/encode) is not complete enough in rasdaman. In particular decode and encode are not mapped 1:1 - the netcdf output only exports a single array variables, while ignoring dimension variables and any additional metadata.

The metadata and dimension variables can be preserved on netCDF export in conjunction with WCSTImport and petascope.

Example

The netCDF file below is used as an example.

netcdf tos_O1_2001-2002 {
dimensions:
	lon = 180 ;
	lat = 170 ;
	time = UNLIMITED ; // (24 currently)
	bnds = 2 ;
variables:
	double lon(lon) ;
		lon:standard_name = "longitude" ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:axis = "X" ;
		lon:bounds = "lon_bnds" ;
		lon:original_units = "degrees_east" ;
	double lon_bnds(lon, bnds) ;
	double lat(lat) ;
		lat:standard_name = "latitude" ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:axis = "Y" ;
		lat:bounds = "lat_bnds" ;
		lat:original_units = "degrees_north" ;
	double lat_bnds(lat, bnds) ;
	double time(time) ;
		time:standard_name = "time" ;
		time:long_name = "time" ;
		time:units = "days since 2001-1-1" ;
		time:axis = "T" ;
		time:calendar = "360_day" ;
		time:bounds = "time_bnds" ;
		time:original_units = "seconds since 2001-1-1" ;
	double time_bnds(time, bnds) ;
	float tos(time, lat, lon) ;
		tos:standard_name = "sea_surface_temperature" ;
		tos:long_name = "Sea Surface Temperature" ;
		tos:units = "K" ;
		tos:cell_methods = "time: mean (interval: 30 minutes)" ;
		tos:_FillValue = 1.e+20f ;
		tos:missing_value = 1.e+20f ;
		tos:original_name = "sosstsst" ;
		tos:original_units = "degC" ;
		tos:history = " At   16:37:23 on 01/11/2005: CMOR altered the data in the following ways: added 2.73150E+02 to yield output units;  Cyclical dimension was output starting at a different lon;" ;

// global attributes:
		:title = "IPSL  model output prepared for IPCC Fourth Assessment SRES A2 experiment" ;
		:institution = "IPSL (Institut Pierre Simon Laplace, Paris, France)" ;
		:source = "IPSL-CM4_v1 (2003) : atmosphere : LMDZ (IPSL-CM4_IPCC, 96x71x19) ; ocean ORCA2 (ipsl_cm4_v1_8, 2x2L31); sea ice LIM (ipsl_cm4_v" ;
		:contact = "Sebastien Denvil, sebastien.denvil@ipsl.jussieu.fr" ;
		:project_id = "IPCC Fourth Assessment" ;
		:table_id = "Table O1 (13 November 2004)" ;
		:experiment_id = "SRES A2 experiment" ;
		:realization = 1 ;
		:cmor_version = 0.96f ;
		:Conventions = "CF-1.0" ;
		:history = "YYYY/MM/JJ: data generated; YYYY/MM/JJ+1 data transformed  At 16:37:23 on 01/11/2005, CMOR rewrote data to comply with CF standards and IPCC Fourth Assessment requirements" ;
		:references = "Dufresne et al, Journal of Climate, 2015, vol XX, p 136" ;
		:comment = "Test drive" ;
}

To ingest into rasdaman:

$ rasql -q 'create collection test_nc FloatSet3' --user rasadmin --passwd rasadmin
$ rasql -q 'insert into test_nc values decode($1, "netCDF", "vars=tos")' -f tos_O1_2001-2002.nc --user rasadmin --passwd rasadmin

This creates a 3D float array of size:

$ rasql -q 'select sdom(c) from test_nc as c' --out string --quiet
[0:23,0:169,0:179]

Exporting the array with

$ rasql -q 'select encode(c, "netCDF") from test_nc as c' --out file

results in a file

netcdf rasql_1 {
dimensions:
	dim_0 = 24 ;
	dim_1 = 170 ;
	dim_2 = 180 ;
variables:
	float data(dim_0, dim_1, dim_2) ;
		data :missing_value = "NaNf" ;

// global attributes:
		:Conventions = "CF-1.4" ;
		:Institution = "rasdaman.org, Jacobs University Bremen" ;
}

rasql output data (first 540 values):

  1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20,
  1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    275.8637, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20,
  1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 275.6657, 275.8596, 276.042, 276.1425, 276.1662, 276.1198, 
    276.0396, 275.9248, 275.787, 275.686, 275.5566, 275.3785, 275.1796, 
    274.9005, 274.5688, 274.108, 273.39, 272.6334, 272.0998, 271.3627, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 272.4681, 271.1732, 271.1732, 
    271.1732, 271.1736, 271.1736, 271.5586, 272.0218, 272.4012, 272.5677, 
    272.4937, 272.2816, 271.9929, 271.6465, 271.3219, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 1e+20, 
    1e+20, 1e+20, 1e+20,
    ....

original data:

  _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _,
  _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 275.8637, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _,
  _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, 275.6657, 275.8596, 276.042, 276.1425, 
    276.1662, 276.1198, 276.0396, 275.9248, 275.787, 275.686, 275.5566, 
    275.3785, 275.1796, 274.9005, 274.5688, 274.108, 273.39, 272.6334, 
    272.0998, 271.3627, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, 272.4681, 271.1732, 271.1732, 271.1732, 271.1736, 
    271.1736, 271.5586, 272.0218, 272.4012, 272.5677, 272.4937, 272.2816, 
    271.9929, 271.6465, 271.3219, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _,

The values are identically ordered, so there is no issue with the data export.

Problems

  • X and Y are generally transposed (it's not enforced, but recommended by CF); so without transposing them during import/export, the netcdf data is rotated while in rasdaman — this is apparent when exporting to png for example
  • the nodata value is not properly set
  • all metadata is lost
  • dimension and data variable names are lost
  • dimension variables are lost

Metadata to be preserved

The stuff that need to be preserved, in addition to the variable data (of variable tos for example):

  • global metadata:
    // global attributes:
    		:title = "IPSL  model output prepared for IPCC Fourth Assessment SRES A2 experiment" ;
    		:institution = "IPSL (Institut Pierre Simon Laplace, Paris, France)" ;
    		:source = "IPSL-CM4_v1 (2003) : atmosphere : LMDZ (IPSL-CM4_IPCC, 96x71x19) ; ocean ORCA2 (ipsl_cm4_v1_8, 2x2L31); sea ice LIM (ipsl_cm4_v" ;
    		:contact = "Sebastien Denvil, sebastien.denvil@ipsl.jussieu.fr" ;
    		:project_id = "IPCC Fourth Assessment" ;
    		:table_id = "Table O1 (13 November 2004)" ;
    		:experiment_id = "SRES A2 experiment" ;
    		:realization = 1 ;
    		:cmor_version = 0.96f ;
    		:Conventions = "CF-1.0" ;
    		:history = "YYYY/MM/JJ: data generated; YYYY/MM/JJ+1 data transformed  At 16:37:23 on 01/11/2005, CMOR rewrote data to comply with CF standards and IPCC Fourth Assessment requirements" ;
    		:references = "Dufresne et al, Journal of Climate, 2015, vol XX, p 136" ;
    		:comment = "Test drive" ;
    
  • variable metadata
    		tos:standard_name = "sea_surface_temperature" ;
    		tos:long_name = "Sea Surface Temperature" ;
    		tos:units = "K" ;
    		tos:cell_methods = "time: mean (interval: 30 minutes)" ;
    		tos:_FillValue = 1.e+20f ;
    		tos:missing_value = 1.e+20f ;
    		tos:original_name = "sosstsst" ;
    		tos:original_units = "degC" ;
    		tos:history = " At   16:37:23 on 01/11/2005: CMOR altered the data in the following ways: added 2.73150E+02 to yield output units;  Cyclical dimension was output starting at a different lon;" ;
    
  • 1D dimension variables of the dimensions of tos (time, lat, lon in the example), and their associated metadata:
    	double lon(lon) ;
    		lon:standard_name = "longitude" ;
    		lon:long_name = "longitude" ;
    		lon:units = "degrees_east" ;
    		lon:axis = "X" ;
    		lon:bounds = "lon_bnds" ;
    		lon:original_units = "degrees_east" ;
    	double lat(lat) ;
    		lat:standard_name = "latitude" ;
    		lat:long_name = "latitude" ;
    		lat:units = "degrees_north" ;
    		lat:axis = "Y" ;
    		lat:bounds = "lat_bnds" ;
    		lat:original_units = "degrees_north" ;
    	double time(time) ;
    		time:standard_name = "time" ;
    		time:long_name = "time" ;
    		time:units = "days since 2001-1-1" ;
    		time:axis = "T" ;
    		time:calendar = "360_day" ;
    		time:bounds = "time_bnds" ;
    		time:original_units = "seconds since 2001-1-1" ;
    
  • data variable and dimension names, right now generic names are used
  • further variables that seem to look like dimension variables?
    	double lat_bnds(lat, bnds) ;
    	double time_bnds(time, bnds) ;
    

This information should be modeled in json format and given as format parameters to the encode function.

encode format parameters

Worth looking at: https://github.com/jllodra/ncdump-json

{
  "dimensions": [ "time", "lat", "lon" ], // dimension names

  "variables": [                          // each variable has metadata and data, except the one for the array in rasdaman which has only metadata

    "time": {
      "type": "double",

      "metadata": {
         "standard_name" = "time",
         "long_name" = "time",
         "units" = "days since 2001-1-1",
         "axis" = "T",
         "calendar" = "360_day",
         "bounds" = "time_bnds",
         "original_units" = "seconds since 2001-1-1"
       },

       "data": [ 1,2,3,4,5,6,... ]
    },

    "lat": {
       ...
    },

    "lon": {
       ...
    }

    "tos": {
      "type": "double",

      "metadata": {
         "standard_name" = "sea_surface_temperature",
         "long_name" = "Sea Surface Temperature",
         "units" = "K",
         "cell_methods" = "time: mean (interval: 30 minutes)",
         "_FillValue" = 1.e+20f,
         "missing_value" = 1.e+20f,
         "original_name" = "sosstsst",
         "original_units" = "degC",
         "history" = " At   16:37:23 on 01/11/2005: CMOR altered the data in the following ways: added 2.73150E+02 to yield output units;  Cyclical dimension was output starting at a different lon;"
       }
       
       // no "data" field, indicating this is the array in rasdaman
    },

  ],

  "global": {
    "metadata": {
      ... 
    }
  }
}
Note: See TracWiki for help on using the wiki.