Opened 4 years ago

Closed 3 years ago

#700 closed defect (fixed)

WCS tupleList is in column-major whereas it should be in row-major

Reported by: dmisev Owned by: vliaukevich
Priority: major Milestone: 9.0.x
Component: petascope Version: development
Keywords: range domain function Cc: pbaumann, adumitru, joosthoek, abeccati
Complexity: Medium

Description

The output of csv in rasdaman is in column-major order, and is verbatim copied to the WCS GML output as far as I know, which expects row-major order.

Here's a script that converts csv of mr to png, considering that the csv is in column-major the output is as expected, if you do it in row-major it's messed up however.

#!/usr/bin/python

import re
import os

ncols = 256
nrows = 211

header = '''ncols %s
nrows %s
xllcorner %s
yllcorner %s
cellsize %s
''' % (ncols,nrows,0,0,1)

output = open("/tmp/data_rasql.asc","w")
output.write(header)

os.system("cd /tmp && rasql -q 'select csv(c) from mr as c' --out file")

tupleList = open("/tmp/rasql_1.csv","r").readline().strip().replace("{","").replace("}","").split(",")
for i in range(nrows):
  for j in range(ncols):
    output.write(tupleList[i + (j * nrows)] + " ")
  output.write("\n")

output.close()

os.system("gdal_translate -of PNG -ot Byte /tmp/data_rasql.asc /tmp/out.png")

Change History (18)

comment:1 Changed 4 years ago by dmisev

We have two options for fixing this

  1. parse the CSV output and reorder in petascope
  2. or we can add a parameter to csv(), so that it allows to output in row-major order, e.g.
    select csv(c, "order=rowmajor") from mr as c
    

I'm more in favor of 2, it will be faster and we may need it outside of petascope as well.

comment:2 Changed 4 years ago by dmisev

A third option is to use

rasql -q 'select encode(c, "AAIGrid") from mr as c' --out file

however this only works for 2D.

comment:3 Changed 4 years ago by pcampalani

1 is suicidal, I prefer the second option.

An other alternative is that Petascope specifies the gml:coverageFunction to follow the sequence rule returned by the CSV output, hence (eg 3D):

<gmlrgrid:sequenceRule axisOrder="+3 +2 +1">Linear</gmlrgrid:sequenceRule>

See "Linear" sequence rule in:
http://rasdaman.org/wiki/PetascopeSubsets

I would stick to the default sequence rule and add the rowmajor option in CSV anyway.

comment:4 Changed 4 years ago by dmisev

  • Owner changed from pcampalani to vliaukevich
  • Status changed from new to assigned

Ok reassigning to Veranika, as she has done quite some changes in the CSV converter last.

To recap, csv() should allow a parameter format=rowmajor:

select csv(c, "format=rowmajor") from mr as c

The default output should still be column-major however.

Later on perhaps we could also add format=gml, so that instead of braces a space is printed.

comment:5 Changed 4 years ago by pcampalani

  • Cc abeccati added
  • Keywords range domain function added

I would call the parameter: sequencerule, or gridfunction, instead of format.
As possible values I would put something not involving row or column terms, but rather inner_outer (+1 +2 __ +N-1 +N), outer_inner ('+N +N-1 __ +2 +1', what is currently done).

To complete the picture, also the startpoint parameter could be added, but I would let this to when it is really needed.

Regarding default values, I would make them the GML way: starting point is sdom.lo for every dimension (current implementation), and inner_outer as the default listing order.

But this might break back-compatibility, so I guess outer_inner should be kept as default.

Again, in order to keep back-compatibility I believe Petascope should not also change the order of coordinates in the tuple list (range values), is that right?
So we could add the gml:coverageFunction in our templates:

<gml:coverageFunction>
  <gml:GridFunction>
    <gml:sequenceRule axisOrder="+N +N-1 __ +2 +1">Linear</gml:sequenceRule>
    <gml:startPoint>0 0 __ 0 0</gml:startPoint>
  </gml:GridFunction>
</gml:coverageFunction>

This can be implemented relatively quickly but I would like to have the nulla osta before tackling it.

comment:6 Changed 4 years ago by pcampalani

Refs: GML 3.2.1 (OGC 07-036), Sections 19.3.11, 19.3.12, 19.3.13, 19.3.14.

comment:7 Changed 4 years ago by vliaukevich

I also prefer "inner_outer" and "outer_inner" terms rather than "row_major" and "column_major", as the latter depend on the notion which of dimensions describes rows and which describes columns (as I am used to the notion, in which first dimension describes rows, so [0:2, 0:1] is a table with 3 rows and 2 columns, and thus current CSV encoder already returns array in the row-major order).

comment:8 Changed 4 years ago by vliaukevich

The patch was committed, now you can just add option "order=inner_outer" to the csv converter. The corresponding changes in QL Guide from Peter are coming.

comment:9 Changed 4 years ago by pbaumann

caveat: for 9.0.1, this only is available with csv() and inv_csv(), not yet with encode(c,"csv").

comment:10 follow-up: Changed 4 years ago by pcampalani

While this new feature should not be used by Petascope for backcompatibility (define coverage function instead, and keep outer_inner order), I suggest to exploit it somehow when encoding to some binary format.

$ grep row-major qlparser/qtencode.cc
// for all bands, convert data from column-major form (from Rasdaman) to row-major form (GDAL)

comment:11 in reply to: ↑ 10 Changed 4 years ago by dmisev

Replying to pcampalani:

While this new feature should not be used by Petascope for backcompatibility (define coverage function instead, and keep outer_inner order), I suggest to exploit it somehow when encoding to some binary format.

$ grep row-major qlparser/qtencode.cc
// for all bands, convert data from column-major form (from Rasdaman) to row-major form (GDAL)

That's something else, I don't see how this ticket is related.

comment:12 Changed 4 years ago by pcampalani

I thought this option was rooted in the way data is extracted from rasdaman (not CSV only), so that the column-major/row-major transformations could be avoided: might mean a good performance gain. Anyway I might misunderstand the internal mechanics here, so spawn to an other ticket if that can make sense, otherwise let's just blow my comments.

comment:13 Changed 4 years ago by pbaumann

IMHO it is good to have control on both levels - rasql clients requesting CSV may need this as well.

comment:14 Changed 4 years ago by pbaumann

storage is done by rasdaman in a way that the arrays arrive natively in the C++ engine. Also, it needs to be delivered this way to the client to keep the promise that "you can readily iterate over the array with your C++ client code". So we cannot change that, but we can indeed change provisioning of data for various purposes -> use encode() parameters, as we do.

comment:15 Changed 4 years ago by vliaukevich

Yes, outer_inner (default) order uses native internal array layout, and thus is preferred to be used to prevent random reads from the memory and to gain maximum performance.

comment:16 Changed 3 years ago by vliaukevich

  • Resolution set to fixed
  • Status changed from assigned to closed

comment:17 Changed 3 years ago by pcampalani

  • Resolution fixed deleted
  • Status changed from closed to reopened

Re-opening: WCS output is not fixed, see my pending patch. A `gml:coverageFunction' must be specified to fix the WCS output.
Veranika, I believe you patched the RasQL CSV encoder right?

PS always refer to the associated changeset when you resolve a ticket thx.

comment:18 Changed 3 years ago by pcampalani

  • Resolution set to fixed
  • Status changed from reopened to closed

Coverage function added for gridded coverages in changeset:82f7b71.
Now the declared mapping from domain points to rangeset values is correct (outer-inner).

Note: See TracTickets for help on using tickets.