Opened 3 years ago

Closed 18 months ago

#772 closed enhancement (fixed)

document inv_csv function

Reported by: mase Owned by: pbaumann
Priority: major Milestone: 9.2
Component: manuals_and_examples Version: 9.0.0
Keywords: Cc: jpass, pbaumann
Complexity: Medium

Description (last modified by dmisev)

Implement an inv_csv function for reading nD CSV files.

Not strictly for CSV, as the formatting doesn't matter much actually, so it's very flexible.
Numbers from the input file are read in order of appearance and stored without any reordering in rasdaman;
whitespace plus the following characters are ignored: '{', '}', ',', '"', '\'', '(', ')', '[', ']'

Mandatory extra parameters:

  • domain - minterval, e.g. [1:5,0:10,2:3]
    • the domain has to match the number of cells read from the input file
  • basetype - array base type, e.g. long, char, etc.
    • struct types have to be specified fully, e.g. struct { char red, char blue, char green }

Example

A is a 2x3 array of longs:

1,2,3,2,1,3

Inserting A can be done with

insert into A values inv_csv($1, "domain=[0:1,0:2];basetype=long")

B is an 1x2 array of RGB values

{1,2,3},{2,1,3}

Inserting B can be done with

insert into B values inv_csv($1, "domain=[0:0,0:1];basetype=struct {char red, char blue, char green}")

B could just as well be formatted like this with the same effect:

1 2 3
2 1 3

Change History (21)

comment:1 Changed 3 years ago by dmisev

  • Cc pbaumann added
  • Component changed from undecided to conversion

inv_csv is a mistake in the QL guide.

comment:2 Changed 3 years ago by dmisev

  • Component changed from conversion to manuals_and_examples
  • Owner changed from dmisev to pbaumann
  • Status changed from new to assigned

Reassigning to Peter for fixing the manual.

comment:3 Changed 3 years ago by pbaumann

done.

comment:4 Changed 3 years ago by dmisev

  • Owner changed from pbaumann to uadhikari

Proposal

Implement an inv_csv conversion function that will read a plaintext csv-like representation of an array.

Problem

How to encode the domain/type of the array in the plaintext file?

The bounding box can be encoded with parentheses or other markers, as is done in the csv function. There is no option for representing the type however.

Solution

Encode domain/type with the extra params of inv_csv. This allows to get rid of the parentheses in the input file, and have just comma-separated values (proper csv encoding).

Rules for the csv encoding:

  • single values are separated by comma
  • composite values are wrapped in braces, within which single values are separated by commas
  • white space is ignored

Extra params:

  • domain - minterval, e.g. [1:5,0:10,2:3]
  • basetype - array base type, e.g. RGBPixel, long, char, etc.

Example

A is a 2x3 array of longs:

1,2,3,2,1,3

Inserting A can be done with

insert into A values inv_csv($1, "domain=[0:1,0:2];basetype=long")

B is an 1x2 array of RGB values

{1,2,3},{2,1,3}

Inserting B can be done with

insert into B values inv_csv($1, "domain=[0:0,0:1];basetype=RGBPixel")

Implementation

In source:conversion/csv.cc the convertFrom() function should be implemented. tiff.cc would provide a good example for the implementation.

Appropriate tests should be provided in source:systemtest/testcases_mandatory/test_conversion/test.sh

comment:5 Changed 3 years ago by dmisev

  • Component changed from manuals_and_examples to conversion
  • Milestone set to 9.1

comment:6 Changed 2 years ago by dmisev

  • Owner changed from uadhikari to vzamfir

comment:7 Changed 2 years ago by dmisev

  • Milestone changed from 9.1 to 9.2

comment:8 Changed 2 years ago by dmisev

  • Description modified (diff)

comment:9 Changed 2 years ago by dmisev

  • Description modified (diff)

comment:10 Changed 2 years ago by dmisev

  • Description modified (diff)
  • Type changed from defect to enhancement

comment:11 Changed 2 years ago by pbaumann

looks good, but it is not exactly CSV ;-)

Goal should be that exported CSV can be imported again within a rasdaman ecosystem. Ideally with other tools as well, but that's a nightmare anyway, see https://en.wikipedia.org/wiki/Comma-separated_values.

Therefore, 2 friendly amendments:

  • make mandatory extra parameters optional and assume suitable defaults (eg, "widest" data type as cell type)
  • to this end, don't ignore nested {...}, but use them for recognizing domains (and throw an exception if the number of elements in some extent does not match with its neighbours)

comment:12 Changed 2 years ago by dmisev

I'm pretty sure any CSV format in inv_csv is supported properly.

What you are proposing is too error-prone and difficult to get right though..
I'd rather have a flexible format support at the expense of having to specify the domain and base type (and a very simple implementation as well).

comment:13 Changed 2 years ago by pbaumann

ok, convinced after giving it a second thought.

comment:14 Changed 2 years ago by pbaumann

ok, it's in - is there anything else I need to know for documenting it in the QL guide?

comment:15 Changed 2 years ago by dmisev

I think no, besides what's in the description of this ticket.

comment:16 Changed 20 months ago by dmisev

  • Owner changed from vzamfir to pbaumann

Reminder to document this, the QL guide still says "Note that inv_csv() is not implemented currently."

comment:17 Changed 19 months ago by dmisev

  • Component changed from conversion to manuals_and_examples
  • Summary changed from inv_csv function not supported to document inv_csv function

comment:18 Changed 19 months ago by pbaumann

the format() functions are deprecated, I hate to describe an obsoleted inv_csv() - therefore: I assume the same functionality is available as decode( $1, "csv" )? What is the exact format specifier? thx.

comment:19 Changed 19 months ago by pbaumann

confirmation needed, is this correct?
"The decode() function automatically detects the format used, so there is no format parameter."

comment:20 Changed 18 months ago by dmisev

Yes that's correct.

comment:21 Changed 18 months ago by pbaumann

  • Resolution set to fixed
  • Status changed from assigned to closed

done.

Note: See TracTickets for help on using tickets.