Opened 11 years ago

Closed 9 years ago

#772 closed enhancement (fixed)

document inv_csv function

Reported by: Marcus Sen Owned by: Peter Baumann
Priority: major Milestone: 9.2
Component: manuals_and_examples Version: 9.0
Keywords: Cc: James Passmore, Peter Baumann
Complexity: Medium

Description (last modified by Dimitar Misev)

Implement an inv_csv function for reading nD CSV files.

Not strictly for CSV, as the formatting doesn't matter much actually, so it's very flexible.
Numbers from the input file are read in order of appearance and stored without any reordering in rasdaman;
whitespace plus the following characters are ignored: '{', '}', ',', '"', '\'', '(', ')', '[', ']'

Mandatory extra parameters:

  • domain - minterval, e.g. [1:5,0:10,2:3]
    • the domain has to match the number of cells read from the input file
  • basetype - array base type, e.g. long, char, etc.
    • struct types have to be specified fully, e.g. struct { char red, char blue, char green }

Example

A is a 2x3 array of longs:

1,2,3,2,1,3

Inserting A can be done with

insert into A values inv_csv($1, "domain=[0:1,0:2];basetype=long")

B is an 1x2 array of RGB values

{1,2,3},{2,1,3}

Inserting B can be done with

insert into B values inv_csv($1, "domain=[0:0,0:1];basetype=struct {char red, char blue, char green}")

B could just as well be formatted like this with the same effect:

1 2 3
2 1 3

Change History (21)

comment:1 by Dimitar Misev, 11 years ago

Cc: Peter Baumann added
Component: undecidedconversion

inv_csv is a mistake in the QL guide.

comment:2 by Dimitar Misev, 11 years ago

Component: conversionmanuals_and_examples
Owner: changed from Dimitar Misev to Peter Baumann
Status: newassigned

Reassigning to Peter for fixing the manual.

comment:3 by Peter Baumann, 11 years ago

done.

comment:4 by Dimitar Misev, 10 years ago

Owner: changed from Peter Baumann to uadhikari

Proposal

Implement an inv_csv conversion function that will read a plaintext csv-like representation of an array.

Problem

How to encode the domain/type of the array in the plaintext file?

The bounding box can be encoded with parentheses or other markers, as is done in the csv function. There is no option for representing the type however.

Solution

Encode domain/type with the extra params of inv_csv. This allows to get rid of the parentheses in the input file, and have just comma-separated values (proper csv encoding).

Rules for the csv encoding:

  • single values are separated by comma
  • composite values are wrapped in braces, within which single values are separated by commas
  • white space is ignored

Extra params:

  • domain - minterval, e.g. [1:5,0:10,2:3]
  • basetype - array base type, e.g. RGBPixel, long, char, etc.

Example

A is a 2x3 array of longs:

1,2,3,2,1,3

Inserting A can be done with

insert into A values inv_csv($1, "domain=[0:1,0:2];basetype=long")

B is an 1x2 array of RGB values

{1,2,3},{2,1,3}

Inserting B can be done with

insert into B values inv_csv($1, "domain=[0:0,0:1];basetype=RGBPixel")

Implementation

In source:conversion/csv.cc the convertFrom() function should be implemented. tiff.cc would provide a good example for the implementation.

Appropriate tests should be provided in source:systemtest/testcases_mandatory/test_conversion/test.sh

comment:5 by Dimitar Misev, 10 years ago

Component: manuals_and_examplesconversion
Milestone: 9.1

comment:6 by Dimitar Misev, 9 years ago

Owner: changed from uadhikari to Vlad Zamfir

comment:7 by Dimitar Misev, 9 years ago

Milestone: 9.19.2

comment:8 by Dimitar Misev, 9 years ago

Description: modified (diff)

comment:9 by Dimitar Misev, 9 years ago

Description: modified (diff)

comment:10 by Dimitar Misev, 9 years ago

Description: modified (diff)
Type: defectenhancement

comment:11 by Peter Baumann, 9 years ago

looks good, but it is not exactly CSV ;-)

Goal should be that exported CSV can be imported again within a rasdaman ecosystem. Ideally with other tools as well, but that's a nightmare anyway, see https://en.wikipedia.org/wiki/Comma-separated_values.

Therefore, 2 friendly amendments:

  • make mandatory extra parameters optional and assume suitable defaults (eg, "widest" data type as cell type)
  • to this end, don't ignore nested {…}, but use them for recognizing domains (and throw an exception if the number of elements in some extent does not match with its neighbours)

comment:12 by Dimitar Misev, 9 years ago

I'm pretty sure any CSV format in inv_csv is supported properly.

What you are proposing is too error-prone and difficult to get right though..
I'd rather have a flexible format support at the expense of having to specify the domain and base type (and a very simple implementation as well).

comment:13 by Peter Baumann, 9 years ago

ok, convinced after giving it a second thought.

comment:14 by Peter Baumann, 9 years ago

ok, it's in - is there anything else I need to know for documenting it in the QL guide?

comment:15 by Dimitar Misev, 9 years ago

I think no, besides what's in the description of this ticket.

comment:16 by Dimitar Misev, 9 years ago

Owner: changed from Vlad Zamfir to Peter Baumann

Reminder to document this, the QL guide still says "Note that inv_csv() is not implemented currently."

comment:17 by Dimitar Misev, 9 years ago

Component: conversionmanuals_and_examples
Summary: inv_csv function not supporteddocument inv_csv function

comment:18 by Peter Baumann, 9 years ago

the format() functions are deprecated, I hate to describe an obsoleted inv_csv() - therefore: I assume the same functionality is available as decode( $1, "csv" )? What is the exact format specifier? thx.

comment:19 by Peter Baumann, 9 years ago

confirmation needed, is this correct?
"The decode() function automatically detects the format used, so there is no format parameter."

comment:20 by Dimitar Misev, 9 years ago

Yes that's correct.

comment:21 by Peter Baumann, 9 years ago

Resolution: fixed
Status: assignedclosed

done.

Note: See TracTickets for help on using tickets.