Opened 8 years ago

Closed 5 years ago

#1049 closed defect (fixed)

subsetting in rasdaman is inconsistent

Reported by: Dimitar Misev Owned by: apercov
Priority: major Milestone: 10.0
Component: qlparser Version: development
Keywords: Cc: Peter Baumann, Vlad Merticariu
Complexity: Medium

Description (last modified by Dimitar Misev)

Subsetting does not behave very consistently, depending on the actual tiles I'd say (discovered in ticket #1036).

Subsetting needs to conform with SQL/MDA, i.e. the subset must be within the array sdom.

Anything else, e.g. a subset that intersects or is completely outside of the array sdom is an exception: "subset S not within array domain D".

Currently we have dedicated systemtest in source:systemtest/testcases_mandatory/test_subsetting the queries should be verified that they produce the correct outputs or errors.

Attachments (1)

0001-ticket-1049-test-subsetting.patch (63.6 KB ) - added by Dimitar Misev 7 years ago.
incomplete patch addressing this ticket

Download all attachments as: .zip

Change History (23)

comment:1 by Dimitar Misev, 8 years ago

The correct behavior should be:

  • if any subset index falls out of the sdom an error is thrown
  • otherwise data is returned, where data is
    • actual cell values at positions where tiles are materialized
    • null where no tiles are materialized

@Peter can you confirm?

comment:2 by Bang Pham Huu, 8 years ago

As my last test with Ricardo (ticket 1036) I can select outside of coverage and it returns for me an array with all is NULL value (I think it is not valid). Subsetting should work like a 'crop image':

+ If you crop outside of an image, you got nothing (a message is nice for user to let they know they have queried outside of coverage).
+ If you crop just a piece of image and the rest is outside you will get only a piece of image (with no NULL pixels of outside image).

This is very natural understanding and I thought Rasdaman has followed from very first time.

in reply to:  1 comment:3 by Peter Baumann, 8 years ago

Replying to dmisev:

The correct behavior should be:

  • if any subset index falls out of the sdom an error is thrown
  • otherwise data is returned, where data is
    • actual cell values at positions where tiles are materialized
    • null where no tiles are materialized

@Peter can you confirm?

confirmed, this reflects exactly the expected behavior.

comment:4 by Dimitar Misev, 8 years ago

Owner: changed from Bang Pham Huu to Dimitar Misev
Status: newassigned

comment:5 by Peter Baumann, 8 years ago

where do we stand on this?

comment:6 by Dimitar Misev, 8 years ago

I've fixed the exception throwing some time ago, but there were still some corner cases segfaulting. In the meantime it seems I managed to lose my patch.. so I'll have to redo that.

comment:7 by Dimitar Misev, 8 years ago

I need to confirm: the only valid subset is the subset that intersects the sdom?

comment:8 by Dimitar Misev, 8 years ago

  • Consider collection test_subsetting_single:
    • 2D RGB array
    • sdom: [0:399,0:343]
    • tiles: [0:399,0:343]
select sdom(c[-400:150,-200:350]) from test_subsetting_single as c

What should the query above print?

  1. [-400:150,-200:350]
  2. [0:150,0:343] - this is what it currently prints
  3. error?
Last edited 8 years ago by Dimitar Misev (previous) (diff)

comment:9 by Bang Pham Huu, 8 years ago

I expect the 2 is correct way as it has data in this region.

comment:10 by Dimitar Misev, 8 years ago

I'd expect either 1 or 3.

  1. is certainly incorrect: the query asks for a certain subset and gets something else.
Last edited 8 years ago by Dimitar Misev (previous) (diff)

comment:11 by Peter Baumann, 8 years ago

I see that the QL Guide p 50 is inexact: "The generalized trim operator allows restriction, extension, and a combination of both operations in a shorthand syntax. This operator does not check for proper subsetting or supersetting of the domain modifier."

As to choices above:

  1. is a possible option, but I do not like it because addressing errors are not captured.
  2. is awkward to me: hard to anticipate, error prone.
  3. is what I would prefer myself, but I know there are people with other opinions.

If you want to achieve (1) I'd suggest

select extend(c, [-400:399,-200:343]) [-400:150,-200:350] from...

Agreed, this is awkward, but it clearly expresses intended user behavior.

PS: What does SQL/MDA specify? We should go along its line.

comment:12 by Alex Dumitru, 8 years ago

I would vote for 3 as well, it's better to throw an error than to take some more or less random decision for the user.
We need to be careful though, I'm assuming there are plenty of tools(perhaps ours as well) that expect the current behavior. Maybe it should be left for 10.0?

comment:13 by Dimitar Misev, 8 years ago

Ok so subsetting should be restricted to the sdom, i.e. in this case it would result in 3. This is the case in SQL/MDA as well.

In the implementation of rasdaman it's restricted up to intersecting with the sdom (although it's buggy so instead of 1. as I would expect, it prints 2.). So if we are to change the ql-guide then this should definitely be left for v10.

comment:14 by Peter Baumann, 8 years ago

yes, makes sense to me. Let's do this.

comment:15 by Dimitar Misev, 7 years ago

Cc: Alex Dumitru removed
Owner: changed from Dimitar Misev to bbell

I'm reassigning this ticket to Brennan. The first step is to specify a comprehensive test that will cover all (or most at least) theoretically possible variations, including erroneous cases.

Attached is a patch I've started working on sometime ago, that also contains a bunch of tests. It could be a good starting point.

by Dimitar Misev, 7 years ago

incomplete patch addressing this ticket

comment:16 by Dimitar Misev, 7 years ago

Milestone: 9.1.x10.0
Priority: criticalmajor

comment:17 by Peter Baumann, 7 years ago

FYI, the original idea was:

  • have a convenient shorthand notation which is safe against domain mismatch; these are the brackets.
  • have a general, different function which permits freedom of partial overlaps etc; we called it extend().

comment:18 by bbell, 7 years ago

I added a handful of tests in a recent patch which illustrate the sdom inconsistencies.

Current known_fails: requests where the sought-after sdom coordinates are not a subset of the existing spatial coordinates SHOULD throw an exception, but at the moment, they simply return the intersection. In case the intersection would be empty, the completely-out-of-bounds sdom is returned.

Last edited 7 years ago by bbell (previous) (diff)

comment:19 by Dimitar Misev, 5 years ago

To summarize: subsetting needs to conform with SQL/MDA, i.e. the subset must be within the array sdom.

Anything else, e.g. intersection or outside of the array sdom is an exception: "subset S not within array domain D".

This is also in line with the *: if subsetting would be allowed to extend out of the sdom, then suddenly the * becomes generally not well-defined.

comment:20 by Dimitar Misev, 5 years ago

Description: modified (diff)
Owner: changed from bbell to apercov

comment:21 by apercov, 5 years ago

Status: assignedaccepted

comment:22 by Dimitar Misev, 5 years ago

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.