Opened 6 years ago
Closed 5 years ago
#2068 closed defect (fixed)
induced condenser treats null values different from regular condensers
Reported by: | Dimitar Misev | Owned by: | apercov |
---|---|---|---|
Priority: | major | Milestone: | 9.8 |
Component: | qlparser | Version: | 9.7 |
Keywords: | Cc: | Peter Baumann | |
Complexity: | Medium |
Description (last modified by )
In regular condense and shorthand condensers like avg_cells, null values are ignored (cf. documentation).
In the induced condenser this is not the case: null values are treated the same way as in binary induced operation, i.e. the result is null if any cell in the operands (or slices in induced condenser) is null (cf. documentation).
With an example, these two expressions are equivalent and hence null value treatment is the same:
select condense + over i in [0:1] using coll3d[ i[0], 0:100, 0:100 ] from coll3d
=
select coll3d[0, 0:100, 0:100] + coll3d[1, 0:100, 0:100] from coll3d
The correct equivalence of the first condense expression, however, should be:
select marray i in [0:100, 0:100] values add_cells( coll3d[ 0:1, i[0], i[1] ] ) from coll3d
Attachments (2)
Change History (15)
comment:1 by , 6 years ago
Description: | modified (diff) |
---|
by , 6 years ago
Attachment: | example.png added |
---|
comment:3 by , 6 years ago
When the induced max condenser is replaced with an equivalent coverage constructor it works as expected, e.g.
for $c1 in ( openEO_S2_32632_10m_L2A ) return encode ( coverage maxcov over $pe E(26265:26366), $pn N(-1920:3289) values max($c1.B08[E($pe),N($pn),DATE("2018-06-16T00:00:00Z":"2018-06-23T00:00:00Z")]) , "tiff" )
This is not a good alternative due to significantly worse performance, however.
comment:4 by , 6 years ago
Makes sense, since the operation performed at every step is actually an induced one, so equivalent to max(slice1, slice2).
This would mean supporting different behavior of nulls in the same operation. Max is not the best example, but if we think of +, we would need to support val + null = null (current case, which would produce results only in the overlapping area) and val + null = val, where null is treated like an identity element.
comment:5 by , 6 years ago
So having some sort of a modifier for binary operations, e.g. NULL VALUES IDENTITY
?
comment:6 by , 6 years ago
Yes, that would be a solution. However, after discussion with Peter we shouldn't deviate from the standard SQL behavior (which is the one currently implemented, where val + null = null) in the general case. There remains the question of whether this should be specifically supported in induced condensers.
comment:7 by , 6 years ago
I think the induced condenser is understood as an aggregation operation, so the "val + null = val" rule is the appropriate one to apply.
comment:8 by , 6 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Ok, makes sense to me. In order to support this, the following steps are necessary:
- binary induced operations must be extended with an option which indicates that null values should be treated as identity (i.e. op(val, null) = op(null, val) = val instead of op(val, null) = op(null, val) = null which is the current behavior and should remain the default one). This option is only for internal use, it won't be exposed in the QL.
- the induced condenser performs a binary induced operation at every step. The operation must be called with the option above enabled.
Test case: the last query in comment:2 should return data for the entire slice.
by , 6 years ago
Attachment: | 0001-ticket-2068-Extend-binary-induced-operations-with-op .patch added |
---|
current patch
comment:9 by , 5 years ago
Owner: | changed from | to
---|
comment:10 by , 5 years ago
Status: | assigned → accepted |
---|
comment:11 by , 5 years ago
Here's a simpler rasql query that helps reproduce the problem easier:
rasql -q 'select encode( condense + over i in [0:1] using ( <[0:1,0:2,0:2] 0, 0, 0; 1, 1, 1; 2, 2, 2; 3, 3, 3; 1, 1, 1; 0, 0, 0> [i[0], 0:2, 0:2] ), "csv")' --out string | grep Result
this is essentially like doing + on the two slices
0, 0, 0 3, 3, 3 1, 1, 1 + 1, 1, 1 2, 2, 2 0, 0, 0
As a result we get as expected
{3,3,3},{2,2,2},{2,2,2} = 3,3,3 2,2,2 2,2,2
Now if we set 0 as null value for the array constant:
rasql -q 'select encode( condense + over i in [0:1] using ( <[0:1,0:2,0:2] 0, 0, 0; 1, 1, 1; 2, 2, 2; 3, 3, 3; 1, 1, 1; 0, 0, 0> [i[0], 0:2, 0:2] null values [0] ), "csv")' --out string | grep Result
we get a bad result, because null value + value = null value
0,0,0 2,2,2 0,0,0
We want to get the same result as before though, i.e. the behavior should be like in add_cells
:
null value + value = value
Relevant documentation: http://doc.rasdaman.org/04_ql-guide.html#nulls-in-aggregation-queries
comment:12 by , 5 years ago
I see what has upset me: regular induced ops follow null op value = null
, and the induced condense is defined as using op
for iterative evaluation. However, taking SQL as the role model, there is an explicit "null elimination step", and that is what we need in our condensers, too. That leads us to the behaviour described above, null op value = value
.
comment:13 by , 5 years ago
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
The attached example shows the problem.
Slice 1:
Slice 2:
Induced max: