Opened 5 years ago

Closed 5 years ago

#2201 closed enhancement (fixed)

positionally-independent subsetting in rasql

Reported by: Dimitar Misev Owned by: apercov
Priority: major Milestone: 11.0
Component: qlparser Version: 9.8
Keywords: Cc: Peter Baumann, Vlad Merticariu, Bang Pham Huu
Complexity: Hard

Description

rasql currently support positionally-dependent subsetting, i.e. for each axis of the array a trim or slice should be specified in the subset. Example:

select mr2[0:100,50] from mr2

As #1175 was fixed, the axis names in marray type definitions are properly persisted in RASBASE now. Therefore, we can support positionally-independent subsetting like in WCPS and SQL/MDA, where for each trim/slice the axis name is indicated as well, e.g.

select mr2[d0(0:100), d1(50)] from mr2

The axis names give a reference to the addressed axes, so the order doesn't matter anymore. This is equivalent:

select mr2[d1(50), d0(0:100)] from mr2

Furthermore, not all axes have to be specified. Any axes which are not specified default to *:*. For example

select mr2[d1(50)] from mr2
=
select mr2[d0(*:*), d1(50)] from mr2

Error cases

The two subset formats cannot be mixed, e.g. this is an error:

select mr2[d0(0:100), 50] from mr2

Implementation

It is best to implement translation of positionally-independent to dependent subset before the subset is evaluated. This allows to have only one implementation.

Translating is straightforward:

  • add any missing axes with *:*
  • order the subset trim/slices to match the array axis order
  • remove the axis names

Don't forget to add tests in test_subsetting and update documentation.

Change History (9)

comment:1 by Peter Baumann, 5 years ago

good move!!!

comment:2 by Dimitar Misev, 5 years ago

Important: try to keep the subset translation generic. We'll need to use the same for shift, scale, extend, and maybe further operations where axes are referenced.

comment:3 by Peter Baumann, 5 years ago

I'd love to see the concept first, in particular: syntax extensions in the grammar.

comment:4 by Dimitar Misev, 5 years ago

I think that's very straightforward, copy the subset parsing into a variant that allows axis names.

It is actually mostly supported already for the type creation (just missing *).

comment:5 by Dimitar Misev, 5 years ago

Owner: changed from dkamov to apercov
Status: newassigned

comment:6 by apercov, 5 years ago

Status: assignedaccepted

comment:7 by Dimitar Misev, 5 years ago

Milestone: 10.011.0

comment:8 by Dimitar Misev, 5 years ago

Complexity: MediumHard

It appears very difficult to implement this in the QueryTree in a reasonable way.

The subset query tree looks like this

    +-------------------+
    | QtDomainOperation |
    +-+------------+----+
      |            |
      |            |
+-----v-+      +---v---------+
| QtMDD |      | QtMinterval |
+-------+      +-------------+

There is no way currently to access the sdom of the QtMDD operand before evaluate() is executed. Accessing the sdom is necessary in order to translate the pos-independent axes into a standard pos-dependent subset.

The execution order of QueryTree nodes is as follows:

  1. checkType() - makes sure the query is valid
  2. optimizeLoad() - pushes subsets down to the bottom of the tree
  3. evaluate() - evaluates the query tree nodes

The main problem with resolving pos-independent subset in the evaluate() method is that the usual pos-dependent subset is already needed beforehand in optimizeLoad().

In order to be able to access the sdom, checkType() of every node would need to be reviewed to make sure that it properly propagates sdom up the tree, which is not done at the moment.

comment:9 by apercov, 5 years ago

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.