Opened 4 years ago

Closed 4 years ago

Last modified 3 years ago

#505 closed defect (fixed)

Performance leak for timestamp to numeric coordinate conversion

Reported by: pcampalani Owned by: pcampalani
Priority: blocker Milestone: Future
Component: petascope Version: 8.5
Keywords: time conversion performance Cc: dmisev, pbaumann
Complexity: Medium

Description

When a WCS GetCoverage requests specifies temporal subsettings with ISO8601 notation (which shall be within double-quotes), the values are converted to the proper numerical equivalent a priori, based on the temporal CRS associated with the coverage.

Currently the algorithm is based on a simple increase of time units from the origin of the t-CRS, but this can be very expensive when the temporal resolution is much smaller with respect to the origin-subset range, e.g. with Unix time CRS.

An optimal algorithm should be implemented, which flexibly adjusts to different time scales and steps.

Change History (11)

comment:1 follow-up: Changed 4 years ago by pcampalani

A possible solution is to define a temporal increment which starts from a big value, and gets decreased (log-scale, say) if the single increment goes beyond the subset upper bound.
Deacreasing this increment until reaching the single temporal unit of resolution and proceed until the single resolution increment exceeds the subset.

PROBLEM: how many resolutions T_RES fit in interval [LO,HI]?

  * # Init
    T_HEAD=LO;
    T_UNITS_COUNT=0; 
    K0=<some_value> e.g. 6
    K=K0

  * # Find biggest increment that still stays within the interval
    WHILE (T_HEAD+(10^K*T_RES) > HI); DO
       K--;
    DONE

  * # Now start the algorithm
    WHILE (K >= 0); DO
      WHILE (T_HEAD < HI); DO
        T_LAST = T_HEAD
        T_HEAD        += 10^K*T_RES;
        T_UNITS_COUNT += 10^K
      DONE
      K--
    DONE

    -- At this point (T_HEAD-T_LAST)=T_RES and we are missing (HI-T_LAST)
    T_UNITS_COUNT += (HI-T_LAST)/T_RES

This algorithm is much faster than the current implementation which adds T_RES until the HI upper limit is found.

Issues: scalar*DateTime operation is not implemented in the Date4J library, and needs to be manually done, taking care of all overflows. http://www.date4j.net/javadoc/index.html

A JUnit tests is mandatory for such method.

comment:2 in reply to: ↑ 1 Changed 4 years ago by dmisev

Replying to pcampalani:

Issues: scalar*DateTime operation is not implemented in the Date4J library, and needs to be manually done, taking care of all overflows. http://www.date4j.net/javadoc/index.html

Maybe there are some alternative library that can handle that? Unless we are deeply involved with Date4J already.

comment:3 Changed 4 years ago by dmisev

Isn't this in general a problem for irregular coverages btw, for any type of axis and not just temporal?

comment:4 Changed 4 years ago by pcampalani

For that you can exploit the database: see petascope.core.DbMetadataSource.getIndexesFromIrregularRectilinearAxis().

comment:5 Changed 4 years ago by pcampalani

The issue of this ticket is prior to irregular handling: is in the conversion from input timestamps to correspondent numerical form (number = coordinate in the associated Temporal CRS).

comment:6 Changed 4 years ago by pcampalani

  • Cc dmisev pbaumann added
  • Priority changed from major to blocker

I suggest to make this ticket blocker and delay the release of a few days: using timestamps becomes easily unusable before this issue is not fixed, and probably is one of most highly awaited features.

comment:7 Changed 4 years ago by pcampalani

Instead of manual algorithm, it might be safer and wiser to inspect other Java libraries, mainly Joda, or even the (amazing) date unix utility, although this could be dangerous since the behavior can depend on the system's installed versions and configurations.

Runtime r = Runtime.getSystemRuntime();
r.exec("date -d 'now - 123 weeks'");

comment:8 Changed 4 years ago by pcampalani

  • Status changed from new to accepted

comment:9 Changed 4 years ago by pcampalani

  • Resolution set to fixed
  • Status changed from accepted to closed

Problem fixed in changeset:3d51a00.

Instead of implementing the optimized algorithm described in comment:1, the date4j library has been replaced by Joda-Time.

With Joda-Time, time instants are internally stored as milliseconds from Unix epoch. It then provides methods to translate timestamps ISO (and also custom) formats back and forth to this internal Temporal CRS.

By applying the correct coefficient, the distance in milliseconds between two time instants can be easily (and quickly) converted to the distance in the Temporal CRS that is used.

The supported UCUM codes currently are:

    // Standard codes for supported temporal Unit of Measures (which provide ISO8601 interface to the user)
    // http://unitsofmeasure.org/ucum.html
    public static final String UCUM_MILLIS               = "ms";
    public static final String UCUM_SECOND               = "s";
    public static final String UCUM_MINUTE               = "min";
    public static final String UCUM_HOUR                 = "h";
    public static final String UCUM_DAY                  = "d";
    public static final String UCUM_WEEK                 = "wk";
    public static final String UCUM_MEAN_JULIAN_MONTH    = "mo_j";
    public static final String UCUM_MEAN_GREGORIAN_MONTH = "mo_g";
    public static final String UCUM_SYNODAL_MONTH        = "mo_s";
    public static final String UCUM_MONTH                = "mo";
    public static final String UCUM_MEAN_JULIAN_YEAR     = "a_j";
    public static final String UCUM_MEAN_GREGORIAN_YEAR  = "a_g";
    public static final String UCUM_TROPICAL_YEAR        = "a_t";
    public static final String UCUM_YEAR                 = "a";

This means that the //CoordinateSystemAxis/@uom attribute of the Temporal CRS in use needs to show such UCUM code in order to be able to insert quoted timestamps in a WC*S request. If this is not the case (eg ma uom, megaannum?), this does not mean that the CRS is not supported, but only the conversion to ISO timestamp is.

The ISO formatted I used had default UTC time zone and optional time part (date is mandatory): see this page for the exact allowed syntax and this file for working examples.

JUnit tests have been added too.

comment:10 Changed 3 years ago by pbaumann

good work! If not yet done, please document on the wiki - in particular the design decision to have ms as resolution lower limit.

comment:11 Changed 3 years ago by pcampalani

Yes, I forgot to mention this has all been documented in PetascopeTimeHandling.

Last edited 3 years ago by pcampalani (previous) (diff)
Note: See TracTickets for help on using tickets.