Opened 4 years ago

Closed 3 years ago

#658 closed defect (fixed)

Silent division by zero

Reported by: vliaukevich Owned by: vliaukevich
Priority: minor Milestone: 9.0.x
Component: catalogmgr Version: development
Keywords: Cc: dmisev, pbaumann, mdumitru, vmerticariu
Complexity: Easy

Description

In rasdaman all the division classes (for integer, doubles) in the query
language silently return 0 in case of the division by zero, which contradict SQL standard and common programming languages behavior.

Change History (11)

comment:1 Changed 3 years ago by dmisev

Is this ticket still relevant?

comment:2 Changed 3 years ago by dmisev

  • Milestone set to 9.0.x

Ping

comment:3 Changed 3 years ago by vliaukevich

As far as I remember, there was a discussion about how to handle such cases, but we did not decide anything. For floating-point numbers IEEE-754 specifies the result (infinity or nan), and C++ already implements this, so we can leave the code as it is.

It is not clear what to do in case of integers. 0? Exception? NULL, as null values support was recently added? I personally prefer the latter option.

comment:4 Changed 3 years ago by vliaukevich

  • Cc dmisev pbaumann added

SQL standard defines that both regular division and modulo operation should raise an exception of the divisor is equal to zero (SQL:2011, Part 2: Foundations, pages 273, 280).

I tried to find out how different DBMSs treat division by zero:

DBMS 2/0 2/0.0 mod(2,0)
mysql NULL NULL NULL
sqlite NULL NULL NULL
postgres exception exception exception
MS SQL server exception exception exception
DB2 exception exception 1
oracle exception exception 2

(It is possible to set a flag changing MySQL behavior).

I propose to raise exceptions as well, division by zero is just wrong after all. And let's do the same also for floating-point division operation, different behavior is only going to confuse users.

comment:5 Changed 3 years ago by pbaumann

solid work, thanks for investigating!
I suggest to follow Veranika's line of argument, any objection/opinions?

comment:6 Changed 3 years ago by vliaukevich

  • Cc mdumitru vmerticariu added

Yesterday there was a discussion about consequences of the enabling "throw exceptions on division by zero". The problem is that almost every array has some zero values, so almost every division operation would end with an exception.

Some ideas with their pros and cons:

  1. Use "case when b = 0 then ... else a/b" statement
    • pros: works in the same way for all situations
    • cons:
      1. users need to write this long construction every time
      2. performance problem: the "case" statement need to be executed for every single element of an array
  2. No exceptions, just return 0, NULL, NaN, etc.
    • pros: no exceptions
    • cons:
      1. NaN works only for doubles
      2. NULL values should be defined in order to be used
      3. zero only masks an error of division by zero. Additionally, is there a reason why to return zero, not the maximum/minimum value?
      4. log(0) throws an exception, why division should behave differently in a similar situation?
      5. performance problem: all "if (divisor == 0)" checks make the code inefficiently executable on GPU
  3. Introduce a non-standard extension keyword like "select /*noexc*/ a/b", which tells the engine not to throw exceptions
    • pros: user controls whether to throw exceptions
    • cons: what to return as the result of division? Goto idea 2.
  4. There is a SQL expression NULLIF(arg0, arg1), which returns NULL if both arguments are equal and the first argument otherwise. It could be used to prevent div-by-zero errors: "select a / NULLIF(b, 0)".
    • pros: user controls whether to throw exceptions or return NULL
    • cons: NULL values need to be defined
  5. Introduce a non-standard operator, in which user defines which value to return instead of errors. E.g.: NOEXC(a/b, 0), there the first argument is the expression to evaluate, the second is the expression to return in case of errors.
    • pros:
      1. user controls whether to throw exceptions or what to return otherwise; nulls as return values are still possible
      2. the syntax is much shorter and cleaner than "case when..."
    • cons: possible performance issue: this operator needs to be evaluated for every single cell of an array, but it is a simple "try...catch..."

Discussion of the written above ideas is highly appreciated.

comment:7 Changed 3 years ago by dmisev

python numpy

Behavior on division by zero can be changed using seterr.
..
When both x1 and x2 are of an integer type, divide will return integers and throw away the fractional part. Moreover, division by zero always yields zero in integer arithmetic.

  • so integer division by zero sets 0, floating point corresponds to IEEE 754 which results in +/- inf or nan, for which we don't need to care as C++ implements it already
  • it's similar in Matlab, except Matlab automatically converts to float on integer division and correspondingly returns inf/nan
  • similar in R as far as I could see

From wikipedia on integer division:

Integer division by zero is usually handled differently from floating point since there is no integer representation for the result. Some processors generate an exception when an attempt is made to divide an integer by zero, although others will simply continue and generate an incorrect result for the division. The result depends on how division is implemented, and can either be zero, or sometimes the largest possible integer.

Last edited 3 years ago by dmisev (previous) (diff)

comment:8 Changed 3 years ago by dmisev

I'd vote for this solution:

  1. If there are null values defined, set null value as result
  2. Otherwise
    • float: simply execute a/b in c++, result is +/- inf or nan as defined by IEEE 754
    • integer: set 0, as the default null value in rasdaman
      • this only comes up when using the div operator, as the '/' operator always returns float
  3. Optionally allow to enable exceptions instead in some way
  4. Switch statement can always be used to explicitly change the default behavior defined by 1-2
Last edited 3 years ago by dmisev (previous) (diff)

comment:9 Changed 3 years ago by pbaumann

following discussion, consensus is that:

  1. if:

float: simply execute a/b in c++, result is +/- inf or nan as defined by IEEE 754
integer: throw exception

this only comes up when using the div operator, as the '/' operator always returns float

  1. Switch statement can always be used to explicitly change the default behavior defined by 1-2

comment:10 Changed 3 years ago by vliaukevich

Fixed in changeset:995936 and changeset:d6be96. The documentation still needs to be updated.

comment:11 Changed 3 years ago by pbaumann

  • Resolution set to fixed
  • Status changed from new to closed

documentation done.

Note: See TracTickets for help on using tickets.