Opened 15 years ago
Closed 12 years ago
#38 closed defect (fixed)
RasJ reports wrong query result
Reported by: | (none) | Owned by: | Heinrich Stamerjohanns |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | java | Version: | 8.1 |
Keywords: | Cc: | Peter Baumann, Heinrich Stamerjohanns | |
Complexity: | Medium |
Description
When query
select add_cells(i[5:5]) from NN3_1 AS i
is executed via RasJ library, the following wrong result is returned:
result=-2.292137065441767E273
but the correct result is (obtained via rasql): 7835
The corresponding rasql invocation is:
./rasql -q "select add_cells(i[5:5]) from NN3_1 AS i" —out string
Attachments (2)
Change History (32)
comment:1 by , 15 years ago
Owner: | set to |
---|---|
Priority: | minor → critical |
Status: | new → assigned |
comment:2 by , 14 years ago
Owner: | changed from | to
---|
comment:3 by , 14 years ago
Cc: | added |
---|---|
Status: | assigned → accepted |
I don't have the NN3_1 collection, is there any way to get it in order to test this?
comment:4 by , 13 years ago
Owner: | changed from | to
---|---|
Status: | accepted → assigned |
comment:5 by , 13 years ago
Status: | assigned → accepted |
---|
comment:6 by , 13 years ago
Priority: | critical → major |
---|
comment:7 by , 13 years ago
Reporter: | removed |
---|
Some stuff that may be helpful with this: http://darksleep.com/player/JavaAndUnsignedTypes.html
And I think we need some way to deal with #55, this workaround is not very pretty: wiki:FAQ#Overflowwhenunsignedbytesareretrievedwithrasj
by , 12 years ago
Attachment: | QueryExample.java added |
---|
comment:9 by , 12 years ago
Cc: | added |
---|
I can't reproduce this bug, tried with attachment:QueryExample.java
To run the test, dump the attachment in source:java/src/examples and then
cd java make examples cd bin java examples.QueryExample
The result seems fine on Debian Wheezy:
log> Executing query: select add_cells(<[0:2] 1.042323, 1.12, 1.8>) from mr2 as c *** Result is: 3.962322950363159
Heinrich, can you run this example on Suse/CentOS?
Jelmer, can you please post the WCPS query and the corresponding rasql query (you can find it in the tomcat log).
comment:10 by , 12 years ago
Ok I reproduced it on Jelmer's planetserver, which is 64bit CentOS release 6.3 (Final):
log> Executing query: select add_cells(<[0:2] 1.042323, 1.12, 1.8>) from NIR as c *** Result is: -2.9789158704612464E-217
And there I get the same result for rasql too:
$ rasql -q 'select add_cells(<[0:2] 1.042323, 1.12, 1.8>) from NIR as c' --out string Result element 1: 2.42774e-317
So it seems like a server problem, not particular to Java.
comment:11 by , 12 years ago
Actually the result from java and rasql were not the same, they seem to be rather arbitrary.
Running on CentOS 6.3 on my machine now yields these results:
Java:
log> Executing query: select add_cells(<[0:2] 1.042323, 1.12, 1.8>) from mr2 as c *** Result is: 3.9741829939327397E-140
rasql:
$ rasql -q 'select add_cells(<[0:2] 1.042323, 1.12, 1.8>) from mr as c' --out string Result element 1: 3.96232
So I do not have problems via rasql.
comment:12 by , 12 years ago
Turning around the double in QueryExample.java actually fixes the problem.
log> Executing query: select add_cells(<[0:2] 1.042323, 1.12, 1.8>) from mr2 as c *** Result is: 3.962322950363159
Still this does not explain the weird values in rasql. I guess there are some implicit conversions that take place?
Double res = (Double) iter.next(); long v = Double.doubleToLongBits(res); long v2 = (((v>>56) & 0xffL) << 0) | (((v>>48) & 0xffL) << 8) | (((v>>40) & 0xffL) << 16) | (((v>>32) & 0xffL) << 24) | (((v>>24) & 0xffL) << 32) | (((v>>16) & 0xffL) << 40) | (((v>>8) & 0xffL) << 48) | (((v>>0) & 0xffL) << 56); res = Double.longBitsToDouble(v2); System.out.println("*** Result is: " + res);
follow-up: 14 comment:13 by , 12 years ago
On planetserver I'm getting some pretty random results.. I'll try with the latest rasdaman, instead of the one from the RPMs.
$ rasql -q 'select add_cells(<[0:2] 1, 1, 1>) from NIR as c' --out string rasql: rasdaman query tool v1.0, rasdaman v8 -- generated on 11.07.2012 18:29:26. opening database RASBASE at localhost:7001...ok Executing retrieval query...ok Query result collection has 1 element(s): Result element 1: 50331648 rasql done.
follow-up: 15 comment:14 by , 12 years ago
Query result collection has 1 element(s):
Result element 1: 50331648
That is actually 00000011 00000000 00000000 00000000.
So it as also just switched around. But what I do not understand is that on some machines
the rasql client seems to behave differently.
comment:15 by , 12 years ago
the rasql client seems to behave differently.
…because I get 3 as a result, when I do the same query on CentOS 6.3 on my machine.
comment:16 by , 12 years ago
Seems like the VMs are a special case, if I'm not wrong on all the real machines we've tested it has worked fine?
comment:18 by , 12 years ago
Replying to dmisev:
Or maybe it's just a CentOS case..
I am running CentOS also in a VM (VirtualBox), everything seems to be fine there.
Or have there been any changes within the last two months that could affect this?
follow-up: 20 comment:19 by , 12 years ago
Wow, I restarted rasdaman on planetserver and now it gives 3, so rasql is (somewhat) fine.
The java example still gives the same result though.
comment:20 by , 12 years ago
Replying to dmisev:
Wow, I restarted rasdaman on planetserver and now it gives 3, so rasql is (somewhat) fine.
The java example still gives the same result though.
At least it seems to consistently not work. You can try the fix above. Then we
need to think how this can be solved in a cleaner way.
comment:21 by , 12 years ago
Yes but how can we determine when the bytes are actually in the wrong order?
follow-up: 23 comment:22 by , 12 years ago
I've hopefully narrowed down the problem to the getResponse
method in source:java/src/rasj/rnp/RasRNPImplementation.java , something must be going wrong in there, probably in the case RESPONSE_SKALARS:
comment:23 by , 12 years ago
Replying to dmisev:
I've hopefully narrowed down the problem to the
getResponse
method in source:java/src/rasj/rnp/RasRNPImplementation.java , something must be going wrong in there, probably in thecase RESPONSE_SKALARS:
That was helpful…
but the byte array that arrives is actually swapped on CentOS 6.3 (64-bit) and openSUSE 12.2 (32-bit), so everything seems ok there. I might still be a problem of the server…
comment:24 by , 12 years ago
Owner: | changed from | to
---|
Actually the problem lies in servercomm/servercomm2.c, when the code tries to swap values
for requests with different endianness (typical java queries). I assume that older gcc-versions
choke how buffer is being reassigned.
To solve the problem, buffer is being assigned vie memcpy. Also gcc-lib directly supports hardware optimized byte-swapping, so use those functions instead.
The patch (see attachment) fixes the cases for floats and doubles, other types remain to be investigated.
comment:25 by , 12 years ago
Great, I'm glad you found the problem. I looked a bit around httpserver.cc and servercomm2.cc and it seems pretty messy, with many comments that indicate hacks in the code..
Just one detail about the patch, maybe it's better to fix this in r_Endian
(since this is the class that deals with endianness stuff) and then call an appropriate function?
follow-up: 27 comment:26 by , 12 years ago
Actually I would rather want to remove the functions that byteswap the doubles and floats in r_Endian (or fix their return values) because they will create all subtle problems. Currently these functions try to store a byteswapped double (or float) into the same type again. The problem is that the values might not be doubles any more, because you may not store arbitrary bit patterns in doubles. If the byteswapped value is stored in a FPU register, this might create all kind of possible problems (normalizing, or creating a floating point exception or creating NaN values). So I do not think that using r_Endian as it is currently is the right way to go.
So I think it is ok to du the conversion directly there (it is just two lines). I do agree that
both files need cleanup though.
comment:27 by , 12 years ago
Replying to hstamerjohanns:
Actually I would rather want to remove the functions that byteswap the doubles and floats in r_Endian (or fix their return values) because they will create all subtle problems
yes this was my idea, either fix the current functions or put those two lines in new functions in r_Endian?
I do agree that both files need cleanup though.
I opened ticket #211 so eventually we'll take care about it.
I'll test your fix on planetserver, but I'm not convinced it will help because the wrong value doesn't seem to be connected to endianness there..
follow-up: 29 comment:28 by , 12 years ago
Actually it did fix it!! So we can close this ticket and #195 as well, once you submit the patch Heinrich.
comment:29 by , 12 years ago
Replying to dmisev:
Actually it did fix it!! So we can close this ticket and #195 as well, once you submit the patch Heinrich.
Yes, the java queries ask for big-endian and want big-endian back (see RnpMessageHeader?.java:64), therefore the data needs to be converted. I have checked this in all detail that the codepath is taken and QueryExample?.java now returns the right result.
By the way I changed
Double res = (Double) iter.next();
to
Object res = iter.next();
then you can also test float and integer queries.
comment:30 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
it's a problem with endianess. Still have to inverstigate if that's a rasdaman server problem.