Opened 9 years ago
Closed 6 years ago
#1276 closed defect (fixed)
rasserver dies with segfault when configured with a postgres backend
Reported by: | Alex Dumitru | Owned by: | Dimitar Misev |
---|---|---|---|
Priority: | major | Milestone: | 9.7 |
Component: | rasserver | Version: | development |
Keywords: | Cc: | Peter Baumann | |
Complexity: | Hard |
Description
Rasdaman has segfaults after a couple of queries when configured with the postgresql backend.
Logs can be found here: http://codereview.rasdaman.org/jenkins/job/vagrant-exec/ws/logs/build146.tar.gz and a new job is available on jenkins for testing this setup.
Stacktrace:
[INFO] - 24/03/2016 10:50:05.101563: Segmentation fault caught, stacktrace: [INFO] - 24/03/2016 10:50:05.108553: [bt]: (1) /usr/lib/libpq.so.5 (??:0) - PQtransactionStatus+0xe [0x7f237a9a8f6e] [INFO] - 24/03/2016 10:50:05.113760: [bt]: (2) /usr/lib/libecpg.so.6 (??:0) - +0x56ad [0x7f237a68e6ad] [INFO] - 24/03/2016 10:50:05.117092: [bt]: (3) /usr/lib/libecpg.so.6 (??:0) - ECPGdo+0x18d [0x7f237a68f0ad] [INFO] - 24/03/2016 10:50:05.117117: [bt]: (4) /bin/rasserver() [0x5a75cd] [INFO] - 24/03/2016 10:50:05.117120: [bt]: (5) /bin/rasserver() [0x5caef9] [INFO] - 24/03/2016 10:50:05.117123: [bt]: (6) /bin/rasserver() [0x475322] [INFO] - 24/03/2016 10:50:05.117125: [bt]: (7) /bin/rasserver() [0x464b9c] [INFO] - 24/03/2016 10:50:05.117127: [bt]: (8) /bin/rasserver() [0x6380bb] [INFO] - 24/03/2016 10:50:05.117129: [bt]: (9) /bin/rasserver() [0x652d5e] [INFO] - 24/03/2016 10:50:05.117131: [bt]: (10) /bin/rasserver() [0x670172] [INFO] - 24/03/2016 10:50:05.117134: [bt]: (11) /bin/rasserver() [0x6c4422] [INFO] - 24/03/2016 10:50:05.117136: [bt]: (12) /bin/rasserver() [0x6cc832] [INFO] - 24/03/2016 10:50:05.117141: [bt]: (13) /bin/rasserver() [0x6cca23] [INFO] - 24/03/2016 10:50:05.121870: [bt]: (14) /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (??:?) - +0xb1a60 [0x7f2377fa4a60] [INFO] - 24/03/2016 10:50:05.124316: [bt]: (15) /lib/x86_64-linux-gnu/libpthread.so.0 (??:0) - +0x8182 [0x7f2379503182] [INFO] - 24/03/2016 10:50:05.127571: [bt]: (16) /lib/x86_64-linux-gnu/libc.so.6 (??:?) - clone+0x6d [0x7f237770c47d] [INFO] - 24/03/2016 10:50:05.127593: rasserver terminated.
Change History (17)
comment:1 by , 9 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:2 by , 9 years ago
Status: | assigned → accepted |
---|
comment:3 by , 8 years ago
Owner: | changed from | to
---|---|
Status: | accepted → assigned |
comment:4 by , 8 years ago
Milestone: | 9.2 → 9.3 |
---|
comment:5 by , 8 years ago
Milestone: | 9.3 → 9.4 |
---|
comment:6 by , 8 years ago
Owner: | changed from | to
---|
comment:7 by , 8 years ago
Please check how to enable postgres with cmake (and document if it isn't), and then check about the segfault problem.
follow-up: 11 comment:8 by , 8 years ago
Complexity: | Medium → Hard |
---|
Enabling postgres with cmake (v3+) (see http://www.rasdaman.org/wiki/InstallFromSource/cmake):
in your build directory…
cmake …<arguments>… -DDEFAULT_DB=postgresql
make
make install
remark: if your "service" name is postgresql-version#, you will run into some trouble here. in linux, the service name should be postgresql.
Regarding the segfault: this seems to happen in different places, at different times. Usually, this happens for requests which take a longer time. For example, if attaching gdb to rasserver, and running some data ingestion (e.g. insert into <args>), then backtraces of the segfaults will point to the objectbroker files.
An approach towards attacking this ticket in the future: run git bisect with ticket:945's patch set to good. The associated sha1: d302f450a9837388754199cd4a051561e2d08f64
comment:9 by , 7 years ago
Milestone: | 9.4 → 9.5 |
---|
comment:10 by , 7 years ago
Owner: | changed from | to
---|
comment:11 by , 7 years ago
Replying to bbell:
Enabling postgres with cmake (v3+) (see http://www.rasdaman.org/wiki/InstallFromSource/cmake):
in your build directory…
cmake …<arguments>… -DDEFAULT_DB=postgresql
it's actually -DDEFAULT_BASEDB=postgresql
comment:12 by , 7 years ago
Cc: | added |
---|
On further inspection, it seems like the issue only happens with rasnet. Compiling —with-protocol=rnp and postgresql works with no issue. It also works compiled with rasnet but using directql instead of rasql.
The problem is that it segfaults randomly within libpq, in places where you wouldn't expect any segfault. It's very unclear at the moment what the issue is.
comment:13 by , 7 years ago
just a suspicion: pg dies because the stack gets overwritten because some other ptr on the stack gets overwritten because of some memory overwrite beyond boundaries → try valgrind?
comment:14 by , 7 years ago
valgrind is not possible unfortunately as it only happens when the query goes through rasnet.
No issues with directql. But I noticed that it tends to happen starting from the second query, so it's possible that something is not cleaned up after the first query.
comment:15 by , 7 years ago
Milestone: | 9.5 → 9.6 |
---|
comment:16 by , 7 years ago
Milestone: | 9.6 → Future |
---|
comment:17 by , 6 years ago
Milestone: | Future → 9.7 |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
It seems to have been resolved meanwhile, at least on Debian buster I don't get the issue anymore.
Has this been fixed?