#325 closed defect (fixed)
Rasdaman not cleaning up after segmentation fault
Reported by: | Heinrich Stamerjohanns | Owned by: | Dimitar Misev |
---|---|---|---|
Priority: | major | Milestone: | Future |
Component: | rasmgr | Version: | 8.4 |
Keywords: | Cc: | Peter Baumann | |
Complexity: | Medium |
Description
When an insert leads to segmentation fault (another issue which will be handled in seperate ticket), it is not possible to continue.
Following error message appears: rasdaman error 806: RasManager? Error: Write transaction in progress, please retry again later.
rasql should be able to handle the error and at least abort the transaction.
One might want to consider libsigsegv to handle such problems.
Change History (16)
comment:1 by , 11 years ago
Cc: | added |
---|---|
Milestone: | → Future |
Status: | new → accepted |
comment:2 by , 11 years ago
I submitted a patch that uses libsigsegv to catch and handle segfaults in rasql, rasimport and raserase.
libsigsegv-dev is not a mandatory requirement for compiling rasdaman, I made it optional via appropriate #ifdefs, but I will include it in our short installation guide and on the wiki.
comment:3 by , 11 years ago
this sounds like an extremely useful feature - anything speaking against making it a regular requirement?
comment:4 by , 11 years ago
No I don't think so, from what I could see it's a pretty standard library in all Linuxes. I can make it a regular requirement in a follow patch.
comment:5 by , 11 years ago
I had another idea as well — to catch segfaults in rasserver, to notify rasmgr the the server has failed so that it can retry the query evaluation once again with another rasserver. Not sure if it's easily possible, but sometimes retrying the query after a server segfault is successful.
I do this retry mechanism in the import scripts, but it would be nicer if it's integrated into the server.
comment:6 by , 11 years ago
nice idea indeed! But we should try it only once IMHO - retrying assumes that the abort is not associated with the query as such, but with server state (such as mem leaks), in which case trying another server makes sense. As servers will have individual states anyway, after the 2nd segfault we know it's meant to be that way.
comment:8 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | accepted → assigned |
Ok reassigning to Nikolce to look at this when he has the time. To summarize, it would be ideal to catch segfaults in rasserver (source:server/rasserver_main.cc, for examples see how I have done it in this patch) and
- print stacktrace of where the segfault happened (as done in gdb for example)
- notify rasmgr of the segfault, so that it can retry the query once more (but not more than one retry per query)
comment:11 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:12 by , 9 years ago
I still have this problem with Rasdaman version 9.1
When I tried to retest test cases in 'test_wcps' or 'test_wcs' of Rasdaman / systemtest. It will happend.
This only happend yesterday, before that never seen this problem.
test.sh: starting test at Thu Sep 10 08:36:19 CEST 2015 test.sh: test.sh: Testing service: wcs rasdaman error 206: Serialisable exception r_Ebase_dbms: error in base DBMS. rasdaman error 206: Serialisable exception r_Ebase_dbms: error in base DBMS. test.sh: deleting coverage rgb from petascope... no such coverage found. test.sh: done. test.sh: importing rgb... rasdaman error 206: Serialisable exception r_Ebase_dbms: error in base DBMS. test.sh: failed, repeating 1... rasdaman error 806: RasManager Error: Write transaction in progress, please retry again later. test.sh: failed, repeating 2... rasdaman error 806: RasManager Error: Write transaction in progress, please retry again later.
comment:13 by , 9 years ago
Hi Bang, your problem is not related to this ticket, please look in the rasdaman logs for more information on the rasdaman error 206: Serialisable exception r_Ebase_dbms: error in base DBMS.
comment:14 by , 9 years ago
Hi Dimitar,
I agree that is not belong to "Segment fault" because when I tried restart computer, it also happended again. So exactly, it is error from RASBASE. I had to remove the RASBASE and create it again, also with Petascopedb (I've done it yesterday before your replying). After that, it could import data normally.
Thanks,
comment:15 by , 9 years ago
But what was the problem in RASBASE? Try to check in the logs next time for more details, so in case it's a bug we need to solve it.
comment:16 by , 9 years ago
Hi Dimitar,
I could see the log file (in $RMANHOME/log/), you could see below and try to analyse what is the cause of problem. It looks like it could not query the sqlite_master table in RASBASE when it is locked (may be some other process is modifying data).
10/09/2015 09:30:31.565 INFO ok Initializing control connections...informing rasmgr: server available...ok Initializing job control...setting timeout to 300 secs...connecting to base DBMS...10/09/2015 09:30:31.566 INFO Connecting to /home/rasdaman/install/data/RASBASE 10/09/2015 09:30:31.566 FATAL SQL query failed: SELECT name FROM sqlite_master WHERE type='table' AND name='RAS_COUNTERS' 10/09/2015 09:30:31.566 FATAL Database error, code: 5, message: database is locked 10/09/2015 09:30:31.566 ERROR Error: encountered 206: Error in base DBMS, error number: 5 database is locked 10/09/2015 09:30:31.566 INFO rasserver terminated.
That's a very good idea, I didn't know about libsigsegv.