#397 closed defect (fixed)
rasmgr segfaults when restarting rasserver
Reported by: | Dimitar Misev | Owned by: | mrusu |
---|---|---|---|
Priority: | critical | Milestone: | 8.4.4 |
Component: | rasmgr | Version: | 8.4 |
Keywords: | Cc: | Peter Baumann, abeccati | |
Complexity: | Medium |
Description
This is bug probably introduced by the p2p patch changeset:90fa2688a862f227e7f52b0423041c185f1eed34
Seems to happen when the countdown/timeout is run out and a rasserver is to be restarted. To test run source:systemtest/testcases_petascope/test_wcps it will fail around the 48/49th test if the rasmgr.conf is with default countdown parameters.
Workaround: set countdown and timeout in rasmgr.conf to some large numbers.
Segfault stack trace:
Program received signal SIGSEGV, Segmentation fault. 0x0000000000423811 in MasterComm::askOutpeer (this=0x6c1b80 <masterCommunicator>, peer=1, outmsg=0x7fff64b70a40 "POST peerrequest HTTP/1.1\r\nAccept: text/plain\r\nUserAgent: RasMGR/1.0\r\nAuthorization: ras ras rasguest:8e70a429be359b6dace8b5b2500dedb0\r\nContent-length: 47\r\n\r\nhifi RASBASE RNP ro 2130706689:1369740188"...) at rasmgr_master_nb.cc:877 877 struct hostent *hostinfo = gethostbyname(config.outpeers[peer]); (gdb) bt #0 0x0000000000423811 in MasterComm::askOutpeer (this=0x6c1b80 <masterCommunicator>, peer=1, outmsg=0x7fff64b70a40 "POST peerrequest HTTP/1.1\r\nAccept: text/plain\r\nUserAgent: RasMGR/1.0\r\nAuthorization: ras ras rasguest:8e70a429be359b6dace8b5b2500dedb0\r\nContent-length: 47\r\n\r\nhifi RASBASE RNP ro 2130706689:1369740188"...) at rasmgr_master_nb.cc:877 #1 0x00000000004231fd in MasterComm::getFreeServer (this=0x6c1b80 <masterCommunicator>, fake=false, frompeer=false) at rasmgr_master_nb.cc:802 #2 0x000000000042178f in MasterComm::processRequest (this=0x6c1b80 <masterCommunicator>, currentJob=...) at rasmgr_master_nb.cc:482 #3 0x00000000004206e2 in MasterComm::processJob (this=0x6c1b80 <masterCommunicator>, currentJob=...) at rasmgr_master_nb.cc:214 #4 0x00000000004200ce in MasterComm::Run (this=0x6c1b80 <masterCommunicator>) at rasmgr_master_nb.cc:165 #5 0x000000000040d066 in main (argc=1, argv=0x7fff64b71548, envp=0x7fff64b71558) at rasmgr_main.cc:172
Change History (4)
comment:1 by , 11 years ago
Milestone: | 8.4.3 → 8.4.4 |
---|
comment:2 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:3 by , 11 years ago
comment:4 by , 11 years ago
Yes, hopefully the fix for this ticket solves the other problem as well, as they seem to be both caused by the same bug in rasmgr's new logic.
Note:
See TracTickets
for help on using tickets.
FYI, ticket #395 seems to be affected by this issue as well.