Opened 9 years ago
Closed 9 years ago
#1002 closed defect (fixed)
Issues with C++-Api and rasnet
Reported by: | Georg Semmler | Owned by: | George Merticariu |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | rasnet | Version: | development |
Keywords: | Cc: | Alex Toader | |
Complexity: | Very Hard |
Description
Currently it is not possible to use certain parts of the c++-api if rasdaman is compiled with —enable-rasnet.
In particular I notice that the following parts are broken:
- The installation is missing a few libraries(librasnet.a and libcommon.a) and header files(especially easylogging++.hh)
- The user must initialize the logging framework. It's not a big issue, but it should be mentioned somewhere in the documentation. (Maybe there should be a function for this later?).
- r_Partial_Insert is broken with rasnet. The attached code works fine with the old protocol, but "fails" with rasnet. It seems like the collection is created, but there are no data written to it.
Attachments (1)
Change History (35)
by , 9 years ago
comment:1 by , 9 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Status: | new → assigned |
comment:2 by , 9 years ago
We are already working on fixing the issues with the new protocol. A patch should come in the following days.
comment:4 by , 9 years ago
The issue is fixed and will come with a patch today or tomorrow. If it's urgent I can give you a quick fix until the patch is applied.
comment:5 by , 9 years ago
thanks for the patch. I see lots of jars submitted, are they all listed on the license page? If & when so I can accept the patch.
comment:6 by , 9 years ago
first, it is an amazing size of code you have produced.
Looking at the exception handlers I find lots of empty method bodies. Does this mean exceptions remain unhandled?
comment:7 by , 9 years ago
In general, we handled all the exceptions that could be handled at a certain level. When an exception could not be handled, the exception information was logged.
There are two cases where you will see a lot of exception handlers:
- When an exception is thrown in a thread other than the main thread, the best solution is to catch it and let the thread finish. If the exception is not caught, the process will be terminated. You will see statements of the kind catch(…) in all methods that run in a thread other than the main.
- When an exception is thrown as a result of a client operation on the rasmgr or rasserver, the initial exception will be caught, logged, serialized and sent to the client where it is deserialized and processed. The mechanism introduced for doing this will be documented for ticket #1006.
comment:8 by , 9 years ago
sounds all reasonable to me, except: "catch it and let the thread finish." Can we guarantee that the client will always get notified of the hiccup, or will it (this is what I seem to read) transport an incomplete data item, which could entail, worst-case, data loss?
comment:9 by , 9 years ago
The threads in question do tasks like managing the list of servers and removing servers that have not responded to pings. They do not touch any data relevant to the client. In theory, the management tasks in question are bulletproof.
Assuming that one of these tasks would fail and the exception handlers would not be in place, an exception could be thrown that could kill the process and maybe interrupt an ongoing data operation leading to data loss.
With the exception handlers in place, only the thread will die and any ongoing data operation taking place in the main thread will not be affected.
I am open to suggestions on how to handle this, but the current solution is the best I could come up with.
comment:10 by , 9 years ago
"In theory, the management tasks in question are bulletproof." hm…not sure what this tells me.
So what does a client see in case of such an exception?
comment:11 by , 9 years ago
These threads are internal to the rasmgr and are an implementation detail. The client is in no way aware of their existence, so he does not see anything if such a thread fails. The database administrator will see the problem reported in the logs.
Is there another behavior expected?
comment:12 by , 9 years ago
I guess Peter's question is - is the thread restarted if it fails? I mean either it fails and all is aborted or it completes properly with some retry. Unless what they do is some really useless work and it doesn't matter what the outcome is.
comment:13 by , 9 years ago
Whenever the work done by the thread is relevant the operation is retried.
I think there was a miscommunication problem. It is really difficult to know which of the catch blocks Peter was referring to. Once we put in place a system for code review where proposed patches can be annotated with comments for any affected file at specific lines, we will be able to avoid misunderstandings of this type.
comment:15 by , 9 years ago
ok, this sounds good to me. I wanted to clarify that clients don't get invalid answers, but either correct ones or an exception report, and this obviously is the case - fine!
comment:16 by , 9 years ago
I've tried my little testprogram again with the current git version(1a0dadc04cf6279841a5d5b606888a7beecb8a55).
- It's telling me:
RasManager Error: Access denied, incorrect user/password
. I didn't change any user/password, so I'm sure they are correct.
Using gdb to look into this it seems like the server is responding with:The client with client ID does not exist
. (GRPC request at RasnetClientComm::connectClient line 176).
Trying to executerasql -q "select r from RAS_COLLELTIONNAMES as r" --out string
results in a similar response. You will see this only in gdb, normally rasql says only:opening database RASBASE at localhost:7001...
terminate called after throwing an instance of 'std::runtime_error'
what(): There is no available server for the client.
- If there is no server running the client is trying to reconnect in a endless loop. It's a grpc internal thing, which should be adjusted. Maybe you should only try to reconnect a few times and then throw a error. You can implement this on channel level with WaitForStateChange. See https://github.com/grpc/grpc/blob/master/doc/connectivity-semantics-and-api.md
- Think about activating SSL/TLS for grpc Channels.
About the code review thing: Maybe you should have a look at https://www.gerritcodereview.com/.
comment:17 by , 9 years ago
Thanks for the feedback Georg! I will look into this immediately.
For code review, I was also looking into Gerrit. It looks like a good fit.
comment:18 by , 9 years ago
I was using this to provide feedback to Bang and the other students: http://git.flanche.com/alex/rasdaman-codereview/commit/6a543dcc7fc63ac3c8da0ac548360b5472647854#43107f46f0a13d8d52e2ecc1f5607d523aa9d962_23_26
It has a nice interface an it takes me about 30s to attach a patch there for review. Dimitar suggested at one point that we install it on kahlua for anybody to use.
I would be fine with Gerrit as well, if it's easy enough to use it with our existing patch system (most of these code review tools require some specific workflow).
comment:19 by , 9 years ago
I did a clean install and tested with the default password and after changing the password. They both work.
If you call
<<rasql -q "select r from RAS_COLLECTIONNAMES as r" —out string>>
The default user(rasguest) and password(rasguest) are sent to rasmgr. This works out of the box.
If you use rascontrol to change the password of the user:
<<change user rasguest -passwd TestPassword>>
You have to specify the password as so:
<<rasql -q "select r from RAS_COLLECTIONNAMES as r" —out string —user rasguest —passwd TestPassword>>
This also works.
Now, assuming you did an upgrade instead of a clean install. It might be that the upgrade of the Google Protobuf package from 2.4 to 3.0 made the rasmgr.auth file that stores the user information unreadable. They say that the protocol is backward compatible, but this might not be 100% true. A solution would be to delete rasmgr.auth from your HOME, restart rasdaman and reconfigure your user information through rascontrol.
I also tested with a different user created through rascontrol and it works.
comment:20 by , 9 years ago
Trying to do a clean build and install results in a build error:
In file included from databaseif.cc:44:0: databaseif.cc: In static member function ‘static void DatabaseIf::destroyDB(const char*)’: sqlglobals.h:59:18: error: unable to find string literal operator ‘operator""view_name’ with ‘const char [21]’, ‘long unsigned int’ arguments UPDATE_QUERY("DROP VIEW IF EXISTS "view_name); ^ sqlglobals.h:53:30: note: in definition of macro ‘UPDATE_QUERY’ sqlite3_exec(sqliteConn, c, 0, 0, 0); \ ^ databaseif.cc:371:5: note: in expansion of macro ‘DROP_VIEW’ DROP_VIEW("RAS_MDDTYPES_VIEW"); ^ sqlglobals.h:59:18: error: unable to find string literal operator ‘operator""view_name’ with ‘const char [21]’, ‘long unsigned int’ arguments UPDATE_QUERY("DROP VIEW IF EXISTS "view_name);
Also as I mention in the last post, I do not think the problem is the authentication(or a authentication file), because the server response says something other. Further more there was no protobuf update for me, because I'm using protobuf 3 for a while now.
comment:21 by , 9 years ago
What OS are you using? What version of GCC? I could not reproduce the error and the code is building.
comment:22 by , 9 years ago
$gcc --version gcc (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010 $lsb_release -s -d Ubuntu 15.10
I've found and fix this problem. Should I open a new issue for the patch?
comment:23 by , 9 years ago
Sure! I will create a VM with Ubuntu 15.10 and try to see if I encounter any problems.
As far as I can see, gcc 4.x is supported http://rasdaman.org/wiki/Platforms and we do not have any testing in place for gcc 5.x.
comment:24 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
follow-up: 26 comment:25 by , 9 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
I compiled everything with gcc 4.9.3. This did not change anything. This is not fixed for me!
comment:26 by , 9 years ago
Replying to gsemmler:
I compiled everything with gcc 4.9.3. This did not change anything. This is not fixed for me!
Hi Georg,
I created a fresh installation of Ubuntu 15.10 and compiled rasdaman with gcc 5.2.1. The compilation worked and I just sent a patch which fixed the passing of the configuration object in the rasnetserver.
I also tested your client and everything worked correctly.
Can you please give me more details about what exactly is your issue so I can provide you support and fix the problem as soon as possible?
Thanks,
George
comment:28 by , 9 years ago
Replying to dmisev:
George, can you test with
--enable-strict
?
—enable-strict is not working because we are using generated code which causes the shadow warning. I created a ticket on that (http://rasdaman.org/ticket/1045).
comment:29 by , 9 years ago
Hi Georg,
Given that you opened http://rasdaman.org/ticket/1044, can I assume that installing rasdaman with the new protocol has worked for you? Are you still having issues?
Thanks,
Alex
comment:30 by , 9 years ago
Hi Alex,
it builds fine, but i cannot start the server. Currently I get
Error in `/opt/rasdaman_rasnet/bin/rasmgr': double free or corruption (out)
Backtrace: http://zero.azapps.de/?3f04e358716630e4#xypwtGM39WtrRuhcKKIXo0xvmKCc4PQhxGm//3WSq1M=
Best
Georg
comment:31 by , 9 years ago
Hi Georg,
Could you tell me the following:
- What version of Boost are you using?
- What version of GCC are you using?
- What Linux distro are you using? Is it still 15.10?
- What were the configuration parameters you used to build rasdaman?
It looks like the error is from inside the class that parses rascontrol commands, but I've never seen a double free in rasmgr in the standard installation. I hope I can reproduce it.
Thanks,
Alex
comment:32 by , 9 years ago
Hi Alex,
$ grep "#define BOOST_VERSION" /usr/include/boost/version.hpp
#define BOOST_VERSION_HPP
#define BOOST_VERSION 105800
$ gcc —version
gcc (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010
Copyright © 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ./configure —with-default-basedb=sqlite —with-protoc=~/build_dir/grpc/bins/opt/protobuf/protoc —disable-java —enable-rasnet —with-grpc-java-plugin=/home/gsemmler1/build_dir/grpc-java/compiler/build/binaries/java_pluginExecutable/protoc-gen-grpc-java —prefix=/opt/rasdaman_rasnet/
GCC and boost are installed from the package manager. Distro ist still ubuntu 15.10.
Protobuf and grpc are selfbuild, but I'm sure they are fine(I use them also in a other project an there is no such issue.)
This was a clean build (I even recloned the git) and a new install dir.
Best
Georg
comment:33 by , 9 years ago
Unfortunately, I was not successful in reproducing the error.
I created a VM with Ubuntu 15.10, installed gcc and boost from the repo together with rasdaman's other dependencies, installed grpc and the grpc-java compiler from source as instructed on the wiki.
I configured rasdaman with the following command:
./configure —with-default-basedb=sqlite —with-protoc=/usr/local/bin/protoc —disable-java —enable-rasnet —with-grpc-java-plugin=/home/rasdaman/Programs/src/grpc-java/compiler/build/binaries/java_pluginExecutable/protoc-gen-grpc-java —prefix /opt/rasdaman_rasnet
The only difference that might be relevant is the placement of protoc. I installed it in the default location. I will try to install it in another place tomorrow and see what happens.
comment:34 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
It seems we can't reproduce the last error, the rest seem to be fixed. Georg, if it still persist on your machine, please open a new ticket.
George / Alex, please check that the attached test.cpp works correctly with the changes to the protocol. It seems a part from the C++ API has not been ported correctly in the case of point 3.