Opened 9 years ago

Closed 9 years ago

#1002 closed defect (fixed)

Issues with C++-Api and rasnet

Reported by: Georg Semmler Owned by: George Merticariu
Priority: major Milestone:
Component: rasnet Version: development
Keywords: Cc: Alex Toader
Complexity: Very Hard

Description

Currently it is not possible to use certain parts of the c++-api if rasdaman is compiled with —enable-rasnet.
In particular I notice that the following parts are broken:

  1. The installation is missing a few libraries(librasnet.a and libcommon.a) and header files(especially easylogging++.hh)
  1. The user must initialize the logging framework. It's not a big issue, but it should be mentioned somewhere in the documentation. (Maybe there should be a function for this later?).
  1. r_Partial_Insert is broken with rasnet. The attached code works fine with the old protocol, but "fails" with rasnet. It seems like the collection is created, but there are no data written to it.

Attachments (1)

test.cpp (2.6 KB ) - added by Georg Semmler 9 years ago.

Download all attachments as: .zip

Change History (35)

by Georg Semmler, 9 years ago

Attachment: test.cpp added

comment:1 by Alex Dumitru, 9 years ago

Cc: Alex Toader added
Owner: changed from Alex Dumitru to George Merticariu
Status: newassigned

George / Alex, please check that the attached test.cpp works correctly with the changes to the protocol. It seems a part from the C++ API has not been ported correctly in the case of point 3.

comment:2 by Alex Toader, 9 years ago

We are already working on fixing the issues with the new protocol. A patch should come in the following days.

comment:3 by Georg Semmler, 9 years ago

Any updates/news on this issue?

comment:4 by George Merticariu, 9 years ago

The issue is fixed and will come with a patch today or tomorrow. If it's urgent I can give you a quick fix until the patch is applied.

comment:5 by Peter Baumann, 9 years ago

thanks for the patch. I see lots of jars submitted, are they all listed on the license page? If & when so I can accept the patch.

comment:6 by Peter Baumann, 9 years ago

first, it is an amazing size of code you have produced.
Looking at the exception handlers I find lots of empty method bodies. Does this mean exceptions remain unhandled?

comment:7 by Alex Toader, 9 years ago

In general, we handled all the exceptions that could be handled at a certain level. When an exception could not be handled, the exception information was logged.
There are two cases where you will see a lot of exception handlers:

  1. When an exception is thrown in a thread other than the main thread, the best solution is to catch it and let the thread finish. If the exception is not caught, the process will be terminated. You will see statements of the kind catch(…) in all methods that run in a thread other than the main.
  1. When an exception is thrown as a result of a client operation on the rasmgr or rasserver, the initial exception will be caught, logged, serialized and sent to the client where it is deserialized and processed. The mechanism introduced for doing this will be documented for ticket #1006.

comment:8 by Peter Baumann, 9 years ago

sounds all reasonable to me, except: "catch it and let the thread finish." Can we guarantee that the client will always get notified of the hiccup, or will it (this is what I seem to read) transport an incomplete data item, which could entail, worst-case, data loss?

comment:9 by Alex Toader, 9 years ago

The threads in question do tasks like managing the list of servers and removing servers that have not responded to pings. They do not touch any data relevant to the client. In theory, the management tasks in question are bulletproof.

Assuming that one of these tasks would fail and the exception handlers would not be in place, an exception could be thrown that could kill the process and maybe interrupt an ongoing data operation leading to data loss.
With the exception handlers in place, only the thread will die and any ongoing data operation taking place in the main thread will not be affected.

I am open to suggestions on how to handle this, but the current solution is the best I could come up with.

comment:10 by Peter Baumann, 9 years ago

"In theory, the management tasks in question are bulletproof." hm…not sure what this tells me.
So what does a client see in case of such an exception?

comment:11 by Alex Toader, 9 years ago

These threads are internal to the rasmgr and are an implementation detail. The client is in no way aware of their existence, so he does not see anything if such a thread fails. The database administrator will see the problem reported in the logs.
Is there another behavior expected?

comment:12 by Dimitar Misev, 9 years ago

I guess Peter's question is - is the thread restarted if it fails? I mean either it fails and all is aborted or it completes properly with some retry. Unless what they do is some really useless work and it doesn't matter what the outcome is.

comment:13 by Alex Toader, 9 years ago

Whenever the work done by the thread is relevant the operation is retried.

I think there was a miscommunication problem. It is really difficult to know which of the catch blocks Peter was referring to. Once we put in place a system for code review where proposed patches can be annotated with comments for any affected file at specific lines, we will be able to avoid misunderstandings of this type.

comment:14 by Dimitar Misev, 9 years ago

Agreed, we should look to setup something like that on kahlua.

comment:15 by Peter Baumann, 9 years ago

ok, this sounds good to me. I wanted to clarify that clients don't get invalid answers, but either correct ones or an exception report, and this obviously is the case - fine!

comment:16 by Georg Semmler, 9 years ago

I've tried my little testprogram again with the current git version(1a0dadc04cf6279841a5d5b606888a7beecb8a55).

  • It's telling me: RasManager Error: Access denied, incorrect user/password. I didn't change any user/password, so I'm sure they are correct.
    Using gdb to look into this it seems like the server is responding with: The client with client ID does not exist. (GRPC request at RasnetClientComm::connectClient line 176).
    Trying to execute rasql -q "select r from RAS_COLLELTIONNAMES as r" --out string results in a similar response. You will see this only in gdb, normally rasql says only:
    opening database RASBASE at localhost:7001...
    terminate called after throwing an instance of 'std::runtime_error'
    what(): There is no available server for the client.
  • Think about activating SSL/TLS for grpc Channels.

About the code review thing: Maybe you should have a look at https://www.gerritcodereview.com/.

comment:17 by Alex Toader, 9 years ago

Thanks for the feedback Georg! I will look into this immediately.

For code review, I was also looking into Gerrit. It looks like a good fit.

Version 0, edited 9 years ago by Alex Toader (next)

comment:18 by Alex Dumitru, 9 years ago

I was using this to provide feedback to Bang and the other students: http://git.flanche.com/alex/rasdaman-codereview/commit/6a543dcc7fc63ac3c8da0ac548360b5472647854#43107f46f0a13d8d52e2ecc1f5607d523aa9d962_23_26
It has a nice interface an it takes me about 30s to attach a patch there for review. Dimitar suggested at one point that we install it on kahlua for anybody to use.

I would be fine with Gerrit as well, if it's easy enough to use it with our existing patch system (most of these code review tools require some specific workflow).

comment:19 by Alex Toader, 9 years ago

I did a clean install and tested with the default password and after changing the password. They both work.

If you call
<<rasql -q "select r from RAS_COLLECTIONNAMES as r" —out string>>
The default user(rasguest) and password(rasguest) are sent to rasmgr. This works out of the box.

If you use rascontrol to change the password of the user:
<<change user rasguest -passwd TestPassword>>
You have to specify the password as so:
<<rasql -q "select r from RAS_COLLECTIONNAMES as r" —out string —user rasguest —passwd TestPassword>>

This also works.

Now, assuming you did an upgrade instead of a clean install. It might be that the upgrade of the Google Protobuf package from 2.4 to 3.0 made the rasmgr.auth file that stores the user information unreadable. They say that the protocol is backward compatible, but this might not be 100% true. A solution would be to delete rasmgr.auth from your HOME, restart rasdaman and reconfigure your user information through rascontrol.

I also tested with a different user created through rascontrol and it works.

comment:20 by Georg Semmler, 9 years ago

Trying to do a clean build and install results in a build error:

In file included from databaseif.cc:44:0:
databaseif.cc: In static member function ‘static void DatabaseIf::destroyDB(const char*)’:
sqlglobals.h:59:18: error: unable to find string literal operator ‘operator""view_name’ with ‘const char [21]’, ‘long unsigned int’ arguments
     UPDATE_QUERY("DROP VIEW IF EXISTS "view_name);
                  ^
sqlglobals.h:53:30: note: in definition of macro ‘UPDATE_QUERY’
     sqlite3_exec(sqliteConn, c, 0, 0, 0); \
                              ^
databaseif.cc:371:5: note: in expansion of macro ‘DROP_VIEW’
     DROP_VIEW("RAS_MDDTYPES_VIEW");
     ^
sqlglobals.h:59:18: error: unable to find string literal operator ‘operator""view_name’ with ‘const char [21]’, ‘long unsigned int’ arguments
     UPDATE_QUERY("DROP VIEW IF EXISTS "view_name);

Also as I mention in the last post, I do not think the problem is the authentication(or a authentication file), because the server response says something other. Further more there was no protobuf update for me, because I'm using protobuf 3 for a while now.

comment:21 by Alex Toader, 9 years ago

What OS are you using? What version of GCC? I could not reproduce the error and the code is building.

comment:22 by Georg Semmler, 9 years ago

$gcc --version
gcc (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010
$lsb_release -s -d
Ubuntu 15.10

I've found and fix this problem. Should I open a new issue for the patch?

comment:23 by Alex Toader, 9 years ago

Sure! I will create a VM with Ubuntu 15.10 and try to see if I encounter any problems.
As far as I can see, gcc 4.x is supported http://rasdaman.org/wiki/Platforms and we do not have any testing in place for gcc 5.x.

comment:24 by Alex Toader, 9 years ago

Resolution: fixed
Status: assignedclosed

comment:25 by Georg Semmler, 9 years ago

Resolution: fixed
Status: closedreopened

I compiled everything with gcc 4.9.3. This did not change anything. This is not fixed for me!

in reply to:  25 comment:26 by George Merticariu, 9 years ago

Replying to gsemmler:

I compiled everything with gcc 4.9.3. This did not change anything. This is not fixed for me!

Hi Georg,

I created a fresh installation of Ubuntu 15.10 and compiled rasdaman with gcc 5.2.1. The compilation worked and I just sent a patch which fixed the passing of the configuration object in the rasnetserver.

I also tested your client and everything worked correctly.

Can you please give me more details about what exactly is your issue so I can provide you support and fix the problem as soon as possible?

Thanks,
George

comment:27 by Dimitar Misev, 9 years ago

George, can you test with --enable-strict?

in reply to:  27 comment:28 by George Merticariu, 9 years ago

Replying to dmisev:

George, can you test with --enable-strict?

—enable-strict is not working because we are using generated code which causes the shadow warning. I created a ticket on that (http://rasdaman.org/ticket/1045).

comment:29 by Alex Toader, 9 years ago

Hi Georg,

Given that you opened http://rasdaman.org/ticket/1044, can I assume that installing rasdaman with the new protocol has worked for you? Are you still having issues?

Thanks,
Alex

comment:30 by Georg Semmler, 9 years ago

Hi Alex,

it builds fine, but i cannot start the server. Currently I get


Error in `/opt/rasdaman_rasnet/bin/rasmgr': double free or corruption (out)

Backtrace: http://zero.azapps.de/?3f04e358716630e4#xypwtGM39WtrRuhcKKIXo0xvmKCc4PQhxGm//3WSq1M=

Best
Georg

comment:31 by Alex Toader, 9 years ago

Hi Georg,

Could you tell me the following:

  1. What version of Boost are you using?
  2. What version of GCC are you using?
  3. What Linux distro are you using? Is it still 15.10?
  4. What were the configuration parameters you used to build rasdaman?

It looks like the error is from inside the class that parses rascontrol commands, but I've never seen a double free in rasmgr in the standard installation. I hope I can reproduce it.

Thanks,
Alex

comment:32 by Georg Semmler, 9 years ago

Hi Alex,


$ grep "#define BOOST_VERSION" /usr/include/boost/version.hpp
#define BOOST_VERSION_HPP
#define BOOST_VERSION 105800
$ gcc —version
gcc (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010
Copyright © 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ./configure —with-default-basedb=sqlite —with-protoc=~/build_dir/grpc/bins/opt/protobuf/protoc —disable-java —enable-rasnet —with-grpc-java-plugin=/home/gsemmler1/build_dir/grpc-java/compiler/build/binaries/java_pluginExecutable/protoc-gen-grpc-java —prefix=/opt/rasdaman_rasnet/

GCC and boost are installed from the package manager. Distro ist still ubuntu 15.10.
Protobuf and grpc are selfbuild, but I'm sure they are fine(I use them also in a other project an there is no such issue.)
This was a clean build (I even recloned the git) and a new install dir.

Best
Georg

comment:33 by Alex Toader, 9 years ago

Unfortunately, I was not successful in reproducing the error.
I created a VM with Ubuntu 15.10, installed gcc and boost from the repo together with rasdaman's other dependencies, installed grpc and the grpc-java compiler from source as instructed on the wiki.

I configured rasdaman with the following command:
./configure —with-default-basedb=sqlite —with-protoc=/usr/local/bin/protoc —disable-java —enable-rasnet —with-grpc-java-plugin=/home/rasdaman/Programs/src/grpc-java/compiler/build/binaries/java_pluginExecutable/protoc-gen-grpc-java —prefix /opt/rasdaman_rasnet

The only difference that might be relevant is the placement of protoc. I installed it in the default location. I will try to install it in another place tomorrow and see what happens.

comment:34 by Alex Dumitru, 9 years ago

Resolution: fixed
Status: reopenedclosed

It seems we can't reproduce the last error, the rest seem to be fixed. Georg, if it still persist on your machine, please open a new ticket.

Note: See TracTickets for help on using tickets.