wiki:ProjectIdeas

Version 5 (modified by abeccati, 10 years ago) ( diff )

query cancellation details

Project ideas

This page collects ideas for possible student projects.

Dynamic retiling of arrays

  • Mentors: dmisev
  • Languages: C/C++, a bit of parsing (Flex/Bison)
  • Overview:

Multidimensional arrays in rasdaman are broken-down and stored as tiles (more info on tiling). At the moment the tiling is static and can be only specified when the data is initially inserted into rasdaman. Furthermore, the tiling is global to the array, i.e. it is not possible to tile one part of the array using one strategy, and another with a different strategy.

The idea is to make tiling dynamic, so that an array or even parts of the array can be re-tiled at any time with a different tiling strategy. This could be part of the UPDATE statement in rasql:

update collName as c
set c[domainToBeTiled]
tiling ...

Once this is achieved, further enhancement would be to make the server self-aware of data access patterns. Rasdaman would keep statistics and learn what's the best tiling based on the queries that are typically run, and re-tile the data accordingly.

Ticket: #312

Web interface for loading geo-spatial data

  • Mentors: abeccati
  • Languages: Java, possibly some C/C++
  • Overview:

rasdaman is a domain-independed array DBMS, but via it's petascope component it becomes a geo-spatial server with support for various OGC standards — WCS, WCPS, WCS-T. Currently data can be loaded into rasdaman in two ways:

  1. By using the rasql language, e.g. via the rasql client tool. This does not provide support for inserting geo-spatial information, which needs to be done manually with SQL scripts.
  2. By using rasimport, a client tool based on GDAL which can extract geo-information from the input files and automatically insert it into petascope, along with the data into rasdaman.

WCS-T (an OGC standard for updating coverages) is on the agenda for implementation in petascope. A prerequisite for WCS-T implementation is enabling petascope (a Java servlet) to insert coverages itself. There are two possibilities for achieving this:

  1. Use native system calls to the rasimport command-line tool, which would be very hackish and potentially dangerous.
  2. Implement it directly in petascope by using the Java bindings for GDAL, and provide a generic interface that can be used by WCS-T implementation or further clients.

Further work would then include

  • Implementing WCS-T
  • Implementing a web application (an admin page in petascope), that allows to easily load data by selecting files and filling in metadata details (in case it's missing from the files themselves and GDAL can't figure it out).

Ticket: #312, related ticket #292

Overlays of Geometries

  • Mentors: pcampalani
  • Languages: Java, SQL
  • Overview:

At the moment Petascope works with aligned coverages, i.e. coverages with axes parallel to the external CRS dimensions. Enabling rotations and curvilinear grid topologies will add computational burden on the domain2grid conversion. The task is to implement optimized algorithms to identify which cells are to be extracted.

(Web) Graphical query builder

  • Mentors: dmisev
  • Languages: HTML, JavaScript, possibly JSP or PHP
  • Overview:

New users typically have difficulties initially with writing WCPS or rasql queries. Such users would benefit a lot from a graphical user interface (preferably web-based) that allows them to build queries and extract information from the database without knowing much about the query languages. Some examples in the SQL world about such tools are:

This could possibly be done by taking an existing open-source SQL builder, and adapting it to rasql/WCPS.

JDBC (or ODBC) driver for rasdaman

  • Mentors: dmisev
  • Languages: Java or C/C++
  • Overview:

rasql is modeled closely by SQL, so it would not be too hard to implement either a JDBC (java) or an ODBC © driver for rasdaman. These drivers specify common interfaces for accessing relational databases, and many applications have plugins for connecting to databases via JDBC/ODBC drivers, like Matlab, R, etc. This would enable such applications to access data stored in rasdaman.

GeoServer and Rasdaman Integration

  • Mentors: abeccati, pcampalani + discuss and work with GeoServer community
  • Languages: Java
  • Overview:

Establish prototype to demonstrate that GeoServer can exploit Rasdaman as a native raster data source

R and Rasdaman Integration

  • Mentors: pcampalani
  • Languages: Java
  • Overview:

Establish prototype to demonstrate that R can exploit Rasdaman as a native raster data source.

Refactor and complete regression tests

  • Mentors: abeccati, dmisev, jyu
  • Languages: Bash, possibly Python. Some C/C++ and Java for the unit tests.
  • Overview:

A robust and extensive regression test suite is essential for building and maintaining stable software. Rasdaman has an integration test framework as well as various unit tests. Unfortunately they are rather incomplete and outdated, and should be improved in several aspects:

  • Usability - tests should be easy to run by both developers and users, with zero or minimal configuration required. Furthermore, test results should be easy to read and provide as much information as possible about the failing cases.
  • Portability - different platforms and systems must not affect the results of the tests.
  • Extensibility - adding new tests should be simple and straightforward, so that developers are not turned off from the idea of adding new tests.
  • Coverage - tests should cover as much functionality as possible. E.g. the tiling and indexes options are not tested at all. Many "unit" tests are legacy code and not regression tests.
  • Modularity - rasdaman is comprised of many different components: database server, web servers, clients (command-line, GUI, JavaScript, …), resolvers, etc. The test suite should therefore be modular and adaptive to the current environment setup.
  • Documentation - on how to run the tests and add new tests.

Tickets: #363

Cancel/abort running query execution

  • Mentors: abeccati
  • Languages: C/C++ and Java (servlets)
  • Overview:

Spawning from audience request at FOSS4G 2013 Long running queries can be sent by mistake to the system, then execution might need to be cancelled. From web-client to server process and db a system for cancelling running queries should be established. Since there is tile streaming a "still valid request" check could be added to stop a large query if needed or process signalling could be employed. The relevance of this task extends to any interactive system providing access to large datasets, where there is potential for an unexpectedly long synchronous execution time. This project consists of:

  • A preliminary review of existing literature
  • A review of practical cases and approaches of existing systems
  • A code review of the rasdaman system (from OGC interface to array engine) to identify processing chain and candidate points of interruption
  • Devise and prototype an effective solution allowing cancellation of standing queries
  • Optionally, estimation metrics for expected array processing time can be investigated
Note: See TracWiki for help on using the wiki.