3 | | This can be implemented in a straightforward way, so that dir_id = tile_id / files_no_per_dir. We should check what could be the best parameters here to fit various common filesystems. |
| 3 | Currently all data is stored in a single directory $RASDATA, i.e. we have |
| 4 | |
| 5 | {{{ |
| 6 | $RASDATA |
| 7 | |_ RASBASE |
| 8 | |_ 1 |
| 9 | |_ 2 |
| 10 | |_ 3 |
| 11 | |_ .. |
| 12 | }}} |
| 13 | |
| 14 | = Proposed restructuring = |
| 15 | |
| 16 | {{{ |
| 17 | $RASDATA |
| 18 | |_ RASBASE |
| 19 | |_ TILES |
| 20 | |_ .. |
| 21 | }}} |
| 22 | |
| 23 | How should TILES be structured? Maximum number of subdirectories across the most common filesystems: |
| 24 | * ext3 : 32,000 |
| 25 | * ext4 : unlimited in theory, but may be set to 64,000 by default |
| 26 | * xfs : tested to millions and performance is not impacted |
| 27 | * btrfs: similar to xfs |
| 28 | * ntfs : 2^32-1 theoretically (same limit as number of files in a directory) |
| 29 | |
| 30 | Between 10,000 and 100,000 files per directory seems like a good number well supported across filesystems. If we take 100,000 on ext3 that gives us a lower limit of 3 billion tiles. |
| 31 | |
| 32 | Based on this my proposal is to distribute tiles in 100,000 per directory, so that we have this organization: |
| 33 | |
| 34 | {{{ |
| 35 | $RASDATA |
| 36 | |_ RASBASE |
| 37 | |_ TILES |
| 38 | |_ 0 |
| 39 | | |_ 1 |
| 40 | | |_ 2 |
| 41 | | |_ 3 |
| 42 | | |_ ... |
| 43 | | |
| 44 | |_ 1 |
| 45 | | |_ 100,000 |
| 46 | | |_ 100,001 |
| 47 | | |_ 100,002 |
| 48 | | |_ ... |
| 49 | | |
| 50 | |_ ... |
| 51 | }}} |
| 52 | |
| 53 | The subdirectory index in TILES is dir_index = tile_index / 100,000. The 100,000 number can be a compile time constant that can be adjusted as necessary. By default it is maybe better if it is 2^16 or 2^17 so that the dir_index can be computed with a fast bit shift. |
| 54 | |
| 55 | I would like to stay away from creating complicated tree-like schemes nesting multiple subdirectories. It's the job of the filesystem to handle this load, if we ever reach some limits with this scheme on a particular filesystem it seems very unlikely that we'll be able to work around it ourselves, without actually adapting the filesystem underneat. |
| 56 | |
| 57 | Rasdaman could support both structures (old and new) with a simple check at startup; in v10.0 we can enforce this structure. update_db.sh can be executed to migrate to the new directory structure. |