#1073 closed defect (fixed)

Filestorage - divide tile files into subdirectories

Reported by: dmisev Owned by: dmisev
Priority: critical Milestone: 9.1.x
Component: relblobif Version: development
Keywords: Cc: pbaumann, vmerticariu, mdumitru
Complexity: Medium

Description (last modified by dmisev)

The flat directory organization of the tile files in $RASDATA is not scalable as we reach filesystem limits. Therefore tiles should be distributed into subdirectories.

Currently all data is stored in a single directory $RASDATA, i.e. we have

$RASDATA
 |_ RASBASE
 |_ 1
 |_ 2
 |_ 3
 |_ ..


Proposed restructuring

$RASDATA
 |_ RASBASE
 |_ TILES
      |_ ..

How should TILES be structured? Maximum number of subdirectories across the most common filesystems:

  • ext3 : 32,000
  • ext4 : unlimited in theory, but may be set to 64,000 by default
  • xfs : tested to millions and performance is not impacted
  • btrfs: similar to xfs
  • ntfs : 2^32-1 theoretically (same limit as number of files in a directory)

Between 10,000 and 100,000 files per directory seems like a good number well supported across filesystems.

Two-level nesting

$RASDATA
 |_ RASBASE
 |_ TILES
      |  dir1_index
      |_ 0
      |  |  dir2_index
      |  |_ 0
      |  |  |_ 1
      |  |  |_ 2
      |  |  |_ 3
      |  |  |_ ..
      |  |  
      |  |_ 1
      |  |  |_ 100,000
      |  |  |_ 100,002
      |  |  |_ ..
      |  |  
      |  |_ 2
      |  |_ ...
      |  |_ 32,767
      |   
      |_ 1
      |  |_ 32,768
      |  |_ 32,769
      |  |_ ...
      |  
      |_ ...

The subdirectory index in TILES is:

  • dir2_index = tile_index / max_files (16,384)
  • dir1_index = dir2_index / max_dirs (16,384)

I suggest we take a limit of 2^14 = 16,384 directories / files for the second and third level.

This gives us a "lower" limit of about ~20 EB (with 4MB tiles).

Backwards compatibility

Rasdaman could support both structures (old and new) with a simple check at startup; in v10.0 we can enforce this structure. update_db.sh can be executed to migrate to the new directory structure.

Change History (9)

comment:1 Changed 22 months ago by dmisev

  • Description modified (diff)

comment:2 Changed 22 months ago by dmisev

  • Description modified (diff)

comment:3 Changed 22 months ago by vmerticariu

100,000 tiles / directory, with a limit of 32,000 directories and a 4MB tile size means ~12 PB maximum size. If somebody chooses smaller tiles, like 1MB, then we would have a limit of 3 PB.

I agree that we should avoid complexity and keep it simple, but in order not to worry about this I guess the limit should be in the order of EB.

What about adding 1 extra level of nesting (so in the subdir 0 of TILES, you can have 32,000 directories), which increases the limit to more than 100 EB?

Please correct me if there's anything wrong with my math.

comment:4 follow-up: Changed 22 months ago by dmisev

Yes true, although 30,000 subdirs is really a lower limit (ext3 is seriously outdated, no one will put PB on ext3). Can you workout a simple bucketing scheme for two levels so that both levels get gradually filled up with subdirs?

I just thought of network filesystems like NFS btw, is anyone familiar with these? Probably they have quite some limitations.

comment:5 in reply to: ↑ 4 Changed 22 months ago by dmisev

Replying to dmisev:

I just thought of network filesystems like NFS btw, is anyone familiar with these? Probably they have quite some limitations.

Seems like this is up to the underlying filesystem, so we can ignore it.

comment:6 Changed 22 months ago by dmisev

  • Description modified (diff)

comment:7 Changed 22 months ago by bphamhuu

http://serverfault.com/questions/129953/maximum-number-of-files-in-one-ext3-directory-while-still-getting-acceptable-per some ideas like lower levels and numbers of file in each directory (around 20.000 files).

comment:8 Changed 22 months ago by dmisev

  • Description modified (diff)

comment:9 Changed 22 months ago by dmisev

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.