Opened 8 years ago

Closed 8 years ago

#1073 closed defect (fixed)

Filestorage - divide tile files into subdirectories

Reported by: Dimitar Misev Owned by: Dimitar Misev
Priority: critical Milestone: 9.1.x
Component: relblobif Version: development
Keywords: Cc: Peter Baumann, Vlad Merticariu, Alex Dumitru
Complexity: Medium

Description (last modified by Dimitar Misev)

The flat directory organization of the tile files in $RASDATA is not scalable as we reach filesystem limits. Therefore tiles should be distributed into subdirectories.

Currently all data is stored in a single directory $RASDATA, i.e. we have

$RASDATA
 |_ RASBASE
 |_ 1
 |_ 2
 |_ 3
 |_ ..


Proposed restructuring

$RASDATA
 |_ RASBASE
 |_ TILES
      |_ ..

How should TILES be structured? Maximum number of subdirectories across the most common filesystems:

  • ext3 : 32,000
  • ext4 : unlimited in theory, but may be set to 64,000 by default
  • xfs : tested to millions and performance is not impacted
  • btrfs: similar to xfs
  • ntfs : 2^32-1 theoretically (same limit as number of files in a directory)

Between 10,000 and 100,000 files per directory seems like a good number well supported across filesystems.

Two-level nesting

$RASDATA
 |_ RASBASE
 |_ TILES
      |  dir1_index
      |_ 0
      |  |  dir2_index
      |  |_ 0
      |  |  |_ 1
      |  |  |_ 2
      |  |  |_ 3
      |  |  |_ ..
      |  |  
      |  |_ 1
      |  |  |_ 100,000
      |  |  |_ 100,002
      |  |  |_ ..
      |  |  
      |  |_ 2
      |  |_ ...
      |  |_ 32,767
      |   
      |_ 1
      |  |_ 32,768
      |  |_ 32,769
      |  |_ ...
      |  
      |_ ...

The subdirectory index in TILES is:

  • dir2_index = tile_index / max_files (16,384)
  • dir1_index = dir2_index / max_dirs (16,384)

I suggest we take a limit of 2^14 = 16,384 directories / files for the second and third level.

This gives us a "lower" limit of about ~20 EB (with 4MB tiles).

Backwards compatibility

Rasdaman could support both structures (old and new) with a simple check at startup; in v10.0 we can enforce this structure. update_db.sh can be executed to migrate to the new directory structure.

Change History (9)

comment:1 by Dimitar Misev, 8 years ago

Description: modified (diff)

comment:2 by Dimitar Misev, 8 years ago

Description: modified (diff)

comment:3 by Vlad Merticariu, 8 years ago

100,000 tiles / directory, with a limit of 32,000 directories and a 4MB tile size means ~12 PB maximum size. If somebody chooses smaller tiles, like 1MB, then we would have a limit of 3 PB.

I agree that we should avoid complexity and keep it simple, but in order not to worry about this I guess the limit should be in the order of EB.

What about adding 1 extra level of nesting (so in the subdir 0 of TILES, you can have 32,000 directories), which increases the limit to more than 100 EB?

Please correct me if there's anything wrong with my math.

comment:4 by Dimitar Misev, 8 years ago

Yes true, although 30,000 subdirs is really a lower limit (ext3 is seriously outdated, no one will put PB on ext3). Can you workout a simple bucketing scheme for two levels so that both levels get gradually filled up with subdirs?

I just thought of network filesystems like NFS btw, is anyone familiar with these? Probably they have quite some limitations.

in reply to:  4 comment:5 by Dimitar Misev, 8 years ago

Replying to dmisev:

I just thought of network filesystems like NFS btw, is anyone familiar with these? Probably they have quite some limitations.

Seems like this is up to the underlying filesystem, so we can ignore it.

comment:6 by Dimitar Misev, 8 years ago

Description: modified (diff)

comment:7 by Bang Pham Huu, 8 years ago

http://serverfault.com/questions/129953/maximum-number-of-files-in-one-ext3-directory-while-still-getting-acceptable-per some ideas like lower levels and numbers of file in each directory (around 20.000 files).

comment:8 by Dimitar Misev, 8 years ago

Description: modified (diff)

comment:9 by Dimitar Misev, 8 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.