Opened 9 years ago

Closed 9 years ago

#983 closed defect (fixed)

wcst_import: invalid input files should not cause error maybe

Reported by: Dimitar Misev Owned by: Alex Dumitru
Priority: major Milestone: 9.1.x
Component: applications Version: development
Keywords: Cc: Bang Pham Huu
Complexity: Medium

Description

Maybe this should not terminate the ingestion but rather be a warning?

RUNTIME ERROR: The file at path /data/datasets/NCI/Landsat5/2005/LS5_TM_NBAR_151_-037_2005-06-08T23-32-13.915056.tif is not a valid GDAL decodable file.

Change History (6)

comment:1 by Alex Dumitru, 9 years ago

Cc: Bang Pham Huu added

This cannot really be done as a catch-all for all recipes, unfortunately, as there's no guarantee a recipe will be using GDAL (see for example wcs_extract recipe). Furthermore, the slice metadata might not be able to be created for a variety of other reasons, and as we cannot know the internal recipe mechanism outside of it we can't really catch it outside of it.

There is already a skipping mechanism for rasdaman / petascope errors (see here http://rasdaman.org/browser/applications/wcst_import/master/importer/importer.py#L108 ), so a mechanism that does this should check for the skip config ingredient first.

However we can offer an option for recipe implementors, a class called GDALValidator(methods: init(File[]), get_valid_files(show_warnings=True)) that takes a list of File objects and returns all GDAL valid files (showing warnings for the ones that are skipped). Then in the recipes that we(rasdaman) offer we can call the function in the constructor. I find this to be a consistent solution with what we have so far.

comment:2 by Bang Pham Huu, 9 years ago

Hi Alex,

It is fine, i will implement based on your idea. But before we can insert_slices, your code has a problem with try to get the gdal_dataset from a first file path in the list of slices to import.

If the first file is error and could not read by GDAL then it will have an exception before you can _insert_slices

gdal_dataset = GDALGmlUtil(self.session.get_files()[0].get_filepath())

So I think we can make an option in recipes (it is nice to have, I will do it later) for user to:
+ skip = true to skip the progress when valid and see a file could not be ingested.
+ skip = false just warning but ignore error files (default).

And I will add the validation before you create gdal_dataset to avoid above problem.

comment:3 by Alex Dumitru, 9 years ago

@Bang: There is already a skip parameter in ConfigManager.skip (true means we skip over invalid files, false means we do not attemtp to resolve invalid files and let them throw an error if they are touched)

So ideally in the recipes the following should happen:

class Recipe(BaseRecipe):
  def __init__(...):
    self.files = self.session.get_files()
    if ConfigManager.skip:
      self.files = GdalValidator(files).get_valid_files() #This eliminates invalid files
  

  def ...:
    gdal_dataset = GdalGmlUtil(self.files[0].get_filepath) #self.files[0] is valid as we handled it in init

Let me know if it's clear, sometimes I do not explain things properly.

comment:4 by Bang Pham Huu, 9 years ago

ok, Alex. It is better when you said that there is a configuration in recipe file first ("skip = true", I have to recheck it in http://www.rasdaman.org/wiki/WCSTImportGuide and from that then ConfigManager.skip will make sense. I understand your mechanism but confuse where you set this value is True or False then I ask about add this option in recipe - but actually you added it before).

Anyway, thanks for very detail instruction, I think the main point I need is where you want to check file valids and you've pointed out below

class Recipe(BaseRecipe):
  def __init__(...):
    self.files = self.session.get_files()
    # Check valid files in here

I've done it and also have some tests with files that could not open by GDAL and it work as I expected. Please validate the new patch today.

comment:6 by Bang Pham Huu, 9 years ago

Resolution: fixed
Status: newclosed

A patch has been accepted, so thanks for Alex has commented and I will know which is I need to enhanced.

ticket:983 wcst_import: invalid input files should not cause error BangPH <b.phamhuu@…> master 2015-10-27 18:16:08 Download patch APPLIED SUCCESS

I will close this ticket here.

Note: See TracTickets for help on using tickets.