Opened 9 years ago
Closed 9 years ago
#983 closed defect (fixed)
wcst_import: invalid input files should not cause error maybe
Reported by: | Dimitar Misev | Owned by: | Alex Dumitru |
---|---|---|---|
Priority: | major | Milestone: | 9.1.x |
Component: | applications | Version: | development |
Keywords: | Cc: | Bang Pham Huu | |
Complexity: | Medium |
Description
Maybe this should not terminate the ingestion but rather be a warning?
RUNTIME ERROR: The file at path /data/datasets/NCI/Landsat5/2005/LS5_TM_NBAR_151_-037_2005-06-08T23-32-13.915056.tif is not a valid GDAL decodable file.
Change History (6)
comment:1 by , 9 years ago
Cc: | added |
---|
comment:2 by , 9 years ago
Hi Alex,
It is fine, i will implement based on your idea. But before we can insert_slices, your code has a problem with try to get the gdal_dataset from a first file path in the list of slices to import.
If the first file is error and could not read by GDAL then it will have an exception before you can _insert_slices
gdal_dataset = GDALGmlUtil(self.session.get_files()[0].get_filepath())
So I think we can make an option in recipes (it is nice to have, I will do it later) for user to:
+ skip = true to skip the progress when valid and see a file could not be ingested.
+ skip = false just warning but ignore error files (default).
And I will add the validation before you create gdal_dataset to avoid above problem.
comment:3 by , 9 years ago
@Bang: There is already a skip parameter in ConfigManager.skip (true means we skip over invalid files, false means we do not attemtp to resolve invalid files and let them throw an error if they are touched)
So ideally in the recipes the following should happen:
class Recipe(BaseRecipe): def __init__(...): self.files = self.session.get_files() if ConfigManager.skip: self.files = GdalValidator(files).get_valid_files() #This eliminates invalid files def ...: gdal_dataset = GdalGmlUtil(self.files[0].get_filepath) #self.files[0] is valid as we handled it in init
Let me know if it's clear, sometimes I do not explain things properly.
comment:4 by , 9 years ago
ok, Alex. It is better when you said that there is a configuration in recipe file first ("skip = true", I have to recheck it in http://www.rasdaman.org/wiki/WCSTImportGuide and from that then ConfigManager.skip will make sense. I understand your mechanism but confuse where you set this value is True or False then I ask about add this option in recipe - but actually you added it before).
Anyway, thanks for very detail instruction, I think the main point I need is where you want to check file valids and you've pointed out below
class Recipe(BaseRecipe): def __init__(...): self.files = self.session.get_files() # Check valid files in here
I've done it and also have some tests with files that could not open by GDAL and it work as I expected. Please validate the new patch today.
comment:5 by , 9 years ago
See comments for your latest patch here: http://git.flanche.com/alex/rasdaman-codereview/commit/a315266987a0bdd1c1de8ae4074b36c3c8b98705
comment:6 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
A patch has been accepted, so thanks for Alex has commented and I will know which is I need to enhanced.
ticket:983 wcst_import: invalid input files should not cause error BangPH <b.phamhuu@…> master 2015-10-27 18:16:08 Download patch APPLIED SUCCESS
I will close this ticket here.
This cannot really be done as a catch-all for all recipes, unfortunately, as there's no guarantee a recipe will be using GDAL (see for example wcs_extract recipe). Furthermore, the slice metadata might not be able to be created for a variety of other reasons, and as we cannot know the internal recipe mechanism outside of it we can't really catch it outside of it.
There is already a skipping mechanism for rasdaman / petascope errors (see here http://rasdaman.org/browser/applications/wcst_import/master/importer/importer.py#L108 ), so a mechanism that does this should check for the skip config ingredient first.
However we can offer an option for recipe implementors, a class called GDALValidator(methods: init(File[]), get_valid_files(show_warnings=True)) that takes a list of File objects and returns all GDAL valid files (showing warnings for the ones that are skipped). Then in the recipes that we(rasdaman) offer we can call the function in the constructor. I find this to be a consistent solution with what we have so far.