Opened 2 years ago
Closed 18 months ago
#2680 closed defect (wontfix)
FIX - wcst_import continues when a C library has seg fault
Reported by: | Bang Pham Huu | Owned by: | Mohit Basak |
---|---|---|---|
Priority: | major | Milestone: | 10.1 |
Component: | wcst_import | Version: | 10.0 |
Keywords: | Cc: | Dimitar Misev | |
Complexity: | Medium |
Description (last modified by )
When a segmentation fault happened by a C library (e.g. libhdf5
when it analyses a netCDF file), the whole wcst_import.py process will be stopped without any other error than a message segmentation fault
.
Here is the full log using valgrind
:
Analyzing file (7/7): /eodata/CLMS/Global/Vegetation/Soil_Water_Index/BioPar_SWI_V3_Global/SWI_200701091200_GLOBE_ASCAT_V3.1.1/SWI_200701091200_GLOBE_ASCAT_V3.1.1.nc ... ==2683576== Invalid read of size 8 ==2683576== at 0x1BD0DB86: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BD22DFD: H5VL_blob_specific (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BD01F5C: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BC6AF60: H5T__conv_vlen (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BC583B4: H5T_convert (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B9E18CD: H5D_get_create_plist (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BD2FD62: H5VL__native_dataset_get (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BD09854: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BD14D35: H5VL_dataset_get (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B9A1C55: H5Dget_create_plist (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B75BF39: nc4_get_var_meta (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libnetcdf-7952139f.so.18.0.0) ==2683576== by 0x1B759566: nc4_hdf5_find_grp_var_att (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libnetcdf-7952139f.so.18.0.0) ==2683576== Address 0x1c854748 is 40 bytes inside a block of size 68 free'd ==2683576== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==2683576== by 0x1BB1B678: H5MM_xfree (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B93066D: H5A__shared_free (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B930B4D: H5A__close (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B932AA0: H5A__attr_release_table (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B92A441: H5A__dense_iterate (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BB38EDD: H5O_attr_iterate_real (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BB39841: H5O__attr_iterate (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B92E91C: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B936698: H5A__iterate (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BD2E504: H5VL__native_attr_specific (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BD08DFD: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== Block was alloc'd at ==2683576== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==2683576== by 0x1BB1B7F0: H5MM_malloc (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BB1BAEB: H5MM_strdup (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BB34438: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BB34C77: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BB6EBA5: H5O_msg_decode (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B927450: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BADA1C7: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BADB64C: H5HF__man_op (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BAB8F0E: H5HF_op (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B92721D: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B953465: H5B2__iterate_node (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== ==2683576== Jump to the invalid address stated on the next line ==2683576== at 0x4645454244414544: ??? ==2683576== by 0x1BD22DFD: H5VL_blob_specific (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BD01F5C: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BC6AF60: H5T__conv_vlen (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BC583B4: H5T_convert (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B9E18CD: H5D_get_create_plist (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BD2FD62: H5VL__native_dataset_get (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BD09854: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BD14D35: H5VL_dataset_get (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B9A1C55: H5Dget_create_plist (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B75BF39: nc4_get_var_meta (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libnetcdf-7952139f.so.18.0.0) ==2683576== by 0x1B759566: nc4_hdf5_find_grp_var_att (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libnetcdf-7952139f.so.18.0.0) ==2683576== Address 0x4645454244414544 is not stack'd, malloc'd or (recently) free'd ==2683576== ==2683576== ==2683576== Process terminating with default action of signal 11 (SIGSEGV) ==2683576== Bad permissions for mapped region at address 0x4645454244414544 ==2683576== at 0x4645454244414544: ??? ==2683576== by 0x1BD22DFD: H5VL_blob_specific (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BD01F5C: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BC6AF60: H5T__conv_vlen (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BC583B4: H5T_convert (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B9E18CD: H5D_get_create_plist (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BD2FD62: H5VL__native_dataset_get (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BD09854: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1BD14D35: H5VL_dataset_get (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B9A1C55: H5Dget_create_plist (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0) ==2683576== by 0x1B75BF39: nc4_get_var_meta (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libnetcdf-7952139f.so.18.0.0) ==2683576== by 0x1B759566: nc4_hdf5_find_grp_var_att (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libnetcdf-7952139f.so.18.0.0) ==2683576== ==2683576== HEAP SUMMARY: ==2683576== in use at exit: 18,500,212 bytes in 49,193 blocks ==2683576== total heap usage: 412,683 allocs, 363,490 frees, 100,997,488 bytes allocated ==2683576== ==2683576== LEAK SUMMARY: ==2683576== definitely lost: 48 bytes in 1 blocks ==2683576== indirectly lost: 0 bytes in 0 blocks ==2683576== possibly lost: 821,730 bytes in 522 blocks ==2683576== still reachable: 17,678,434 bytes in 48,670 blocks ==2683576== suppressed: 0 bytes in 0 blocks ==2683576== Rerun with --leak-check=full to see details of leaked memory ==2683576== ==2683576== Use --track-origins=yes to see where uninitialised values come from ==2683576== For lists of detected and suppressed errors, rerun with: -s ==2683576== ERROR SUMMARY: 7613 errors from 257 contexts (suppressed: 0 from 0) Segmentation fault
For reproducing, the pip3 netCDF version is 1.5.8 on Ubuntu 20.04, which installed the following libraries:
netCDF4.libs/libaec-e300f322.so.0.0.10 netCDF4.libs/libcurl-8c767087.so.4.7.0 netCDF4.libs/libhdf5-29e32098.so.200.0.0 netCDF4.libs/libhdf5_hl-f3ea9bc7.so.200.0.0 netCDF4.libs/libnetcdf-7952139f.so.18.0.0 netCDF4.libs/libsz-57467d8a.so.2.0.1 netCDF4/_netCDF4.cpython-38-x86_64-linux-gnu.so
Attached is the data and an ingredients file. To reproduce the problem:
- download the data and ingredients in one directory
- install problematic package version
pip3 install netCDF4=1.5.8
- run the import
/opt/rasdaman/share/rasdaman/wcst_import/wcst_import.py test.json
It may need to run multiple times until the problem is triggered.
Instead of the whole import process diying, wcst_import should retry importing the file.
Attachments (2)
Change History (7)
comment:1 by , 2 years ago
Description: | modified (diff) |
---|
by , 2 years ago
Attachment: | SWI_200701091200_GLOBE_ASCAT_V3.1.1.nc added |
---|
comment:2 by , 2 years ago
Description: | modified (diff) |
---|
comment:3 by , 2 years ago
Owner: | changed from | to
---|
comment:4 by , 2 years ago
Owner: | changed from | to
---|
comment:5 by , 18 months ago
Resolution: | → wontfix |
---|---|
Status: | assigned → closed |
I think there's no reasonable way to somehow do a retry in wcst_import.py. Maybe something could be added to the wcst_import.sh script which starts wcst_import.py. But it all sounds like more work than is worth for this extremely rare case, so I'll close the ticket.
data