Opened 2 years ago

Closed 18 months ago

#2680 closed defect (wontfix)

FIX - wcst_import continues when a C library has seg fault

Reported by: Bang Pham Huu Owned by: Mohit Basak
Priority: major Milestone: 10.1
Component: wcst_import Version: 10.0
Keywords: Cc: Dimitar Misev
Complexity: Medium

Description (last modified by Dimitar Misev)

When a segmentation fault happened by a C library (e.g. libhdf5 when it analyses a netCDF file), the whole wcst_import.py process will be stopped without any other error than a message segmentation fault.

Here is the full log using valgrind:

Analyzing file (7/7): /eodata/CLMS/Global/Vegetation/Soil_Water_Index/BioPar_SWI_V3_Global/SWI_200701091200_GLOBE_ASCAT_V3.1.1/SWI_200701091200_GLOBE_ASCAT_V3.1.1.nc ...
==2683576== Invalid read of size 8
==2683576==    at 0x1BD0DB86: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BD22DFD: H5VL_blob_specific (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BD01F5C: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BC6AF60: H5T__conv_vlen (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BC583B4: H5T_convert (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B9E18CD: H5D_get_create_plist (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BD2FD62: H5VL__native_dataset_get (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BD09854: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BD14D35: H5VL_dataset_get (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B9A1C55: H5Dget_create_plist (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B75BF39: nc4_get_var_meta (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libnetcdf-7952139f.so.18.0.0)
==2683576==    by 0x1B759566: nc4_hdf5_find_grp_var_att (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libnetcdf-7952139f.so.18.0.0)
==2683576==  Address 0x1c854748 is 40 bytes inside a block of size 68 free'd
==2683576==    at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2683576==    by 0x1BB1B678: H5MM_xfree (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B93066D: H5A__shared_free (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B930B4D: H5A__close (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B932AA0: H5A__attr_release_table (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B92A441: H5A__dense_iterate (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BB38EDD: H5O_attr_iterate_real (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BB39841: H5O__attr_iterate (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B92E91C: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B936698: H5A__iterate (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BD2E504: H5VL__native_attr_specific (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BD08DFD: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==  Block was alloc'd at
==2683576==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2683576==    by 0x1BB1B7F0: H5MM_malloc (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BB1BAEB: H5MM_strdup (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BB34438: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BB34C77: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BB6EBA5: H5O_msg_decode (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B927450: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BADA1C7: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BADB64C: H5HF__man_op (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BAB8F0E: H5HF_op (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B92721D: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B953465: H5B2__iterate_node (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576== 
==2683576== Jump to the invalid address stated on the next line
==2683576==    at 0x4645454244414544: ???
==2683576==    by 0x1BD22DFD: H5VL_blob_specific (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BD01F5C: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BC6AF60: H5T__conv_vlen (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BC583B4: H5T_convert (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B9E18CD: H5D_get_create_plist (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BD2FD62: H5VL__native_dataset_get (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BD09854: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BD14D35: H5VL_dataset_get (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B9A1C55: H5Dget_create_plist (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B75BF39: nc4_get_var_meta (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libnetcdf-7952139f.so.18.0.0)
==2683576==    by 0x1B759566: nc4_hdf5_find_grp_var_att (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libnetcdf-7952139f.so.18.0.0)
==2683576==  Address 0x4645454244414544 is not stack'd, malloc'd or (recently) free'd
==2683576== 
==2683576== 
==2683576== Process terminating with default action of signal 11 (SIGSEGV)
==2683576==  Bad permissions for mapped region at address 0x4645454244414544
==2683576==    at 0x4645454244414544: ???
==2683576==    by 0x1BD22DFD: H5VL_blob_specific (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BD01F5C: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BC6AF60: H5T__conv_vlen (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BC583B4: H5T_convert (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B9E18CD: H5D_get_create_plist (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BD2FD62: H5VL__native_dataset_get (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BD09854: ??? (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1BD14D35: H5VL_dataset_get (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B9A1C55: H5Dget_create_plist (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libhdf5-29e32098.so.200.0.0)
==2683576==    by 0x1B75BF39: nc4_get_var_meta (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libnetcdf-7952139f.so.18.0.0)
==2683576==    by 0x1B759566: nc4_hdf5_find_grp_var_att (in /usr/local/lib/python3.8/dist-packages/netCDF4.libs/libnetcdf-7952139f.so.18.0.0)
==2683576== 
==2683576== HEAP SUMMARY:
==2683576==     in use at exit: 18,500,212 bytes in 49,193 blocks
==2683576==   total heap usage: 412,683 allocs, 363,490 frees, 100,997,488 bytes allocated
==2683576== 
==2683576== LEAK SUMMARY:
==2683576==    definitely lost: 48 bytes in 1 blocks
==2683576==    indirectly lost: 0 bytes in 0 blocks
==2683576==      possibly lost: 821,730 bytes in 522 blocks
==2683576==    still reachable: 17,678,434 bytes in 48,670 blocks
==2683576==         suppressed: 0 bytes in 0 blocks
==2683576== Rerun with --leak-check=full to see details of leaked memory
==2683576== 
==2683576== Use --track-origins=yes to see where uninitialised values come from
==2683576== For lists of detected and suppressed errors, rerun with: -s
==2683576== ERROR SUMMARY: 7613 errors from 257 contexts (suppressed: 0 from 0)
Segmentation fault

For reproducing, the pip3 netCDF version is 1.5.8 on Ubuntu 20.04, which installed the following libraries:

  netCDF4.libs/libaec-e300f322.so.0.0.10
  netCDF4.libs/libcurl-8c767087.so.4.7.0
  netCDF4.libs/libhdf5-29e32098.so.200.0.0
  netCDF4.libs/libhdf5_hl-f3ea9bc7.so.200.0.0
  netCDF4.libs/libnetcdf-7952139f.so.18.0.0
  netCDF4.libs/libsz-57467d8a.so.2.0.1
  netCDF4/_netCDF4.cpython-38-x86_64-linux-gnu.so

Attached is the data and an ingredients file. To reproduce the problem:

  1. download the data and ingredients in one directory
  2. install problematic package version
    pip3 install netCDF4=1.5.8
    
  3. run the import
    /opt/rasdaman/share/rasdaman/wcst_import/wcst_import.py test.json
    

It may need to run multiple times until the problem is triggered.

Instead of the whole import process diying, wcst_import should retry importing the file.

Attachments (2)

SWI_200701091200_GLOBE_ASCAT_V3.1.1.nc (6.9 MB ) - added by Dimitar Misev 2 years ago.
data
test.json (3.9 KB ) - added by Bang Pham Huu 2 years ago.
newer ingredients file

Change History (7)

comment:1 by Dimitar Misev, 2 years ago

Description: modified (diff)

by Dimitar Misev, 2 years ago

data

by Bang Pham Huu, 2 years ago

Attachment: test.json added

newer ingredients file

comment:2 by Dimitar Misev, 2 years ago

Description: modified (diff)

comment:3 by Bang Pham Huu, 2 years ago

Owner: changed from Bang Pham Huu to mohit

comment:4 by Bang Pham Huu, 2 years ago

Owner: changed from mohit to Mohit Basak

comment:5 by Dimitar Misev, 18 months ago

Resolution: wontfix
Status: assignedclosed

I think there's no reasonable way to somehow do a retry in wcst_import.py. Maybe something could be added to the wcst_import.sh script which starts wcst_import.py. But it all sounds like more work than is worth for this extremely rare case, so I'll close the ticket.

Note: See TracTickets for help on using tickets.