NETCDF compression
Added by Stefan Hagemann about 2 months ago
Hi!
Currently, CDO is using NETCDF data compression via deflate (option -z zip_2) according to the manual. Today I was attending a talk of Anna Fuchs (DKRZ) who criticised DEFLATE as being slow and outdated. She recommended using compression based on lz4, and mentioned that related filters are already available. Is it planned to introduce netcdf data compression based on lz4 into CDOs?
Replies (1)
RE: NETCDF compression - Added by Uwe Schulzweida about 1 month ago
Hi Stefan,
NetCDF4/HDF5 filter support has been available since CDO release 2.4.3. The --filter <filterspec> option can be used to specify filters for compressing the data.
The filter specification consist of a filter id, a comma, and then a sequence of comma separated constants representing the parameters (see: https://docs.unidata.ucar.edu/netcdf-c/4.9.2/filters.html#filters_syntax)
The number of parameters depends on the selected filter, here is a list of registered filters.
Filters are installed as dynamic libraries, called plugins. If a required filter plugin is not available, it can be installed by the user in a separate plugin directory. The environment variable HDF5_PLUGIN_PATH is used to refer to plugins directories.
If a filter plugin is not found, CDO terminates with an error message. If the filter parameters are not specified correctly, CDO is cancelled unexpectedly by the netCDF library. Unfortunately, I have not yet found a list with the necessary parameters for all filters.
The filter LZ4 is mostly used in combination with the BLOSC filter. BLOSC has the filter ID 32001 and requires 7 parameters.
Here is an example:
export HDF5_PLUGIN_PATH=/work/kv0653/spack-flo/netcdf-c-4.9.0-7katv4/plugins cdo -f nc4 --filter 32001,0,0,0,0,5,2,1 copy infile outfileThe first four parameters are 0 and set by the NetCDF library. Parameter 5 is the compression level. Parameter 6 is for shuffling, 2 means bit-wise shuffle. Parameter 7 is the subcompressor, 1 stands for LZ4.
Compressed data can only be read if the filter plugins are also available when reading the data.
There are also two new filter operators in CDO:
- setfilter: sets the NetCDF4 filter specification for selected variables
- showfilter: Prints NetCDF4 filter specification of all variables
Cheers,
Uwe