Project

General

Profile

cdo-1.4.7rc2: Setting chunk size

Added by Clement Tisseuil over 13 years ago

Hello,

I am trying the cdo-1.4.7rc2 version that allows reading/writing NC_NETCDF4 model data.
I have successfully tried the following command line to copy a NC_CLASSIC_MODEL file to a NC_NETCDF4 file:

cdo -f nc4 copy file.nc file2.nc

My question is to know if there is any possibility to set chunk parameters when creating the NC_NETCDF4 file, in order to speed up some reading/writing operations?

Thanks in advance

Clem


Replies (6)

RE: cdo-1.4.7rc2: Setting chunk size - Added by Uwe Schulzweida over 13 years ago

Hi Clem,

The CDO I/O is optimized for speed. The chunk size will be set automaticaly to the size of a horizontal slice of the variables.
You can also use the CDO option -z zip to compress all chunks:

cdo -f nc4 -z zip  copy file.nc file2.nc

Regards,
Uwe

RE: cdo-1.4.7rc2: Setting chunk size - Added by Clement Tisseuil over 13 years ago

Hi Uwe,

In my case, I would like to set specifically the chunk size of my NetCDF4 files ("ideally" editable using CDO) so that the targeted data could be accessed efficiently from external software, in my case R via the ncdf4 library.

It seems that the "bm_file" program, provided by the NetCDF4 library (http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/bm_005ffile.html) could be an interesting alternative for my problem.

Let me know if this feature could be an interesting issue for CDO...

Regards,

Clem

RE: cdo-1.4.7rc2: Setting chunk size - Added by Etienne Tourigny over 11 years ago

Hi

I would like to implement this into cdo.

Some applications (gdal/qgis, grads) work better if the chunk size is {width x 1} instead of {width x height}.

The default is also not good with very large datasets - individual chunks are very big.

Also note that the "recommeended" chunk size is
chunksize[d] = pow((double)DEFAULT_CHUNK_SIZE/type_size,
1/(double)(var->ndims - unlimdim));
taken from
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Default-Chunking.html#Default-Chunking

Any ideas how this could be input by the user? command-line argument / env variable / new operator ?

Also somewhat related, is there a way to specify compression level, and if not which ways could it be set by the user?

Thanks
Etienne

RE: cdo-1.4.7rc2: Setting chunk size - Added by Uwe Schulzweida over 11 years ago

Hi Etienne,

Thanks for the information. I still haven't informed about the optimal chunk size. If I understand it right the recommended chunk size is about 4096 bytes. I have two recommendations for the implementation:
  • An environment variable to set the chunk size in bytes.
  • And try to find a more optimal default chunk size in CDO, e.g. if ( height > 1 && width > 1024 ) chunk_size = width

The compression level could be added to the compression type. Here an example for a compression level of 6:

cdo -f nc4 -z zip_6 copy ifile ofile
I can implement this in the next CDO release.

Cheers,
Uwe

RE: cdo-1.4.7rc2: Setting chunk size - Added by Etienne Tourigny over 11 years ago

Hi Uwe

Yes please add (optional) compression level into next release.

I have worked a bit on chunking yesterday, with using a parameter instead of env. var - although it may be less intrusive if used as an environment var as you suggest. And most cases, you don't need to set shunk size manually, but when needed an env. var can be set temporarily anyway.

{{{
-k <chunks> Set the chunk sizes of the X and Y axes (netcdf-4 only). Options:
default (1 chunk per time/height), auto (netcdf default), lines (<widthx1>), <widthxheight>
}}}

the auto option would be to let libnetcdf itself decide, and the lines option is mainly for interacting with gdal

and the following code in

{{{
#if defined (HAVE_NETCDF4)
if ( lchunk &&
(streamptr->filetype FILETYPE_NC4 || streamptr->filetype FILETYPE_NC4C) ) {
chunkAlgo = vlistInqVarChunkAlgo(vlistID, varID);
if ( ( strcmp("default", chunkAlgo) 0 ) || ( strcmp("", chunkAlgo) 0 ) ) {
if ( xd != -1 )
chunks[xd] = xsize;
if ( yd != -1 )
chunks[yd] = ysize;
}
else if ( strcmp("auto", chunkAlgo) 0 ) {
}
else if ( strcmp("lines", chunkAlgo) 0 ) {
if ( xd != -1 )
chunks[xd] = xsize;
if ( yd != -1 )
chunks[yd] = 1;
}
else {
// TODO parse widthxheight
}
if ( strcmp("auto", chunkAlgo) != 0 ) {
if ( (retval = nc_def_var_chunking(fileID, ncvarid, 0, chunks)) )
Error("nc_def_var_chunking failed, status = %d", retval);
}
}
#endif
}}}

There is some other glue also (in vlist_var.c) and others, but the important stuff is above. Let me know what you think.

I also have a small patch to print chunking information in sinfo, here is sample output:

{{{
tourigny@supernova: /data/research/work/inland/hg/etienne-subgrid/output $ cdo sinfon inland-hrmap-1981.nc
File format: netCDF4 ZIP
-1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name : Extra
1 : unknown unknown instant 1 1 24206400 1 I16z : ihrtileparent chunks: 1448 X 1448 X 1
2 : unknown unknown instant 1 1 24206400 1 I8 z : vegtype chunks: 2048 X 2048 X 1
3 : unknown unknown instant 1 1 24206400 1 I8 z : landuse chunks: 2048 X 2048 X 1

$ cdo sinfon inland-hrmap-1981.nc3
File format: netCDF
-1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name : Extra
1 : unknown unknown instant 1 1 24206400 1 I16 : ihrtileparent
2 : unknown unknown instant 1 1 24206400 1 I8 : vegtype
3 : unknown unknown instant 1 1 24206400 1 I8 : landuse
}}}

RE: cdo-1.4.7rc2: Setting chunk size - Added by Etienne Tourigny over 11 years ago

Attaching a patch which implements chunks printing in sinfo* and also chunksize algorithm selection via CDI_CHUNK_ALGO.

    (1-6/6)