Project

General

Profile

Recommended netcdf format for performance

Added by Oliver Angelil almost 8 years ago

Hi there,

Is there a particular netcdf file format which optimises performance? For example, which of the following would cdo perform best with: netCDF4 , netCDF4-classic, netCDF3? Or would we expect no difference in performance.

Thanks,
Oliver


Replies (1)

RE: Recommended netcdf format for performance - Added by Ralf Mueller almost 8 years ago

Performance is always hard to judge. I did a test with a 3.9GB nc4 classic (10000 gridpoints, 100000 thousand timesteps). With transparent compression (cdo -f nc4c -z zip) the filesize is 1.4GB.

 ram@luthien:~/local/data/cdo cdo showformat ncep_temperature.nc4                                                                                                                                           
NetCDF4
cdo showformat: Processed 1 variable ( 0.00s )
ram@luthien:~/local/data/cdo cdo showformat ncep_temperature.nc
NetCDF4 classic
cdo showformat: Processed 1 variable ( 0.00s )
ram@luthien:~/local/data/cdo cdo showformat ncep_temperature.nc3                                                                                                                                                           
NetCDF
cdo showformat: Processed 1 variable ( 0.00s )
ram@luthien:~/local/data/cdo cdo showformat ncep_temperature.ncz                                                                                                                                                            
NetCDF4 classic ZIP
cdo showformat: Processed 1 variable ( 0.01s )

-rw-rw-r-- 1 ram users  1.4G Jul 20 07:46 ncep_temperature.ncz
-rw-rw-r-- 1 ram users  3.9G Jul 20 07:49 ncep_temperature.nc3
-rw-rw-r-- 1 ram users  3.9G Jul 20 07:52 ncep_temperature.nc4
-rw-rw-r-- 1 ram users  3.9G Jun  9  2015 ncep_temperature.nc

No I ran

cdo -infov -timmean
on each of these files
ram@luthien:~/local/data/cdo for f (ncep_temperature.nc ncep_temperature.nc4 ncep_temperature.nc3 ncep_temperature.ncz) {echo $f;  cdo -infov -timmean $f }                                                                 [7:56:17|16-07-20]
ncep_temperature.nc
cdo infon: Started child process "timmean ncep_temperature.nc (pipe1.1)".
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
     1 : 1981-07-01 21:00:00       0    10512       0 :      222.25      277.51      303.60 : air           
cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 5.16s )
cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 5.16s )
ncep_temperature.nc4
cdo infon: Started child process "timmean ncep_temperature.nc4 (pipe1.1)".
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
     1 : 1981-07-01 21:00:00       0    10512       0 :      222.25      277.51      303.60 : air           
cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 5.17s )
cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 5.17s )
ncep_temperature.nc3
cdo infon: Started child process "timmean ncep_temperature.nc3 (pipe1.1)".
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
     1 : 1981-07-01 21:00:00       0    10512       0 :      222.25      277.51      303.60 : air           
cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 4.05s )
cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 4.05s )
ncep_temperature.ncz
cdo infon: Started child process "timmean ncep_temperature.ncz (pipe1.1)".
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
     1 : 1981-07-01 21:00:00       0    10512       0 :      222.25      277.51      303.60 : air           
cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 24.36s )
cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 24.36s )
and again
ram@luthien:~/local/data/cdo for f (ncep_temperature.nc ncep_temperature.nc4 ncep_temperature.nc3 ncep_temperature.ncz) {echo $f;  cdo -infov -timmean $f }                                                                 [7:57:26|16-07-20]
ncep_temperature.nc
cdo infon: Started child process "timmean ncep_temperature.nc (pipe1.1)".
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
     1 : 1981-07-01 21:00:00       0    10512       0 :      222.25      277.51      303.60 : air           
cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 5.22s )
cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 5.22s )
ncep_temperature.nc4
cdo infon: Started child process "timmean ncep_temperature.nc4 (pipe1.1)".
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
     1 : 1981-07-01 21:00:00       0    10512       0 :      222.25      277.51      303.60 : air           
cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 5.05s )
cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 5.05s )
ncep_temperature.nc3
cdo infon: Started child process "timmean ncep_temperature.nc3 (pipe1.1)".
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
     1 : 1981-07-01 21:00:00       0    10512       0 :      222.25      277.51      303.60 : air           
cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 4.05s )
cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 4.05s )
ncep_temperature.ncz
cdo infon: Started child process "timmean ncep_temperature.ncz (pipe1.1)".
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
     1 : 1981-07-01 21:00:00       0    10512       0 :      222.25      277.51      303.60 : air           
cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 24.06s )
cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 24.06s )

Finally: netcdf4 seems to be a bit slower that netcdf3, compression seems to be useful only if saving disk space has top priority or your program runs so long or parallel, that you don't care and you get less file size as a benefit.

Of course these are the number from my machine and my OS.

If you can live with lossy compression, grib might be an option:

ram@luthien:~/local/data/cdo cdo infov -timmean ncep_temperature.grb                                                                                                                                                        [8:15:12|16-07-20]
cdo infon: Started child process "timmean ncep_temperature.grb (pipe1.1)".
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
     1 : 1981-07-01 21:00:00       0    10512       0 :      222.25      277.51      303.60 : var1          
cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 1.58s )
cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 1.58s )
ram@luthien:~/local/data/cdo time cdo infov -timmean ncep_temperature.grb                                                                                                                                                   [8:15:21|16-07-20]
cdo infon: Started child process "timmean ncep_temperature.grb (pipe1.1)".
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
     1 : 1981-07-01 21:00:00       0    10512       0 :      222.25      277.51      303.60 : var1          
cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 1.60s )
cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 1.60s )
cdo -C infov -timmean ncep_temperature.grb  1.22s user 0.40s system 99% cpu 1.618 total
ram@luthien:~/local/data/cdo time cdo infov -timmean ncep_temperature.grb2                                                                                                                                                  [8:15:27|16-07-20]
cdo infon: Started child process "timmean ncep_temperature.grb2 (pipe1.1)".
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
     1 : 1981-07-01 21:00:00       0    10512       0 :      222.25      277.51      303.60 : var1          
cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 105.04s )
cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 105.04s )
cdo -C infov -timmean ncep_temperature.grb2  104.13s user 0.91s system 99% cpu 1:45.37 total
ram@luthien:~/local/data/cdo time cdo infov -timmean ncep_temperature.grb2                                                                                                                                                  [8:17:18|16-07-20]
cdo infon: Started child process "timmean ncep_temperature.grb2 (pipe1.1)".
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
     1 : 1981-07-01 21:00:00       0    10512       0 :      222.25      277.51      303.60 : var1          
cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 108.13s )
cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 108.13s )
cdo -C infov -timmean ncep_temperature.grb2  107.12s user 1.02s system 100% cpu 1:48.14 total

or at least grib1, grib2 seems to be horribly slow.

testing data can be found at ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis/surface

hth
ralf

    (1-1/1)