Recommended netcdf format for performance
Added by Oliver Angelil over 8 years ago
Hi there,
Is there a particular netcdf file format which optimises performance? For example, which of the following would cdo perform best with: netCDF4 , netCDF4-classic, netCDF3? Or would we expect no difference in performance.
Thanks,
Oliver
Replies (1)
RE: Recommended netcdf format for performance - Added by Ralf Mueller over 8 years ago
Performance is always hard to judge. I did a test with a 3.9GB nc4 classic (10000 gridpoints, 100000 thousand timesteps). With transparent compression (cdo -f nc4c -z zip) the filesize is 1.4GB.
ram@luthien:~/local/data/cdo cdo showformat ncep_temperature.nc4 NetCDF4 cdo showformat: Processed 1 variable ( 0.00s ) ram@luthien:~/local/data/cdo cdo showformat ncep_temperature.nc NetCDF4 classic cdo showformat: Processed 1 variable ( 0.00s ) ram@luthien:~/local/data/cdo cdo showformat ncep_temperature.nc3 NetCDF cdo showformat: Processed 1 variable ( 0.00s ) ram@luthien:~/local/data/cdo cdo showformat ncep_temperature.ncz NetCDF4 classic ZIP cdo showformat: Processed 1 variable ( 0.01s )
-rw-rw-r-- 1 ram users 1.4G Jul 20 07:46 ncep_temperature.ncz -rw-rw-r-- 1 ram users 3.9G Jul 20 07:49 ncep_temperature.nc3 -rw-rw-r-- 1 ram users 3.9G Jul 20 07:52 ncep_temperature.nc4 -rw-rw-r-- 1 ram users 3.9G Jun 9 2015 ncep_temperature.nc
No I ran
cdo -infov -timmeanon each of these files
ram@luthien:~/local/data/cdo for f (ncep_temperature.nc ncep_temperature.nc4 ncep_temperature.nc3 ncep_temperature.ncz) {echo $f; cdo -infov -timmean $f } [7:56:17|16-07-20] ncep_temperature.nc cdo infon: Started child process "timmean ncep_temperature.nc (pipe1.1)". -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name 1 : 1981-07-01 21:00:00 0 10512 0 : 222.25 277.51 303.60 : air cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 5.16s ) cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 5.16s ) ncep_temperature.nc4 cdo infon: Started child process "timmean ncep_temperature.nc4 (pipe1.1)". -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name 1 : 1981-07-01 21:00:00 0 10512 0 : 222.25 277.51 303.60 : air cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 5.17s ) cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 5.17s ) ncep_temperature.nc3 cdo infon: Started child process "timmean ncep_temperature.nc3 (pipe1.1)". -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name 1 : 1981-07-01 21:00:00 0 10512 0 : 222.25 277.51 303.60 : air cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 4.05s ) cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 4.05s ) ncep_temperature.ncz cdo infon: Started child process "timmean ncep_temperature.ncz (pipe1.1)". -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name 1 : 1981-07-01 21:00:00 0 10512 0 : 222.25 277.51 303.60 : air cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 24.36s ) cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 24.36s )and again
ram@luthien:~/local/data/cdo for f (ncep_temperature.nc ncep_temperature.nc4 ncep_temperature.nc3 ncep_temperature.ncz) {echo $f; cdo -infov -timmean $f } [7:57:26|16-07-20] ncep_temperature.nc cdo infon: Started child process "timmean ncep_temperature.nc (pipe1.1)". -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name 1 : 1981-07-01 21:00:00 0 10512 0 : 222.25 277.51 303.60 : air cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 5.22s ) cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 5.22s ) ncep_temperature.nc4 cdo infon: Started child process "timmean ncep_temperature.nc4 (pipe1.1)". -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name 1 : 1981-07-01 21:00:00 0 10512 0 : 222.25 277.51 303.60 : air cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 5.05s ) cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 5.05s ) ncep_temperature.nc3 cdo infon: Started child process "timmean ncep_temperature.nc3 (pipe1.1)". -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name 1 : 1981-07-01 21:00:00 0 10512 0 : 222.25 277.51 303.60 : air cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 4.05s ) cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 4.05s ) ncep_temperature.ncz cdo infon: Started child process "timmean ncep_temperature.ncz (pipe1.1)". -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name 1 : 1981-07-01 21:00:00 0 10512 0 : 222.25 277.51 303.60 : air cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 24.06s ) cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 24.06s )
Finally: netcdf4 seems to be a bit slower that netcdf3, compression seems to be useful only if saving disk space has top priority or your program runs so long or parallel, that you don't care and you get less file size as a benefit.
Of course these are the number from my machine and my OS.
If you can live with lossy compression, grib might be an option:
ram@luthien:~/local/data/cdo cdo infov -timmean ncep_temperature.grb [8:15:12|16-07-20] cdo infon: Started child process "timmean ncep_temperature.grb (pipe1.1)". -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name 1 : 1981-07-01 21:00:00 0 10512 0 : 222.25 277.51 303.60 : var1 cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 1.58s ) cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 1.58s ) ram@luthien:~/local/data/cdo time cdo infov -timmean ncep_temperature.grb [8:15:21|16-07-20] cdo infon: Started child process "timmean ncep_temperature.grb (pipe1.1)". -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name 1 : 1981-07-01 21:00:00 0 10512 0 : 222.25 277.51 303.60 : var1 cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 1.60s ) cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 1.60s ) cdo -C infov -timmean ncep_temperature.grb 1.22s user 0.40s system 99% cpu 1.618 total ram@luthien:~/local/data/cdo time cdo infov -timmean ncep_temperature.grb2 [8:15:27|16-07-20] cdo infon: Started child process "timmean ncep_temperature.grb2 (pipe1.1)". -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name 1 : 1981-07-01 21:00:00 0 10512 0 : 222.25 277.51 303.60 : var1 cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 105.04s ) cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 105.04s ) cdo -C infov -timmean ncep_temperature.grb2 104.13s user 0.91s system 99% cpu 1:45.37 total ram@luthien:~/local/data/cdo time cdo infov -timmean ncep_temperature.grb2 [8:17:18|16-07-20] cdo infon: Started child process "timmean ncep_temperature.grb2 (pipe1.1)". -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name 1 : 1981-07-01 21:00:00 0 10512 0 : 222.25 277.51 303.60 : var1 cdo(2) timmean: Processed 1028998656 values from 1 variable over 97888 timesteps ( 108.13s ) cdo infon: Processed 10512 values from 1 variable over 1 timestep ( 108.13s ) cdo -C infov -timmean ncep_temperature.grb2 107.12s user 1.02s system 100% cpu 1:48.14 total
or at least grib1, grib2 seems to be horribly slow.
testing data can be found at ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis/surface
hth
ralf