Project

General

Profile

CDO splityear vs NCO ncks

Added by Kristy Dahl almost 6 years ago

Hello,

I am relatively new to CDO and NCO and have learned a lot from these forums.

I am currently working with netCDF4 files that contain five years of daily data and am using cdo timsum to calculate the number of days above a specified temperature threshold.

When I tested the timsum command (cdo -f nc -timsum -gec,105 infile.nc outfile.nc) on a netcdf4 file containing just one year of data it took about 5 minutes. When I tested the same command on the file containing five years of data, it took a full hour. Thinking that maybe I should break the five year data into one year chunks, I tried two things:

1. Use cdo splityear to create 5 one year files (this took about an hour)
2. Use ncks to hyperslab along the time dimension (this took on the order of one minute)

My best guess is that the difference in processing time between the cdo splityear and ncks is that the output format is different. When I do an ncdump -k on the different output files, cdo splityear is generating 64-bit offset files while ncks is generating netCDF4.

This has major time implications for the next step of using cdo timsum. When running timsum on the cdo splityear output (64-bit offset), the time is trivial (~ 3 seconds for each one year file). When running the same command on the ncks output (netCDF4), it takes 12 minutes.

In other words, if I use the fast option (ncks) for breaking into one year chunks, cdo timsum is very slow in processing the output. If, on the other hand, I use the slow method to split the years (cdo splityear), cdo timsum is fast in processing the output.

Why the time difference in these different processes? And is there any way to avoid having one of these steps be so slow?

For reference, here is the output of cdo -V:
Climate Data Operators version 1.9.2 (http://mpimet.mpg.de/cdo)
Compiled: by kristinadahl on kristinas-imac.lan (x86_64-apple-darwin14.5.0) Nov 19 2018 11:57:11
CXX Compiler: clang++ -std=gnu++11 -g -O2 -D_THREAD_SAFE -pthread
CXX version : Apple LLVM version 7.0.2 (clang-700.1.81)
C Compiler: clang -g -O2 -D_THREAD_SAFE -pthread
C version : Apple LLVM version 7.0.2 (clang-700.1.81)
F77 Compiler: gfortran -g -O2
F77 version : GNU Fortran (Homebrew GCC 8.2.0) 8.2.0
Features: 32GB Fortran DATA PTHREADS HDF5 NC4/HDF5 OPeNDAP SZ AVX2
Libraries: HDF5/1.10.4
Filetypes: srv ext ieg grb1 grb2 nc1 nc2 nc4 nc4c nc5
CDI library version : 1.9.2 of Nov 19 2018 11:56:47
CGRIBEX library version : 1.9.0 of Sep 29 2017 10:16:02
GRIB_API library version : 1.27.0
NetCDF library version : 4.6.1 of Nov 19 2018 11:53:52 $
HDF5 library version : 1.10.4
SERVICE library version : 1.4.0 of Nov 19 2018 11:56:44
EXTRA library version : 1.4.0 of Nov 19 2018 11:56:43
IEG library version : 1.4.0 of Nov 19 2018 11:56:44
FILE library version : 1.8.3 of Nov 19 2018 11:56:43

I am running on a Mac OSX.

Thanks in advance for any insights!


Replies (8)

RE: CDO splityear vs NCO ncks - Added by Ralf Mueller almost 6 years ago

hi!

the times (5min, 1hours) does not seem to be reasonable to me. timesum and gec are rather simple operations not taking too much memory. If these 5mins are due to IO, it means 300sec*80MB/sec=24GB. I this your filesize for a single year?

cheers
ralf

RE: CDO splityear vs NCO ncks - Added by Kristy Dahl almost 6 years ago

Thanks for your reply, Ralf.

The file size for a single year is about 1 GB if I produce the file with ncks and about 2 GB if I produce it with cdo splityear. The filesize for the full five year file is about 5 GB.

Kristy

RE: CDO splityear vs NCO ncks - Added by Ralf Mueller almost 6 years ago

processing 5GB in 1 hour sounds crazy to me - this is an IO equivalent of 16mb/sec. you seem to have 32GB RAM. so you could move the input to /tmp and rerun it. this should let the IO-cost drop to zero.

RE: CDO splityear vs NCO ncks - Added by Kristy Dahl almost 6 years ago

I moved the 1 GB nc file created with ncks to /tmp and reran timsum, but it still takes more than 10 minutes.

As you say, and as evidenced by the short processing time (3-4 seconds) for timsum when I run it on a one year file created with cdo splityear, these long processing times are crazy.

So maybe my question needs to be whether there’s some reason why cdo splityear is so slow compared to ncks and why the output gets processed so differently depending on how the one year files were created.

RE: CDO splityear vs NCO ncks - Added by Ralf Mueller almost 6 years ago

you could try your cdo calls on another machine or you upload the data on the ftp server and I give it a try.

RE: CDO splityear vs NCO ncks - Added by Kristy Dahl almost 6 years ago

Thanks for the suggestions, Ralf. I'll try to make some progress today, and if I'm still running into problems I'll let you know and upload the data to the ftp server.

And yes, that was me on github asking about bindings. I'm trying to get up to speed quickly for a project I had to take on unexpectedly, so I'm learning on many fronts. I've resolved the need for bindings by just calling .sh scripts from python scripts, which seems to be working just fine. (The issues I've described on this thread occur even when running on the command line, so are unrelated to the bindings, I think.)

RE: CDO splityear vs NCO ncks - Added by Ralf Mueller almost 6 years ago

ok, thx Kristy - I didn't meant to put extra pressure on you. just saw the chance for some progress in that github ticket. If you are ok with it i will close it then.

best wishes

    (1-8/8)