Forums » Operator News »
CDO on speed
Added by Ralf Mueller over 6 years ago
hi CDO-ninjas!
This is a little bit off topic, because it's not directly about CDO operators but more on how to speed things up in general.
Despite choosing the right algorithm and its computational efficiency it's very often the IO-part, that make things slow. But how to cope with that strongly depends on
- how IO-intensive your workflow is
- what are the most costly parts in terms of computations
- the total number of input files
- the layout of your input files:
- number of variables
- number of timesteps
- number of grids
- gridsize
Here is a list of recommendations, that might help in one or the other occasion. Depending on how you make use of them, they can put heavy (and I mean heavy) load on any machine. Login-nodes or any other resource you share with other users are not the right place to run this in production. If you still do, your soon-to-be-not-so-favourite admin will give you a call. - Yes, I speak from personal experience. you have been warned
- Use
-P <thread-count>
for everything related to horizontal interpolation and EOFs. Choose a number that fits the number of threads possibly running in parallel on your machine. It does not harm, if you choose to many - CDO will take the maximum number then. But do not run such calls in parallel - this will result in a slow-down, because the threads will interfere with each other. It works forremap*
andgen*
operators in the same way. This is a list of all operators that support OpenMP parallelization. - Use process-based parallelization if possible: In case your workflow has a chunk of CDO calls to be executed on a rather large number of input files, you can parallelize this with a tool like GNU parallel. It read text input from file or stdout and executes each line of it in parallel with a given number (
-j <N>
, likemake
) of processes. So the recipe is to put your chunk of calls on a single line (separated with';'
), loop over the input files and pipe this intoparallel
:for file in *.nc; do echo "<long list of CDO commands and all the other tools might need to run on ${file}>"; done | parallel -v -j 12
This technique can be used for all stuff you can do on the command line, e.g. generating 1000 plots for a movie. - Scripting languages like python or ruby offer very similar functionality as part of their standard libraries:
multiprocessing
for python,parallel
for ruby. The following is an example extracted from the DYAMOND Hackathon at MPI:import glob from multiprocessing import Pool def cdozonmean(infile): print('processing '+infile) ofile =cdo.zonmean(input=infile) return ofile; ntasks = 4 nicam_path = '/work/ka1081/DYAMOND/NICAM-7km/' files = sorted([s for s in glob.glob(nicam_path+'*/sa_tppn.nc')])[0:4] print(files) pool = Pool(ntasks) results = dict() for file in files: print (file) ofile = pool.apply_async(cdozonmean,(file,)) results[file] = ofile pool.close() pool.join() # retrieve results, keeping the order of the input files for output files and cat to vfile for k,v in results.items(): results[k] = v.get() cdo.cat(input = ' '.join([results[x] for x in files]),output = wrk_dir+'test.nc')
- In order to reduce IO for each CDO call, it might be useful to split input files along different dimensions (variables, timesteps, grids,...) and loop over them with the tools mentioned above. CDO offers a long list of operators for that purpose, please check it with
cdo -h split
Always make sure IO is as cheap as possible, but not cheaper. - Write intermediate output to fast IO buffers like
/tmp
or better/dev/shm
. Both directories are usually mapped into RAM, so IO is neglectable. But space is very limited there. Keep track of what you do and (re)move things as soon as possible.
Replies (2)
RE: CDO on speed - Added by Ralf Mueller over 2 years ago
thx - it's a bit off topic. but the potential benefit is worth it, I hope.