Project

General

Profile

most efficient operator ordering

Added by Don Murray about 3 years ago

Hi-

I have a website that uses CDO in the background to process data from user requests. As climate simulations increase in resolution and timesteps, it takes more time to process the data. In the past, I've tweaked the operator ordering just based on experience, but I'm wondering if I'm being efficient. I've had issues in the past running the commands using multiple threads, so I have to use the -L flag. Here's an example request from a 200 year monthly simulation that composites the data for a range of months and years over a given location:

cdo -L -s -O -sellonlatbox,-130.0,-100.0,30.0,50.0 -selname,pr -timselmean,4 -selyear,2001/2020 -selmon,6/9 pr_GFDL-SPEAR-MED_LENS_ens03.nc foo.nc

Any advice on whether the current ordering/chaining of commands is the most efficient would be appreciated.

Don Murray
NOAA/PSL


Replies (2)

RE: most efficient operator ordering - Added by Karin Meier-Fleischer about 3 years ago

Hi Don,

you have to read the operators from right to left which explains why I would use another order (without knowing your data it's a guess).
Assuming the input file contains multiple variables on a high resolution grid:

cdo -L -s -O -timselmean,4 -selmon,6/9 -selyear,2001/2020 -sellonlatbox,-130.0,-100.0,30.0,50.0 -selname,pr pr_GFDL-SPEAR-MED_LENS_ens03.nc foo.nc

-Karin

RE: most efficient operator ordering - Added by Don Murray almost 3 years ago

Hi Karin-

Thanks for the reply. My files are global climate simulations, with 1 variable per file. I have to do the selname only because some of the files have metadata variables that would cause errors on the region subset if I didn't do the selname. Files contain anywhere from 40 to 200 years of data, so my thinking was that selecting out time (selmon and selyear) would be more efficient for the latter operators, rather than selecting all 200 years for a parameter and then subsetting by time and region. And the use cases might be a composite over a short period of time (10-20 years) or a time series over the entire temporal domain. Grid domains vary from .25 degree to 2 degrees, generally. I do agree that the timselmean could probably be the last (leftmost) operatore.

I'll do some timing tests with your ordering.

Don

    (1-2/2)