timesel*** vs nco
Added by Antonio Rodriges about 8 years ago
Hello,
I am interested on what optimizations does CDO undertake to perform 9x faster than NCO
cdo timselavg,1460 uwnd.10m.gauss.1979.nc u1979.nc
// shape is 1460 x 94 x 192
takes 0.5 sec
while
ncwa -D 4 -O -x -v time --thr_nbr=1 -a time uwnd.10m.gauss.1979.nc n1979.nc
takes 4.48 sec
Disclaimer: I not the developer of NCO neither CDO, I use both of them for my personal projects
Thanks
Replies (5)
RE: timesel*** vs nco - Added by Antonio Rodriges about 8 years ago
P.S.
I just noticed that CDO produces short values
short uwnd(time,level,lat,lon) ;
uwnd:standard_name = "eastward_wind" ;
uwnd:long_name = "6-Hourly Forecast of U-wind at 10 m" ;
uwnd:units = "m/s" ;
uwnd:grid_type = "gaussian" ;
uwnd:add_offset = 207.65f ;
uwnd:scale_factor = 0.01f ;
while NCO float
float uwnd(level,lat,lon) ;
uwnd:long_name = "6-Hourly Forecast of U-wind at 10 m" ;
uwnd:valid_range = -32765s, -8765s ;
uwnd:unpacked_valid_range = -120.f, 120.f ;
uwnd:actual_range = -38.15f, 46.84f ;
uwnd:units = "m/s" ;
additional question: does CDO takes average on packed (short) values? If yes, what about the precision?
Thanks again
RE: timesel*** vs nco - Added by Uwe Schulzweida about 8 years ago
All CDO operators uses as less memory as possible. And this is the reason why the time mean in CDO is much faster. It seems that NCO stores all timesteps of the array in memory. CDO is reading the array timestep by timestep and accumulates them instantaneously.
Memory requirement:
CDO: 2 x 94 x 192 x 8 = 288768 byte
NCO: 1461 x 94 x 192 x 8 = 210945024 byte
The format and datatype of the CDO output is derived from the input. That means short values on input produces short output values. Use the option "-b F32" to write 32bit floats. Internally all operations are 64bit floats. You can replace "timselavg,1460" by "timavg":
cdo -b F32 timavg uwnd.10m.gauss.1979.nc u1979.nc
Cheers,
Uwe
RE: timesel*** vs nco - Added by Antonio Rodriges about 8 years ago
Thanks!
Really interesting insights!
Performance difference is really impressive!
Does CDO exploits some SSE or OpenMP optimizations in this case (e.g. averaging)?
Also, is it possible to specify a dimension index like in NCO instead of explicitly "time", "lon", etc.?
Thanks
RE: timesel*** vs nco - Added by Uwe Schulzweida about 8 years ago
No, it's not possible to specify a dimension index. That's an other difference to NCO, CDO has very specific operators for one task:
timavg: for time avarage zonavg: for zonal avarage fldavg: for field avarageThe loops are very simple, so automatic SIMD vectorization shouldn't be a problem. Here is the loop for the accumulation:
double *restrict array1; const double *restrict array2; for ( int i = 0; i < len; i++ ) array1[i] += array2[i];To gain something out of OpenMP parallelization seems the be very difficult for the time average. I have spent a lot of time to improve the performance of this operation with OpenMP without any success. The main reason is that this task is highly I/O bound and can't be parallelized with serial access to the file.
RE: timesel*** vs nco - Added by Antonio Rodriges about 8 years ago
Thanks for information! Very interesting!