operator collgrid speed

Hi all,

I am trying to use the operator collgrid to gather together LES model data run on several processors (i.e. 128 files resulting from a grid partition).
Given that the total size of the file is big (several GBs) the operation is very slow. On a full file, it could takes hours.
On a simplified/smaller files, it takes about 9sec. I have a NCL code that performs a similar job in about 1.5sec.
I would like to uniform the postprocessing routine and shifting all the code to CDO, but this bottleneck is a bit of a issue.

Therefore, I am wondering if there is any "best practice" known to improve the computational speed (file format, order of the files, loops on the variables). Anything that can provide a speed up.
I can use parallelization since it is a batch job with several cores, but as far as I understand it is not exploited directly by collgrid operator since it is a serial operation (like a cat or a merge).

Any suggestion is widely appreciated. I can provide you sample files (but they are 128) and the equivalent NCL script to comparison.
Many thanks!
Cheers
Paolo

Replies (7)

RE: operator collgrid speed - Added by Uwe Schulzweida over 8 years ago

Hi Paolo,

Your sample files would help us a lot to analyze the performance.

Cheers,
Uwe

RE: operator collgrid speed - Added by Paolo Davini over 8 years ago

Thanks Uwe,

actually files are pretty heavy, I am trying to create a smaller-size portable reproducible example.
I have a side questions in the meanwhile on collgrid.

How can I specify both nx and varname? I tried

cdo collgrid,8,varname filein*.nc fileout.nc
cdo collgrid (Abort): Variable name 8 not found!

I guess it is a trivial error, but what is the correct syntax? They work if you use each of the alone...
Thanks
Paolo

RE: operator collgrid speed - Added by Uwe Schulzweida over 8 years ago

Thanks for the information, this is definitely a bug. The number of regions in x direction is only needed for non regular lon/lat grids. The bug will be fixed in the next release.

RE: operator collgrid speed - Added by Paolo Davini over 8 years ago

Good to know that, thanks! This is my case, I am using a generic grid (it is a LES grid).

Here another issue that I feel like a minor bug. I am running with CDO 1.8.0
Some of the variables produced by my simulations are 1D, i.e. the output is 1 value per timestep per processor (i.e. per file).

cdo griddes dycoms2_rf01_ttt2.ts.00000000.nc

#
# gridID 1
#
gridtype  = generic
gridsize  = 1
cdo griddes: Processed 67 variables ( 0.00s )

When I perform the collgrid something I get a warning:

cdo collgrid dycoms2_rf01_ttt2.nc cdo_file.ts.nc
Warning (Collgrid) : Allocation of 0 bytes! [ line 487 file Collgrid.c ]

Then if I analyze the result from one of the variables, something strange has happened if I compare my cdofile to the file rebuilt with my NCL script (that I know it is fine!).

cdo info -selname,zi1_bar cdo_file.nc 
cdo info: Started child process "selname,zi1_bar prova.nc (pipe1.1)".
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter ID
     1 : 2000-00-00 00:00:00       0        1       0 :                  836.18             : -3            
     2 : 1999-11-30 00:02:00       0        1       0 :                  836.18             : -3            
     3 : 1999-11-30 00:04:00       0        1       0 :                  836.18             : -3            
     4 : 1999-11-30 00:06:00       0        1       0 :                  836.18             : -3            
     5 : 1999-11-30 00:08:00       0        1       0 :                  836.18             : -3            
     6 : 1999-11-30 00:10:00       0        1       0 :                  836.18             : -3            
     7 : 1999-11-30 00:12:00       0        1       0 :                  836.19             : -3            
     8 : 1999-11-30 00:14:00       0        1       0 :                  837.28             : -3            
     9 : 1999-11-30 00:16:00       0        1       0 :                  839.32             : -3            
    10 : 1999-11-30 00:18:00       0        1       0 :                  840.39             : -3            
    11 : 1999-11-30 00:20:00       0        1       0 :                  841.36             : -3            
    12 : 1999-11-30 00:22:00       0        1       0 :                  842.11             : -3            
    13 : 1999-11-30 00:24:00       0        1       0 :                  842.54             : -3            
    14 : 1999-11-30 00:26:00       0        1       0 :                  842.72             : -3            
    15 : 1999-11-30 00:28:00       0        1       0 :                  843.40             : -3            
    16 : 1999-11-30 00:30:00       0        1       0 :                  843.85             : -3

where the original files reads:

cdo info -selname,zi1_bar dycoms2_rf01_ttt2.ts.nc 
cdo info: Started child process "selname,zi1_bar dycoms2_rf01_ttt2.ts.nc (pipe1.1)".
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter ID
     1 : 2000-00-00 00:00:00       0        1       0 :                  836.18             : -3            
     2 : 1999-11-30 00:02:00       0        1       0 :                  836.18             : -3            
     3 : 1999-11-30 00:04:00       0        1       0 :                  836.18             : -3            
     4 : 1999-11-30 00:06:00       0        1       0 :                  836.18             : -3            
     5 : 1999-11-30 00:08:00       0        1       0 :                  836.18             : -3            
     6 : 1999-11-30 00:10:00       0        1       0 :                  836.18             : -3            
     7 : 1999-11-30 00:12:00       0        1       0 :                  836.19             : -3            
     8 : 1999-11-30 00:14:00       0        1       0 :                  837.23             : -3            
     9 : 1999-11-30 00:16:00       0        1       0 :                  839.07             : -3            
    10 : 1999-11-30 00:18:00       0        1       0 :                  840.22             : -3            
    11 : 1999-11-30 00:20:00       0        1       0 :                  841.15             : -3            
    12 : 1999-11-30 00:22:00       0        1       0 :                  842.02             : -3            
    13 : 1999-11-30 00:24:00       0        1       0 :                  842.86             : -3            
    14 : 1999-11-30 00:26:00       0        1       0 :                  843.27             : -3            
    15 : 1999-11-30 00:28:00       0        1       0 :                  843.74             : -3            
    16 : 1999-11-30 00:30:00       0        1       0 :                  843.99             : -3

Actually, if I merge the file and then do a vertmean command I can get the same as the NCL script.
I feel that the collgrid do some kind of area weighting that it should not be done.
I attached the test files and the NCL script (although I am not sure the latter will work on other machines)

Also, you can have a first overview of the speed-related problem I was mentioning at the beginning of this post:

time cdo collgrid dycoms2_rf01_ttt2.ts.000*.nc prova.nc
Warning (Collgrid) : Allocation of 0 bytes! [ line 487 file Collgrid.c ]
cdo collgrid: Processed 8576 values from 536 variables over 128 timesteps ( 0.08s )

*real    0m2.483s*
user    0m0.057s
sys    0m0.032s

while

time ncl reduce_dycoms2_rf01_ttt2.ncl 
 Copyright (C) 1995-2015 - All Rights Reserved
 University Corporation for Atmospheric Research
 NCAR Command Language Version 6.3.0
 The use of this software is governed by a License Agreement.
 See http://www.ncl.ucar.edu/ for more details.
(0)    cp dycoms2_rf01_ttt2.ts.00000000.nc dycoms2_rf01_ttt2.ts.nc
(0)    processing dycoms2_rf01_ttt2.ts.00000001.nc
(0)    processing dycoms2_rf01_ttt2.ts.00000002.nc
(0)    processing dycoms2_rf01_ttt2.ts.00000003.nc
(0)    processing dycoms2_rf01_ttt2.ts.00010000.nc
(0)    processing dycoms2_rf01_ttt2.ts.00010001.nc
(0)    processing dycoms2_rf01_ttt2.ts.00010002.nc
(0)    processing dycoms2_rf01_ttt2.ts.00010003.nc
(0)    final processing of ts files
(0)    xx 8, 0
(1)    xx 8, 960
(2)    xx 8, 1920
(3)    xx 8, 2880
(4)    xx 8, 3840
(5)    xx 8, 4800
(6)    xx 8, 5760
(7)    xx 8, 6720
(8)    xx 8, 7680
(9)    xx 8, 8640
(10)    xx 8, 9600
(11)    xx 8, 10560
(12)    xx 8, 11520
(13)    xx 8, 12480
(14)    xx 8, 13440
(15)    xx 8, 14400

*real    0m0.450s*
user    0m0.321s
sys    0m0.112s

I hope my explanation was clear enough, please ask me if you need further details!
Many thanks for any help,
Cheers
Paolo

Download all files

testfile_collgrid.tar.gz (10.2 KB) testfile_collgrid.tar.gz	file to merge
reduce_dycoms2_rf01_ttt2.ncl (3.4 KB) reduce_dycoms2_rf01_ttt2.ncl	NCL script

RE: operator collgrid speed - Added by Uwe Schulzweida over 8 years ago

Thanks for the testfiles and the ncl script!
Files with 1 value per timestep and without grid coordinates were not supported with collgrid. So what you see in the result is only a copy of the first input file. The next CDO release 1.8.1 will support it! The above bugs with collgrid are also solved. You will find the new release in the download area.
Unfortunately I can't reproduce the performance problem. On our systems CDO collgrid is 3x faster than the ncl script.
It seems that your ncl script is computing the mean over all input files. The corresponding cdo command is:

time cdo -O fldmean -collgrid 'dycoms2_rf01_ttt2.ts.0*.nc' result

An equivalent command is:

time cdo -O ensmean 'dycoms2_rf01_ttt2.ts.0*.nc' result

Cheers,
Uwe

RE: operator collgrid speed - Added by Paolo Davini over 8 years ago

Hi Uwe,

first of all, thanks a lot!
I installed and tested the 1.8.1 version and the collgrid operator is much more efficient.
I still have the feeling the NCL script is faster, especially with larger files real 3d files.
But now I am able to select which variable collect, and this allows me to use parallelization bumping up the performance.

I still have some concerns: I tried with a different set of files to see if this speed problem is only my issue. Perhaps we are getting far from the original problem, but I think it is worth to explore it.
These are vertical profiles evolving in time, so they can be thought as 2D vars f(z,t). We still have one value per core, so ensemble averaging - as you suggested - is definitely the best way to tackle the problem.

But here the difference in time is even more evident.

time cdo -O ensmean 'dycoms2_rf01_ttt2.ps.0*.nc' result.nc
real    0m10.822s
user    0m10.092s
sys    0m0.434s

and

time ncl reduce_ps.ncl 
real    0m1.082s
user    0m0.344s
sys    0m0.718s

Did you think is there any way to speed up the process? Or I am doing something wrong?

Many thanks for any hints!
Cheers,
Paolo

Download all files

reduce_ps.ncl (2.38 KB) reduce_ps.ncl
testfile_collgrid.tar.gz (10.2 KB) testfile_collgrid.tar.gz

RE: operator collgrid speed - Added by Ralf Mueller over 8 years ago

for large files, you could use our ftp server: ftp://ftp.zmaw.de/incoming/

username: anonymous
password: email

you might create a subdir for your files

hth
ralf

(1-7/7)

Project

General

Profile

CDO