operator collgrid speed
Added by Paolo Davini over 7 years ago
Hi all,
I am trying to use the operator collgrid to gather together LES model data run on several processors (i.e. 128 files resulting from a grid partition).
Given that the total size of the file is big (several GBs) the operation is very slow. On a full file, it could takes hours.
On a simplified/smaller files, it takes about 9sec. I have a NCL code that performs a similar job in about 1.5sec.
I would like to uniform the postprocessing routine and shifting all the code to CDO, but this bottleneck is a bit of a issue.
Therefore, I am wondering if there is any "best practice" known to improve the computational speed (file format, order of the files, loops on the variables). Anything that can provide a speed up.
I can use parallelization since it is a batch job with several cores, but as far as I understand it is not exploited directly by collgrid operator since it is a serial operation (like a cat or a merge).
Any suggestion is widely appreciated. I can provide you sample files (but they are 128) and the equivalent NCL script to comparison.
Many thanks!
Cheers
Paolo
Replies (7)
RE: operator collgrid speed - Added by Uwe Schulzweida over 7 years ago
Hi Paolo,
Your sample files would help us a lot to analyze the performance.
Cheers,
Uwe
RE: operator collgrid speed - Added by Paolo Davini over 7 years ago
Thanks Uwe,
actually files are pretty heavy, I am trying to create a smaller-size portable reproducible example.
I have a side questions in the meanwhile on collgrid.
How can I specify both nx and varname? I tried
cdo collgrid,8,varname filein*.nc fileout.nc cdo collgrid (Abort): Variable name 8 not found!
I guess it is a trivial error, but what is the correct syntax? They work if you use each of the alone...
Thanks
Paolo
RE: operator collgrid speed - Added by Uwe Schulzweida over 7 years ago
Thanks for the information, this is definitely a bug. The number of regions in x direction is only needed for non regular lon/lat grids. The bug will be fixed in the next release.
RE: operator collgrid speed - Added by Paolo Davini over 7 years ago
Good to know that, thanks! This is my case, I am using a generic grid (it is a LES grid).
Here another issue that I feel like a minor bug. I am running with CDO 1.8.0
Some of the variables produced by my simulations are 1D, i.e. the output is 1 value per timestep per processor (i.e. per file).
cdo griddes dycoms2_rf01_ttt2.ts.00000000.nc # # gridID 1 # gridtype = generic gridsize = 1 cdo griddes: Processed 67 variables ( 0.00s )
When I perform the collgrid something I get a warning:
cdo collgrid dycoms2_rf01_ttt2.nc cdo_file.ts.nc Warning (Collgrid) : Allocation of 0 bytes! [ line 487 file Collgrid.c ]
Then if I analyze the result from one of the variables, something strange has happened if I compare my cdofile to the file rebuilt with my NCL script (that I know it is fine!).
cdo info -selname,zi1_bar cdo_file.nc cdo info: Started child process "selname,zi1_bar prova.nc (pipe1.1)". -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter ID 1 : 2000-00-00 00:00:00 0 1 0 : 836.18 : -3 2 : 1999-11-30 00:02:00 0 1 0 : 836.18 : -3 3 : 1999-11-30 00:04:00 0 1 0 : 836.18 : -3 4 : 1999-11-30 00:06:00 0 1 0 : 836.18 : -3 5 : 1999-11-30 00:08:00 0 1 0 : 836.18 : -3 6 : 1999-11-30 00:10:00 0 1 0 : 836.18 : -3 7 : 1999-11-30 00:12:00 0 1 0 : 836.19 : -3 8 : 1999-11-30 00:14:00 0 1 0 : 837.28 : -3 9 : 1999-11-30 00:16:00 0 1 0 : 839.32 : -3 10 : 1999-11-30 00:18:00 0 1 0 : 840.39 : -3 11 : 1999-11-30 00:20:00 0 1 0 : 841.36 : -3 12 : 1999-11-30 00:22:00 0 1 0 : 842.11 : -3 13 : 1999-11-30 00:24:00 0 1 0 : 842.54 : -3 14 : 1999-11-30 00:26:00 0 1 0 : 842.72 : -3 15 : 1999-11-30 00:28:00 0 1 0 : 843.40 : -3 16 : 1999-11-30 00:30:00 0 1 0 : 843.85 : -3
where the original files reads:
cdo info -selname,zi1_bar dycoms2_rf01_ttt2.ts.nc cdo info: Started child process "selname,zi1_bar dycoms2_rf01_ttt2.ts.nc (pipe1.1)". -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter ID 1 : 2000-00-00 00:00:00 0 1 0 : 836.18 : -3 2 : 1999-11-30 00:02:00 0 1 0 : 836.18 : -3 3 : 1999-11-30 00:04:00 0 1 0 : 836.18 : -3 4 : 1999-11-30 00:06:00 0 1 0 : 836.18 : -3 5 : 1999-11-30 00:08:00 0 1 0 : 836.18 : -3 6 : 1999-11-30 00:10:00 0 1 0 : 836.18 : -3 7 : 1999-11-30 00:12:00 0 1 0 : 836.19 : -3 8 : 1999-11-30 00:14:00 0 1 0 : 837.23 : -3 9 : 1999-11-30 00:16:00 0 1 0 : 839.07 : -3 10 : 1999-11-30 00:18:00 0 1 0 : 840.22 : -3 11 : 1999-11-30 00:20:00 0 1 0 : 841.15 : -3 12 : 1999-11-30 00:22:00 0 1 0 : 842.02 : -3 13 : 1999-11-30 00:24:00 0 1 0 : 842.86 : -3 14 : 1999-11-30 00:26:00 0 1 0 : 843.27 : -3 15 : 1999-11-30 00:28:00 0 1 0 : 843.74 : -3 16 : 1999-11-30 00:30:00 0 1 0 : 843.99 : -3
Actually, if I merge the file and then do a vertmean command I can get the same as the NCL script.
I feel that the collgrid do some kind of area weighting that it should not be done.
I attached the test files and the NCL script (although I am not sure the latter will work on other machines)
Also, you can have a first overview of the speed-related problem I was mentioning at the beginning of this post:
time cdo collgrid dycoms2_rf01_ttt2.ts.000*.nc prova.nc Warning (Collgrid) : Allocation of 0 bytes! [ line 487 file Collgrid.c ] cdo collgrid: Processed 8576 values from 536 variables over 128 timesteps ( 0.08s ) *real 0m2.483s* user 0m0.057s sys 0m0.032s
while
time ncl reduce_dycoms2_rf01_ttt2.ncl Copyright (C) 1995-2015 - All Rights Reserved University Corporation for Atmospheric Research NCAR Command Language Version 6.3.0 The use of this software is governed by a License Agreement. See http://www.ncl.ucar.edu/ for more details. (0) cp dycoms2_rf01_ttt2.ts.00000000.nc dycoms2_rf01_ttt2.ts.nc (0) processing dycoms2_rf01_ttt2.ts.00000001.nc (0) processing dycoms2_rf01_ttt2.ts.00000002.nc (0) processing dycoms2_rf01_ttt2.ts.00000003.nc (0) processing dycoms2_rf01_ttt2.ts.00010000.nc (0) processing dycoms2_rf01_ttt2.ts.00010001.nc (0) processing dycoms2_rf01_ttt2.ts.00010002.nc (0) processing dycoms2_rf01_ttt2.ts.00010003.nc (0) final processing of ts files (0) xx 8, 0 (1) xx 8, 960 (2) xx 8, 1920 (3) xx 8, 2880 (4) xx 8, 3840 (5) xx 8, 4800 (6) xx 8, 5760 (7) xx 8, 6720 (8) xx 8, 7680 (9) xx 8, 8640 (10) xx 8, 9600 (11) xx 8, 10560 (12) xx 8, 11520 (13) xx 8, 12480 (14) xx 8, 13440 (15) xx 8, 14400 *real 0m0.450s* user 0m0.321s sys 0m0.112s
I hope my explanation was clear enough, please ask me if you need further details!
Many thanks for any help,
Cheers
Paolo
testfile_collgrid.tar.gz (10.2 KB) testfile_collgrid.tar.gz | file to merge | ||
reduce_dycoms2_rf01_ttt2.ncl (3.4 KB) reduce_dycoms2_rf01_ttt2.ncl | NCL script |
RE: operator collgrid speed - Added by Uwe Schulzweida over 7 years ago
Thanks for the testfiles and the ncl script!
Files with 1 value per timestep and without grid coordinates were not supported with collgrid. So what you see in the result is only a copy of the first input file. The next CDO release 1.8.1 will support it! The above bugs with collgrid are also solved. You will find the new release in the download area.
Unfortunately I can't reproduce the performance problem. On our systems CDO collgrid is 3x faster than the ncl script.
It seems that your ncl script is computing the mean over all input files. The corresponding cdo command is:
time cdo -O fldmean -collgrid 'dycoms2_rf01_ttt2.ts.0*.nc' resultAn equivalent command is:
time cdo -O ensmean 'dycoms2_rf01_ttt2.ts.0*.nc' resultCheers,
Uwe
RE: operator collgrid speed - Added by Paolo Davini over 7 years ago
Hi Uwe,
first of all, thanks a lot!
I installed and tested the 1.8.1 version and the collgrid operator is much more efficient.
I still have the feeling the NCL script is faster, especially with larger files real 3d files.
But now I am able to select which variable collect, and this allows me to use parallelization bumping up the performance.
I still have some concerns: I tried with a different set of files to see if this speed problem is only my issue. Perhaps we are getting far from the original problem, but I think it is worth to explore it.
These are vertical profiles evolving in time, so they can be thought as 2D vars f(z,t). We still have one value per core, so ensemble averaging - as you suggested - is definitely the best way to tackle the problem.
But here the difference in time is even more evident.
time cdo -O ensmean 'dycoms2_rf01_ttt2.ps.0*.nc' result.nc real 0m10.822s user 0m10.092s sys 0m0.434s
and
time ncl reduce_ps.ncl real 0m1.082s user 0m0.344s sys 0m0.718s
Did you think is there any way to speed up the process? Or I am doing something wrong?
Many thanks for any hints!
Cheers,
Paolo
reduce_ps.ncl (2.38 KB) reduce_ps.ncl | |||
testfile_collgrid.tar.gz (10.2 KB) testfile_collgrid.tar.gz |
RE: operator collgrid speed - Added by Ralf Mueller over 7 years ago
for large files, you could use our ftp server: ftp://ftp.zmaw.de/incoming/
username: anonymous
password: email
you might create a subdir for your files
hth
ralf