Optimal operator order?
Added by Brendan DeTracey almost 3 years ago
Hi. Me again...
In the following command line what is the optimal order of cdo commmands?
cdo -O -a -f nc4 -z zip --cmor \ -intlevel,50 \ -select,startdate="$seldate_start_arg",enddate="$seldate_end_arg",levidx=18,19 \ -mergetime "$file_glob" "$file_out"
Would the following be more optimal?
cdo -O -a -f nc4 -z zip --cmor \ -intlevel,50 \ -mergetime "$file_glob" "$file_out" -apply,-select,startdate="$seldate_start_arg",enddate="$seldate_end_arg",levidx=18,19 [ "$file_glob" ] \ "$file_out"
Replies (9)
RE: Optimal operator order? - Added by Ralf Mueller almost 3 years ago
hey, Brendan! Happy to hear from you ;-)
IMO it's a good rule of thumb to select the data before doing anything else. So you idea with the second version seems good: use apply to select things before merge.
But I think syntactically it does not work, because the mergetime
should get the output of the apply
chain an input and the file_out
should only occur at the very end of the call.
Can you upload two input files? I would love to test the combination of merge
, apply
and select
cheers
ralf
RE: Optimal operator order? - Added by Brendan DeTracey almost 3 years ago
Hi ralf!
The files are much too big to upload, but here are some download links for your pleasure! Each file is ~8.6GB:
http://esg1.umr-cnrm.fr/thredds/fileServer/CMIP6_CNRM/CMIP/CNRM-CERFACS/CNRM-CM6-1-HR/historical/r1i1p1f2/Omon/thetao/gn/v20191021/thetao_Omon_CNRM-CM6-1-HR_historical_r1i1p1f2_gn_199501-199912.nc
http://esg1.umr-cnrm.fr/thredds/fileServer/CMIP6_CNRM/CMIP/CNRM-CERFACS/CNRM-CM6-1-HR/historical/r1i1p1f2/Omon/thetao/gn/v20191021/thetao_Omon_CNRM-CM6-1-HR_historical_r1i1p1f2_gn_200001-200412.nc
http://esg1.umr-cnrm.fr/thredds/fileServer/CMIP6_CNRM/CMIP/CNRM-CERFACS/CNRM-CM6-1-HR/historical/r1i1p1f2/Omon/thetao/gn/v20191021/thetao_Omon_CNRM-CM6-1-HR_historical_r1i1p1f2_gn_200501-200912.nc
http://esg1.umr-cnrm.fr/thredds/fileServer/CMIP6_CNRM/CMIP/CNRM-CERFACS/CNRM-CM6-1-HR/historical/r1i1p1f2/Omon/thetao/gn/v20191021/thetao_Omon_CNRM-CM6-1-HR_historical_r1i1p1f2_gn_201001-201412.nc
The following link shows the status of all ESGF servers. If the above download links do not work, it may be because esg1.umr-cnrm.fr is temporarily down: https://esgf-node.llnl.gov/status/
RE: Optimal operator order? - Added by Ralf Mueller almost 3 years ago
hi again!
thx for the link. I downloaded 2 of the files. they are compressed netcdf4, which is very bad for processing (constant decompression of data/coordinates). so before doing anything else I decompress them.
2nd thing is mergetime
: I thing you don't need that, because the shell wildcard together with the filename conventions gives the correct temporal order. extra scanning an re-ordering of timesteps does not seem to be needed IMO.cat
should be faster.
My attempt to decompress a file with CDO takes ages (after 10min I have 5%). Have to check this in more detail first
RE: Optimal operator order? - Added by Ralf Mueller almost 3 years ago
With some help of Uwe I can say more: Not only are the data-variables compressed, but also they are saved with the largest possible chunksize (1 single chunk for everything). Similar to the intlevel ticket (#10617) the data is in the worst shape ever to be analyzed with CDO.
My recommendation is: uncompress the data and set a reasonable chunk-size (= horizontal gridsize). this can be done with
nccopy -d 0 -c lev/1 thetao_Omon_CNRM-CM6-1-HR_historical_r1i1p1f2_gn_199501-199912.nc \ thetao_Omon_CNRM-CM6-1-HR_historical_r1i1p1f2_gn_199501-199912-noZ.nc
The individual file size raises from 8G to 26G, so there is an IO penalty for it, but usually the time spend in decompression is far worse. Here are some test I did
$ cdo -a -f nc --cmor -intlevel,50 \ -select,levidx=18,19 \ -cat 'thetao_Omon_CNRM-CM6-1-HR_historical_r1i1p1f2_gn_*-noZ.nc' tttt.nc cdo(1) select: Process started cdo(2) cat: Process started cdo(2) cat: Processed 13626900000 values from 2 variables over 120 timesteps. cdo(1) select: Processed 363384000 values from 1 variable over 120 timesteps. cdo intlevel: Processed 363384000 values from 1 variable over 120 timesteps [64.95s 375MB].
This should be close to what you initially posted: cmor output, first the concatenation, thenselect
, thenintlevel
. no compression$ cdo -a -f nc --cmor -intlevel,50 \ -select,startdate=1999-06-01,enddate=2010-06-01,levidx=18,19 \ -cat 'thetao_Omon_CNRM-CM6-1-HR_historical_r1i1p1f2_gn_*-noZ.nc' tttt-sel.nc cdo(1) select: Process started cdo(2) cat: Process started cdo(2) cat: Processed 13626900000 values from 2 variables over 120 timesteps. cdo(1) select: Processed 36338400 values from 1 variable over 66 timesteps. cdo intlevel: Processed 36338400 values from 1 variable over 12 timesteps [47.46s 375MB].
This time I included the date selection to check the reduction in processing time wrt. the reduction in data reduction byselect
#$ cdo -a -f nc --cmor -intlevel,50 \ -select,startdate=1999-06-01,enddate=2010-06-01,levidx=18,19 \ 'thetao_Omon_CNRM-CM6-1-HR_historical_r1i1p1f2_gn_*-noZ.nc' tttt-sel-noCat.nc cdo(1) select: Process started cdo(1) select: Processed 36338400 values from 2 variables over 66 timesteps. cdo intlevel: Processed 36338400 values from 1 variable over 12 timesteps [9.34s 369MB].
Finally a version withoutmerge
orcat
becauseselect
already is a collective operation on all inputs. This version seems to be reasonable, because the time coordinate is shared by all input files (a timestep only occurs in a single file).
At the moment the select
cannot be used with apply
because select
accepts an arbitrary number of input files. But maybe the version with select
only does the job for you.
One final point about compressed netcdf: IMO the only useful application of this is archiving (as a transparent way of saving space). Any kind of analysis on such data should be done on uncompressed input.
hth
ra;lf
RE: Optimal operator order? - Added by Brendan DeTracey almost 3 years ago
Thanks ralf! It looks like the correct answer to my problem is to uncompress these large files that were sub-optimally chunked by their creators. I wonder if I have enough disk space...
RE: Optimal operator order? - Added by Brendan DeTracey over 2 years ago
I am discovering that most of the CMIP6 ocean model datasets are chunked in this way. Blech...
RE: Optimal operator order? - Added by Ralf Mueller over 2 years ago
hi Brendan!
Unfortunately data suitable for archiving is in most cases not suitable for data analysis.
Funny: Blech is a german word, too. But I guess, thats not what you had in mind, right?
RE: Optimal operator order? - Added by Brendan DeTracey over 2 years ago
"Blech" for me comes from comic strips. "Blech" is the noise a comic book character might make when expressing disgust with a situation, perhaps with tongue stuck out like they ate something that tasted terrible.
RE: Optimal operator order? - Added by Ralf Mueller over 2 years ago
I thought so ;-) - in german it means steel sheet or plate