Forums » Operator News »
CDO goes quantum: introduction of the Bra-Ket
Added by Ralf Mueller about 6 years ago
This post is not about a specific operator or module but about a subtle little extension of the chaining of operators on the command line. But some background first ...
The operators built into CDO can be seen as separate functions, where the output of the called function is the input if the calling function. For simple operations this might look like this
cdo -copy -fldmean ifile ofile |
ofile = copy(fldmean(ifile)) |
cdo -div -mul afile -selname,var bfile maskfile ofile |
ofile = div(mul(afile,selname(var,bfile)),maskfile) |
What we use on the command line is the so-called Polish-Notation, that can be written in a parenthesis-free way iff the number of inputs and outputs is fixed for each function.
Consequently there is the need for a special handling for operators with an arbitrary number of inputs or outputs. For inputs these operators are
after | enspctl | gather | outputext | selall |
afterburner | ensrange | graph | outputf | select |
cat | ensrkhist_space | info | outputfld | sinfo |
collgrid | ensrkhist_time | infoc | outputint | sinfoc |
copy | ensrkhistspace | infon | outputkey | sinfon |
delete | ensrkhisttime | infop | outputsrv | sinfop |
ensavg | ensroc | infos | outputtab | sinfov |
ensbrs | ensskew | infov | outputts | sorttaxis |
enscrps | ensstd | map | outputxyz | sorttimestamp |
enskurt | ensstd1 | merge | seinfo | szip |
ensmax | enssum | mergetime | seinfoc | xinfon |
ensmean | ensvar | output | seinfon | ensmin |
ensvar1 | outputarr | seinfop |
For outputs the list is
distgrid | eofcoeff | eofcoeff3d | intyear |
scatter | splitcode | splitday | splitgrid |
splithour | splitlevel | splitmon | splitname |
splitparam | splitrec | splitseas | splitsel |
splittabnum | splitvar | splityear | splityearmon |
splitzaxis |
Arbitrary Outputs¶
For operators with arbitrary outputs the rule is simple: Since the calling operator of such an operation cannot not know, how many inputs to read from, it is only allowed to call these operators at the very end of a chain. Hence the outputs are always files on disk.
Arbitrary Inputs¶
Here the rule is different: Operators with an arbitrary number of inputs behave greedy (like regular expressions), which means: they read as many inputs as possible. This leads to a limitation, because such operators cannot be part of complex chains, that involve operators with more than one input stream. The consequence is, that operators like cat
and merge
are mostly used in stand-alone calls, because their ability for chaining is somewhat limited.
Lets have a look at an example:
cdo -infov -div -fldmean -cat -for,1,10 -mulc,-1 -for,1,5 -fldmax -topo
Don't get confused by the fact, that there is no input data file: -for
and -topo
create the inputs and -infov
writes to stdout. Hence I can illustrate things without going into the details of special input files.
The above call does not work because -cat
is greedy (typical for a cat, btw - I love cats ...):
% cdo -infov -div -fldmean -cat -for,1,10 -mulc,-1 -for,1,5 -fldmax -topo cdo (Abort): Too few streams specified! Operator -div needs 2 input and 1 output streams.
What happened? -div
needs two input streams and one output stream, but our -cat
has claimed all possible streams on its right hand side as input and didn't leave anything for the remaining input or output stream of -div
.
How to deal with that? A change in the processing rule is unlikely to help here: Limit the number of inputs to 2 would lead in even longer chains like
cdo -cat afile -cat bfile -cat cfile dfile outfile- certainly nothing readable. The solution is: Re-Introduce a parenthesis, but only if needed! On the Unix-command line there are not so many signs left for that purpose. That's why we decided to take the detached square bracket for that, i.e. a square bracket that has a space on the left and right of it.
The above call now looks like
cdo -infov -div -fldmean -cat [ -for,1,10 -mulc,-1 -for,1,5 ] -fldmax -topoand it works perfectly with cdo-1.9.5. It's even a lot more readable:
-div
's first input is -fldmean -cat [ -for,1,10 -mulc,-1 -for,1,5 ]
and its second is -fldmax -topo
.
Please come up with some other use-cases!