Broadcasting climate indices
Added by Gruel Axles 12 months ago
What is the recommended approach to calculate ECASU - Summer days index per time period for multiple temperature thresholds?
Intuitively I want to broadcast the greater than comparison over an array of constants rather than a single constant. Is this possible with CDO Indices?
Replies (6)
RE: Broadcasting climate indices - Added by Karin Meier-Fleischer 12 months ago
Hi Gruel,
see the ECA documentation https://code.mpimet.mpg.de/projects/cdo/embedded/cdo_eca.pdf#subsection.2.0.3.
RE: Broadcasting climate indices - Added by Gruel Axles 12 months ago
Thanks for the quick response. I had seen the documentation, but I’m still not clear on the best approach to computing the ECASU index for multiple threshold values. The documentation says T is an integer, so can it only be done for a single threshold at a time? And multiple thresholds requires running the command multiple times, which duplicates all the I/O? Just looking for a way to avoid all the extra I/O for computing multiple threshold values. I don’t mind bypassing the eca_su operator and manually crafting the index calculation using more primitive operators that can handle broadcasting.
RE: Broadcasting climate indices - Added by Gruel Axles 12 months ago
Gruel Axles wrote in RE: Broadcasting climate indices:
Thanks for the quick response. I had seen the documentation, but I’m still not clear on the best approach to computing the ECASU index for multiple threshold values. The documentation says T is an FLOAT, so can it only be done for a single threshold at a time? And multiple thresholds requires running the command multiple times, which duplicates all the I/O? Just looking for a way to avoid all the extra I/O for computing multiple threshold values. I don’t mind bypassing the eca_su operator and manually crafting the index calculation using more primitive operators that can handle broadcasting.
RE: Broadcasting climate indices - Added by Ralf Mueller 12 months ago
hi!
There is no way to calculate the index for multiple thresholds at once. you have to create additional output (a single 2d field).
BTW: i don't understand the term broadcasting in that context.
cheers
ralf
RE: Broadcasting climate indices - Added by Gruel Axles 12 months ago
Ah sorry. I use broadcasting in the way NumPy describes https://numpy.org/doc/stable/user/basics.broadcasting.html
So if you have a size (M,N) 2D array of temperature data and a size (Q,) 1D array of threshold values, then the operation
T_data > T_thresh
will be broadcasted, resulting in a 3D array of size (M, N, Q) where each slice in the Q dimension represents the comparison with each constant in the threshold array independently.
RE: Broadcasting climate indices - Added by Ralf Mueller 12 months ago
I see. In the case of ecasu, the operation is a reduction of the time dimension (T(time,lon,lat) -> ecasu(lon,lat,T) Sure, you can save multiple results in a single array. But in the same way you can merge each output file (for each given T threshold) into a single file. The algorithm itself will not be parallelized in any way with CDO. Instead you could run multiple CDO commands for process-level parallelization.
You can rewrite the whole implementation with dask to ad the parallelization within a single process, but writing it will take time and I doubt there will be any performance benefit left in the end.
If you want to avoid IO-overhead, you can use /dev/shm
on linux systems. It's limited in terms of space, but very fast.
cheers
ralf