Percentile calculation using 'linear'
Added by Lukas Brunner 2 months ago
I would want to calcualte daily running percentiles using different seasonal running windows using `ydrunpctl`. Some of these I calculate with python xarray as they are complex but most of them I calculate with cdo for performance reasons (that is to say I need to use both). I want to make sure that there is no difference in the calculation per se.
The documentation tells me that the cdo default is nrank but that this can be changed to the numpy default of 'linear' using `--percentile linear` as an option. This works as far as not throwing me an error when running it, but as best as I can tell the results do not match numpy. In addition the results with and without `--percentile` flag are identical, which seems to suggest that the flag is not working.
Maybe I'm using it wrong? I'd be happt for any help. Below is a MWE I ran on Levante, I'll also add the sample data I created for this case.
(please note that I'm using a window size of 1 here for simplicity, but in general I need longer windows as well, hence my use of `ydrunpctl` instead of `ydaypctl`)
```bash
module purge
module load cdo/2.4.3-gcc-11.2.0
window=1
percentile=90
infile=/work/uc1275/LukasBrunner/data/RunningWindowSize/data/test_cdo_python/tasmax_day_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_19610101-19901231_single-cell.nc
outfile=${infile%.nc}
cdo ydrunmin,$window,rm=c $infile ${outfile}_w${window}_ydrunmin.nc
cdo ydrunmax,$window,rm=c $infile ${outfile}_w${window}_ydrunmax.nc
cdo --percentile linear -ydrunpctl,${percentile},$window,rm=c,pm=r8 $infile ${outfile}_w${window}_ydrunmin.nc ${outfile}_w${window}_ydrunmax.nc ${outfile}_w${window}_${percentile}p_linear.nc
```
```python
import numpy as np
import xarray as xr
da = xr.open_dataset('/work/uc1275/LukasBrunner/data/RunningWindowSize/data/test_cdo_python/tasmax_day_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_19610101-19901231_single-cell.nc')['tasmax']
p90_python = da.groupby('time.dayofyear').quantile(.9)
p90_cdo = xr.open_dataset('/work/uc1275/LukasBrunner/data/RunningWindowSize/data/test_cdo_python/tasmax_day_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_19610101-19901231_single-cell_w1_90p_linear.nc')['tasmax']
p90_cdo = p90_cdo.assign_coords({'time': np.arange(1, 366)}).rename({'time': 'dayofyear'})
(p90_cdo - p90_python).mean().item()
```
-> 0.23620238421704062
And the 'nrank' - 'linear' comparison:
```bash
cdo --percentile nrank -ydrunpctl,${percentile},$window,rm=c,pm=r8 $infile ${outfile}_w${window}_ydrunmin.nc ${outfile}_w${window}_ydrunmax.nc ${outfile}_w${window}_${percentile}p_nrank.nc
cdo info -sub tasmax_day_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_19610101-19901231_single-cell_w1_90p_linear.nc tasmax_day_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_19610101-19901231_single-cell_w1_90p_nrank.nc
```
Replies (2)
RE: Percentile calculation using 'linear' - Added by Uwe Schulzweida 2 months ago
Hi Lukas,
The parameter pm=r8 of the operator ydrunpctl sets the percentile method to R’s type=8.
You have to remove this parameter to change the percentile method with the CDO option --percentile.
The CDO option -v prints the selected/used percentile method.
Cheers,
Uwe
RE: Percentile calculation using 'linear' - Added by Lukas Brunner 2 months ago
Ahhh... legacy stuff thanks for pointing that out!!
Tiny follow-up:
--percentile linear and ...,pm=7 do not give the same result
The first one is closer to the numpy solution (10-7 difference) while pm=7 is 10-1. I'll use --perccenile linear now and drop the pm= but to satisfy my curriosity in case you know I'd be interested how that happens?
Thanks again anyway!!