Project

General

Profile

Error in ETCCDI calculation

Added by Andrea Vito Vacca about 3 years ago

Goodmorning and thanks in advance for the support.

I am trying to calculate with cdo the etccdi indices as indicated by Fabian Wachsmann (https://gitlab.dkrz.de/k204210/cdo_cei) from whom I have taken the dataset (hamburg_timeseries) and the conda environment.
When computing R95p (Annual total precipitation when pr > 95p) in the following way: cdo etccdi_r95p,n,startboot,endboot infile1 infile2 infile3 outfile

--> cdo etccdi_r95p,5,2071,2100 pr_2071-2100.nc -ydrunmin,5 pr_2071-2100.nc -ydrunmax,5 pr_2071-2100.nc r95ptot.nc

this error occurs: cdo etccdi_r95p (Abort): The interval start year '5' is before infile start year '2071'. terminate called without an active exception. Aborted (core dumped).

that does not make sense to me since n=5 is the window days, number of timesteps, and not the interval start as indicated by the error. Is there a bug in the cdo command or am I doing something wrong?

I attach here the file taken from Fabian Wachsmann


Replies (3)

RE: Error in ETCCDI calculation - Added by Fabian Wachsmann about 3 years ago

Dear Andrea Vito Vacca,

the CDO's help function is not correct and complete at that point, I am sorry. I will change that soon.

According to the definition of ETCCDI, the percentile for r95p is not calculated centered on a day or with a window but instead, the percentile is calculated for the entire base period. I.e., there is no differentiation between the days of year for the precipitation amount.

http://etccdi.pacificclimate.org/list_27_indices.shtml

That means, n is set 1. I am not sure about the bootstrapping but I will get back to you soon.

Best,
Fabi

RE: Error in ETCCDI calculation - Added by Andrea Vito Vacca about 3 years ago

Dear Fabian,
thank you for the previous answer and for Your work that is very important for my master thesis.

I have another doubt regarding ETCCDI indices calculation. If I compute the percentile-based indices,for example Tx90p, over a timeperiod that is the same as the bootstrapping period and I average over time the index I do not have 10% as final result as I would expect.
Here I report what I have done and attach a sample of my data.

tasmax=tasmax_day_ACCESS-CM2_19810101-20101231.nc
cdo ydrunmin,5 $tasmax infile2.nc
cdo ydrunmax,5 $tasmax infile3.nc
cdo etccdi_tx90p,5,1981,2010 $tasmax infile2.nc infile3.nc Tx90p_ACCESS-CM2.nc
cdo output -timmean Tx90p_ACCESS-CM2.nc

Result --> 10.7277

I would expect this result to be exactly equal to 10. I tried to do the same with different samples with different sizes and the result is always different from 10.
Could you explain me that? Thank You very much in advance.
Kind regards,
Andrea Vito Vacca

RE: Error in ETCCDI calculation - Added by Fabian Wachsmann about 3 years ago

Dear Andrea Vito Vacca,

there are many percentile calculation methods all leading to slightly different results. The main reason why the etccdi index is not exactly equal to 10 is because of the bootstrapping method that is applied within. A detailed explanation why that is used is given in Zhang et al. 2005 https://doi.org/10.1175/JCLI3366.1 .

Short explanation: Bootstrapping is required to keep the results homogeneous when a base period is compared with another period and when the base period is used for percentile calculation. If you have only one period, as in your use case, bootstrapping is not necessary for a homogeneous result.

Here are some hints to improve your code:

1. The etccdi operators are using a "circular" loop over the days. At start and end of the time series, the time series is extended by days from the other end. I.e., the time series starting from 19810101 is extended by two days taken from 20101230 and 20101231 and is also extended at the other end with two days 19810101 and 19810102. In order to keep the total analysis consistently, you can use a keyword argument rm=c for ydrunmin and ydrunmax, i.e.

cdo ydrunmin,5,rm=c $tasmax infile2.nc
cdo ydrunmax,5,rm=c $tasmax infile3.nc

so that $tasmax is also extended in a similar way for this calculation

2.1. As I said, there are many percentile calculation methods. Using the ETCCDI's method in CDO's, the r8 type (coming from the R - language), requires that the entire time series needs to be kept in memory during the analysis. Therefore, it is only applicable when you are interested in rather one grid cell instead of the entire grid. In your case that is true. However, to switch this method on, you need to tell CDO's that you can keep all values in memory by setting the environment variable:

export CDO_PCTL_NBINS="$((windowsize*(endboot-startboot+1)*2+2))" 

which is in your case 5*(2010-1981)*2+2 .

If you dont set this value, CDO will calculate histograms needed for percentile calucation based on bin sizes. These bin size is 1/CDO_PCTL_NBINS*(ydrunmax-ydrunmin). The result will be different.

In etccdi_ operators, the r8 percentile calculation method is used by default.

2.2. If you do not want to use bootstrapping, I recommend to use the eca_tx90p operator for what you have to calculate the percentils yourself:

cdo ydrunpctl,90,5,rm=c,pm=r8 $tasmax infile2.nc infile3.nc ydrunpctl90.nc

Here, you have to set `pm=r8` so that the r8-method is used. Afterwards, run:

cdo eca_tx90p $tasmax ydrunpctl90.nc result.nc

Btw, the last result will be almost exactly 10.

Best regards,
Fabi

    (1-3/3)