Project

General

Profile

eca_r95ptot wrong results if following documentation

Added by Adriano Fantini almost 6 years ago

I have found out that the documentation attached to the `eca_r95ptot` (and similar) operators seems to be misleading, at least if I understand stuff correctly.

R95ptot is the percentage of precipitation due to wet days with precipitation > 95th percentile (also of wet days).

Here are the steps I used to reproduce the results (bash shell), starting from a file with 5 years of daily precipitation:

#Lets make sure we only have rainy days
f0=daily.nc
f=rainy.nc
cdo setrtomiss,-1000,0.9999 ${f0}.nc ${f}

#1) This is how the documentation suggests to use the operators, with `yday` daily statistics. Notice that a DUPLICATE is necessary, that is not mentioned in the documentation
cdo -L ydaypctl,95 ${f} -ydaymin ${f} -ydaymax ${f} q95_1.nc
cdo -L eca_r95ptot ${f} -duplicate,$(cdo ntime ${f}) q95_1.nc r95_1.nc

#2) like 1, with more operator chaining, same results
cdo -L eca_r95ptot ${f} -duplicate,$(cdo ntime ${f}) -ydaypctl,95 ${f} -ydaymin ${f} -ydaymax ${f} r95_2.nc

#3) Using total percentiles, which makes more sense to me... works, but results are very different to 1 and 2
cdo -L eca_r95ptot ${f} -duplicate,$(cdo ntime ${f}) -timpctl,95 ${f} -timmin ${f} -timmax ${f} r95_3.nc

#4) Doing stuff manually, without using the ECA operator, leads to identical results to 3
cdo timsum ${f} timsum.nc
cdo -L timpctl,95 ${f} -timmin ${f} -timmax ${f} q95_4.nc
cdo -L -timsum -mul -ge ${f} q95_4.nc ${f} total_ge_q95.nc
cdo -L mulc,100 -div total_ge_q95.nc timsum.nc r95_4.nc

#5) Like 4, with more operator chaining, identical results to 3
cdo -L timpctl,95 ${f} -timmin ${f} -timmax ${f} q95_5.nc
cdo -L mulc,100 -div -timsum -mul -ge ${f} q95_5.nc ${f} -timsum ${f} r95_5.nc

#6) Like 5, with more operator chaining, identical results to 3 and 4
cdo -L mulc,100 -div -timsum -mul -ge ${f} -duplicate,$(cdo ntime ${f}) -timpctl,95 ${f} -timmin ${f} -timmax ${f} ${f} -timsum ${f} r95_6.nc

Did I misunderstand the documentation, the meaning of the metric, or something else?

Thanks in advance.


Replies (9)

RE: eca_r95ptot wrong results if following documentation - Added by Adriano Fantini almost 6 years ago

I think this is a pretty serious issue, if I am not mistaken, but maybe I am. Is this a bug or not?

RE: eca_r95ptot wrong results if following documentation - Added by Ralf Mueller almost 6 years ago

hi Adriano!

so your question is how to use these operators correctly?

cheers
ralf

RE: eca_r95ptot wrong results if following documentation - Added by Ralf Mueller almost 6 years ago

some first thoughts by me:

  1. the definition can be found here: https://www.ecad.eu/images/indices/R95pTOT.png
  2. the difference of 2) and 3) is understandable since overall pctl gives a lower fraction that pctl wrt days. of cause the there are less days above the 95pct wrt to total max that wrt ydaymax.

bot 2 and 3 have their use - it depends on what you want to do with it. I have to admit that the given definition is not precise with that respect. Unfortunately the etccdi definition for R95pTOT at http://etccdi.pacificclimate.org/list_27_indices.shtml is similar

hth
ralf

RE: eca_r95ptot wrong results if following documentation - Added by Ralf Mueller almost 6 years ago

might be helpful to have a look into other climate indices packages

RE: eca_r95ptot wrong results if following documentation - Added by Adriano Fantini almost 6 years ago

Hi Ralf,
there are several points to be discussed here.

First, I think that the documentation's suggestion to use "yday" percentiles is wrong. I do not see the reasoning behind it (you would need at least 100 years of data for it to make sense, right? If you have 10 years, what would be the point of calculating the 95th percentile of 10 days?) nor I see how it can be supported by the definition. But I might be wrong, maybe there is something I am missing.

Second, the documentation should be updated in the fact that a "duplicate" is needed to make the command work... at least I cannot make it work otherwise!

I am currently using number 3) to calculate the R95_ptot.

RE: eca_r95ptot wrong results if following documentation - Added by Ralf Mueller almost 6 years ago

Adriano Fantini wrote:

Hi Ralf,
there are several points to be discussed here.

First, I think that the documentation's suggestion to use "yday" percentiles is wrong. I do not see the reasoning behind it (you would need at least 100 years of data for it to make sense, right? If you have 10 years, what would be the point of calculating the 95th percentile of 10 days?) nor I see how it can be supported by the definition. But I might be wrong, maybe there is something I am missing.

From the statistical point of view your argument is valid, but IMO not if seasonality is part of your data. Let's say for southern Europe, does it make sense to mix up summer and winter when it comes to rainfall? For Brasil that might not be important though - both value might converge.

The definition is based on daily data - your implementation can be done for any frequency. Keep in mind, that these indices are climate indices. They should be applied for data of 30 years or more - computing the climate of a five-year-dataset doesn't make any sense to me.

Second, the documentation should be updated in the fact that a "duplicate" is needed to make the command work... at least I cannot make it work otherwise!

Correct - this should be pointed out in the docu.

I am currently using number 3) to calculate the R95_ptot.

Again - it might be worth checking how other tool implement these indices since the definition is not 100% precise about that important detail.

hth
ralf

RE: eca_r95ptot wrong results if following documentation - Added by Adriano Fantini almost 6 years ago

Personally I divide the data into seasons and do the r95 separately on the four seasons (in some cases, I use months instead of seasons). As you say this is climate: as much as it does not make sense to calculate the r95 of 5 years, it does not make sense to calculate the r95 separately for, e.g., the 14 and the 15th of April, which is what the documentation currently seems to suggest. Especially since this is an extreme index.

RE: eca_r95ptot wrong results if following documentation - Added by Ralf Mueller almost 6 years ago

Adriano Fantini wrote:

Personally I divide the data into seasons and do the r95 separately on the four seasons (in some cases, I use months instead of seasons). As you say this is climate: as much as it does not make sense to calculate the r95 of 5 years, it does not make sense to calculate the r95 separately for, e.g., the 14 and the 15th of April, which is what the documentation currently seems to suggest. Especially since this is an extreme index.

I didn't mean to prefer one R95_ptot implementation over the other - I simply don't know which one is more along with the definition. The definition itself seems to be weak in that point. I had similar problems with other indices, too - like the growing-season-length. Its the definition is only reasonable for the northern hemisphere.

your argument regarding extreme conditions sounds pretty reasonable to me.

Here is another document on which this software is referring to. in the docu it's says something about percetiles on week days.

really - i am not the expert here ;-)

RE: eca_r95ptot wrong results if following documentation - Added by Adriano Fantini almost 6 years ago

None of the definitions above are specific as to what period to take as reference. I think the documentation could suggest something sensible, and in my opinion currently it does not.

    (1-9/9)