Project

General

Profile

Erroneous behavior of cdo

Added by Stefan Hagemann 5 months ago

Hi
I generated a file with a test version of the HD model where cdo states
cdo infov error_in_cdoinfo.nc
-1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name
1 : 1979-01-16 00:00:00 0 518400 0 : 0.0000 nan 2.7366 : friv_phosphorus

However, if I look into the file with ncview, it looks pretty ok, thereby showing values of 5, 10 and up to ca. 367.
Alberto used ferret and calculated the maximum value of 367.6
but cdo fldmax yields 2.7366 .
Older versions on levante seems to show the same. However on the Hereon cluster strand,
cdo infov error_in_cdoinfo.nc
-1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name
1 : 1979-01-16 00:00:00 0 518400 0 : 0.0000 nan 367.61 : friv_phosphorus

There seems to be a bug, at least on the levante cdos.
The file is attached and located on levante at:
/scratch/g/g260122/error/error_in_cdoinfo.nc


Replies (6)

RE: Erroneous behavior of cdo - Added by Uwe Schulzweida 5 months ago

Hi Stefan,

The file contains missing values and the appropriate _FillValue=NaN attribute is missing. If the _FillValue attribute is missing, CDO assumes that there are no missing values in the data. If there are still NaNs in the data, this can lead to incorrect results.

cdo infon -setmissval,nan error_in_cdoinfo.nc
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
     1 : 1979-01-16 00:00:00       0   518400  131234 :      0.0000     0.66334      367.61 : friv_phosphorus
Cheers,
Uwe

RE: Erroneous behavior of cdo - Added by Stefan Hagemann 5 months ago

Hi Uwe,
thanks for the info. I suggest to tell the fldmin and fldmax function (and hence cdo info) to ignore NANs. Then, one can seen the NANs in the mean, but does not think that the whole array is crab when the max value is stated correctly.
Best regards
Stefan

RE: Erroneous behavior of cdo - Added by Ralf Mueller 5 months ago

Stefan Hagemann wrote in RE: Erroneous behavior of cdo :

Hi Uwe,
thanks for the info. I suggest to tell the fldmin and fldmax function (and hence cdo info) to ignore NANs. Then, one can seen the NANs in the mean, but does not think that the whole array is crab when the max value is stated correctly.

-setmissval,nan is exactly this, isn't it? you can prepend this to any operator u want

RE: Erroneous behavior of cdo - Added by Stefan Hagemann 5 months ago

Hi Ralf
in principle, yes. However, the NAN was not desired, but as the maximum in 'cdo info' was much too small, I thought that the whole field is erroneous. Hence, I thought that something in my program was fundamentally wrong so I searched a while for the error. (After I knew that the data were fine except for a few NANs related to NANs in the input data, the solution was quick.)
As a user I was expecting that the fldmax function delivers a correct maximum and not a wrong maximum, independent whether there are NANs or not.

RE: Erroneous behavior of cdo - Added by Ralf Mueller 5 months ago

Moin Stefan!
If nan is not desired in your case, you might declare the _FillValue or missing_value attribute accordingly and CDO can figure it out without any extra command line input. As an analysis tool CDO has to rely on some data format conventions. Otherwise netcdf as a file format becomes too general to write any reasonable operation for climate/nwp. That's why the CF-convention handles this as Uwe pointed out before.

I suggest, you add this to the HD model and preprocess all relevant files that u already have with cdo -setmissval,nan .... you could also it with ncatted and change the original files instead of creating new ones (if the definition part of your files is big enough).

RE: Erroneous behavior of cdo - Added by Stefan Hagemann 5 months ago

Hi Ralf
there is a misunderstanding. It has nothing to do with the HD model itself. The NANs in the output are an error, and I removed them by smoothing out input data over lakes to remove NANs therein.
I just suggested a minor improvement from my point of view. so that cdo do not provide a wrong 'fldmax' instead of the correct maximum, or instead of giving an error message and no maximum (or NAN as a maximum), just because there are NANs in the field. This wrong maximum has mislead me in my error search and may mislead others, too. If you do not want to adapt my suggestion, I am am ok with that.

    (1-6/6)