Project

General

Profile

Incorrect number of missing values from infon

Added by Matt Thompson about 2 years ago

All,

I have a bit of an oddity. So a colleague was working on a new utility for our model and was using "infon" to make sure things looked right compared to an older utility. However, in doing so we seem to have found an oddity in the reported number of missing values.

For example, in his Fortran code he counted them and got--for the two variables in question--184367 missing values. But when we run cdo:

❯ cdo --version
Climate Data Operators version 2.1.0 (https://mpimet.mpg.de/cdo)
System: x86_64-apple-darwin21.6.0
CXX Compiler: mpic++   -pthread
CXX version : unknown
C Compiler: mpicc -g -O2  -pthread -pthread
C version : gcc (GCC) 12.2.0
F77 Compiler: mpifort -g -O2
F77 version : GNU Fortran (GCC) 12.2.0
Features: 32GB 16threads c++17 Fortran pthreads HDF5 NC4/HDF5 OPeNDAP sz udunits2 sse3
Libraries: yac/2.4.2 HDF5/1.10.9
CDI data types: SizeType=size_t
CDI file types: srv ext ieg nc1 nc2 nc4 nc4c nc5 nczarr
     CDI library version : 2.1.0
 cgribex library version : 2.0.2
  NetCDF library version : 4.9.0 of Oct 31 2022 12:08:26 $
    HDF5 library version : 1.10.9
    exse library version : 1.4.2
    FILE library version : 1.9.1

❯ cdo infon teland.nc4
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
     1 : 1995-01-01 01:30:00       0   259920  184367 : -1.8845e+09  8.3204e+07  1.0098e+09 : TELAND
     2 : 1995-01-01 01:30:00       0   259920  184393 : -7.5591e+11  5.8734e+13  9.7276e+14 : Var_TELAND
cdo    infon: Processed 519840 values from 2 variables over 1 timestep [0.03s 17MB]

it is reporting 184393 missing values for Var_TELAND. And nothing we could do in Fortran duplicated that. (And since Var_TELAND is the variance of TELAND, they probably should have the same number of missing values.) We tried all sorts of odd "floating point equality" tricks, but could never get 184393.

So, as a check, we turned to NCO:

❯ ncap2 -s "nmiss=TELAND.number_miss();nmiss_var=Var_TELAND.number_miss()" teland.nc4 out.nc4
❯ ncks -v nmiss,nmiss_var out.nc4
netcdf out {
  variables:
    uint64 nmiss ;

    uint64 nmiss_var ;

  data:
    nmiss = 184367 ;

    nmiss_var = 184367 ;

} // group /

and it too says both have 184367.

Now, in our code, the missing value is 1e+15, so all it takes is a little FP oddity and boom, not a "missing value".

Are there perhaps options/settings we need to pass to CDO? Or perhaps we might be doing something wrong in our Fortran setting it?

I'm attaching the input file used above for your testing.

Thanks for any help,
Matt

teland.nc4 (2.01 MB) teland.nc4 Input file for tests above

Replies (4)

RE: Incorrect number of missing values from infon - Added by Karin Meier-Fleischer about 2 years ago

Hi Matt,

I'm able to reproduce the same missing values count for both variables, too. I used Python3/xarray:

import xarray as xr
from cdo import Cdo
cdo = Cdo()

infile = 'teland.nc4'
ds = xr.open_dataset(infile)
var1 = ds.TELAND
size1 = var1.size
missing1 = size1 - var1.count().values
print(missing1)    # = 184367

var2 = ds.Var_TELAND
size2 = var2.size
missing2 = size2 - var2.count().values
print(missing2)   # = 184367

print(cdo.version())   # '2.0.5'

cdo.infon(input=infile)
['-1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name',
 '1 : 1995-01-01 01:30:00       0   259920  184367 : -1.8845e+09  8.3204e+07  1.0098e+09 : TELAND',
 '2 : 1995-01-01 01:30:00       0   259920  184393 : -7.5591e+11  5.8734e+13  9.7276e+14 : Var_TELAND']

@Uwe: do you know why the number of missing values differ?

RE: Incorrect number of missing values from infon - Added by Matt Thompson about 2 years ago

Thing I just learned: There's a Python interface to CDO!

Thanks for the check, Karin!

RE: Incorrect number of missing values from infon - Added by Uwe Schulzweida about 2 years ago

The valid range of Var_TELAND is -1.e+15f to 1.e+15f. The maximum value of Var_TELAND is 3.6e+15 (result from ncview). So it seems that Var_TELAND contains some values outside the valid range. Since CDO evaluates the valid_range attribute, these values are also missing values.

RE: Incorrect number of missing values from infon - Added by Matt Thompson about 2 years ago

Uwe,

Interesting! That "valid_range" has always been an issue for me in that it's sort of not as enforced in our model as I hoped.

But, you just gave us a good reason to start taking seriously about it.

    (1-4/4)