Incorrect number of missing values from infon
Added by Matt Thompson about 2 years ago
All,
I have a bit of an oddity. So a colleague was working on a new utility for our model and was using "infon" to make sure things looked right compared to an older utility. However, in doing so we seem to have found an oddity in the reported number of missing values.
For example, in his Fortran code he counted them and got--for the two variables in question--184367 missing values. But when we run cdo:
❯ cdo --version
Climate Data Operators version 2.1.0 (https://mpimet.mpg.de/cdo)
System: x86_64-apple-darwin21.6.0
CXX Compiler: mpic++ -pthread
CXX version : unknown
C Compiler: mpicc -g -O2 -pthread -pthread
C version : gcc (GCC) 12.2.0
F77 Compiler: mpifort -g -O2
F77 version : GNU Fortran (GCC) 12.2.0
Features: 32GB 16threads c++17 Fortran pthreads HDF5 NC4/HDF5 OPeNDAP sz udunits2 sse3
Libraries: yac/2.4.2 HDF5/1.10.9
CDI data types: SizeType=size_t
CDI file types: srv ext ieg nc1 nc2 nc4 nc4c nc5 nczarr
CDI library version : 2.1.0
cgribex library version : 2.0.2
NetCDF library version : 4.9.0 of Oct 31 2022 12:08:26 $
HDF5 library version : 1.10.9
exse library version : 1.4.2
FILE library version : 1.9.1
❯ cdo infon teland.nc4
-1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name
1 : 1995-01-01 01:30:00 0 259920 184367 : -1.8845e+09 8.3204e+07 1.0098e+09 : TELAND
2 : 1995-01-01 01:30:00 0 259920 184393 : -7.5591e+11 5.8734e+13 9.7276e+14 : Var_TELAND
cdo infon: Processed 519840 values from 2 variables over 1 timestep [0.03s 17MB]
it is reporting 184393 missing values for Var_TELAND. And nothing we could do in Fortran duplicated that. (And since Var_TELAND is the variance of TELAND, they probably should have the same number of missing values.) We tried all sorts of odd "floating point equality" tricks, but could never get 184393.
So, as a check, we turned to NCO:
❯ ncap2 -s "nmiss=TELAND.number_miss();nmiss_var=Var_TELAND.number_miss()" teland.nc4 out.nc4
❯ ncks -v nmiss,nmiss_var out.nc4
netcdf out {
variables:
uint64 nmiss ;
uint64 nmiss_var ;
data:
nmiss = 184367 ;
nmiss_var = 184367 ;
} // group /
and it too says both have 184367.
Now, in our code, the missing value is 1e+15, so all it takes is a little FP oddity and boom, not a "missing value".
Are there perhaps options/settings we need to pass to CDO? Or perhaps we might be doing something wrong in our Fortran setting it?
I'm attaching the input file used above for your testing.
Thanks for any help,
Matt
teland.nc4 (2.01 MB) teland.nc4 | Input file for tests above |
Replies (4)
RE: Incorrect number of missing values from infon - Added by Karin Meier-Fleischer about 2 years ago
Hi Matt,
I'm able to reproduce the same missing values count for both variables, too. I used Python3/xarray:
import xarray as xr from cdo import Cdo cdo = Cdo() infile = 'teland.nc4' ds = xr.open_dataset(infile) var1 = ds.TELAND size1 = var1.size missing1 = size1 - var1.count().values print(missing1) # = 184367 var2 = ds.Var_TELAND size2 = var2.size missing2 = size2 - var2.count().values print(missing2) # = 184367 print(cdo.version()) # '2.0.5' cdo.infon(input=infile) ['-1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name', '1 : 1995-01-01 01:30:00 0 259920 184367 : -1.8845e+09 8.3204e+07 1.0098e+09 : TELAND', '2 : 1995-01-01 01:30:00 0 259920 184393 : -7.5591e+11 5.8734e+13 9.7276e+14 : Var_TELAND']
@Uwe: do you know why the number of missing values differ?
RE: Incorrect number of missing values from infon - Added by Matt Thompson about 2 years ago
Thing I just learned: There's a Python interface to CDO!
Thanks for the check, Karin!
RE: Incorrect number of missing values from infon - Added by Uwe Schulzweida about 2 years ago
The valid range of Var_TELAND is -1.e+15f to 1.e+15f. The maximum value of Var_TELAND is 3.6e+15 (result from ncview). So it seems that Var_TELAND contains some values outside the valid range. Since CDO evaluates the valid_range attribute, these values are also missing values.
RE: Incorrect number of missing values from infon - Added by Matt Thompson about 2 years ago
Uwe,
Interesting! That "valid_range" has always been an issue for me in that it's sort of not as enforced in our model as I hoped.
But, you just gave us a good reason to start taking seriously about it.