Project

General

Profile

CDO computes different results on two different machines from same file

Added by Luca Lelli about 2 years ago

Hello forum,
I have a possibly worrisome problem. I have created a netCDF stack with

cdo -b F32 -z zip4 setgrid,modis.des -copy ./MOD08_D3_netcdf/*.nc modis_gome2a_20070123_20211115.nc

The content is thus a timeserie of lonlat grids

netcdf modis_gome2a_20070123_20211115 {
dimensions:
        time = UNLIMITED ; // (2902 currently)
        lon = 241 ;
        lat = 121 ;
variables:
        double time(time) ;
                time:standard_name = "time" ;
                time:units = "days since 2007-01-01 00:00:00" ;
                time:calendar = "proleptic_gregorian" ;
                time:axis = "T" ;
        double lon(lon) ;
                lon:standard_name = "longitude" ;
                lon:long_name = "longitude" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;
        double lat(lat) ;
                lat:standard_name = "latitude" ;
                lat:long_name = "latitude" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;
        float Water_Vapor_Near_Infrared_Clear_Mean(time, lat, lon) ;
                Water_Vapor_Near_Infrared_Clear_Mean:long_name = "Water vapor near infrared - clear column (bright land and ocean sunglint only): Mean" ;
                Water_Vapor_Near_Infrared_Clear_Mean:units = "cm" ;
                Water_Vapor_Near_Infrared_Clear_Mean:_FillValue = -9999.f ;
                Water_Vapor_Near_Infrared_Clear_Mean:missing_value = -9999.f ;

When I issue the following command (but this applies to timmean and others but not to infon), I get different results on different machines.
For instance, on machine 1

cdo outputf,%5.3f,1 -fldmean -seltimestep,1,2,3 modis_gome2a_20070123_20211115.nc

cdo(1) fldmean: Process started
cdo(2) seltimestep: Process started
20.802              
14.097
15.187
cdo(2) seltimestep: Processed 87483 values from 1 variable over 4 timesteps
cdo(1) fldmean: Processed 87483 values from 1 variable over 3 timesteps
cdo    outputf: Processed 3 values from 1 variable over 3 timesteps [0.17s 40MB]

while on machine 2
cdo(1) fldmean: Process started
cdo(2) seltimestep: Process started
  nan               
  nan
  nan
cdo(2) seltimestep: Processed 87483 values from 1 variable over 4 timesteps.
cdo(1) fldmean: Processed 87483 values from 1 variable over 3 timesteps.
cdo    outputf: Processed 3 values from 1 variable over 3 timesteps [0.01s 37MB].

Operators like setrtomiss, setvrange do not sort any difference in the computations.
I am puzzled: why should CDO compute different results if the file is exactly the same and/or treat NaNs differently on different machines?

On machine 1 I have the following CDO installation

Climate Data Operators version 2.1.0 (https://mpimet.mpg.de/cdo)
System: x86_64-conda-linux-gnu
CDI data types: SizeType=size_t
CDI file types: srv ext ieg grb1 grb2 nc1 nc2 nc4 nc4c nc5 nczarr 
     CDI library version : 2.1.0
 cgribex library version : 2.0.2
 ecCodes library version : 2.27.0
  NetCDF library version : 4.8.1 of Aug 13 2022 00:35:58 $
    HDF5 library version : 1.12.2 threadsafe
    exse library version : 1.4.2
    FILE library version : 1.9.1

On machine 2 I have the following CDO installation
Climate Data Operators version 2.0.5 (https://mpimet.mpg.de/cdo)
System: x86_64-conda-linux-gnu
CDI data types: SizeType=size_t  DateType=int64_t
CDI file types: srv ext ieg grb1 grb2 nc1 nc2 nc4 nc4c nc5 
     CDI library version : 2.0.5
 cgribex library version : 2.0.1
 ecCodes library version : 2.26.0
  NetCDF library version : 4.8.1 of Apr 25 2022 17:43:42 $
    hdf5 library version : 1.12.1 threadsafe
    exse library version : 1.4.2
    FILE library version : 1.9.1

If any good Samaritan wants to give it a look, I attach the first three time steps of the stack. For the time being I run computations on the machine giving me real results and not nans but it is not a good feeling when the numerics behaves differently across platforms.

Thanks and cheers
Luca

modis-t123.nc (374 KB) modis-t123.nc First three timesteps of a bigger stack

Replies (3)

RE: CDO computes different results on two different machines from same file - Added by Uwe Schulzweida about 2 years ago

Hello Luca

The missing value in the data is -9999. In addition, there are some values in the data that are NaN. Such undefined values lead to undefined results in calculations. NaNs are handled correctly in CDO only if it is the missing value.
A workaround for this case is either convert all NaNs to missing values

cdo outputf,%5.3f,1 -fldmean -seltimestep,1,2,3 -setctomiss,nan modis_gome2a_20070123_20211115.nc
or set the missing value to NaN
cdo outputf,%5.3f,1 -fldmean -seltimestep,1,2,3 -setmissval,nan modis_gome2a_20070123_20211115.nc
Cheers, Uwe

RE: CDO computes different results on two different machines from same file - Added by Luca Lelli about 2 years ago

Hello Uwe,

thanks. I confirm that both approaches deliver the correct and expected outcome.

Just out of curiosity, I wonder why nan is interpreted as missing_value on machine 1 and not on machine 2.

Cheers
Luca

RE: CDO computes different results on two different machines from same file - Added by Uwe Schulzweida about 2 years ago

I was just wondering that myself. For perfromance reasons, we have two branches for calculations. One for data with missing values and one without. Since CDO version 2.0.6 a new branch has been added. This is for data with missing values, if this is not NaN. This branch seems to give correct results for the present case. I would see this more as a coincidence.

    (1-3/3)