Project

General

Profile

strange platform-dependent problem with cdo selindexbox

Added by Christian Stepanek almost 12 years ago

Hi,

I discovered a strange problem with the selindexbox operator that obviously is platform-dependent. The problem therefore might be related to issues with the netcdf library or compilation rather than the program code of the cdos. I still would like to give here a short description of the problem. Maybe someone has met a similar problem and/or has an idea on how to fix it.

I have tsurf output from ECHAM5 that surely does not contain any NaNs (see attached figure Matran_new_echam5_main_mm_610001_tsurf.nc.eps).
If I use the version 1.5.4 of the cdos on one of our solaris-servers to select the area around Greenland, suddenly NaNs appear (Matran_new_echam5_main_mm_610001_tsurf_selindexbox_solasrv4.nc.eps). On the frontend of our SX8 (using the same version of cdo - compiled for a different architecture - and the same input file) I end up with the desired NaN-free tsurf selection around Greenland (Matran_new_echam5_main_mm_610001_tsurf_selindexbox_tx7.nc.eps). The appearance of the NaN-bug on solasrv4 does not depend on the cdo version. Strangely though, I have used cdo on the solaris server for years now without major problems, and just now stumbled over this issue.

For several reasons it is problematic for me to do the selection on the SX8-frontend, and therefore I am stuck to the solaris server, on which the selection is obviously buggy. Is there any way of locating/fixing this issue?

Here some information on the different cdo installations, retrieved by invoking "cdo -V"
on tx7 (not buggy):
Climate Data Operators version 1.5.4 (http://code.zmaw.de/projects/cdo)
Compiler: ecc -std=c99 -O -Onooverlap,restrict=all -pvctl,fullmsg,noassume,loopcnt=1000000 -pthread
version: NEC C Itanium(R) Compiler, Revision 5.5
with: PTHREADS Z
Compiled: by mwerner on sx8 (ia64-unknown-linux-gnu) Feb 23 2012 09:49:45
CDI library version : 1.5.4 of Feb 23 2012 09:48:16
CGRIBEX library version : 1.5.1 of Aug 29 2011 20:30:27
netCDF library version : "4.0" of Nov 4 2008 15:14:35 $
SERVICE library version : 1.3.0 of Feb 23 2012 09:47:22
EXTRA library version : 1.3.0 of Feb 23 2012 09:47:03
IEG library version : 1.3.0 of Feb 23 2012 09:47:18
FILE library version : 1.7.1 of Feb 23 2012 09:47:04

on solasrv4 (buggy):
Climate Data Operators version 1.5.4 (http://code.zmaw.de/projects/cdo)
Compiler: gcc -std=gnu99 -g -O2 -D_REENTRANT -pthreads
version: gcc (GCC) 4.6.2
with: PTHREADS NC4 SZ Z
Compiled: by wcohrs on filesrv3 (i386-pc-solaris2.10) Mar 14 2012 13:32:19
CDI library version : 1.5.4 of Mar 14 2012 13:32:11
CGRIBEX library version : 1.5.1 of Aug 29 2011 20:30:27
netCDF library version : 4.1.3 of Aug 31 2011 16:45:16 $
HDF5 library version : 1.8.7
SERVICE library version : 1.3.0 of Mar 14 2012 13:31:52
EXTRA library version : 1.3.0 of Mar 14 2012 13:31:47
IEG library version : 1.3.0 of Mar 14 2012 13:31:51
FILE library version : 1.7.1 of Mar 14 2012 13:31:47

Thanks a lot,
Christian


Replies (5)

RE: strange platform-dependent problem with cdo selindexbox - Added by Jaison-Thomas Ambadan almost 12 years ago

Hi Christian,

If you could upload a sample of your data, that would be great; it may help the CDO guys here to identify the problem a bit faster! If the file size is > 50MB, you may select a very small geographical region (1 or 2 time & levels) using "cdo -sellonlatbox".

Cheers,
J.

RE: strange platform-dependent problem with cdo selindexbox - Added by Jaison-Thomas Ambadan almost 12 years ago

Hi again,

just one doubt: are you able to reproduce the problem with some other data sets? say for example:

cdo -f nc -selindexbox,10,20,10,20 -random,r360x180 outfile.nc

Cheers,
J.

RE: strange platform-dependent problem with cdo selindexbox - Added by Christian Stepanek almost 12 years ago

Dear Jason-Thomas,

thanks a lot for taking your time.

I have attached the global tsurf-field (2D, one time step) that has been used for the generation of the "buggy" lon-lat-selection as illustrated above. There is no magic about this file, it is just a common tsurf output of a coupled cosmos-aso simulation that has been retrieved via standard post-processing.

As a reply to your last answer: Yes, I have tried another model output file of a different simulation. The result on the solaris server is the same - the NaN-pattern persists, and this is true for all contained 2D- and 3D-variables (resulting netcdf attached as well). The selindexbox on a random field though does not lead to the NaN problem (netcdf attached as well).

I was just playing a little around with selecting different regions - and I can show that the occurrence of NaN cells also depends on the selected region:
selindexbox,81,92,2,8 -> NaN pattern
selindexbox,80,92,2,19 -> NaN pattern
selindexbox,80,92,5,34 -> no NaN pattern
selindexbox,1,90,2,8 -> no NaN pattern
selindexbox,1,48,2,34 -> no NaN pattern

That all looks very strange to me. If I can help with any additional information, please do not hesitate to ask.

Christian

Matran_new_echam5_main_mm_610001_tsurf.nc (20.7 KB) Matran_new_echam5_main_mm_610001_tsurf.nc global tsurf field used as input file for the selindexbox-operator
Matran_echam5_main_mm_210001_all_codes_selindexbox_solasrv4.nc (134 KB) Matran_echam5_main_mm_210001_all_codes_selindexbox_solasrv4.nc result of selindexbox,81,92,2,8 on another model output file containing multiple variables (2D/3D)
random_selindexbox.nc (1.21 KB) random_selindexbox.nc result of: cdo -f nc -selindexbox,81,92,2,8 -random,r360x180 random_selindexbox.nc

RE: strange platform-dependent problem with cdo selindexbox - Added by Jaison-Thomas Ambadan almost 12 years ago

Hi Christian,

Unfortunately I do not have a definite answer for this problem. Since you get the correct results with random numbers, my gut feeling says the problem has to do with the NetCDF attributes related to the _Fill/missing values of the variables (although it is not necessary that the data has missing values). But still the question remains why it gave correct result on the other platform (probably it has to do with NetCDF versions)

Nevertheless, I have few suggestions:

It seems that you have merged two original ECHAM GRIB data and then converted NetCDF.

1. First I would say try the same (selindexbox) without converting to NetCDF, on Solaris and convert the file in the other platform - if you get the correct result the definitely it is the NetCDF library on Solaris

2. On Solaris, before merging the files, it might be better if you set the missing value attribute, and then convert to netcdf, i.e.

cdo -f nc -merge -setmissval,-9999.9 file01.nc -setmissval,-9999.9 file02.nc outfile.nc

then do the next step "selindexbox"

3. OR you may try the "setmissval" before "selindexbox" on your existing merged input file.

I'm nor sure if this helps, may be Uwe or Ralf can give you more info/help.

Cheers,
J.

RE: strange platform-dependent problem with cdo selindexbox - Added by Christian Stepanek almost 12 years ago

Hi Jaison-Thomas,

I will try those procedures - thank you very very much for your help. In case the procedures don't work, I will get back to you, Uwe or Ralf.

Cheers,
Christian

    (1-5/5)