Project

General

Profile

Bizarre behaviour when using 'timpctl' with large bin number

Added by Joshua Miller 15 days ago

Hello, I am computing the median of a very large dataset (150G) along the time dimension for each lat/lon coordinate. I am using CDO's 'timpctl' to do this.

The commands I ran were:

export CDO_PCTL_NBINS = 101
cdo timpctl,50 input_file.nc min.nc max.nc median_nbins101.nc

export CDO_PCTL_NBINS = 1001
cdo timpctl,50 input_file.nc min.nc max.nc median_nbins10001.nc

export CDO_PCTL_NBINS = 10001
cdo timpctl,50 input_file.nc min.nc max.nc median_nbins10001.nc

export CDO_PCTL_NBINS = 100001
cdo timpctl,50 input_file.nc min.nc max.nc median_nbins100001.nc

I have attached a figure showing what happens, but to summarize, with increasing bin number the median shrinks and eventually becomes negative. However, the 99.99 percentile remains stable across the bin numbers.

I created min.nc and max.nc using cdo timmin input_file.nc min.nc and cdo timmax input_file.nc max.nc

I manually verified using Python that the smallest value in min.nc is 0, and the maximum in max.nc is 127.35. Therefore it should be impossible to have a negative median value, so I am reaching out to see what could be the cause of these results.


Replies (1)

RE: Bizarre behaviour when using 'timpctl' with large bin number - Added by Uwe Schulzweida 12 days ago

Dear Joshua,

Thanks for this report! This problem is caused by a short integer overflow. So the maximum value for CDO_PCTL_NBINS is 32768. We will fix this bug in the next CDO release.

Cheers,
Uwe

    (1-1/1)