Bizarre behaviour when using 'timpctl' with large bin number
Added by Joshua Miller 15 days ago
Hello, I am computing the median of a very large dataset (150G) along the time dimension for each lat/lon coordinate. I am using CDO's 'timpctl' to do this.
The commands I ran were:
export CDO_PCTL_NBINS = 101
cdo timpctl,50 input_file.nc min.nc max.nc median_nbins101.nc
export CDO_PCTL_NBINS = 1001
cdo timpctl,50 input_file.nc min.nc max.nc median_nbins10001.nc
export CDO_PCTL_NBINS = 10001
cdo timpctl,50 input_file.nc min.nc max.nc median_nbins10001.nc
export CDO_PCTL_NBINS = 100001
cdo timpctl,50 input_file.nc min.nc max.nc median_nbins100001.nc
I have attached a figure showing what happens, but to summarize, with increasing bin number the median shrinks and eventually becomes negative. However, the 99.99 percentile remains stable across the bin numbers.
I created min.nc and max.nc using cdo timmin input_file.nc min.nc and cdo timmax input_file.nc max.nc
I manually verified using Python that the smallest value in min.nc is 0, and the maximum in max.nc is 127.35. Therefore it should be impossible to have a negative median value, so I am reaching out to see what could be the cause of these results.
Replies (1)
RE: Bizarre behaviour when using 'timpctl' with large bin number - Added by Uwe Schulzweida 12 days ago
Dear Joshua,
Thanks for this report! This problem is caused by a short integer overflow. So the maximum value for CDO_PCTL_NBINS is 32768. We will fix this bug in the next CDO release.
Cheers,
Uwe