Area weighted average
Added by Lina Teckentrup almost 5 years ago
Hi,
I'm trying to calculate area weighted averages with CDO. For different reasons I need to calculate the average using the fldsum function though (which is not area weighted, right?) - so here is what I did
cdo -fldsum -mul prec_1990-2000_CRUNCEP.nc -gridarea prec_1990-2000_CRUNCEP.nc prec_1990-2000_CRUNCEP_fldsum.nc
cdo -fldsum -gridarea prec_1990-2000_CRUNCEP.nc gridarea_sum.nc
cdo div prec_1990-2000_CRUNCEP_fldsum.nc gridarea_sum.nc prec_1990-2000_CRUNCEP_fldsum_mean.nc
In my head this should equal the result of
cdo fldmean prec_1990-2000_CRUNCEP.nc prec_1990-2000_CRUNCEP_fldmean.nc
but it doesn't. They are different, exactly by a factor of 1.9777444762683 for all timesteps, independing on temporal resolution or length of timeseries:
cdo div prec_1990-2000_CRUNCEP_fldmean.nc prec_1990-2000_CRUNCEP_fldsum_mean.nc div.nc
I attached all the files. Where am I going wrong, I've been racking my brain but I don't see where I'm going wrong? I'm using Climate Data Operators version 1.8.0.
Thanks a lot!
Lina
Replies (1)
RE: Area weighted average - Added by Ralf Mueller almost 5 years ago
hi Lina!
I think the difference is a result of missing values in your input data:
- fldsum and fldmean leave the missing values out of the computation
- gridarea does not take the data into account, hence the gridarea is computed from all locations
your manual computed field mean uses more locations for computing the total weight that the data itself has. I wrote a call for testing this (also uploaded):
#!/usr/bin/env bash set -x input="$1" cdo -diffv [ -fldmean ${input} -div -fldsum -mul [ ${input} -gridarea ${input} ] [ -fldsum -gridarea ${input} ] ]
if you call this with something globally defined like '-topo,global_1'
(i.e. a global 1deg grid) you get very small differences:
./fldmeanVSsum.sh '-topo,global_1' + input=-topo,global_1 + cdo -diffv '[' -fldmean -topo,global_1 -div -fldsum -mul '[' -topo,global_1 -gridarea -topo,global_1 ']' '[' -fldsum -gridarea -topo,global_1 ']' ']' cdo(7) gridarea: 100% Date Time Level Gridsize Miss Diff : S Z Max_Absdiff Max_Reldiff : Parameter name 1 : 0001-01-01 00:00:00 0 1 0 1 : F F 4.1564e-10 1.7425e-13 : topo 1 of 1 records differ 0 of 1 records differ more than 0.001 cdo diffn: Processed 2 values from 2 variables over 2 timesteps [0.14s 50MB].you can shrink the diff more if you go to corse resolutions
./fldmeanVSsum.sh '-topo,global_10' 1 + input=-topo,global_10 + cdo -diffv '[' -fldmean -topo,global_10 -div -fldsum -mul '[' -topo,global_10 -gridarea -topo,global_10 ']' '[' -fldsum -gridarea -topo,global_10 ']' ']' cdo(7) gridarea: 0%cdo(10) gridarea: 100% Date Time Level Gridsize Miss Diff : S Z Max_Absdiff Max_Reldiff : Parameter name 1 : 0001-01-01 00:00:00 0 1 0 1 : F F 1.8645e-11 7.8692e-15 : topo 1 of 1 records differ 0 of 1 records differ more than 0.001 cdo diffn: Processed 2 values from 2 variables over 2 timesteps [0.01s 46MB].
Now let's see what happens with your file:
./fldmeanVSsum.sh prec_1990-2000_CRUNCEP.nc + input=prec_1990-2000_CRUNCEP.nc + cdo -diffv '[' -fldmean prec_1990-2000_CRUNCEP.nc -div -fldsum -mul '[' prec_1990-2000_CRUNCEP.nc -gridarea prec_1990-2000_CRUNCEP.nc ']' '[' -fldsum -gridarea prec_1990-2000_CRUNCEP.nc ']' ']' cdo(5) gridarea: 8%cdo(7) gridarea: 43%cdo(4) mul: Filling up stream2 >(pipe5.8)< by copying the first timestep. 100%cdo(2) div: Filling up stream2 >(pipe3.10)< by copying the first timestep. Date Time Level Gridsize Miss Diff : S Z Max_Absdiff Max_Reldiff : Parameter name 1 : 1990-07-01 06:00:00 0 1 0 1 : F F 214.30 0.49437 : prec 2 : 1991-07-01 06:00:00 0 1 0 1 : F F 237.14 0.49437 : prec 3 : 1992-07-01 06:00:00 0 1 0 1 : F F 222.12 0.49437 : prec 4 : 1993-07-01 06:00:00 0 1 0 1 : F F 255.90 0.49437 : prec 5 : 1994-07-01 06:00:00 0 1 0 1 : F F 173.49 0.49437 : prec 6 : 1995-07-01 06:00:00 0 1 0 1 : F F 270.91 0.49437 : prec 7 : 1996-07-01 06:00:00 0 1 0 1 : F F 242.66 0.49437 : prec 8 : 1997-07-01 06:00:00 0 1 0 1 : F F 270.50 0.49437 : prec 9 : 1998-07-01 06:00:00 0 1 0 1 : F F 289.60 0.49437 : prec 10 : 1999-07-01 06:00:00 0 1 0 1 : F F 301.72 0.49437 : prec 11 : 2000-07-01 06:00:00 0 1 0 1 : F F 358.35 0.49437 : prec 11 of 11 records differ cdo diffn: Processed 22 values from 2 variables over 22 timesteps [0.02s 47MB].the relative error of 0.5 corresponds to the factor of 2 that you wrote in your post.
So what next: in order to do a proper manual computation you need to mask the gridarea result with your data. so first you need to create mask, that is '1' on all valid data locations and missing (or 0) elsewhere. This can be done with
cdo -gtc,-10 prec_1990-2000_CRUNCEP.nc prec_mask.ncNow this mask needs to by multiplied with the computed gridarea everywhere in the CDO call:
input="$1" applyMask="-mul -gtc,-10 ${input}" # valid for fields that contain missing values cdo -diffv -fldmean ${input} -div -fldsum -mul ${input} ${applyMask} -gridarea ${input} -fldsum ${applyMask} -gridarea ${input}I put both version in the uploaded script. What remains is this
Date Time Level Gridsize Miss Diff : S Z Max_Absdiff Max_Reldiff : Parameter name 1 : 1990-07-01 06:00:00 0 1 0 1 : F F 8.8676e-12 2.0457e-14 : prec 2 : 1991-07-01 06:00:00 0 1 0 1 : F F 1.0061e-11 2.0975e-14 : prec 3 : 1992-07-01 06:00:00 0 1 0 1 : F F 8.1855e-12 1.8218e-14 : prec 4 : 1993-07-01 06:00:00 0 1 0 1 : F F 1.0118e-11 1.9547e-14 : prec 5 : 1994-07-01 06:00:00 0 1 0 1 : F F 7.9012e-12 2.2515e-14 : prec 6 : 1995-07-01 06:00:00 0 1 0 1 : F F 1.1823e-11 2.1576e-14 : prec 7 : 1996-07-01 06:00:00 0 1 0 1 : F F 1.0289e-11 2.0961e-14 : prec 8 : 1997-07-01 06:00:00 0 1 0 1 : F F 1.0459e-11 1.9116e-14 : prec 9 : 1998-07-01 06:00:00 0 1 0 1 : F F 1.1823e-11 2.0184e-14 : prec 10 : 1999-07-01 06:00:00 0 1 0 1 : F F 1.1823e-11 1.9373e-14 : prec 11 : 2000-07-01 06:00:00 0 1 0 1 : F F 1.4893e-11 2.0546e-14 : prec 11 of 11 records differ 0 of 11 records differ more than 0.001So we are back to differences coming from numerics,only I think.
hth
ralf
fldmeanVSsum.sh (406 Bytes) fldmeanVSsum.sh |