Area weighted average

Added by Lina Teckentrup about 5 years ago

Hi,

I'm trying to calculate area weighted averages with CDO. For different reasons I need to calculate the average using the fldsum function though (which is not area weighted, right?) - so here is what I did

cdo -fldsum -mul prec_1990-2000_CRUNCEP.nc -gridarea prec_1990-2000_CRUNCEP.nc prec_1990-2000_CRUNCEP_fldsum.nc
cdo -fldsum -gridarea prec_1990-2000_CRUNCEP.nc gridarea_sum.nc
cdo div prec_1990-2000_CRUNCEP_fldsum.nc gridarea_sum.nc prec_1990-2000_CRUNCEP_fldsum_mean.nc

In my head this should equal the result of

cdo fldmean prec_1990-2000_CRUNCEP.nc prec_1990-2000_CRUNCEP_fldmean.nc

but it doesn't. They are different, exactly by a factor of 1.9777444762683 for all timesteps, independing on temporal resolution or length of timeseries:

cdo div prec_1990-2000_CRUNCEP_fldmean.nc prec_1990-2000_CRUNCEP_fldsum_mean.nc div.nc

I attached all the files. Where am I going wrong, I've been racking my brain but I don't see where I'm going wrong? I'm using Climate Data Operators version 1.8.0.

Thanks a lot!
Lina

Download all files

gridarea_sum.nc (5.67 KB) gridarea_sum.nc
prec_1990-2000_CRUNCEP_fldmean.nc (6.62 KB) prec_1990-2000_CRUNCEP_fldmean.nc
prec_1990-2000_CRUNCEP_fldsum.nc (6.66 KB) prec_1990-2000_CRUNCEP_fldsum.nc
prec_1990-2000_CRUNCEP_fldsum_mean.nc (6.78 KB) prec_1990-2000_CRUNCEP_fldsum_mean.nc
prec_1990-2000_CRUNCEP.nc (498 KB) prec_1990-2000_CRUNCEP.nc

Replies (1)

RE: Area weighted average - Added by Ralf Mueller about 5 years ago

hi Lina!

I think the difference is a result of missing values in your input data:

fldsum and fldmean leave the missing values out of the computation
gridarea does not take the data into account, hence the gridarea is computed from all locations

your manual computed field mean uses more locations for computing the total weight that the data itself has. I wrote a call for testing this (also uploaded):

#!/usr/bin/env bash
set -x
input="$1" 
cdo -diffv [ -fldmean ${input} -div -fldsum -mul [ ${input} -gridarea ${input} ] [ -fldsum -gridarea ${input} ] ]

if you call this with + input=-topo,global_1 + cdo -diffv '[' -fldmean cdo(7) gridarea: 100% 1 : 0001-01-01 00:00:00 1 of 1 records differ 0 of 1 records differ more than 0.001 cdo diffn: Processed you can shrink + input=-topo,global_10 + cdo -diffv '[' -fldmean cdo(7) gridarea: 1 : 0001-01-01 00:00:00 1 of 1 records differ 0 of 1 records differ more than 0.001 cdo diffn: Processed
Now let's + input=prec_1990-2000_CRUNCEP.nc + cdo -diffv '[' -fldmean cdo(5) gridarea: 8%cdo(7) gridarea: 100%cdo(2) div: Filling Date 1 : 1990-07-01 06:00:00 2 : 1991-07-01 06:00:00 3 : 1992-07-01 06:00:00 4 : 1993-07-01 06:00:00 5 : 1994-07-01 06:00:00 6 : 1995-07-01 06:00:00 7 : 1996-07-01 06:00:00 8 : 1997-07-01 06:00:00 9 : 1998-07-01 06:00:00 10 : 1999-07-01 06:00:00 11 : 2000-07-01 06:00:00 11 of 11 records differ cdo diffn: Processed the relative applyMask="-mul -gtc,-10 ${input}" # valid for fields cdo -diffv -fldmean 1 : 1990-07-01 06:00:00 2 : 1991-07-01 06:00:00 3 : 1992-07-01 06:00:00 4 : 1993-07-01 06:00:00 5 : 1994-07-01 06:00:00 6 : 1995-07-01 06:00:00 7 : 1996-07-01 06:00:00 8 : 1997-07-01 06:00:00 9 : 1998-07-01 06:00:00 10 : 1999-07-01 06:00:00 11 : 2000-07-01 06:00:00 11 of 11 records differ 0 of 11 records differ something globally defined like '-topo,global_1' (i.e. a global 1deg grid) you get very small differences:

./fldmeanVSsum.sh '-topo,global_1' -topo,global_1 -div -fldsum -mul '[' -topo,global_1 -gridarea -topo,global_1 ']' '[' -fldsum -gridarea -topo,global_1 ']' ']' Date     Time   Level Gridsize    Miss    Diff : S Z  Max_Absdiff Max_Reldiff : Parameter name 0        1       0       1 : F F   4.1564e-10  1.7425e-13 : topo 2 values from 2 variables over 2 timesteps [0.14s 50MB]. the diff more if you go to corse resolutions./fldmeanVSsum.sh '-topo,global_10'                                                                                             1 -topo,global_10 -div -fldsum -mul '[' -topo,global_10 -gridarea -topo,global_10 ']' '[' -fldsum -gridarea -topo,global_10 ']' ']' 0%cdo(10) gridarea: 100%               Date     Time   Level Gridsize    Miss    Diff : S Z  Max_Absdiff Max_Reldiff : Parameter name 0        1       0       1 : F F   1.8645e-11  7.8692e-15 : topo 2 values from 2 variables over 2 timesteps [0.01s 46MB]. see what happens with your file:./fldmeanVSsum.sh prec_1990-2000_CRUNCEP.nc prec_1990-2000_CRUNCEP.nc -div -fldsum -mul '[' prec_1990-2000_CRUNCEP.nc -gridarea prec_1990-2000_CRUNCEP.nc ']' '[' -fldsum -gridarea prec_1990-2000_CRUNCEP.nc ']' ']' 43%cdo(4) mul: Filling up stream2 >(pipe5.8)< by copying the first timestep. up stream2 >(pipe3.10)< by copying the first timestep. Time   Level Gridsize    Miss    Diff : S Z  Max_Absdiff Max_Reldiff : Parameter name 0        1       0       1 : F F       214.30     0.49437 : prec 0        1       0       1 : F F       237.14     0.49437 : prec 0        1       0       1 : F F       222.12     0.49437 : prec 0        1       0       1 : F F       255.90     0.49437 : prec 0        1       0       1 : F F       173.49     0.49437 : prec 0        1       0       1 : F F       270.91     0.49437 : prec 0        1       0       1 : F F       242.66     0.49437 : prec 0        1       0       1 : F F       270.50     0.49437 : prec 0        1       0       1 : F F       289.60     0.49437 : prec 0        1       0       1 : F F       301.72     0.49437 : prec 0        1       0       1 : F F       358.35     0.49437 : prec 22 values from 2 variables over 22 timesteps [0.02s 47MB]. error of 0.5 corresponds to the factor of 2 that you wrote in your post. 
So what next: in order to do a proper manual computation you need to mask the gridarea result with your data. so first you need to create  mask, that is '1' on all valid data locations and missing (or 0) elsewhere. This can be done withcdo -gtc,-10 prec_1990-2000_CRUNCEP.nc prec_mask.nc
Now this mask needs to by multiplied with the computed gridarea everywhere in the CDO call:input="$1"

that contain missing values ${input} -div -fldsum -mul  ${input} ${applyMask} -gridarea ${input}  -fldsum ${applyMask} -gridarea ${input}
I put both version in the uploaded script. What remains is this               Date     Time   Level Gridsize    Miss    Diff : S Z  Max_Absdiff Max_Reldiff : Parameter name 0        1       0       1 : F F   8.8676e-12  2.0457e-14 : prec 0        1       0       1 : F F   1.0061e-11  2.0975e-14 : prec 0        1       0       1 : F F   8.1855e-12  1.8218e-14 : prec 0        1       0       1 : F F   1.0118e-11  1.9547e-14 : prec 0        1       0       1 : F F   7.9012e-12  2.2515e-14 : prec 0        1       0       1 : F F   1.1823e-11  2.1576e-14 : prec 0        1       0       1 : F F   1.0289e-11  2.0961e-14 : prec 0        1       0       1 : F F   1.0459e-11  1.9116e-14 : prec 0        1       0       1 : F F   1.1823e-11  2.0184e-14 : prec 0        1       0       1 : F F   1.1823e-11  1.9373e-14 : prec 0        1       0       1 : F F   1.4893e-11  2.0546e-14 : prec more than 0.001
So we are back to differences coming from numerics,only I think.


	hth
ralf

fldmeanVSsum.sh (406 Bytes) fldmeanVSsum.sh

(1-1/1)

Project

General

Profile

CDO