Project

General

Profile

Area weighted average

Added by Lina Teckentrup almost 4 years ago

Hi,

I'm trying to calculate area weighted averages with CDO. For different reasons I need to calculate the average using the fldsum function though (which is not area weighted, right?) - so here is what I did

cdo -fldsum -mul prec_1990-2000_CRUNCEP.nc -gridarea prec_1990-2000_CRUNCEP.nc prec_1990-2000_CRUNCEP_fldsum.nc
cdo -fldsum -gridarea prec_1990-2000_CRUNCEP.nc gridarea_sum.nc
cdo div prec_1990-2000_CRUNCEP_fldsum.nc gridarea_sum.nc prec_1990-2000_CRUNCEP_fldsum_mean.nc

In my head this should equal the result of

cdo fldmean prec_1990-2000_CRUNCEP.nc prec_1990-2000_CRUNCEP_fldmean.nc

but it doesn't. They are different, exactly by a factor of 1.9777444762683 for all timesteps, independing on temporal resolution or length of timeseries:

cdo div prec_1990-2000_CRUNCEP_fldmean.nc prec_1990-2000_CRUNCEP_fldsum_mean.nc div.nc

I attached all the files. Where am I going wrong, I've been racking my brain but I don't see where I'm going wrong? I'm using Climate Data Operators version 1.8.0.

Thanks a lot!
Lina


Replies (1)

RE: Area weighted average - Added by Ralf Mueller almost 4 years ago

hi Lina!

I think the difference is a result of missing values in your input data:

  • fldsum and fldmean leave the missing values out of the computation
  • gridarea does not take the data into account, hence the gridarea is computed from all locations

your manual computed field mean uses more locations for computing the total weight that the data itself has. I wrote a call for testing this (also uploaded):

#!/usr/bin/env bash
set -x
input="$1" 
cdo -diffv [ -fldmean ${input} -div -fldsum -mul [ ${input} -gridarea ${input} ] [ -fldsum -gridarea ${input} ] ]

if you call this with something globally defined like '-topo,global_1' (i.e. a global 1deg grid) you get very small differences:

./fldmeanVSsum.sh '-topo,global_1'         
+ input=-topo,global_1
+ cdo -diffv '[' -fldmean -topo,global_1 -div -fldsum -mul '[' -topo,global_1 -gridarea -topo,global_1 ']' '[' -fldsum -gridarea -topo,global_1 ']' ']'
cdo(7) gridarea: 100%               Date     Time   Level Gridsize    Miss    Diff : S Z  Max_Absdiff Max_Reldiff : Parameter name
     1 : 0001-01-01 00:00:00       0        1       0       1 : F F   4.1564e-10  1.7425e-13 : topo       
  1 of 1 records differ
  0 of 1 records differ more than 0.001
cdo    diffn: Processed 2 values from 2 variables over 2 timesteps [0.14s 50MB].
you can shrink the diff more if you go to corse resolutions
./fldmeanVSsum.sh '-topo,global_10'                                                                                             1 
+ input=-topo,global_10
+ cdo -diffv '[' -fldmean -topo,global_10 -div -fldsum -mul '[' -topo,global_10 -gridarea -topo,global_10 ']' '[' -fldsum -gridarea -topo,global_10 ']' ']'
cdo(7) gridarea:   0%cdo(10) gridarea: 100%               Date     Time   Level Gridsize    Miss    Diff : S Z  Max_Absdiff Max_Reldiff : Parameter name
     1 : 0001-01-01 00:00:00       0        1       0       1 : F F   1.8645e-11  7.8692e-15 : topo       
  1 of 1 records differ
  0 of 1 records differ more than 0.001
cdo    diffn: Processed 2 values from 2 variables over 2 timesteps [0.01s 46MB].

Now let's see what happens with your file:
./fldmeanVSsum.sh prec_1990-2000_CRUNCEP.nc
+ input=prec_1990-2000_CRUNCEP.nc
+ cdo -diffv '[' -fldmean prec_1990-2000_CRUNCEP.nc -div -fldsum -mul '[' prec_1990-2000_CRUNCEP.nc -gridarea prec_1990-2000_CRUNCEP.nc ']' '[' -fldsum -gridarea prec_1990-2000_CRUNCEP.nc ']' ']'
cdo(5) gridarea:   8%cdo(7) gridarea:  43%cdo(4) mul: Filling up stream2 >(pipe5.8)< by copying the first timestep.
100%cdo(2) div: Filling up stream2 >(pipe3.10)< by copying the first timestep.
               Date     Time   Level Gridsize    Miss    Diff : S Z  Max_Absdiff Max_Reldiff : Parameter name
     1 : 1990-07-01 06:00:00       0        1       0       1 : F F       214.30     0.49437 : prec       
     2 : 1991-07-01 06:00:00       0        1       0       1 : F F       237.14     0.49437 : prec       
     3 : 1992-07-01 06:00:00       0        1       0       1 : F F       222.12     0.49437 : prec       
     4 : 1993-07-01 06:00:00       0        1       0       1 : F F       255.90     0.49437 : prec       
     5 : 1994-07-01 06:00:00       0        1       0       1 : F F       173.49     0.49437 : prec       
     6 : 1995-07-01 06:00:00       0        1       0       1 : F F       270.91     0.49437 : prec       
     7 : 1996-07-01 06:00:00       0        1       0       1 : F F       242.66     0.49437 : prec       
     8 : 1997-07-01 06:00:00       0        1       0       1 : F F       270.50     0.49437 : prec       
     9 : 1998-07-01 06:00:00       0        1       0       1 : F F       289.60     0.49437 : prec       
    10 : 1999-07-01 06:00:00       0        1       0       1 : F F       301.72     0.49437 : prec       
    11 : 2000-07-01 06:00:00       0        1       0       1 : F F       358.35     0.49437 : prec       
  11 of 11 records differ
cdo    diffn: Processed 22 values from 2 variables over 22 timesteps [0.02s 47MB].
the relative error of 0.5 corresponds to the factor of 2 that you wrote in your post.
So what next: in order to do a proper manual computation you need to mask the gridarea result with your data. so first you need to create mask, that is '1' on all valid data locations and missing (or 0) elsewhere. This can be done with
cdo -gtc,-10 prec_1990-2000_CRUNCEP.nc prec_mask.nc
Now this mask needs to by multiplied with the computed gridarea everywhere in the CDO call:
input="$1"                                    
applyMask="-mul -gtc,-10 ${input}"            

# valid for fields that contain missing values                                                                                                                 
cdo -diffv  -fldmean ${input} -div -fldsum -mul  ${input} ${applyMask} -gridarea ${input}  -fldsum ${applyMask} -gridarea ${input}
I put both version in the uploaded script. What remains is this
               Date     Time   Level Gridsize    Miss    Diff : S Z  Max_Absdiff Max_Reldiff : Parameter name
     1 : 1990-07-01 06:00:00       0        1       0       1 : F F   8.8676e-12  2.0457e-14 : prec       
     2 : 1991-07-01 06:00:00       0        1       0       1 : F F   1.0061e-11  2.0975e-14 : prec       
     3 : 1992-07-01 06:00:00       0        1       0       1 : F F   8.1855e-12  1.8218e-14 : prec       
     4 : 1993-07-01 06:00:00       0        1       0       1 : F F   1.0118e-11  1.9547e-14 : prec       
     5 : 1994-07-01 06:00:00       0        1       0       1 : F F   7.9012e-12  2.2515e-14 : prec       
     6 : 1995-07-01 06:00:00       0        1       0       1 : F F   1.1823e-11  2.1576e-14 : prec       
     7 : 1996-07-01 06:00:00       0        1       0       1 : F F   1.0289e-11  2.0961e-14 : prec       
     8 : 1997-07-01 06:00:00       0        1       0       1 : F F   1.0459e-11  1.9116e-14 : prec       
     9 : 1998-07-01 06:00:00       0        1       0       1 : F F   1.1823e-11  2.0184e-14 : prec       
    10 : 1999-07-01 06:00:00       0        1       0       1 : F F   1.1823e-11  1.9373e-14 : prec       
    11 : 2000-07-01 06:00:00       0        1       0       1 : F F   1.4893e-11  2.0546e-14 : prec       
  11 of 11 records differ
  0 of 11 records differ more than 0.001
So we are back to differences coming from numerics,only I think.

hth
ralf

    (1-1/1)