Project

General

Profile

Weighted mean across a dimension, using weights from a second variable

Added by Peter Briggs over 3 years ago

I'm a 'powerless-user' of CDO and haven't been able to work out the sensible way to do this or find it in the forum. It seems like a problem that would be encountered often:
I have a collection of carbon budget variables, e.g. GPP with dimensions time, x, y, and patch. Patch has values 0, 1, 2 and represents up to 3 veg types that co-exist in the cell. The proportions of each veg type that are present in the cell are given by another variable patchfrac, with dimensions x, y, patch.
For each time, x, and y, I would like to calculate a GPPtot, that is, compress the patch dimension to give a total GPP for the cell. That is the weighted mean of the GPPs summed across the patch dimension at time, x, y, where the weights are the patchfrac's at x, y, across the patch dimension.

Can anyone help with the syntax of this? The sum of patchfrac across the patch dimension will be 1, but a complication is that patchfrac can be a missing value where there are less than 3 vegetation types in the cell.

Many thanks!

Peter

My header is:

netcdf bios_out_cable_cbudget_2010_2019 {
dimensions:
        time = UNLIMITED ; // (120 currently)
        x = 169 ;
        y = 137 ;
        patch = 3 ;
variables:
        double time(time) ;
                time:standard_name = "time" ;
                time:units = "seconds since 1990-01-01 00:00:00" ;
                time:calendar = "standard" ;
                time:axis = "T" ;
        float x(x) ;
                x:standard_name = "longitude" ;
                x:long_name = "longitude" ;
                x:units = "degrees_east" ;
                x:axis = "X" ;
        float y(y) ;
                y:standard_name = "latitude" ;
                y:long_name = "latitude" ;
                y:units = "degrees_north" ;
                y:axis = "Y" ;
        float HeteroResp(time, patch, y, x) ;
                HeteroResp:long_name = "Heterotrophic respiration" ;
                HeteroResp:units = "umol/m^2/s" ;
                HeteroResp:_FillValue = -1.e+33f ;
                HeteroResp:missing_value = -1.e+33f ;
        float GPP(time, patch, y, x) ;
                GPP:long_name = "Gross primary production" ;
                GPP:units = "umol/m^2/s" ;
                GPP:_FillValue = -1.e+33f ;
                GPP:missing_value = -1.e+33f ;
        float NPP(time, patch, y, x) ;
                NPP:long_name = "Net primary production" ;
                NPP:units = "umol/m^2/s" ;
                NPP:_FillValue = -1.e+33f ;
                NPP:missing_value = -1.e+33f ;
        float NBP(time, patch, y, x) ;
                NBP:long_name = "Net Biosphere Production (uptake +ve)" ;
                NBP:units = "umol/m^2/s" ;
                NBP:_FillValue = -1.e+33f ;
                NBP:missing_value = -1.e+33f ;
        float TotLivBiomass(time, patch, y, x) ;
                TotLivBiomass:long_name = "Total Biomass" ;
                TotLivBiomass:units = "kg C/m^2" ;
                TotLivBiomass:_FillValue = -1.e+33f ;
                TotLivBiomass:missing_value = -1.e+33f ;
        float patchfrac(patch, y, x) ;
                patchfrac:long_name = "Fractional cover of vegetation patches" ;
                patchfrac:units = "-" ;
                patchfrac:_FillValue = -1.e+33f ;
                patchfrac:missing_value = -1.e+33f ;

// global attributes:
                :CDI = "Climate Data Interface version 1.9.8 (https://mpimet.mpg.de/cdi)" ;
                :Conventions = "CF-1.6" ;
                :history = "Fri Aug 21 13:17:02 2020: cdo seldate,2010-01-01T00:00:00,2020-01-01T00:00:00 bios_out_cable_cbudget_1990_2019.nc bios_out_cable_cbudget_2010_2019.nc\n",
                        "Fri Aug 21 13:16:33 2020: cdo selname,GPP,NPP,NBP,HeteroResp,TotLivBiomass,patchfrac bios_out_cable_1990_2019.nc bios_out_cable_cbudget_1990_2019.nc" ;
                :Production = "2020/03/20 at 05:09:17" ;
                :Output_averaging = "monthly" ;
                :CDO = "Climate Data Operators version 1.9.8 (https://mpimet.mpg.de/cdo)" ;
}


Replies (12)

RE: Weighted mean across a dimension, using weights from a second variable - Added by Peter Briggs over 3 years ago

Yes, well, no idea what happened to the format of the header. The red x should be surrounded by brackets, and no strikeouts.

RE: Weighted mean across a dimension, using weights from a second variable - Added by Karin Meier-Fleischer over 3 years ago

Hi Peter,

I'm really not an expert on this but I'm interessted to see the original data file (which is unfortunately not following the netCDF CF convention). Can you upload the file with 2 or more timesteps?

-Karin

RE: Weighted mean across a dimension, using weights from a second variable - Added by Ralf Mueller over 3 years ago

hi Peter!

I fixed this formatting thing - I hope

CDO cannot recognize what's the meaning of 'patch' (because there is no CF-convention about such things), but you could turn it into a vertical dimension. Just rename the variable and dimension patch to level. this turns the patchfrac variable into a simple 3d variable, which could be summed up along its vertical axes.

hth
ralf

RE: Weighted mean across a dimension, using weights from a second variable - Added by Peter Briggs over 3 years ago

Thank you Karin and Ralf,

Karin, I've attached the first 5 timesteps. Ralf, I think I noted a forum post about fooling cdo into treating something as levels, and then using vertmean. I can certainly rename patch to level. I'm unclear about the syntax for creating a new 3D var within the file, say GPPtot, which averages across 'levels' using patchfrac as the level 'thickness'.
Once I've renamed patch to level, is it something like:

cdo expr,’GPPtot=(sellevel(patchfrac,0)*sellevel(GPP,0))+(sellevel(patchfrac,1)*sellevel(GPP,1))+(sellevel(patchfrac,2)*sellevel(GPP,2));’ infile outfile

Cheers, Peter

RE: Weighted mean across a dimension, using weights from a second variable - Added by Ralf Mueller over 3 years ago

hi Peter!

the renaming doesn't seem to be necessary:

cdo sinfov bios_out_cable_cbudget_5steps.nc
   File format : NetCDF2
    -1 : Institut Source   T Steptype Levels Num    Points Num Dtype : Parameter name
     1 : unknown  unknown  v instant       3   1     23153   1  F32  : HeteroResp    
     2 : unknown  unknown  v instant       3   1     23153   1  F32  : GPP           
     3 : unknown  unknown  v instant       3   1     23153   1  F32  : NPP           
     4 : unknown  unknown  v instant       3   1     23153   1  F32  : NBP           
     5 : unknown  unknown  v instant       3   1     23153   1  F32  : TotLivBiomass 
     6 : unknown  unknown  c instant       3   1     23153   1  F32  : patchfrac     
   Grid coordinates :
     1 : lonlat                   : points=23153 (169x137)
                                x : 112 to 154 by 0.25 degrees_east
                                y : -10 to -44 by -0.25 degrees_north
   Vertical coordinates :
     1 : generic                  : levels=3
   Time coordinate :  5 steps
     RefTime =  1990-01-01 00:00:00  Units = seconds  Calendar = standard
  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
  2010-01-16 12:00:00  2010-02-15 00:00:00  2010-03-16 12:00:00  2010-04-16 00:00:00
  2010-05-16 12:00:00

Now you can use the versum operator
cdo -s -infov -vertsum -selname,patchfrac `lf`
     1 : 2010-01-16 12:00:00       0    23153   12146 :      1.0000      1.0000      1.0000 : patchfrac     

IMO this looks good. what do you think?

RE: Weighted mean across a dimension, using weights from a second variable - Added by Karin Meier-Fleischer over 3 years ago

Shouldn't it be called as follows if I go by the formula above?

cdo -mul -selname,patchfrac bios_out_cable_cbudget_5steps.nc -selname,GPP bios_out_cable_cbudget_5steps.nc outfile.nc

cdo infon outfile.nc


    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
     1 : 2010-01-16 12:00:00       1    23153   12146 :      0.0000     0.86062      16.294 : GPP           
     2 : 2010-01-16 12:00:00       2    23153   12269 :      0.0000     0.88957      14.120 : GPP           
     3 : 2010-01-16 12:00:00       3    23153   19831 :      0.0000     0.97467      11.528 : GPP           
     4 : 2010-02-15 00:00:00       1    23153   12146 :      0.0000     0.88540      13.600 : GPP           
     5 : 2010-02-15 00:00:00       2    23153   12269 :      0.0000      1.3239      12.460 : GPP           
     6 : 2010-02-15 00:00:00       3    23153   19831 :      0.0000      1.4449      9.6412 : GPP           
     7 : 2010-03-16 12:00:00       1    23153   12146 :      0.0000     0.81787      11.693 : GPP           
     8 : 2010-03-16 12:00:00       2    23153   12269 :      0.0000      1.3249      11.538 : GPP           
     9 : 2010-03-16 12:00:00       3    23153   19831 :      0.0000      1.5186      8.2991 : GPP           
    10 : 2010-04-16 00:00:00       1    23153   12146 :      0.0000     0.72262      11.533 : GPP           
    11 : 2010-04-16 00:00:00       2    23153   12269 :      0.0000      1.0925      10.304 : GPP           
    12 : 2010-04-16 00:00:00       3    23153   19831 :      0.0000      1.3995      7.9285 : GPP           
    13 : 2010-05-16 12:00:00       1    23153   12146 :      0.0000     0.65894      10.604 : GPP           
    14 : 2010-05-16 12:00:00       2    23153   12269 :      0.0000     0.80561      11.126 : GPP           
    15 : 2010-05-16 12:00:00       3    23153   19831 :      0.0000      1.0463      6.8018 : GPP

RE: Weighted mean across a dimension, using weights from a second variable - Added by Karin Meier-Fleischer over 3 years ago

And finally create the sum and change the variable name to GPPtot, too.

cdo -chname,GPP,GPPtot -vertsum -mul -selname,patchfrac bios_out_cable_cbudget_5steps.nc -selname,GPP bios_out_cable_cbudget_5steps.nc outfile.nc

;)

RE: Weighted mean across a dimension, using weights from a second variable - Added by Karin Meier-Fleischer over 3 years ago

Ok, and here comes another one

cdo -vertsum -expr,'GPPtot=patchfrac*GPP' bios_out_cable_cbudget_5steps.nc outfile.nc

RE: Weighted mean across a dimension, using weights from a second variable - Added by Peter Briggs over 3 years ago

Thank you both for your help.

I've chosen Karin's last one:

cdo -vertsum -expr,'GPPtot=patchfrac*GPP' bios_out_cable_cbudget_5steps.nc outfile.nc

which seems to work in one step without renaming patch to level. So that raises a couple of questions:

1) It knew that each variable had 3 levels, i.e. that patch was the 'level' dimension. Is that because it assumed patch as being a vertical dimension because it was the only other choice which was not x, y, or time? If there was a 5th dimension would vertsum attempt to aggregate over both patch and the 5th dimension?
2) vertsum,TRUE (default) and vertsum,FALSE generate the same answer, not surprising because it has no information about level thickness. Presumably it is defaulting to equal weights, which we are circumventing by using patchfrac explicitly with -expr.

Cheers, Peter

RE: Weighted mean across a dimension, using weights from a second variable - Added by Karin Meier-Fleischer over 3 years ago

1) CDO don't allow a 5th dimension:

NetCDF datasets are only supported for the classic data model and arrays up to 4 dimensions. 
These dimensions should only be used by the horizontal and vertical grid and the time. 
The NetCDF attributes should follow the GDT, COARDS or CF Conventions.

2) The problem is that patch is not a dimension defined by CDO:

Available z-axis types:

Z-axis type         Description               Units
-----------------------------------------------------------
surface            Surface
pressure           Pressure level             pascal
hybrid             Hybrid model level        
height             Height above ground        meter
depth_below_sea    Depth below sea level      meter
depth_below_land   Depth below land surface   centimeter
isentropic         Isentropic (theta) level   kelvin

RE: Weighted mean across a dimension, using weights from a second variable - Added by Karin Meier-Fleischer over 3 years ago

Compute the GPPtot (3 levels/patches):

cdo -expr,'GPPtot=patchfrac*GPP' bios_out_cable_cbudget_5steps.nc GPPtot.nc

Then add each level:

cdo -add -sellevel,3 GPPtot.nc -add -sellevel,2 GPPtot.nc -sellevel,1 GPPtot.nc GPPtot_add_levels.nc

RE: Weighted mean across a dimension, using weights from a second variable - Added by Peter Briggs over 3 years ago

Hi Karin, Thank you, and sorry for the delayed response. The difference between the end product of

1) cdo -vertsum -expr,'GPPtot=patchfrac*GPP' bios_out_cable_cbudget_5steps.nc outfile.nc

2) cdo -expr,'GPPtot=patchfrac*GPP' bios_out_cable_cbudget_5steps.nc GPPtot.nc
cdo -add -sellevel,3 GPPtot.nc -add -sellevel,2 GPPtot.nc -sellevel,1 GPPtot.nc GPPtot_add_levels.nc

appears to be only that

1) The sum is calculated for all levels that contain data, and missing values are ignored.

2) If a value is missing at any level the sum will be a missing value.

Behaviour 1) is what I'm after. Was there something you were trying to illustrate with 2) or just offering it as an alternative formulation?

Thanks, Peter

    (1-12/12)