Weighted mean across a dimension, using weights from a second variable
Added by Peter Briggs over 4 years ago
I'm a 'powerless-user' of CDO and haven't been able to work out the sensible way to do this or find it in the forum. It seems like a problem that would be encountered often:
I have a collection of carbon budget variables, e.g. GPP with dimensions time, x, y, and patch. Patch has values 0, 1, 2 and represents up to 3 veg types that co-exist in the cell. The proportions of each veg type that are present in the cell are given by another variable patchfrac, with dimensions x, y, patch.
For each time, x, and y, I would like to calculate a GPPtot, that is, compress the patch dimension to give a total GPP for the cell. That is the weighted mean of the GPPs summed across the patch dimension at time, x, y, where the weights are the patchfrac's at x, y, across the patch dimension.
Can anyone help with the syntax of this? The sum of patchfrac across the patch dimension will be 1, but a complication is that patchfrac can be a missing value where there are less than 3 vegetation types in the cell.
Many thanks!
Peter
My header is:
netcdf bios_out_cable_cbudget_2010_2019 { dimensions: time = UNLIMITED ; // (120 currently) x = 169 ; y = 137 ; patch = 3 ; variables: double time(time) ; time:standard_name = "time" ; time:units = "seconds since 1990-01-01 00:00:00" ; time:calendar = "standard" ; time:axis = "T" ; float x(x) ; x:standard_name = "longitude" ; x:long_name = "longitude" ; x:units = "degrees_east" ; x:axis = "X" ; float y(y) ; y:standard_name = "latitude" ; y:long_name = "latitude" ; y:units = "degrees_north" ; y:axis = "Y" ; float HeteroResp(time, patch, y, x) ; HeteroResp:long_name = "Heterotrophic respiration" ; HeteroResp:units = "umol/m^2/s" ; HeteroResp:_FillValue = -1.e+33f ; HeteroResp:missing_value = -1.e+33f ; float GPP(time, patch, y, x) ; GPP:long_name = "Gross primary production" ; GPP:units = "umol/m^2/s" ; GPP:_FillValue = -1.e+33f ; GPP:missing_value = -1.e+33f ; float NPP(time, patch, y, x) ; NPP:long_name = "Net primary production" ; NPP:units = "umol/m^2/s" ; NPP:_FillValue = -1.e+33f ; NPP:missing_value = -1.e+33f ; float NBP(time, patch, y, x) ; NBP:long_name = "Net Biosphere Production (uptake +ve)" ; NBP:units = "umol/m^2/s" ; NBP:_FillValue = -1.e+33f ; NBP:missing_value = -1.e+33f ; float TotLivBiomass(time, patch, y, x) ; TotLivBiomass:long_name = "Total Biomass" ; TotLivBiomass:units = "kg C/m^2" ; TotLivBiomass:_FillValue = -1.e+33f ; TotLivBiomass:missing_value = -1.e+33f ; float patchfrac(patch, y, x) ; patchfrac:long_name = "Fractional cover of vegetation patches" ; patchfrac:units = "-" ; patchfrac:_FillValue = -1.e+33f ; patchfrac:missing_value = -1.e+33f ; // global attributes: :CDI = "Climate Data Interface version 1.9.8 (https://mpimet.mpg.de/cdi)" ; :Conventions = "CF-1.6" ; :history = "Fri Aug 21 13:17:02 2020: cdo seldate,2010-01-01T00:00:00,2020-01-01T00:00:00 bios_out_cable_cbudget_1990_2019.nc bios_out_cable_cbudget_2010_2019.nc\n", "Fri Aug 21 13:16:33 2020: cdo selname,GPP,NPP,NBP,HeteroResp,TotLivBiomass,patchfrac bios_out_cable_1990_2019.nc bios_out_cable_cbudget_1990_2019.nc" ; :Production = "2020/03/20 at 05:09:17" ; :Output_averaging = "monthly" ; :CDO = "Climate Data Operators version 1.9.8 (https://mpimet.mpg.de/cdo)" ; }
Replies (12)
RE: Weighted mean across a dimension, using weights from a second variable - Added by Peter Briggs over 4 years ago
Yes, well, no idea what happened to the format of the header. The red x should be surrounded by brackets, and no strikeouts.
RE: Weighted mean across a dimension, using weights from a second variable - Added by Karin Meier-Fleischer over 4 years ago
Hi Peter,
I'm really not an expert on this but I'm interessted to see the original data file (which is unfortunately not following the netCDF CF convention). Can you upload the file with 2 or more timesteps?
-Karin
RE: Weighted mean across a dimension, using weights from a second variable - Added by Ralf Mueller over 4 years ago
hi Peter!
I fixed this formatting thing - I hope
CDO cannot recognize what's the meaning of 'patch' (because there is no CF-convention about such things), but you could turn it into a vertical dimension. Just rename the variable and dimension patch
to level
. this turns the patchfrac
variable into a simple 3d variable, which could be summed up along its vertical axes.
hth
ralf
RE: Weighted mean across a dimension, using weights from a second variable - Added by Peter Briggs over 4 years ago
Thank you Karin and Ralf,
Karin, I've attached the first 5 timesteps. Ralf, I think I noted a forum post about fooling cdo into treating something as levels, and then using vertmean. I can certainly rename patch to level. I'm unclear about the syntax for creating a new 3D var within the file, say GPPtot, which averages across 'levels' using patchfrac as the level 'thickness'.
Once I've renamed patch to level, is it something like:
cdo expr,’GPPtot=(sellevel(patchfrac,0)*sellevel(GPP,0))+(sellevel(patchfrac,1)*sellevel(GPP,1))+(sellevel(patchfrac,2)*sellevel(GPP,2));’ infile outfile
Cheers, Peter
RE: Weighted mean across a dimension, using weights from a second variable - Added by Ralf Mueller over 4 years ago
hi Peter!
the renaming doesn't seem to be necessary:
cdo sinfov bios_out_cable_cbudget_5steps.nc File format : NetCDF2 -1 : Institut Source T Steptype Levels Num Points Num Dtype : Parameter name 1 : unknown unknown v instant 3 1 23153 1 F32 : HeteroResp 2 : unknown unknown v instant 3 1 23153 1 F32 : GPP 3 : unknown unknown v instant 3 1 23153 1 F32 : NPP 4 : unknown unknown v instant 3 1 23153 1 F32 : NBP 5 : unknown unknown v instant 3 1 23153 1 F32 : TotLivBiomass 6 : unknown unknown c instant 3 1 23153 1 F32 : patchfrac Grid coordinates : 1 : lonlat : points=23153 (169x137) x : 112 to 154 by 0.25 degrees_east y : -10 to -44 by -0.25 degrees_north Vertical coordinates : 1 : generic : levels=3 Time coordinate : 5 steps RefTime = 1990-01-01 00:00:00 Units = seconds Calendar = standard YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss 2010-01-16 12:00:00 2010-02-15 00:00:00 2010-03-16 12:00:00 2010-04-16 00:00:00 2010-05-16 12:00:00
Now you can use the
versum
operator cdo -s -infov -vertsum -selname,patchfrac `lf` 1 : 2010-01-16 12:00:00 0 23153 12146 : 1.0000 1.0000 1.0000 : patchfrac
IMO this looks good. what do you think?
RE: Weighted mean across a dimension, using weights from a second variable - Added by Karin Meier-Fleischer over 4 years ago
Shouldn't it be called as follows if I go by the formula above?
cdo -mul -selname,patchfrac bios_out_cable_cbudget_5steps.nc -selname,GPP bios_out_cable_cbudget_5steps.nc outfile.nc
cdo infon outfile.nc
-1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name 1 : 2010-01-16 12:00:00 1 23153 12146 : 0.0000 0.86062 16.294 : GPP 2 : 2010-01-16 12:00:00 2 23153 12269 : 0.0000 0.88957 14.120 : GPP 3 : 2010-01-16 12:00:00 3 23153 19831 : 0.0000 0.97467 11.528 : GPP 4 : 2010-02-15 00:00:00 1 23153 12146 : 0.0000 0.88540 13.600 : GPP 5 : 2010-02-15 00:00:00 2 23153 12269 : 0.0000 1.3239 12.460 : GPP 6 : 2010-02-15 00:00:00 3 23153 19831 : 0.0000 1.4449 9.6412 : GPP 7 : 2010-03-16 12:00:00 1 23153 12146 : 0.0000 0.81787 11.693 : GPP 8 : 2010-03-16 12:00:00 2 23153 12269 : 0.0000 1.3249 11.538 : GPP 9 : 2010-03-16 12:00:00 3 23153 19831 : 0.0000 1.5186 8.2991 : GPP 10 : 2010-04-16 00:00:00 1 23153 12146 : 0.0000 0.72262 11.533 : GPP 11 : 2010-04-16 00:00:00 2 23153 12269 : 0.0000 1.0925 10.304 : GPP 12 : 2010-04-16 00:00:00 3 23153 19831 : 0.0000 1.3995 7.9285 : GPP 13 : 2010-05-16 12:00:00 1 23153 12146 : 0.0000 0.65894 10.604 : GPP 14 : 2010-05-16 12:00:00 2 23153 12269 : 0.0000 0.80561 11.126 : GPP 15 : 2010-05-16 12:00:00 3 23153 19831 : 0.0000 1.0463 6.8018 : GPP
RE: Weighted mean across a dimension, using weights from a second variable - Added by Karin Meier-Fleischer over 4 years ago
And finally create the sum and change the variable name to GPPtot, too.
cdo -chname,GPP,GPPtot -vertsum -mul -selname,patchfrac bios_out_cable_cbudget_5steps.nc -selname,GPP bios_out_cable_cbudget_5steps.nc outfile.nc
RE: Weighted mean across a dimension, using weights from a second variable - Added by Karin Meier-Fleischer over 4 years ago
Ok, and here comes another one
cdo -vertsum -expr,'GPPtot=patchfrac*GPP' bios_out_cable_cbudget_5steps.nc outfile.nc
RE: Weighted mean across a dimension, using weights from a second variable - Added by Peter Briggs over 4 years ago
Thank you both for your help.
I've chosen Karin's last one:
cdo -vertsum -expr,'GPPtot=patchfrac*GPP' bios_out_cable_cbudget_5steps.nc outfile.nc
which seems to work in one step without renaming patch to level. So that raises a couple of questions:
1) It knew that each variable had 3 levels, i.e. that patch was the 'level' dimension. Is that because it assumed patch as being a vertical dimension because it was the only other choice which was not x, y, or time? If there was a 5th dimension would vertsum attempt to aggregate over both patch and the 5th dimension?
2) vertsum,TRUE (default) and vertsum,FALSE generate the same answer, not surprising because it has no information about level thickness. Presumably it is defaulting to equal weights, which we are circumventing by using patchfrac explicitly with -expr.
Cheers, Peter
RE: Weighted mean across a dimension, using weights from a second variable - Added by Karin Meier-Fleischer over 4 years ago
1) CDO don't allow a 5th dimension:
NetCDF datasets are only supported for the classic data model and arrays up to 4 dimensions. These dimensions should only be used by the horizontal and vertical grid and the time. The NetCDF attributes should follow the GDT, COARDS or CF Conventions.
2) The problem is that patch is not a dimension defined by CDO:
Available z-axis types: Z-axis type Description Units ----------------------------------------------------------- surface Surface pressure Pressure level pascal hybrid Hybrid model level height Height above ground meter depth_below_sea Depth below sea level meter depth_below_land Depth below land surface centimeter isentropic Isentropic (theta) level kelvin
RE: Weighted mean across a dimension, using weights from a second variable - Added by Karin Meier-Fleischer over 4 years ago
Compute the GPPtot (3 levels/patches):
cdo -expr,'GPPtot=patchfrac*GPP' bios_out_cable_cbudget_5steps.nc GPPtot.nc
Then add each level:
cdo -add -sellevel,3 GPPtot.nc -add -sellevel,2 GPPtot.nc -sellevel,1 GPPtot.nc GPPtot_add_levels.nc
RE: Weighted mean across a dimension, using weights from a second variable - Added by Peter Briggs over 4 years ago
Hi Karin, Thank you, and sorry for the delayed response. The difference between the end product of
1) cdo -vertsum -expr,'GPPtot=patchfrac*GPP' bios_out_cable_cbudget_5steps.nc outfile.nc
2) cdo -expr,'GPPtot=patchfrac*GPP' bios_out_cable_cbudget_5steps.nc GPPtot.nc
cdo -add -sellevel,3 GPPtot.nc -add -sellevel,2 GPPtot.nc -sellevel,1 GPPtot.nc GPPtot_add_levels.nc
appears to be only that
1) The sum is calculated for all levels that contain data, and missing values are ignored.
2) If a value is missing at any level the sum will be a missing value.
Behaviour 1) is what I'm after. Was there something you were trying to illustrate with 2) or just offering it as an alternative formulation?
Thanks, Peter