Project

General

Profile

Selecting data based on 1D variable value

Added by Fabrizio Amoruso over 1 year ago

Hallo everybody,

While I have been using CDO for a while now (I use with Cygwin on windows, to be precise) and solved many issues using the extremely helpful resources provided by the CDO manual and the forum, I stumbled upon a simple issue that I am still unsure how to solve, notwithstanding its (apparent?) simplicity:

1. I dispose of netcdf4 catchment temperature file provided by Copernicus EU, which contains one 2D variable (local river temperature-seasonality expressed as "locwtemp_ymonmean") and two 1D variables (catchment id, expressed as "id" and time expressed as... "time").

2. I would like to construct a query that, very simply, allows me to extract all locwtemp_ymonmean values only for id=9507523, for all time values (the id value taken from a shapefile provided together with the netcdf file).

3. The way I'd do it would be to use nco operator ncks in this fashion: ncks -d id,catchmentid infile.in outfile.out

4. There is an issue tho, "catchmentid" in 3. not being the real catchment id, but the progressive (starting from 0) id assigned to that catchment within the file. Meaning id 9507523 is progressively numbered differently, since the list of catchments being included in the file does not start form 0.

5. While I understand that finding the id of the catchment is extremely easy through an indirect calculation, I need to deal with multiple catchment ids and multiple files, and I would like to streamline as much as possible the coding involved. I tried multiple solution with cdo select etc., but to no avail as their intended purpose differ from what I require.

I am sure there is a simple solution and would be grateful if someone would point me to any entry in this forum or solution to solve it.
Many thanks!


Replies (4)

RE: Selecting data based on 1D variable value - Added by Ralf Mueller over 1 year ago

hi Fabrizio!

is this a Copernicus-question? CDO works on data files. So you have something like that please upload it.

cheers
ralf

RE: Selecting data based on 1D variable value - Added by Fabrizio Amoruso over 1 year ago

Hallo Ralf,

Thanks for your always fast feedbacks.

The question is related to using cdo on netcdf4 files downloaded from Copernicus.

One example of the datafile can be found attached.

The file contains monthly average seasonality temperature values (in local rivers) for catchment ids provided in an attached shapefile (which you do not need but I'll attach just in case: https://zenodo.org/record/581451#.Y3O9h8fMJlo) using the E-HYPEcatch 00 model and provides both historical and RCP scenario data.

Extracting catchment id 9507523 (progressive id=23919) should therefore output 48 (12x4) values, 12 monthly entries for each historical and scenario datasets exclusive for that catchment id.

I am currently using ncks -d id,23919 infile.in outfile.out to extract the data, but I would like to forgo completely using the id array position and instead use the required value (much like ncks -d id,9507523 infile.in outfile.out, which ofc does not work and outputs an out of range error).

I realized furthermore that catchments in the file are not progressively sorted, meaning that I first need to find the netcdf array position of the catchment required (9507523 in the shapefile and position 23919 in the netcdf array) and then use it as input for the ncks function.

Many thanks indeed!

Hypecatch00.nc (16.2 MB) Hypecatch00.nc HYPEcatch00

RE: Selecting data based on 1D variable value - Added by Ralf Mueller over 1 year ago

CDO regards the id as normal location information because of the id:axis = "X". Hence CDO won't be able the select a certain location because the id has nothing to do with a normal coordinate system.

But you input seems to be small enough, so I would work with xarray on this like

>>> import xarray as xr
>>> xr.open_dataset('Hypecatch00.nc')
<xarray.Dataset>
Dimensions:            (time: 120, id: 34810)
Coordinates:
  * time               (time) datetime64[ns] 1971-01-01 ... 2071-12-01
  * id                 (id) float64 8.802e+06 8e+06 ... 9.606e+06 9.602e+06
Data variables:
    locwtemp_ymonmean  (time, id) float32 ...
Attributes: (12/27)
    CDI:                      Climate Data Interface version 2.0.5 (https://m...
    Conventions:              CF-1.6
    source:                   A set of EURO-CORDEX EUR-11 RCM was bias adjust...
    institution:              SMHI, www.smhi.se
    NCO:                      netCDF Operators version 4.7.7 (Homepage = http...
    comment:                  -
    ...                       ...
    invar_experiment_name:    rcp45
    time_coverage_start:      19710101
    time_coverage_end:        20001231
    variable_name:            locwtemp_ymonmean
    contact:                  copernicus-support@ecmwf.int
    CDO:                      Climate Data Operators version 2.0.5 (https://m...

you can use ds.time, ds.id or ds.locwtemp_ymonmean for the data and start searching for the right ids using xarrays select/find/slice methods.

Data access like ds.locwtemp_ymonmean[:,23919] maybe?

I am not fluent in xarray, but I am sure the docu has examples on this

cheers
ralf

RE: Selecting data based on 1D variable value - Added by Fabrizio Amoruso over 1 year ago

Hallo Ralf,

Thanks very much about the suggestion. I assume, given the interpretation of the variables as axis, that this is the fastest way to achieve what I need.

I will try to implement it in a conda environment then, and see what comes out.

Many thanks again!

    (1-4/4)