Project

General

Profile

Transforming ensemble years into single time series

Added by Henrique Goulart over 3 years ago

Hello everybody :),

I have an ensemble (25 members) of monthly data that spams from 2035 to 2039. Each year and each ensemble member (or run) is a separate file, making it 125 files (5 years * 25 members). I've merged them with mergetime and have done some analysis on statistical properties. However, using mergetime means the merged file is structured with every date being repeated 24 times before going to the next date (01-01-2035, 01-01-2035,..(25x), 01-02-2035,...).

For the next step, I have to feed these data in a machine learning model, which would require the data to be streamlined into a single time series with no repeated year and each year grouped together (which means extending the final year way beyond the 2039).
I believe the code to shift time for 5 years would be something like this:

cdo shifttime,5year dtr_m_ECEarth_PD_s01r01_2035_2039.nc dtr_m_ECEarth_PD_s01r01_40_45.nc

Now I am struggling to create the for loop to automatise the process. Perhaps I need to create a counter that for each time the run changes (r00, r01, r02...) it increase the 5 years, 5*i, and change the output name file for these years. Or maybe everything can be merged in the end, so only one file is generated. Honestly, however, I am not used to bash scripts, nor to CDO in general and am not sure how to proceed. I hope some of you might be able to shed a light here.
Also, if there is a completely different approach that makes things easier, I am happy to try it out.

If anyone has any suggestions, I would be very grateful!

Kind regards


Replies (8)

RE: Transforming ensemble years into single time series - Added by Brendan DeTracey over 3 years ago

Hi,
Not sure I completely understand but cat with globbing would be a better first step. Assuming bash shell:

$ cdo cat dtr_m_ECEarth_PD_s01r00_20??.nc dtr_m_ECEarth_PD_s01r00_2025-2039.nc

One way in bash to cat each ensemble:
for r_number in {0..24} ; do r_tag='r'$(printf '%02d' "$r_number") ; echo "$r_tag" ; cdo cat dtr_m_ECEarth_PD_s01"$r_tag"_20??.nc dtr_m_ECEarth_PD_s01"$r_tag"_2035-2039.nc ; done

I put as a one-liner, as I'm too lazy to write a script. I encourage learning to script in bash or another shell. Now each ensemble is in one file with continous dates.

RE: Transforming ensemble years into single time series - Added by Brendan DeTracey over 3 years ago

On second thought, include timeshift in the pipeline. Maybe:

$ for r_number in {0..24} ; do r_tag='r'$(printf '%02d' "$r_number") ; echo "$r_tag" ; cdo cat -shifttime,$(( $r_number * 5 ))year dtr_m_ECEarth_PD_s01"$r_tag"_20??.nc dtr_m_ECEarth_PD_s01r00-24_2035-2159.nc ; done

The cat operator, appends to the output file if it already exists. Make sure you delete it before running the above code more than once.

RE: Transforming ensemble years into single time series - Added by Henrique Goulart over 3 years ago

Brendan DeTracey wrote:

On second thought, include timeshift in the pipeline. Maybe:
[...]
The cat operator, appends to the output file if it already exists. Make sure you delete it before running the above code more than once.

Hi Brendan,

first of all, a massive thank you for the help. This is much more efficient and subtle than what I had in mind. I do indeed need to learn my way around bash as the more I get into climate data, the more I see xarray and other python libraries are incapable of solving issues.

The only thing that seems to be off is that while the first year of each run updates as expected (2040,45,50...), the following years keep the original date(2040, 2036, 2037,....2050, 2036,...). I'm trying to see where the problem is, but apparently the shifttime changing according to the r_number should work for all years of each run. Would you know why the command is not being applied for the following years?

A noob question: Is

??
equivalent to
*
for wildcard use or are there specific circumstances to use each one?

Thank you once again!

Best regards,

Henrique

RE: Transforming ensemble years into single time series - Added by Brendan DeTracey over 3 years ago

Sorry mate. It should have been:

$ for r_number in {0..24} ; do r_tag='r'$(printf '%02d' "$r_number") ; echo "$r_tag" ; cdo cat -apply,shifttime,$(( $r_number * 5 ))year [ dtr_m_ECEarth_PD_s01"$r_tag"_20??.nc ] dtr_m_ECEarth_PD_s01r00-24_2035-2159.nc ; done

The apply opertor was needed with the square brackets to shiftime all files matching the pattern. In bash ? matches any single character, * matches any length string.

Since you know python(and xarray), you might want to consider the python cdo bindings (https://code.mpimet.mpg.de/projects/cdo/wiki/Cdo%7Brbpy%7D). bash is great for quick and dirty pattern matching, but the python globbing module does the same, although you might have to sort the list of files afterwards. I can't remember. Also, using the cdo python bindings allows direct xarray access. I have yet to use these bindings, but make sure you read https://github.com/Try2Code/cdo-bindings/blob/master/python/test/test_cdo.py as it shows both the old and new way to assemble a cdo pipeline.

RE: Transforming ensemble years into single time series - Added by Brendan DeTracey over 3 years ago

Edit: by pipeline I meant cdo chaining.

RE: Transforming ensemble years into single time series - Added by Brendan DeTracey over 3 years ago

One last possible bump. With cdo 1.9.9 you must now add a - before the first argument to apply.

$ for r_number in {0..24} ; do r_tag='r'$(printf '%02d' "$r_number") ; echo "$r_tag" ; cdo cat apply,shifttime,$(( $r_number * 5 ))year [ dtr_m_ECEarth_PD_s01"$r_tag"_20??.nc ] dtr_m_ECEarth_PD_s01r00-24_2035-2159.nc ; done

I was testing under cygwin cdo, which is currently 1.9.8, but just gotten bitten by this change using cdo 1.9.9 on linux! :)

RE: Transforming ensemble years into single time series - Added by Brendan DeTracey over 3 years ago

And I forgot to add it to my last post:

$ for r_number in {0..24} ; do r_tag='r'$(printf '%02d' "$r_number") ; echo "$r_tag" ; cdo cat -apply,-shifttime,$(( $r_number * 5 ))year [ dtr_m_ECEarth_PD_s01"$r_tag"_20??.nc ] dtr_m_ECEarth_PD_s01r00-24_2035-2159.nc ; done

RE: Transforming ensemble years into single time series - Added by Henrique Goulart over 3 years ago

Hi Brendan,

Sorry for the delay, I wrote this reply 6 days ago but apparently it was not submitted.

Thank you again for the reply. I had no clue about -apply, but now it is working indeed, that's outstanding!

Also, great to know about the bindings for python, first time I ever hear about them. I'll try them out (but also will practice some shell just for the sake of it).

Thank you once more, Brendan!

    (1-8/8)