Project

General

Profile

Merge datasets sorted by spatial coordinates?

Added by Clement Tisseuil almost 14 years ago

Dear all,

Is there a possibility to merge several dataset along the spatial dimension (like mergetime but applied to the spatial coordinates). Typically, the dataset are identical in terms of variables and time dimensions but they represent different spatial area.

Example :
cdo sellonlatbox,-10,10,30,40 file.nc file1.nc
cdo sellonlatbox,-10,10,20,30 file.nc file2.nc
cdo sellonlatbox,-10,10,20,40 file.nc file3.nc

I would like to build file4.nc as the spatial merging of file1.nc and file2.nc, which should be identical to file3.nc

Thanks in advance.

Regards,

Clement


Replies (48)

RE: Merge datasets sorted by spatial coordinates? - Added by Ralf Mueller almost 4 years ago

hi Audace - and thx for your patience with me,

Here is what your files are about in terms of their grids:

MLLO1.1998.2019.cld.dat.nc:
   Grid coordinates :
     1 : lonlat                   : points=36 (6x6)
                              lon : -76.75 to -74.25 by 0.5 degrees_east
                              lat : 45.25 to 47.75 by 0.5 degrees_north
MLLO2.1998.2019.cld.dat.nc
   Grid coordinates :
     1 : lonlat                   : points=16 (4x4)
                              lon : -78.25 to -76.75 by 0.5 degrees_east
                              lat : 45.75 to 47.25 by 0.5 degrees_north
MLLO3.1998.2019.cld.dat.nc
   Grid coordinates :
     1 : lonlat                   : points=9 (3x3)
                              lon : -74.25 to -73.25 by 0.5 degrees_east
                              lat : 45.25 to 46.25 by 0.5 degrees_north 

Obviously they have locations in common: (1) and (2) share the complete columns at lon = -76.75, similar holds true for (1) and (3). So a simple merge can not guarantee that all values will be kept. Here is my workflow to get close to it

  1. create a target grid, in which all files fit:
    cdo -griddes -sellonlatbox,-80,-70,45,48 -topo,global_0.5 > targetGrid.txt
  2. remap all inputs onto this grid with minimal (or no) value change:
    for f (MLLO1.1998.2019.cld.dat.nc MLLO2.1998.2019.cld.dat.nc MLLO3.1998.2019.cld.dat.nc ) {echo $f; cdo remapcon,targetGrid.txt $f enlargeCON_$f}
    or in plain bash
    for f in MLLO1.1998.2019.cld.dat.nc MLLO2.1998.2019.cld.dat.nc MLLO3.1998.2019.cld.dat.nc; do echo $f; cdo remapcon,targetGrid.txt $f enlargeCON_$f; done
    you can experiment with other interpolation methods of course.
  3. merge the results step by step:
    1. cdo mergegrid enlargeCON_MLLO2.1998.2019.cld.dat.nc enlargeCON_MLLO1.1998.2019.cld.dat.nc 2_1.nc
    2. cdo mergegrid 2_1.nc enlargeCON_MLLO3.1998.2019.cld.dat.nc 2_1_3.nc

here are some plot of the inputs on the enlarged grid so get an idea on how the locations are distributed and the final merge results:

input 2 input 1 input 3
intermediate result final result

Please note, that the colorbars are all slightly different.

hope i could help you a bit
ralf

RE: Merge datasets sorted by spatial coordinates? - Added by Ralf Mueller almost 4 years ago

Now with identical colorbars

input 2 input 1 input 3
intermediate result final result

RE: Merge datasets sorted by spatial coordinates? - Added by Noam Chomsky almost 4 years ago

thank you,so the only way to merge is to do it step-by-step two files at the time? If you have dozens of files it's not very practical.
Also, using merge and mergegrid leads the same results?

RE: Merge datasets sorted by spatial coordinates? - Added by Ralf Mueller almost 4 years ago

Noam Chomsky wrote:

thank you,so the only way to merge is to do it step-by-step two files at the time? If you have dozens of files it's not very practical.
Also, using merge and mergegrid leads the same results?

no, with merge the 'sfc' variable becomes a dimension of 'cld', which I don't really understand. but this seems specific to these input files to me.

the thing with many files and different grids is, that CDO handles all of them separately: they all could have completely different grids. Even those uploaded here are do not naturally form a lonlat grid, when you just glue them together. Everything would be unstructured, which is not what most people want. For lonlat input, you could always use a global one as target grid and select largest covering region from the final merge result. And if you put all of them on the same grid before the merge and regions do not overlap (like here) you can perfectly parallelize this.

If global is not an option, you need to analyse the grids for generating the target grid. I am sure this can be done in 20 lines (in the language of my choice ;-)

cheers
ralf

RE: Merge datasets sorted by spatial coordinates? - Added by Audace ADANTODE almost 4 years ago

Ralf Mueller wrote:

hi Audace - and thx for your patience with me,

Here is what your files are about in terms of their grids:[...]

Obviously they have locations in common: (1) and (2) share the complete columns at lon = -76.75, similar holds true for (1) and (3). So a simple merge can not guarantee that all values will be kept. Here is my workflow to get close to it

  1. create a target grid, in which all files fit:[...]
  2. remap all inputs onto this grid with minimal (or no) value change:[...]or in plain bash[...]you can experiment with other interpolation methods of course.
  3. merge the results step by step:
    1. [...]
    2. [...]

here are some plot of the inputs on the enlarged grid so get an idea on how the locations are distributed and the final merge results:

input 2 input 1 input 3
intermediate result final result

Please note, that the colorbars are all slightly different.

hope i could help you a bit
ralf

Dear Ralf

You did a lot for me. It's exactly what I am looking for.

Thank you so much dear.

RE: Merge datasets sorted by spatial coordinates? - Added by Ralf Mueller almost 4 years ago

you are welcome - was a nice excuse to work on my debugging plot script

RE: Merge datasets sorted by spatial coordinates? - Added by Audace ADANTODE almost 4 years ago

Ralf Mueller wrote:

you are welcome - was a nice excuse to work on my debugging plot script

Nice :)

Select and aggregate ncdf4 data? - Added by Audace ADANTODE almost 4 years ago

Hello,
Please, I need your help about this.

I'm processing a ncdf4 climate database with CDO. It is a daily temperature in 1km*1km resolution and covers all of North America. What I need is a daily temperature by state. For that, I tried to use sellonlatbox to select the data by state with coordinates and use fldmean to aggregate. unfortunately the process is too long. It will take more than 2 weeks.

I'm sure that there is a most faster way. May someone help me with that, Please?

RE: Merge datasets sorted by spatial coordinates? - Added by Ralf Mueller almost 4 years ago

some questions first:

  1. do you have proper sellonlatbox calls for all states?
  2. how large is the file?
  3. do you have a parallel file system, a SSD or enough RAM for this file?

here is the recipe:

  1. bring the input to fastest filesystem you have: Best is '/tmp' or '/dev/shm' since both are mapped into RAM. But its size is limtted - check your RAM first!
  2. create a sh script, that select all states each in a single command line and chain it with the fldmean operator to create a small data file as output
  3. don't forget to mask out all non-relevant cell, i.e. set them to missval.
  4. call the script with "GNU parallel"

I explained those tips here in more detail

cheers
ralf

RE: Select and aggregate ncdf4 data - Added by Audace ADANTODE almost 4 years ago

Thank you Ralf for your Answer.

My RAM is 8Go.
I started to create a data subset for each state by using: cdo sellonlatbox, lon1,lon2,lat1,lat2 infile outfile
and I will aggregate just with: cdo fldmean infile outfile

That is what I am doing. Despite the fact that I don't understand clearly what you said, I think that you method would be the best. May you explain it with code, please?

Here is the link of my data base: https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/1328/1999/catalog.html?dataset=1328/1999/daymet_v3_tmax_1999_na.nc4
on that page I choose: "HTTPServer: /thredds/fileServer/ornldaac/1328/1999/daymet_v3_tmax_1999_na.nc4" for downloading

Thanks:)

RE: Merge datasets sorted by spatial coordinates? - Added by Audace ADANTODE almost 4 years ago

For ARIZONA I am using these codes:

cd /cygdrive/C/Users/USER/Desktop/spmo/Day

PATH="$PATH:.:"

cdo sinfon daymet_v3_tmax_1999_na.nc4
cdo sellonlatbox,-114.86,-109,31.33,37 daymet_v3_tmax_1999_na.nc4 ARIZONA.daymet_v3_tmax_1999_na.nc4
cdo fldmean ARIZONA.daymet_v3_tmax_1999_na.nc4 ARIZONA_v3_tmax_1999_na.nc4

RE: Merge datasets sorted by spatial coordinates? - Added by Ralf Mueller almost 4 years ago

I will check the data, thank you.
my point is: usually regions do not fit into a regular lon-lat-box. There will be cells inside the box, that do not belong to the region of interest (just because the region is not rectangular). you might have to eliminate those points from the computation.

RE: Merge datasets sorted by spatial coordinates? - Added by Audace ADANTODE almost 4 years ago

Your right Ralf. I did not remove those points just because I don't know how to manage it. With your help, I could also be able to deal with that mistake for the next time :)
Thanks

RE: Merge datasets sorted by spatial coordinates? - Added by Karin Meier-Fleischer almost 4 years ago

Hi Audace and Ralf,

may I weigh in here. You need the lon/lat values for the Arizona state outline which you can get from a shapefile for instance. Then you can use NCL to create a mask file or you write the polygon values to a file and use CDOs maskregion operator (https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf#subsection.2.6.11).

-Karin

RE: Merge datasets sorted by spatial coordinates? - Added by Audace ADANTODE almost 4 years ago

Thank you for your guidance, Karin.

cheers

RE: Merge datasets sorted by spatial coordinates? - Added by Karin Meier-Fleischer almost 4 years ago

I had just a little time and wrote a short article about maskregion.

https://code.mpimet.mpg.de/boards/53/topics/9689

RE: Merge datasets sorted by spatial coordinates? - Added by Audace ADANTODE almost 4 years ago

Thank you again, Karin.
Very kind:)

RE: Merge datasets sorted by spatial coordinates? - Added by Audace ADANTODE over 3 years ago

Audace ADANTODE wrote:

Hello Dear.

I am trying to merge some files using this nice method suggested by Ralf:

cdo -griddes -sellonlatbox,-80,-70,45,48 -topo,global_0.5 > targetGrid.txt
for f in MLLO2.tmin_1999_na.nc4 MLLO3.tmin_1999_na.nc4; do echo $f; cdo remapcon,targetGrid.txt $f enlargeCON_$f; done
cdo mergegrid enlargeCON_MLLO2.tmin_1999_na.nc4 enlargeCON_MLLO3.tmin_1999_na.nc4 MLLO.tmin_1999_na.nc4

but I have got this message:

cdo remapcon (Abort): Source grid cell corner coordinates missing!
HDF5-DIAG: Error detected in HDF5 (1.10.2) thread 0:
#000: /cygdrive/e/cyg_pub/devel/hdf5/hdf5-1.10.2-1.x86_64/src/hdf5-1.10.2/src/H5T.c line 1734 in H5Tclose(): not a datatype
major: Invalid arguments to routine
minor: Inappropriate type

Error (cdf_close): NetCDF: HDF error
MLLO2.tmin_1999_na.nc4

May someone help me please?
Thank you so much.

RE: Merge datasets sorted by spatial coordinates? - Added by Karin Meier-Fleischer over 3 years ago

IMO without the grid cell corners you can't use remapcon. Instead you can use remapnn, remapbil,...
To get rid of the multiple grids problem delete the unused yearday variable before remapping.

cdo -griddes -sellonlatbox,-80,-70,45,48 -topo,global_0.5 > targetGrid.txt

cdo -delname,yearday MLLO2.tmin_1999_na.nc4 MLLO2.tmin_1999_na_without_yearday.nc4
cdo -delname,yearday MLLO3.tmin_1999_na.nc4 MLLO3.tmin_1999_na_without_yearday.nc4

for f in MLLO2.tmin_1999_na_without_yearday.nc4 MLLO3.tmin_1999_na_without_yearday.nc4
do
   echo $f
   cdo -remapnn,targetGrid.txt $f enlargeCON_$f
done

cdo -O -mergegrid enlargeCON_MLLO2.tmin_1999_na_without_yearday.nc4 \
                  enlargeCON_MLLO3.tmin_1999_na_without_yearday.nc4 \
                  MLLO.tmin_1999_na_without_yearday.nc4

RE: Merge datasets sorted by spatial coordinates? - Added by Audace ADANTODE over 3 years ago

Karin Meier-Fleischer wrote:

IMO without the grid cell corners you can't use remapcon. Instead you can use remapnn, remapbil,...
To get rid of the multiple grids problem delete the unused yearday variable before remapping.

[...]

Hello Karin,

It works perfectly.
Thank you so much and have a nice day.:)

Audace

(26-48/48)