Large memory usage in CDO collgrid command
I have 78 netcdf files each around 17MB, with shape (time=1, h=2048, w=2048) to be merged spatially. The single timestep is shared for all 78 files. The collgrid merge command below was able to produce the output netcdf of size 1.3GB, but the memory usage during the merge was 5GB.
INPUT_DIR="/home/muye/Merge-tiles/nctiles"
OUTPUT_FILE="/home/muye/Merge-tiles/merged.nc"
mkdir -p "$(dirname "$OUTPUT_FILE")"
/usr/bin/time -v cdo --single -r collgrid "$INPUT_DIR/Crop17_??_??.nc" "$OUTPUT_FILE"
Logs from the command: cdo collgrid: Processed 327155712 values from 78 variables over 78 timesteps [12.87s 5046MB].
An individual netcdf has the following structure:
<xarray.Dataset> Size: 17MB
Dimensions: (x: 2048, y: 2048, time: 1)
Coordinates:
* x float64 16kB 143.3 143.3 143.3 143.3 ... 143.5 143.5 143.5
* y (y) float64 16kB -34.8 -34.8 -34.8 -34.8 ... -35.0 -35.0 -35.0
* time (time) datetime64[ns] 8B 2017-01-01
Data variables:
var0 (time, y, x) float32 17MB ...
I wonder whether the 5GB memory usage was making sense (really required when the output grid is created, or something not efficient), and if it is possible to reduce the memory usage in this operation. Thanks in advance for any idea!
Replies (8)
RE: Large memory usage in CDO collgrid command - Added by Karin Meier-Fleischer 6 days ago
Can you upload e.g. 4 files for an 2x2 grid example?
RE: Large memory usage in CDO collgrid command - Added by Dale Chen 6 days ago
Karin Meier-Fleischer wrote in RE: Large memory usage in CDO collgrid command:
Can you upload e.g. 4 files for an 2x2 grid example?
No problem, here are 4 tiles.
As a related matter, I wonder if there are ways (e.g. flags in command) of sequential processing or other purposes to reduce runtime memory consumption. Thanks in advance!
Crop18_04_03.nc (16 MB) Crop18_04_03.nc | |||
Crop18_04_02.nc (16 MB) Crop18_04_02.nc | |||
Crop18_05_02.nc (16 MB) Crop18_05_02.nc | |||
Crop18_05_03.nc (16 MB) Crop18_05_03.nc |
RE: Large memory usage in CDO collgrid command - Added by Uwe Schulzweida 6 days ago
I can reproduce the memory problem. The required memory could be reduced by a factor of 2. For unstructured grids all grid cell indices have to be stored, this is not necessary for regular grids. But it will not be less than twice the full grid size, as CDO only processes complete horizontal fields, sorry.
RE: Large memory usage in CDO collgrid command - Added by Dale Chen 6 days ago
Uwe Schulzweida wrote in RE: Large memory usage in CDO collgrid command:
I can reproduce the memory problem. The required memory could be reduced by a factor of 2. For unstructured grids all grid cell indices have to be stored, this is not necessary for regular grids. But it will not be less than twice the full grid size, as CDO only processes complete horizontal fields, sorry.
Cool that is already a big reduction. Could you explain to me what regular grids are? I thought my netcdfs already have proper x and y coordinates in EPSG4326. How shall I turn them into regular grids for less intensive collgrid operation?
RE: Large memory usage in CDO collgrid command - Added by Uwe Schulzweida 5 days ago
By regular I mean normal lon/lat grids, like your data.
RE: Large memory usage in CDO collgrid command - Added by Dale Chen 3 days ago
Uwe Schulzweida wrote in RE: Large memory usage in CDO collgrid command:
By regular I mean normal lon/lat grids, like your data.
I see. So my netcdfs do have the potential to reduce memory consumption in collgrid
by a factor of 2, if I avoid storing all grid cell indices. Can I achieve that by deleting e.g. all lon indices in each netcdf?
A brief example would be greatly appreciated
RE: Large memory usage in CDO collgrid command - Added by Dale Chen 3 days ago
Dale Chen wrote in RE: Large memory usage in CDO collgrid command:
Uwe Schulzweida wrote in RE: Large memory usage in CDO collgrid command:
By regular I mean normal lon/lat grids, like your data.
I see. So my netcdfs do have the potential to reduce memory consumption in
collgrid
by a factor of 2, if I avoid storing all grid cell indices. Can I achieve that by deleting e.g. all lon indices in each netcdf?A brief example would be greatly appreciated
I actually tried ds = ds.drop_vars('x')
for each input netcdf before collgrid
. But the memory consumption is still 5GB during the operation. I wonder if operator collgrid
or another could help achieve 'not storing all grid cell indices'?
RE: Large memory usage in CDO collgrid command - Added by Uwe Schulzweida 1 day ago
The cell indices are calculated in CDO because the method used in collgrid requires them. For regular lon/lat grids, a different method must be implemented in collgrid so that the cell indices are not necessary. This feature will be available in the next CDO version 2.5.3. Your data does not need to be changed.