Speed on remapnn
Added by Antonio Rodriges over 7 years ago
I use the data from ftp://podaac-ftp.jpl.nasa.gov/allData/merged_alt/L4/cdr_grid
Here is the metadata of one of the files:
netcdf ssh_grids_v1609_2014122612 { dimensions: Latitude = 960 ; Longitude = 2160 ; Time = UNLIMITED ; // (1 currently) nv = 2 ; variables: float Lat_bounds(Latitude,nv) ; Lat_bounds:units = "degrees_north" ; Lat_bounds:comment = "latitude values at the north and south bounds of each pixel." ; float Latitude(Latitude) ; Latitude:bounds = "Lat_bounds" ; Latitude:point_spacing = "even" ; Latitude:long_name = "latitude" ; Latitude:standard_name = "latitude" ; Latitude:units = "degrees_north" ; Latitude:axis = "Y" ; float Lon_bounds(Longitude,nv) ; Lon_bounds:units = "degrees_east" ; Lon_bounds:comment = "longitude values at the west and east bounds of each pixel." ; float Longitude(Longitude) ; Longitude:bounds = "Lon_bounds" ; Longitude:point_spacing = "even" ; Longitude:long_name = "longitude" ; Longitude:standard_name = "longitude" ; Longitude:units = "degrees_east" ; Longitude:axis = "X" ; float SLA(Time,Longitude,Latitude) ; SLA:_FillValue = 9.96921e+36f ; SLA:scale_factor = 1.f ; SLA:coordinates = "Sea Level Anomaly Estimate" ; SLA:add_offset = 0.f ; SLA:long_name = "Sea Level Anomaly Estimate" ; SLA:standard_name = "Sea Level Anomaly Estimate" ; SLA:units = "m" ; float SLA_ERR(Time,Longitude,Latitude) ; SLA_ERR:_FillValue = 9.96921e+36f ; SLA_ERR:scale_factor = 1.f ; SLA_ERR:coordinates = "Sea Level Anomaly Error Estimate" ; SLA_ERR:add_offset = 0.f ; SLA_ERR:long_name = "Sea Level Anomaly Error Estimate" ; SLA_ERR:standard_name = "Sea Level Anomaly Error Estimate" ; SLA_ERR:units = "m" ; float Time(Time) ; Time:bounds = "Time_bounds" ; Time:long_name = "Time" ; Time:standard_name = "time" ; Time:units = "Days since 1985-01-01 00:00:00" ; Time:calendar = "gregorian" ; Time:axis = "T" ; float Time_bounds(Time,nv) ; Time_bounds:units = "Days since 1985-01-01 00:00:00" ; Time_bounds:comment = "Time bounds for each time value, same value as time variable. The time variable is defined on points instead of on bounding boxes." ; // global attributes: :geospatial_lon_min = 0.08333334f ; :geospatial_lat_max = 79.91666f ; :time_coverage_end = "2014-12-26" ; :title = "Sea Level Anormaly Estimate based on Altimeter Data" ; :geospatial_lat_min = -79.91666f ; :time_coverage_start = "2014-12-26" ; :Conventions = "CF-1.6" ; :date_created = "2016-09-11T18:07:15.140757" ; :summary = "Sea level anomaly grids from altimeter data using Kriging technique, which gives best linear prediction based upon prior knowledge of covariance. " ; :ncei_template_version = "NCEI_NetCDF_Grid_Template_v2.0" ; :Institution = "Jet Propulsion Laboratory" ; :geospatial_lon_max = 359.9167f ; :version_number = "1609" ; } // group /
My CDO version is below
cdo -V Climate Data Operators version 1.7.2 (http://mpimet.mpg.de/cdo) Compiled: by scidb on scidb-vm (x86_64-unknown-linux-gnu) Sep 19 2016 13:48:25 Compiler: gcc -std=gnu99 -g -O2 -fopenmp version: gcc (Ubuntu 4.8.5-2ubuntu1~14.04.1) 4.8.5 Features: DATA PTHREADS OpenMP3 HDF5 NC4/HDF5/threadsafe OPeNDAP PROJ.4 SSE2 Libraries: HDF5/1.8.15 proj/4.8 Filetypes: srv ext ieg grb grb2 nc nc2 nc4 nc4c CDI library version : 1.7.2 of Sep 19 2016 13:46:56 CGRIBEX library version : 1.7.5 of Jun 3 2016 14:44:00 GRIB_API library version : 1.13.1 NetCDF library version : 4.3.3.1 of Sep 19 2016 13:23:52 $ HDF5 library version : 1.8.15 threadsafe SERVICE library version : 1.4.0 of Sep 19 2016 13:46:34 EXTRA library version : 1.4.0 of Sep 19 2016 13:46:26 IEG library version : 1.4.0 of Sep 19 2016 13:46:31 FILE library version : 1.8.2 of Sep 19 2016 13:46:26
I try to perform nearest-neighbor interpolation like this (increase the resolution twice, 2x):
time cdo remapnn,r4320x1920 ssh_grids_v1609_2014122612.nc o.nc Warning (cdfScanVarAttributes) : NetCDF: Variable not found - Sea Warning (cdfScanVarAttributes) : NetCDF: Variable not found - Level Warning (cdfScanVarAttributes) : NetCDF: Variable not found - Anomaly Warning (cdfScanVarAttributes) : NetCDF: Variable not found - Error Warning (cdfScanVarAttributes) : NetCDF: Variable not found - Estimate cdo remapnn: Nearest neighbor weights from lonlat (2160x960) to lonlat (4320x1920) grid, with source mask (1411453) Error (gridDefBoundsGeneric) : Allocation of 265420800 bytes failed. [ line 3136 file grid.c ] System error message : Cannot allocate memory real 5m6.203s user 5m2.712s sys 0m1.492sThere are two problems:
- It takes too long for remapnn to complete. Nearest-neighbor is not a time-intensive operation, why does it take this time? To compare: when I convert this file to GeoTIFF and do the same with gdalwarp it takes just several seconds vs 5 minutes...
- Memory allocation failure.
What I am doing wrong? Is there a way to speed up the processing?
Thanks
Replies (7)
RE: Speed on remapnn - Added by Ralf Mueller over 7 years ago
the setup for the coordinates
attribute in
float SLA_ERR(Time,Longitude,Latitude) ; SLA_ERR:_FillValue = 9.96921e+36f ; SLA_ERR:scale_factor = 1.f ; SLA_ERR:coordinates = "Sea Level Anomaly Error Estimate" ; SLA_ERR:add_offset = 0.f ; SLA_ERR:long_name = "Sea Level Anomaly Error Estimate" ; SLA_ERR:standard_name = "Sea Level Anomaly Error Estimate" ; SLA_ERR:units = "m" ;is wrong. just remove it and rerun cdo. In the current status of the data CDO cannot really figure out, what are the correct coordinates, because this attribute is missleading. If set, it should have the value "Longitude Latitude", but thats not needed.
In general nearest neighbour is not an easy task to do IMO, because you have to look for it! Esp. with missing values, this can be demanding. Anyhow, you can speed up the interpolation weight computation by using multiple threads with the option
-P
hth
ralf
RE: Speed on remapnn - Added by Antonio Rodriges over 7 years ago
Now it works, but 9 minutes...
For a general grid, you are right: you need to compute a lot of distances to possibly many points.
However, for a regular grid and NN interpolation you must consider only 4 neighbors.
time cdo remapnn,r4320x1920 in.nc o.nc cdo remapnn: Nearest neighbor weights from lonlat (2160x960) to lonlat (4320x1920) grid, with source mask (1411453) cdo remapnn: Nearest neighbor weights from lonlat (2160x960) to lonlat (4320x1920) grid, with source mask (1351150) cdo remapnn: Processed 4147200 values from 2 variables over 1 timestep ( 583.34s ) real 9m44.281s user 9m42.632s sys 0m0.732s
RE: Speed on remapnn - Added by Uwe Schulzweida over 7 years ago
There are some parameter which influence the performance of the operator remapnn. The most important one with your data is the environment variable
export REMAP_EXTRAPOLATE=offThe default is REMAP_EXTRAPOLATE=on. This will extrapolate all target grid points which are outside the region of the source grid. The extrapolation is not well optimized.
RE: Speed on remapnn - Added by Antonio Rodriges over 7 years ago
Now it is much faster: 19 sec VS 9 minutes.
What do you mean by "grid points which are outside the region of the source grid"?
Points that are at the border of the grid? There are should not be too many points, aren't there?
Can I use a CDO command line argument instead of "export REMAP_EXTRAPOLATE=off"?
time cdo remapnn,r4320x1920 in.nc o.nc cdo remapnn: Nearest neighbor weights from lonlat (2160x960) to lonlat (4320x1920) grid, with source mask (1411453) cdo remapnn: Nearest neighbor weights from lonlat (2160x960) to lonlat (4320x1920) grid, with source mask (1351150) cdo remapnn: Processed 4147200 values from 2 variables over 1 timestep ( 19.06s ) real 0m19.564s user 0m16.988s sys 0m2.096s
Are there any other options to speed up the processing? I just ran gdalwarp on the same file that does NN interpolation in 0.89 seconds.
RE: Speed on remapnn - Added by Ralf Mueller over 7 years ago
Again, use
-P <numberOfThreads>for using openmp parallelization.
RE: Speed on remapnn - Added by Antonio Rodriges over 7 years ago
It took 15 seconds (4 seconds less) with -P 2
RE: Speed on remapnn - Added by Uwe Schulzweida over 7 years ago
What do you mean by "grid points which are outside the region of the source grid"?
The latitude range of your data is -79.91666 to 79.91666 degree. The range of the target grid r4320x1920 is -89.95312 to 89.95312.
That mean 921600 points of the target grid are outside the region of the source grid.