Project

General

Profile

Speed on remapnn

Added by Antonio Rodriges almost 7 years ago

I use the data from ftp://podaac-ftp.jpl.nasa.gov/allData/merged_alt/L4/cdr_grid

Here is the metadata of one of the files:

netcdf ssh_grids_v1609_2014122612 {
  dimensions:
    Latitude = 960 ;
    Longitude = 2160 ;
    Time = UNLIMITED ; // (1 currently)
    nv = 2 ;

  variables:
    float Lat_bounds(Latitude,nv) ;
      Lat_bounds:units = "degrees_north" ;
      Lat_bounds:comment = "latitude values at the north and south bounds of each pixel." ;

    float Latitude(Latitude) ;
      Latitude:bounds = "Lat_bounds" ;
      Latitude:point_spacing = "even" ;
      Latitude:long_name = "latitude" ;
      Latitude:standard_name = "latitude" ;
      Latitude:units = "degrees_north" ;
      Latitude:axis = "Y" ;

    float Lon_bounds(Longitude,nv) ;
      Lon_bounds:units = "degrees_east" ;
      Lon_bounds:comment = "longitude values at the west and east bounds of each pixel." ;

    float Longitude(Longitude) ;
      Longitude:bounds = "Lon_bounds" ;
      Longitude:point_spacing = "even" ;
      Longitude:long_name = "longitude" ;
      Longitude:standard_name = "longitude" ;
      Longitude:units = "degrees_east" ;
      Longitude:axis = "X" ;

    float SLA(Time,Longitude,Latitude) ;
      SLA:_FillValue = 9.96921e+36f ;
      SLA:scale_factor = 1.f ;
      SLA:coordinates = "Sea Level Anomaly Estimate" ;
      SLA:add_offset = 0.f ;
      SLA:long_name = "Sea Level Anomaly Estimate" ;
      SLA:standard_name = "Sea Level Anomaly Estimate" ;
      SLA:units = "m" ;

    float SLA_ERR(Time,Longitude,Latitude) ;
      SLA_ERR:_FillValue = 9.96921e+36f ;
      SLA_ERR:scale_factor = 1.f ;
      SLA_ERR:coordinates = "Sea Level Anomaly Error Estimate" ;
      SLA_ERR:add_offset = 0.f ;
      SLA_ERR:long_name = "Sea Level Anomaly Error Estimate" ;
      SLA_ERR:standard_name = "Sea Level Anomaly Error Estimate" ;
      SLA_ERR:units = "m" ;

    float Time(Time) ;
      Time:bounds = "Time_bounds" ;
      Time:long_name = "Time" ;
      Time:standard_name = "time" ;
      Time:units = "Days since 1985-01-01 00:00:00" ;
      Time:calendar = "gregorian" ;
      Time:axis = "T" ;

    float Time_bounds(Time,nv) ;
      Time_bounds:units = "Days since 1985-01-01 00:00:00" ;
      Time_bounds:comment = "Time bounds for each time value, same value as time variable. The time variable is defined on points instead of on bounding boxes." ;

  // global attributes:
    :geospatial_lon_min = 0.08333334f ;
    :geospatial_lat_max = 79.91666f ;
    :time_coverage_end = "2014-12-26" ;
    :title = "Sea Level Anormaly Estimate based on Altimeter Data" ;
    :geospatial_lat_min = -79.91666f ;
    :time_coverage_start = "2014-12-26" ;
    :Conventions = "CF-1.6" ;
    :date_created = "2016-09-11T18:07:15.140757" ;
    :summary = "Sea level anomaly grids from altimeter data using Kriging technique, which gives best linear prediction based upon prior knowledge of covariance. " ;
    :ncei_template_version = "NCEI_NetCDF_Grid_Template_v2.0" ;
    :Institution = "Jet Propulsion Laboratory" ;
    :geospatial_lon_max = 359.9167f ;
    :version_number = "1609" ;

} // group /

My CDO version is below

cdo -V
Climate Data Operators version 1.7.2 (http://mpimet.mpg.de/cdo)
Compiled: by scidb on scidb-vm (x86_64-unknown-linux-gnu) Sep 19 2016 13:48:25
Compiler: gcc -std=gnu99 -g -O2 -fopenmp
 version: gcc (Ubuntu 4.8.5-2ubuntu1~14.04.1) 4.8.5
Features: DATA PTHREADS OpenMP3 HDF5 NC4/HDF5/threadsafe OPeNDAP PROJ.4 SSE2
Libraries: HDF5/1.8.15 proj/4.8
Filetypes: srv ext ieg grb grb2 nc nc2 nc4 nc4c
     CDI library version : 1.7.2 of Sep 19 2016 13:46:56
 CGRIBEX library version : 1.7.5 of Jun  3 2016 14:44:00
GRIB_API library version : 1.13.1
  NetCDF library version : 4.3.3.1 of Sep 19 2016 13:23:52 $
    HDF5 library version : 1.8.15 threadsafe
 SERVICE library version : 1.4.0 of Sep 19 2016 13:46:34
   EXTRA library version : 1.4.0 of Sep 19 2016 13:46:26
     IEG library version : 1.4.0 of Sep 19 2016 13:46:31
    FILE library version : 1.8.2 of Sep 19 2016 13:46:26

I try to perform nearest-neighbor interpolation like this (increase the resolution twice, 2x):

 time cdo remapnn,r4320x1920 ssh_grids_v1609_2014122612.nc o.nc
Warning (cdfScanVarAttributes) : NetCDF: Variable not found - Sea
Warning (cdfScanVarAttributes) : NetCDF: Variable not found - Level
Warning (cdfScanVarAttributes) : NetCDF: Variable not found - Anomaly
Warning (cdfScanVarAttributes) : NetCDF: Variable not found - Error
Warning (cdfScanVarAttributes) : NetCDF: Variable not found - Estimate
cdo remapnn: Nearest neighbor weights from lonlat (2160x960) to lonlat (4320x1920) grid, with source mask (1411453)

Error (gridDefBoundsGeneric) : Allocation of 265420800 bytes failed. [ line 3136 file grid.c ]
System error message : Cannot allocate memory

real    5m6.203s
user    5m2.712s
sys     0m1.492s

There are two problems:
  • It takes too long for remapnn to complete. Nearest-neighbor is not a time-intensive operation, why does it take this time? To compare: when I convert this file to GeoTIFF and do the same with gdalwarp it takes just several seconds vs 5 minutes...
  • Memory allocation failure.

What I am doing wrong? Is there a way to speed up the processing?

Thanks


Replies (7)

RE: Speed on remapnn - Added by Ralf Mueller almost 7 years ago

the setup for the coordinates attribute in

    float SLA_ERR(Time,Longitude,Latitude) ;
      SLA_ERR:_FillValue = 9.96921e+36f ;
      SLA_ERR:scale_factor = 1.f ;
      SLA_ERR:coordinates = "Sea Level Anomaly Error Estimate" ;
      SLA_ERR:add_offset = 0.f ;
      SLA_ERR:long_name = "Sea Level Anomaly Error Estimate" ;
      SLA_ERR:standard_name = "Sea Level Anomaly Error Estimate" ;
      SLA_ERR:units = "m" ;
is wrong. just remove it and rerun cdo. In the current status of the data CDO cannot really figure out, what are the correct coordinates, because this attribute is missleading. If set, it should have the value "Longitude Latitude", but thats not needed.

In general nearest neighbour is not an easy task to do IMO, because you have to look for it! Esp. with missing values, this can be demanding. Anyhow, you can speed up the interpolation weight computation by using multiple threads with the option

-P

hth
ralf

RE: Speed on remapnn - Added by Antonio Rodriges almost 7 years ago

Now it works, but 9 minutes...
For a general grid, you are right: you need to compute a lot of distances to possibly many points.
However, for a regular grid and NN interpolation you must consider only 4 neighbors.

time cdo remapnn,r4320x1920 in.nc o.nc
cdo remapnn: Nearest neighbor weights from lonlat (2160x960) to lonlat (4320x1920) grid, with source mask (1411453)
cdo remapnn: Nearest neighbor weights from lonlat (2160x960) to lonlat (4320x1920) grid, with source mask (1351150)
cdo remapnn: Processed 4147200 values from 2 variables over 1 timestep ( 583.34s )

real    9m44.281s
user    9m42.632s
sys     0m0.732s

RE: Speed on remapnn - Added by Uwe Schulzweida almost 7 years ago

There are some parameter which influence the performance of the operator remapnn. The most important one with your data is the environment variable

export REMAP_EXTRAPOLATE=off
The default is REMAP_EXTRAPOLATE=on. This will extrapolate all target grid points which are outside the region of the source grid. The extrapolation is not well optimized.

RE: Speed on remapnn - Added by Antonio Rodriges almost 7 years ago

Now it is much faster: 19 sec VS 9 minutes.

What do you mean by "grid points which are outside the region of the source grid"?
Points that are at the border of the grid? There are should not be too many points, aren't there?

Can I use a CDO command line argument instead of "export REMAP_EXTRAPOLATE=off"?

time cdo remapnn,r4320x1920 in.nc o.nc
cdo remapnn: Nearest neighbor weights from lonlat (2160x960) to lonlat (4320x1920) grid, with source mask (1411453)
cdo remapnn: Nearest neighbor weights from lonlat (2160x960) to lonlat (4320x1920) grid, with source mask (1351150)
cdo remapnn: Processed 4147200 values from 2 variables over 1 timestep ( 19.06s )

real    0m19.564s
user    0m16.988s
sys     0m2.096s

Are there any other options to speed up the processing? I just ran gdalwarp on the same file that does NN interpolation in 0.89 seconds.

RE: Speed on remapnn - Added by Ralf Mueller almost 7 years ago

Again, use

-P <numberOfThreads>
for using openmp parallelization.

RE: Speed on remapnn - Added by Antonio Rodriges almost 7 years ago

It took 15 seconds (4 seconds less) with -P 2

RE: Speed on remapnn - Added by Uwe Schulzweida almost 7 years ago

What do you mean by "grid points which are outside the region of the source grid"?

The latitude range of your data is -79.91666 to 79.91666 degree. The range of the target grid r4320x1920 is -89.95312 to 89.95312.
That mean 921600 points of the target grid are outside the region of the source grid.

    (1-7/7)