'timselmean' method too slow
Added by Xiangfei LI over 2 years ago
I use 'timselmean' method to calculate the mean value of a 2D (stations*time:43119x4464) variable every 6 timesteps (NetCDF file size ~220M). The results are correct but the speed is too slow (~8 minutes for one file!). How can I improve the computing speed?
I also tried another 2D variable with same rows but less columns (i.e., the uploaded file. stations*time:43119x1200). It only takes < 0.5s! It seems that large dimensions greatly slower the calculation speed?
Here is the script: cdo timselmean,6 test.nc out.nc
Thank you very much!
Replies (4)
RE: 'timselmean' method too slow - Added by Karin Meier-Fleischer over 2 years ago
Hi Xiangfei,
I guess that it takes much time due to the number of time steps. Is there only one variable in the large file? If there are multiple variables, the timselmean operation is processed for each variable, and then it might be better to extract the needed variable first.
RE: 'timselmean' method too slow - Added by Xiangfei LI over 2 years ago
Thank you for the prompt reply!
I post the structure of the large file bellow, in which only one variable ('waterlevel') has a 'time' dimension.
I also tried extract the 'waterlevel' variable (cdo selname,waterlevel in.nc out.nc), it takes another ~8 minutes . But when I apply the 'timselmean' method to the extracted file, it indeed takes much less time (~1.3s).
- large file structure:
ncdisp('largeFile.nc')
Source:
C:\myWorks\3.CoastalErosionExtremeSeaLevel\DATA\TotalWaterLevel\HadGEM3-GC31-HM_future_waterlevel_2041_01_v1.nc
Format:
netcdf4
Global Attributes:
Conventions = 'CF-1.6'
featureType = 'timeSeries'
id = 'GTSMv3_totalwaterlevels'
naming_authority = 'https://deltares.nl/en'
Metadata_Conventions = 'Unidata Dataset Discovery v1.0'
title = '10-minute timeseries of total water levels'
summary = 'This dataset has been produced with the Global Tide and Surge Model (GTSM) version 3.0. GTSM was forced with wind speed and pressure fields from HadGEM3-GC31-HM dataset'
date_created = '2021-03-31 15:11:54.110827 UTC'
project = 'GTSMip and C3S_435_Lot8 Deltares'
acknowledgment = 'The development of this dataset was financed with Deltares Strategic Research Program. Additional funding was received by Contract C3S_435_Lot8 Deltares'
license = 'Copernicus Products License'
institution = 'Deltares & Vrije Universiteit Amsterdam'
sea_name = 'global'
source = 'GTSMv3 forced with HadGEM3-GC31-HM dataset'
keywords = 'sea-level rise; climate change; water level; climate; tides; hydrography; global tide and surge model;'
keywords_vocabulary = 'http://www.eionet.europa.eu/gemet'
standard_name_vocabulary = ''
geospatial_lat_min = '-84.712'
geospatial_lat_max = '83.65'
geospatial_lon_min = '-179.985'
geospatial_lon_max = '179.956'
geospatial_lat_units = 'degrees_north'
geospatial_lat_resolution = 'point'
geospatial_lon_units = 'degrees_east'
geospatial_lon_resolution = 'point'
geospatial_vertical_min = '-7.351'
geospatial_vertical_max = '8.41'
geospatial_vertical_units = 'm'
geospatial_vertical_positive = 'up'
time_coverage_start = '2041-01-01 00:00:00'
time_coverage_end = '2041-01-31 23:50:00'
experiment = 'highres-future'
date_modified = '2021-05-14 23:11:37.704130 UTC'
_NCProperties = 'version=2,netcdf=4.7.3,hdf5=1.10.4'
contact = 'Please contact Copernicus User Support on the Copernicus Climate Change Service website (https://climate.copernicus.eu/).'
history = 'This is version 1 of the dataset'
Dimensions:
time = 4464
stations = 43119
Variables:
waterlevel
Size: 43119x4464
Dimensions: stations,time
Datatype: int16
Attributes:
_FillValue = -999
coordinates = 'station_x_coordinate station_y_coordinate'
scale_factor = 0.001
station_x_coordinate
Size: 43119x1
Dimensions: stations
Datatype: int32
Attributes:
_FillValue = -999
units = 'degrees_east'
short_name = 'longitude'
long_name = 'longitude'
crs = 'EPSG:4326'
scale_factor = 0.001
station_y_coordinate
Size: 43119x1
Dimensions: stations
Datatype: int32
Attributes:
_FillValue = -999
units = 'degrees_north'
short_name = 'latitude'
long_name = 'latitude'
crs = 'EPSG:4326'
scale_factor = 0.001
time
Size: 4464x1
Dimensions: time
Datatype: double
Attributes:
_FillValue = NaN
axis = 'T'
long_name = 'time'
short_name = 'time'
units = 'seconds since 1900-01-01'
calendar = 'proleptic_gregorian'
stations
Size: 43119x1
Dimensions: stations
Datatype: uint16
- extracted file structure:
ncdisp('extractedFile.nc')
Source:
C:\myWorks\3.CoastalErosionExtremeSeaLevel\DATA\TotalWaterLevel\test.nc
Format:
netcdf4
Global Attributes:
CDI = 'Climate Data Interface version 1.9.9rc1 (https://mpimet.mpg.de/cdi)'
Conventions = 'CF-1.6'
history = 'Thu Sep 08 20:57:44 2022: cdo selname,waterlevel HadGEM3-GC31-HM_future_waterlevel_2041_01_v1.nc test.nc
This is version 1 of the dataset'
_NCProperties = 'version=2,netcdf=4.7.3,hdf5=1.10.4'
source = 'GTSMv3 forced with HadGEM3-GC31-HM dataset'
institution = 'Deltares & Vrije Universiteit Amsterdam'
featureType = 'timeSeries'
id = 'GTSMv3_totalwaterlevels'
naming_authority = 'https://deltares.nl/en'
Metadata_Conventions = 'Unidata Dataset Discovery v1.0'
title = '10-minute timeseries of total water levels'
summary = 'This dataset has been produced with the Global Tide and Surge Model (GTSM) version 3.0. GTSM was forced with wind speed and pressure fields from HadGEM3-GC31-HM dataset'
date_created = '2021-03-31 15:11:54.110827 UTC'
project = 'GTSMip and C3S_435_Lot8 Deltares'
acknowledgment = 'The development of this dataset was financed with Deltares Strategic Research Program. Additional funding was received by Contract C3S_435_Lot8 Deltares'
license = 'Copernicus Products License'
sea_name = 'global'
keywords = 'sea-level rise; climate change; water level; climate; tides; hydrography; global tide and surge model;'
keywords_vocabulary = 'http://www.eionet.europa.eu/gemet'
geospatial_lat_min = '-84.712'
geospatial_lat_max = '83.65'
geospatial_lon_min = '-179.985'
geospatial_lon_max = '179.956'
geospatial_lat_units = 'degrees_north'
geospatial_lat_resolution = 'point'
geospatial_lon_units = 'degrees_east'
geospatial_lon_resolution = 'point'
geospatial_vertical_min = '-7.351'
geospatial_vertical_max = '8.41'
geospatial_vertical_units = 'm'
geospatial_vertical_positive = 'up'
time_coverage_start = '2041-01-01 00:00:00'
time_coverage_end = '2041-01-31 23:50:00'
experiment = 'highres-future'
date_modified = '2021-05-14 23:11:37.704130 UTC'
contact = 'Please contact Copernicus User Support on the Copernicus Climate Change Service website (https://climate.copernicus.eu/).'
CDO = 'Climate Data Operators version 1.9.9rc1 (https://mpimet.mpg.de/cdo)'
Dimensions:
time = 4464 (UNLIMITED)
stations = 43119
Variables:
time
Size: 4464x1
Dimensions: time
Datatype: double
Attributes:
standard_name = 'time'
long_name = 'time'
units = 'seconds since 1900-01-01'
calendar = 'proleptic_gregorian'
axis = 'T'
station_x_coordinate
Size: 43119x1
Dimensions: stations
Datatype: int32
Attributes:
standard_name = 'longitude'
long_name = 'longitude'
units = 'degrees_east'
station_y_coordinate
Size: 43119x1
Dimensions: stations
Datatype: int32
Attributes:
standard_name = 'latitude'
long_name = 'latitude'
units = 'degrees_north'
waterlevel
Size: 43119x4464
Dimensions: stations,time
Datatype: int16
Attributes:
CDI_grid_type = 'unstructured'
coordinates = 'station_y_coordinate station_x_coordinate'
add_offset = 0
scale_factor = 0.001
_FillValue = -999
missing_value = -999
RE: 'timselmean' method too slow - Added by Xiangfei LI over 2 years ago
Sorry for the messy.
I upload them as a TXT file.
fileStructure.txt (9.8 KB) fileStructure.txt |
RE: 'timselmean' method too slow - Added by Uwe Schulzweida over 2 years ago
An analysis of performance problems is only possible with the original file.
I assume that the station data were chunked over time. You can check it with:
ncdump -h -s filename.nc | grep _ChunkSizes result: waterlevel:_ChunkSizes = 4464, 1 ;This creates a very unfavorable access pattern to the data in CDO. Unfortunately there is no possibility to accelerate this.