Getting rid of duplicate data within a netCDF file
Added by Navajyoth MP over 7 years ago
Hey,
I have a netCDF file, file.nc supposed to be containing monthly data spanning from January, 1960 to December, 2007 (576 months). But the file contains 803 time frames. When I viewed the file using ncview, I could see that there is a duplication of data within the file. Time frames 1 to 576 contain data for the actual period of observation while time frames 577 to 803 contain data for a shorter but duplicate time period.
How do I get rid of the time frames ranging from 577 to 803 using cdo? Also what could be the possible reason for such duplication within the dataset? In other words, is this error purely technical or could there be another reason behind this?
Replies (5)
RE: Getting rid of duplicate data within a netCDF file - Added by Karin Meier-Fleischer over 7 years ago
Hi,
how did you create the data file? Without the data file itself I can only suggest that the creation of the file was wrong.
Bye,
Karin
RE: Getting rid of duplicate data within a netCDF file - Added by Michael Böttinger over 7 years ago
Hi,
in any case, when you are sure that the first 576 time steps are ok,
you can select only those by using
cdo seltimestep,1/576 <file1.nc> <file2.nc>
As for the reason of the duplicated time steps - how did you create that file?
"ncdump -h <file1.nc>" will show you (besides other header information) the file history.
Cheers,
Michael
RE: Getting rid of duplicate data within a netCDF file - Added by Saumya Singh over 2 years ago
@Michael Gertz
Hi, there are duplicate records in the raw file itself. what do I do? I need to delete the duplicate records. Kindly help.
RE: Getting rid of duplicate data within a netCDF file - Added by Karin Meier-Fleischer over 2 years ago
Hi Saumya,
do you mean duplicate times in a netCDF file? What do you mean with 'raw data'?
An example for time duplicates in netCDF file
If you know the time index of the duplicated times, for instance by means of
cdo infon data_with_duplicate_time_records.nc -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name 1 : 2000-07-16 06:00:00 0 18432 0 : 214.46 278.60 305.98 : temp2 2 : 2000-07-16 06:00:00 0 18432 0 : 214.46 278.60 305.98 : temp2 3 : 2001-07-16 06:00:00 0 18432 0 : 214.78 278.65 306.47 : temp2 4 : 2001-07-16 06:00:00 0 18432 0 : 214.78 278.65 306.47 : temp2 5 : 2002-07-16 06:00:00 0 18432 0 : 214.58 278.65 305.64 : temp2 6 : 2003-07-16 06:00:00 0 18432 0 : 214.72 278.79 306.28 : temp2 7 : 2003-07-16 06:00:00 0 18432 0 : 214.72 278.79 306.28 : temp2 8 : 2004-07-16 06:00:00 0 18432 0 : 214.88 278.88 306.48 : temp2 9 : 2005-07-16 06:00:00 0 18432 0 : 214.04 278.83 306.52 : temp2 10 : 2005-07-16 06:00:00 0 18432 0 : 214.04 278.83 306.52 : temp2
You can use the time indices to delete the duplicates with CDO
cdo -delete,timestep=2,4,7,10 data_with_duplicate_time_records.nc outfile.nc
If there are too much duplicates you can use a short python script that uses xarray and numpy to delete the duplicates
import xarray as xr import numpy as np # read the dataset infile = 'data_with_duplicate_time_records.nc' ds = xr.open_dataset(infile) print(ds['time']) # Numpy provides the function np.unique to create an index list of unique time # records of a dataset or array _, index = np.unique(ds['time'], return_index=True) print(index) # create a new dataset which doesn't contain time record duplicates ds_unique = ds.isel(time=index) print(ds_unique['time']) # write the dataset with unique time records to a new netCDF file ds_unique.to_netcdf('temp2_unique_timesteps.nc')
-Karin
RE: Getting rid of duplicate data within a netCDF file - Added by Saumya Singh over 2 years ago
@karin
Hi,
Thank you very much for the quick response. Yes, I was referring to the duplicate times in the NC file. Raw data meant the file downloaded from the source itself, I have not created it. It turns out that each even timestep is a duplicate value for 4 consecutive years. I will try the script provided by you and hope it works for me. I was wondering if there is a way to just delete the even timestep in cdo.
Thanks a lot.
Have a nice day!