converting .csv to .nc to perform time-based statistics
Added by Kyle Clem over 4 years ago
Hello,
I have a .csv file containing data on atmospheric rivers: Year (yyyy), Month (mm), Day (mm), Hour (hh), Length, Mean IVT, Direction, Max, Mean at Landfall, Direction at Landfall. These data are not gridded, but are organized as a set of rows and columns in .csv format.
I would like to perform daily and monthly statistics on these data (i.e., monmean, daymean, etc). This is a bit of a shot in the dark, but is this possible to do with CDO? I realize CDO cannot perform monmean or daymean on a .csv file, so I am wondering if it is possible to convert these data to .nc format. However, I don't know how I would do this since the data are not gridded (i.e., lat/lon). Has anyone ever done anything like this before?
Thank you for any advice you can provide.
Regards,
Kyle
Replies (17)
RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller over 4 years ago
hi Kyle!
this depends on the specific input format. you can check the input
operator for that. As an example input file you can use IN.csv
. Its nothing but these two lines
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8Then the following command does the job
cdo -f nc input,r1x8 o.nc <IN.csvThe
input
operator needs a grid description and optionally a indicator for the vertical axis. otherwise consecutive line of values will read in as timesteps:ncdump o.nc netcdf o { dimensions: time = UNLIMITED ; // (2 currently) lon = 1 ; lat = 8 ; variables: double time(time) ; time:standard_name = "time" ; time:units = "hours since 1-1-1 00:00:00" ; time:calendar = "proleptic_gregorian" ; time:axis = "T" ; double lon(lon) ; lon:standard_name = "longitude" ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; lon:axis = "X" ; double lat(lat) ; lat:standard_name = "latitude" ; lat:long_name = "latitude" ; lat:units = "degrees_north" ; lat:axis = "Y" ; float var1(time, lat, lon) ; // global attributes: :CDI = "Climate Data Interface version 1.9.8 (https://mpimet.mpg.de/cdi)" ; :Conventions = "CF-1.6" ; :history = "Thu Apr 16 10:02:40 2020: cdo -f nc input,r1x8 o.nc" ; :CDO = "Climate Data Operators version 1.9.8 (https://mpimet.mpg.de/cdo)" ; data: time = -9552, -9552 ; lon = 0 ; lat = -78.75, -56.25, -33.75, -11.25, 11.25, 33.75, 56.25, 78.75 ; var1 = 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8 ; }
you can also reset the grid and the time axis with setgrid
and settaxis
.
hth
ralf
RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem over 4 years ago
Hi Ralf,
Many thanks for your reply! First of all, I just want to confirm that "input" operator is a file containing all the information that you have shown above, correct? Secondly, my interpretation of what you did is set var1 values of 1, 2, 3, 4, 5, 6, 7, 8 to longitude=0, latitude=-78.75, -56.25, -33.75, -11.25, 11.25, 33.75, 56.25, 78.75 for two time steps, correct?
I am attaching my .csv file. In my case, I want to copy over the time coordinates, yyyy mm dd hh (UTC), and assign the corresponding six variables to each time step. So for my case, would I want to define my time steps as you did:
time:units = "hours since 1-1-1 00:00:00",
then 1) add all my time steps under data->time as 1979-03-01 06:00:00, 1979-03-01 18:00:00, ..... etc until my last time step?
Then 2) set lat=0, set lon=0 (even though I don't have lat/lon)
Lastly 3) set my six variables (var1, var2,...,var6), which equal the same number of time steps?
Then would this write a NetCDF file that has vars 1-6 for each time step at arbitrary lat/lon values of 0,0?
Thanks Ralf!!
-Kyle
RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller over 4 years ago
1979 3 10 12 2846.5 364.4 296.0 236.4 157.6 290.8means what?
1979-03-10
10:00:00
and the rest is a single value for 7 variables? or 7 values of the same variable? could be 7 horizontal locations or 7 levels ...
taken into account what you posted it is seven different variables, no levels, no locations.
with "input operator" i meant the CDO operator called input. It's an operation, not a file.
I used this ruby script:
require 'cdo'
require 'parallel'
cdo = Cdo.new
inputs=File.open(ARGV[0]).readlines.map(&:chomp).map(&:split)
Parallel.each(inputs, in_threads: 12) {|input|
year, month, day, hour, *values = input
puts '############################## '+year
values.each_with_index {|value,index|
cdo.settaxis("#{year}-#{month}-#{day},#{hour}:00:00,1hours",
input: "-setname,var#{index} -const,#{value},r1x1",
output: "var#{index}_#{year}-#{month}-#{day}_#{hour}.nc",
options: '-f nc')
}
(0..7).each {|i|
cdo.cat(input: Dir.glob("var#{i}_*.nc").sort, output: "VAR#{i}.nc")
}
some signs are broken, so I uploaded kyle.rb.The other uploads are the results files - one for each variable.
for using this script you have to install a ruby interpreter, CDO and to extra packages: cdo and parallel. you can install them both with
gem install parallel cdo --user
RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem over 4 years ago
Dear Ralf,
I am sorry for my slow reply. Thank you for the helpful information you provided and for sharing your output. I've checked your netcdf output and they don't match with the correct time steps. I am attaching the text data with the headers. The data are organized into 10 columns ordered by time step. The first four columns, by which the data are organized, are year (yyyy), month (m), day (dd), and hour (UTC). The last six columns are just atmospheric river variables associated with each time step; there are no horizontal locations or vertical levels. These six variables are what I would like to convert to netcdf files corresponding to each time step, exactly like you have done.
However, the data begins at 1979-03-10 12:00:00 while your netcdf file begins at 1979-10-10 12:00:00 (and the data ends at 2019-03-29 12:00:00 while your's ends at 2018-09-09 18:00:00). So it appears there is some sort of mismatch here, maybe because I didn't clarify that the first four columns are time variables and the last six columns are variables? Any idea what the problem might be?
Thank you,
Kyle
RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller over 4 years ago
hi!
Please check the Var0.nc file, it starts with 1979-10-03T12:00:00
:
do sinfov VAR0.nc File format : NetCDF2 -1 : Institut Source T Steptype Levels Num Points Num Dtype : Parameter name 1 : unknown unknown v instant 1 1 1 1 F32 : const Grid coordinates : 1 : lonlat : points=1 (1x1) lon : 0 degrees_east lat : 0 degrees_north Vertical coordinates : 1 : surface : levels=1 Time coordinate : 12 steps RefTime = 1979-03-10 12:00:00 Units = hours Calendar = proleptic_gregorian YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss 1979-03-10 12:00:00 1979-03-11 18:00:00 1979-03-12 00:00:00 1979-03-13 06:00:00 1979-03-19 06:00:00 1979-03-26 06:00:00 1979-04-10 18:00:00 1979-04-12 18:00:00 1979-04-20 18:00:00 1979-04-21 06:00:00 1979-04-02 18:00:00 1979-04-08 06:00:00 cdo sinfon: Processed 1 variable over 12 timesteps [0.07s 44MB].
RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem over 4 years ago
Hi Ralf,
Great to hear from you, thanks for your reply.
I'm not sure what the Var0.nc file is...do you mean VAR0.nc? For me it starts with 1979-10-10 12:00:00:
cdo sinfov VAR0.nc
File format: netCDF2
-1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name
1 : unknown unknown instant 1 1 1 1 F32 : var0
Grid coordinates :
1 : lonlat > size : dim = 1 nx = 1 ny = 1
lon : first = 0 degrees_east
lat : first = 0 degrees_north
Vertical coordinates :
1 : surface : 0
Time coordinate : 4003 steps
RefTime = 1979-10-10 12:00:00 Units = hours Calendar = proleptic_gregorian
YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss
1979-10-10 12:00:00 1979-10-10 18:00:00 1979-10-10 06:00:01 1979-10-14 00:00:00
1979-10-14 12:00:00 1979-10-14 18:00:00 1979-10-14 06:00:00 1979-10-15 00:00:00
1979-10-15 06:00:00 1979-10-16 00:00:00 1979-10-19 12:00:00 1979-10-19 18:00:00
1979-10-01 12:00:00 1979-11-23 00:00:00 1979-11-23 12:00:00 1979-11-23 18:00:00
1979-11-23 06:00:00 1979-11-24 00:00:00 1979-11-24 12:00:00 1979-11-24 18:00:00
RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller over 4 years ago
interesting ... I checked the file on disk, rather than the upload itself. seems to be different.
let me reprocess the whole thing - I am sure we clear this up. And sorry for the confusion. Will take some minutes, I guess.
RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem over 4 years ago
Ah ha! Well that's good to know, we may not have a problem with the script, but rather an issue of which files uploaded. Thank you for checking.
Kyle
RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller over 4 years ago
I think I know what happened: days, months and hours do not have leading 0. So the whole time-axis was messed up and I didn't put a timsort
after the cat
operators. Here is the re-processed output include the new ruby script
RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem over 4 years ago
Hi Ralf,
Wow, thank you for you effort to help me! This was my fault, I told you dd, mm, etc. Sorry I made this a pain. Your script is brilliant. You really went above and beyond what I was expecting. How do I cite/acknowledge your help in publications?
Thanks,
Kyle
RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem over 4 years ago
Hi Ralf,
I am having trouble running the ruby script. I installed the ruby interpreter, CDO and extra packages as you instructed:
gem install parallel cdo --user
Successfully installed parallel-1.19.1
Parsing documentation for parallel-1.19.1
Done installing documentation for parallel after 0 seconds
Successfully installed cdo-1.5.0
Parsing documentation for cdo-1.5.0
Done installing documentation for cdo after 0 seconds
And then ran the script:
ruby kyle.rb
/System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- numru/netcdf_miss (LoadError)
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
from /Users/kclem/.gem/ruby/2.3.0/gems/cdo-1.5.0/lib/cdo.rb:355:in `loadOptionalLibs'
from /Users/kclem/.gem/ruby/2.3.0/gems/cdo-1.5.0/lib/cdo.rb:56:in `initialize'
from kyle.rb:3:in `new'
from kyle.rb:3:in `<main>'
I have a couple of questions, though you may see what the problem is. First, I have CDO 1.6.3 already installed and that is my default CDO path...do I need to install CDO again as you instructed? And/or do I need to somehow link parallel to my existing CDO libraries? And lastly, how is the .csv file read into the script? I don't see where the data is read in.
Thank you for your help.
Best wishes,
Kyle
RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller over 4 years ago
Kyle Clem wrote:
Hi Ralf,
I am having trouble running the ruby script. I installed the ruby interpreter, CDO and extra packages as you instructed:
gem install parallel cdo --user
Successfully installed parallel-1.19.1
Parsing documentation for parallel-1.19.1
Done installing documentation for parallel after 0 seconds
Successfully installed cdo-1.5.0
Parsing documentation for cdo-1.5.0
Done installing documentation for cdo after 0 secondsAnd then ran the script:
ruby kyle.rb
/System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- numru/netcdf_miss (LoadError)
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
from /Users/kclem/.gem/ruby/2.3.0/gems/cdo-1.5.0/lib/cdo.rb:355:in `loadOptionalLibs'
from /Users/kclem/.gem/ruby/2.3.0/gems/cdo-1.5.0/lib/cdo.rb:56:in `initialize'
from kyle.rb:3:in `new'
from kyle.rb:3:in `<main>'
you need the ruby-netcdf
package,too. sorry, my bad. gem install ruby-netcdf --user
should do it. but your need to install the netcdf (header + shared object) library first. maybe with macports. In case you don't manage to install netcdf, you can remove the line
require "numru/netcdf_miss"because this feature is not uses in the script (kyle.rb),
I have a couple of questions, though you may see what the problem is. First, I have CDO 1.6.3 already installed and that is my default CDO path...do I need to install CDO again as you instructed? And/or do I need to somehow link parallel to my existing CDO libraries? And lastly, how is the .csv file read into the script? I don't see where the data is read in.
the csv file needs to be given on the command line like
ruby kyle.rb ar_myData.csvAn update of CDO you be really helpful. again: macports has CDO in the current release version.
hth
ralf
Thank you for your help.
Best wishes,
Kyle
RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem over 4 years ago
Hi Ralf,
Hmm, I don't follow what you mean by remove the line
require "numru/netcdf_miss"
because in the Ruby script we only have
require 'cdo'
require 'parallel'
Just an FYI I have CDO versions 1.6.3 as well as 1.9.5 both installed. I don't know what the gem install of parallel cdo does, e.g. when I run
gem install parallel cdo --user
Successfully installed parallel-1.19.1
Parsing documentation for parallel-1.19.1
Done installing documentation for parallel after 0 seconds
Successfully installed cdo-1.5.0
Parsing documentation for cdo-1.5.0
Done installing documentation for cdo after 0 seconds
2 gems installed
what exactly did this do that would allow me to run your Ruby script? There doesn't seem to be a "parallel" command, e.g.
which parallel
parallel: Command not found.
I'm sorry for my ignorance, this is why I don't understand why the script is failing.
RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller over 4 years ago
hi Kyle!
I guess this is by bad: I am so used to this stuff, that I leave out the essential parts.
gem
is a ruby package manager like pip
for python or npm
for node. you can install many ruby libraries on your own into system directories (with root access rights) or into your $HOME
directory. To install the missing library, you need to run
gem install ruby-netcdf --user-installWhatever you installed is a ruby-library and does not necessarily come with an executable that could be found with
which
. There are such gem packages, but cdo
and parallel
aren't. Since you already installed the other ruby libraries (
cdo
and parallel
) the script should be ready to go if ruby-netcdf
is on your system, too.RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem over 4 years ago
Ah, I see! This is helpful to know, thank you.
Unfortunately I can't install ruby-netcdf:
gem install ruby-netcdf --user-install
Building native extensions. This could take a while...
ERROR: Error installing ruby-netcdf:
ERROR: Failed to build gem native extension.
current directory: /Users/kclem/.gem/ruby/2.3.0/gems/narray-0.6.1.2/src
/System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/bin/ruby -r ./siteconf20200506-1581-1b78jpu.rb extconf.rb
mkmf.rb can't find header files for ruby at /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/include/ruby.h
extconf failed, exit code 1
Gem files will remain installed in /Users/kclem/.gem/ruby/2.3.0/gems/narray-0.6.1.2 for inspection.
Results logged to /Users/kclem/.gem/ruby/2.3.0/extensions/universal-darwin-18/2.3.0/narray-0.6.1.2/gem_make.out
I suspect it is because I have not installed the netcdf (header + shared object) library you mentioned. I've never heard of this...is this a ruby package, or a stand alone package like my netcdf-c libraries?
You also mentioned that if I don't install netcdf, I can remove the line
require "numru/netcdf_miss"
Can you please clarify what you mean because this line is not in the script? I realize this is getting quite off topic, I'm sorry. I understand if you need to focus on more CDO related items.
Thank you,
Kyle
RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller over 4 years ago
I am not an expert with MacOS, but did you install CDO using macports?
RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller over 4 years ago
I uploaded a python version, maybe it's easier to installed its dependencies. you need
- python3 (macports)
- py-netcdf4 (macports)
- cdo (current release 1.9.8, macports) - this is optional since you seem to already have a CDO binary running
then you need the cdo-python bindings:
pip install cdo --user
hth
ralf
PS: starting the script works identical to the ruby version:
python kyle.py ar_characteristics.csv