Project

General

Profile

converting .csv to .nc to perform time-based statistics

Added by Kyle Clem about 4 years ago

Hello,

I have a .csv file containing data on atmospheric rivers: Year (yyyy), Month (mm), Day (mm), Hour (hh), Length, Mean IVT, Direction, Max, Mean at Landfall, Direction at Landfall. These data are not gridded, but are organized as a set of rows and columns in .csv format.

I would like to perform daily and monthly statistics on these data (i.e., monmean, daymean, etc). This is a bit of a shot in the dark, but is this possible to do with CDO? I realize CDO cannot perform monmean or daymean on a .csv file, so I am wondering if it is possible to convert these data to .nc format. However, I don't know how I would do this since the data are not gridded (i.e., lat/lon). Has anyone ever done anything like this before?

Thank you for any advice you can provide.

Regards,
Kyle


Replies (17)

RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller about 4 years ago

hi Kyle!

this depends on the specific input format. you can check the input operator for that. As an example input file you can use IN.csv. Its nothing but these two lines

1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
Then the following command does the job
cdo -f nc input,r1x8 o.nc <IN.csv
The input operator needs a grid description and optionally a indicator for the vertical axis. otherwise consecutive line of values will read in as timesteps:
ncdump o.nc   
netcdf o {
dimensions:
    time = UNLIMITED ; // (2 currently)
    lon = 1 ;
    lat = 8 ;
variables:
    double time(time) ;
        time:standard_name = "time" ;
        time:units = "hours since 1-1-1 00:00:00" ;
        time:calendar = "proleptic_gregorian" ;
        time:axis = "T" ;
    double lon(lon) ;
        lon:standard_name = "longitude" ;
        lon:long_name = "longitude" ;
        lon:units = "degrees_east" ;
        lon:axis = "X" ;
    double lat(lat) ;
        lat:standard_name = "latitude" ;
        lat:long_name = "latitude" ;
        lat:units = "degrees_north" ;
        lat:axis = "Y" ;
    float var1(time, lat, lon) ;

// global attributes:
        :CDI = "Climate Data Interface version 1.9.8 (https://mpimet.mpg.de/cdi)" ;
        :Conventions = "CF-1.6" ;
        :history = "Thu Apr 16 10:02:40 2020: cdo -f nc input,r1x8 o.nc" ;
        :CDO = "Climate Data Operators version 1.9.8 (https://mpimet.mpg.de/cdo)" ;
data:

 time = -9552, -9552 ;

 lon = 0 ;

 lat = -78.75, -56.25, -33.75, -11.25, 11.25, 33.75, 56.25, 78.75 ;

 var1 =
  1,
  2,
  3,
  4,
  5,
  6,
  7,
  8,
  1,
  2,
  3,
  4,
  5,
  6,
  7,
  8 ;
}

you can also reset the grid and the time axis with setgrid and settaxis.

hth
ralf

IN.csv (32 Bytes) IN.csv

RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem about 4 years ago

Hi Ralf,

Many thanks for your reply! First of all, I just want to confirm that "input" operator is a file containing all the information that you have shown above, correct? Secondly, my interpretation of what you did is set var1 values of 1, 2, 3, 4, 5, 6, 7, 8 to longitude=0, latitude=-78.75, -56.25, -33.75, -11.25, 11.25, 33.75, 56.25, 78.75 for two time steps, correct?

I am attaching my .csv file. In my case, I want to copy over the time coordinates, yyyy mm dd hh (UTC), and assign the corresponding six variables to each time step. So for my case, would I want to define my time steps as you did:

time:units = "hours since 1-1-1 00:00:00",

then 1) add all my time steps under data->time as 1979-03-01 06:00:00, 1979-03-01 18:00:00, ..... etc until my last time step?

Then 2) set lat=0, set lon=0 (even though I don't have lat/lon)

Lastly 3) set my six variables (var1, var2,...,var6), which equal the same number of time steps?

Then would this write a NetCDF file that has vars 1-6 for each time step at arbitrary lat/lon values of 0,0?

Thanks Ralf!!
-Kyle

RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller about 4 years ago

1979 3 10 12 2846.5 364.4 296.0 236.4 157.6 290.8
means what? 1979-03-10 10:00:00 and the rest is a single value for 7 variables? or 7 values of the same variable? could be 7 horizontal locations or 7 levels ...

taken into account what you posted it is seven different variables, no levels, no locations.

with "input operator" i meant the CDO operator called input. It's an operation, not a file.

I used this ruby script:

require 'cdo'                       
require 'parallel'                  
cdo = Cdo.new                       
inputs=File.open(ARGV[0]).readlines.map(&:chomp).map(&:split)                               

Parallel.each(inputs, in_threads: 12) {|input|                                                                                                 
  year, month, day, hour, *values = input
  puts '############################## '+year
  values.each_with_index {|value,index|
    cdo.settaxis("#{year}-#{month}-#{day},#{hour}:00:00,1hours",                              
                 input: "-setname,var#{index} -const,#{value},r1x1", 
                 output: "var#{index}_#{year}-#{month}-#{day}_#{hour}.nc", 
                 options: '-f nc') 
}                                   

(0..7).each {|i|                    
  cdo.cat(input: Dir.glob("var#{i}_*.nc").sort, output: "VAR#{i}.nc")                       
} 
some signs are broken, so I uploaded kyle.rb.
The other uploads are the results files - one for each variable.

for using this script you have to install a ruby interpreter, CDO and to extra packages: cdo and parallel. you can install them both with

gem install parallel cdo --user

VAR5.nc (128 KB) VAR5.nc
VAR4.nc (128 KB) VAR4.nc
VAR3.nc (128 KB) VAR3.nc
VAR2.nc (128 KB) VAR2.nc
VAR1.nc (128 KB) VAR1.nc
VAR0.nc (128 KB) VAR0.nc
kyle.rb (573 Bytes) kyle.rb

RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem almost 4 years ago

Dear Ralf,

I am sorry for my slow reply. Thank you for the helpful information you provided and for sharing your output. I've checked your netcdf output and they don't match with the correct time steps. I am attaching the text data with the headers. The data are organized into 10 columns ordered by time step. The first four columns, by which the data are organized, are year (yyyy), month (m), day (dd), and hour (UTC). The last six columns are just atmospheric river variables associated with each time step; there are no horizontal locations or vertical levels. These six variables are what I would like to convert to netcdf files corresponding to each time step, exactly like you have done.

However, the data begins at 1979-03-10 12:00:00 while your netcdf file begins at 1979-10-10 12:00:00 (and the data ends at 2019-03-29 12:00:00 while your's ends at 2018-09-09 18:00:00). So it appears there is some sort of mismatch here, maybe because I didn't clarify that the first four columns are time variables and the last six columns are variables? Any idea what the problem might be?

Thank you,
Kyle

RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller almost 4 years ago

hi!

Please check the Var0.nc file, it starts with 1979-10-03T12:00:00:

do sinfov VAR0.nc
   File format : NetCDF2
    -1 : Institut Source   T Steptype Levels Num    Points Num Dtype : Parameter name
     1 : unknown  unknown  v instant       1   1         1   1  F32  : const         
   Grid coordinates :
     1 : lonlat                   : points=1 (1x1)
                              lon : 0 degrees_east
                              lat : 0 degrees_north
   Vertical coordinates :
     1 : surface                  : levels=1
   Time coordinate :  12 steps
     RefTime =  1979-03-10 12:00:00  Units = hours  Calendar = proleptic_gregorian
  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
  1979-03-10 12:00:00  1979-03-11 18:00:00  1979-03-12 00:00:00  1979-03-13 06:00:00
  1979-03-19 06:00:00  1979-03-26 06:00:00  1979-04-10 18:00:00  1979-04-12 18:00:00
  1979-04-20 18:00:00  1979-04-21 06:00:00  1979-04-02 18:00:00  1979-04-08 06:00:00
cdo    sinfon: Processed 1 variable over 12 timesteps [0.07s 44MB].

RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem almost 4 years ago

Hi Ralf,
Great to hear from you, thanks for your reply.

I'm not sure what the Var0.nc file is...do you mean VAR0.nc? For me it starts with 1979-10-10 12:00:00:

cdo sinfov VAR0.nc
File format: netCDF2
-1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name
1 : unknown unknown instant 1 1 1 1 F32 : var0
Grid coordinates :
1 : lonlat > size : dim = 1 nx = 1 ny = 1
lon : first = 0 degrees_east
lat : first = 0 degrees_north
Vertical coordinates :
1 : surface : 0
Time coordinate : 4003 steps
RefTime = 1979-10-10 12:00:00 Units = hours Calendar = proleptic_gregorian
YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss
1979-10-10 12:00:00 1979-10-10 18:00:00 1979-10-10 06:00:01 1979-10-14 00:00:00
1979-10-14 12:00:00 1979-10-14 18:00:00 1979-10-14 06:00:00 1979-10-15 00:00:00
1979-10-15 06:00:00 1979-10-16 00:00:00 1979-10-19 12:00:00 1979-10-19 18:00:00
1979-10-01 12:00:00 1979-11-23 00:00:00 1979-11-23 12:00:00 1979-11-23 18:00:00
1979-11-23 06:00:00 1979-11-24 00:00:00 1979-11-24 12:00:00 1979-11-24 18:00:00

RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller almost 4 years ago

interesting ... I checked the file on disk, rather than the upload itself. seems to be different.

let me reprocess the whole thing - I am sure we clear this up. And sorry for the confusion. Will take some minutes, I guess.

RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem almost 4 years ago

Ah ha! Well that's good to know, we may not have a problem with the script, but rather an issue of which files uploaded. Thank you for checking.
Kyle

RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller almost 4 years ago

I think I know what happened: days, months and hours do not have leading 0. So the whole time-axis was messed up and I didn't put a timsort after the cat operators. Here is the re-processed output include the new ruby script

kyle.rb (775 Bytes) kyle.rb
VAR5.nc (134 KB) VAR5.nc
VAR4.nc (134 KB) VAR4.nc
VAR2.nc (134 KB) VAR2.nc
VAR3.nc (134 KB) VAR3.nc
VAR1.nc (134 KB) VAR1.nc
VAR0.nc (134 KB) VAR0.nc

RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem almost 4 years ago

Hi Ralf,

Wow, thank you for you effort to help me! This was my fault, I told you dd, mm, etc. Sorry I made this a pain. Your script is brilliant. You really went above and beyond what I was expecting. How do I cite/acknowledge your help in publications?

Thanks,
Kyle

RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem almost 4 years ago

Hi Ralf,
I am having trouble running the ruby script. I installed the ruby interpreter, CDO and extra packages as you instructed:

gem install parallel cdo --user
Successfully installed parallel-1.19.1
Parsing documentation for parallel-1.19.1
Done installing documentation for parallel after 0 seconds
Successfully installed cdo-1.5.0
Parsing documentation for cdo-1.5.0
Done installing documentation for cdo after 0 seconds

And then ran the script:

ruby kyle.rb
/System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- numru/netcdf_miss (LoadError)
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
from /Users/kclem/.gem/ruby/2.3.0/gems/cdo-1.5.0/lib/cdo.rb:355:in `loadOptionalLibs'
from /Users/kclem/.gem/ruby/2.3.0/gems/cdo-1.5.0/lib/cdo.rb:56:in `initialize'
from kyle.rb:3:in `new'
from kyle.rb:3:in `<main>'

I have a couple of questions, though you may see what the problem is. First, I have CDO 1.6.3 already installed and that is my default CDO path...do I need to install CDO again as you instructed? And/or do I need to somehow link parallel to my existing CDO libraries? And lastly, how is the .csv file read into the script? I don't see where the data is read in.

Thank you for your help.
Best wishes,
Kyle

RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller almost 4 years ago

Kyle Clem wrote:

Hi Ralf,
I am having trouble running the ruby script. I installed the ruby interpreter, CDO and extra packages as you instructed:

gem install parallel cdo --user
Successfully installed parallel-1.19.1
Parsing documentation for parallel-1.19.1
Done installing documentation for parallel after 0 seconds
Successfully installed cdo-1.5.0
Parsing documentation for cdo-1.5.0
Done installing documentation for cdo after 0 seconds

And then ran the script:

ruby kyle.rb
/System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- numru/netcdf_miss (LoadError)
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
from /Users/kclem/.gem/ruby/2.3.0/gems/cdo-1.5.0/lib/cdo.rb:355:in `loadOptionalLibs'
from /Users/kclem/.gem/ruby/2.3.0/gems/cdo-1.5.0/lib/cdo.rb:56:in `initialize'
from kyle.rb:3:in `new'
from kyle.rb:3:in `<main>'

you need the ruby-netcdf package,too. sorry, my bad. gem install ruby-netcdf --user should do it. but your need to install the netcdf (header + shared object) library first. maybe with macports. In case you don't manage to install netcdf, you can remove the line

require "numru/netcdf_miss"
because this feature is not uses in the script (kyle.rb),

I have a couple of questions, though you may see what the problem is. First, I have CDO 1.6.3 already installed and that is my default CDO path...do I need to install CDO again as you instructed? And/or do I need to somehow link parallel to my existing CDO libraries? And lastly, how is the .csv file read into the script? I don't see where the data is read in.

the csv file needs to be given on the command line like

ruby kyle.rb ar_myData.csv
An update of CDO you be really helpful. again: macports has CDO in the current release version.

hth
ralf

Thank you for your help.
Best wishes,
Kyle

RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem almost 4 years ago

Hi Ralf,

Hmm, I don't follow what you mean by remove the line

require "numru/netcdf_miss" 

because in the Ruby script we only have

require 'cdo'
require 'parallel'

Just an FYI I have CDO versions 1.6.3 as well as 1.9.5 both installed. I don't know what the gem install of parallel cdo does, e.g. when I run

gem install parallel cdo --user
Successfully installed parallel-1.19.1
Parsing documentation for parallel-1.19.1
Done installing documentation for parallel after 0 seconds
Successfully installed cdo-1.5.0
Parsing documentation for cdo-1.5.0
Done installing documentation for cdo after 0 seconds
2 gems installed

what exactly did this do that would allow me to run your Ruby script? There doesn't seem to be a "parallel" command, e.g.

which parallel
parallel: Command not found.

I'm sorry for my ignorance, this is why I don't understand why the script is failing.

RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller almost 4 years ago

hi Kyle!

I guess this is by bad: I am so used to this stuff, that I leave out the essential parts.

gem is a ruby package manager like pip for python or npm for node. you can install many ruby libraries on your own into system directories (with root access rights) or into your $HOME directory. To install the missing library, you need to run

gem install ruby-netcdf --user-install
Whatever you installed is a ruby-library and does not necessarily come with an executable that could be found with which. There are such gem packages, but cdo and parallel aren't.
Since you already installed the other ruby libraries (cdo and parallel) the script should be ready to go if ruby-netcdf is on your system, too.

RE: converting .csv to .nc to perform time-based statistics - Added by Kyle Clem almost 4 years ago

Ah, I see! This is helpful to know, thank you.

Unfortunately I can't install ruby-netcdf:

gem install ruby-netcdf --user-install
Building native extensions.  This could take a while...
ERROR:  Error installing ruby-netcdf:
    ERROR: Failed to build gem native extension.

    current directory: /Users/kclem/.gem/ruby/2.3.0/gems/narray-0.6.1.2/src
/System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/bin/ruby -r ./siteconf20200506-1581-1b78jpu.rb extconf.rb
mkmf.rb can't find header files for ruby at /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/include/ruby.h

extconf failed, exit code 1

Gem files will remain installed in /Users/kclem/.gem/ruby/2.3.0/gems/narray-0.6.1.2 for inspection.
Results logged to /Users/kclem/.gem/ruby/2.3.0/extensions/universal-darwin-18/2.3.0/narray-0.6.1.2/gem_make.out

I suspect it is because I have not installed the netcdf (header + shared object) library you mentioned. I've never heard of this...is this a ruby package, or a stand alone package like my netcdf-c libraries?

You also mentioned that if I don't install netcdf, I can remove the line

require "numru/netcdf_miss" 

Can you please clarify what you mean because this line is not in the script? I realize this is getting quite off topic, I'm sorry. I understand if you need to focus on more CDO related items.
Thank you,
Kyle

RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller almost 4 years ago

I am not an expert with MacOS, but did you install CDO using macports?

RE: converting .csv to .nc to perform time-based statistics - Added by Ralf Mueller almost 4 years ago

I uploaded a python version, maybe it's easier to installed its dependencies. you need

  • python3 (macports)
  • py-netcdf4 (macports)
  • cdo (current release 1.9.8, macports) - this is optional since you seem to already have a CDO binary running

then you need the cdo-python bindings:

pip install cdo --user

hth
ralf

PS: starting the script works identical to the ruby version:

 python kyle.py ar_characteristics.csv

kyle.py (1.22 KB) kyle.py
    (1-17/17)