Project

General

Profile

Analyze

This is the main access to all installed analysis tools and the history. The tools are implemented by providing plug-ins to the system. For more information on how to create a plugin check the Developing a Plugin guides. An Overview of installed tools you get in the BUG.

Basic Usage

To get the help:

$ analyze --help
analyze [opt] query 
opt:
[...]

To list all available analysis tools:

$ analyze --list-tools
PCA: Principal Component Analysis

The Overview of tools installed.

To select a particular tool:

$ analyze --tool pca
Missing required configuration for: input, variable

You see here that the PCA tool is complaining because of an incomplete configuration.

To get the help of a particular tool:

$ analyze --tool pca --help
PCA (v3.1.0): Principal Component Analysis
Options:
areaweight     (default: False)
               Whether or not you want to have your data area weighted. This is
               done per latitude with sqrt(cos(latitude)).

boots          (default: 100)
               Number of bootstraps.
[...]
input          (default: None) [mandatory]
               An arbitrary NetCDF file. There are only two restrictions to your
               NetCDF file: a) Time has to be the very first dimension in the
               variable you like to analyze. b) All dimensions in your variable
               need to be defined as variables themselves with equal names.
               Both, a) and b), are usually true.
[...]

Here you see the configuration parameter, its default value (None means there is no value setup), whether the configuration is mandatory ([mandatory] marking by the default value) and an explanation about the configuration parameter.

To pass the values to the tool you just need to use the key=value construct like this:

$ analyze --tool pca input=myfile.nc outputdir=/tmp eofs=3
[...]

You may even define variables in terms of other variables like the projection name above. While doing so from the shell please remember you need to escape the $ sign by using the backslash (\) or setting the value in single quotes (no, double quotes don't work). For example:

$ analyze --tool pca input=myfile_\${eofs}.nc outputdir=/tmp eofs=3
#or
$ analyze --tool pca 'input=myfile_${eofs}.nc' outputdir=/tmp eofs=3

If you want to know more about this bash feature see this and if you want to want to know much more then take a look at this

Quoting is very important on any shell, so if you use them, be sure to know how it works. It may help you avoid losing data!

Configuring the tools

You may want to save the configuration of the tool:

$ analyze --save-config --tool pca variable=tas input=myfile.nc outputdir=/tmp eofs=3
INFO:__main__:Configuration file saved in /home/<user_account>/evaluation_system/config/pca/pca.conf

Note this starts the tool. To just save the configuration without starting the tool use the -n or --dry-run flag.
Also note this stores the configuration in a special directory structure so the system can find it again.

You can save the configuration somewhere else:

$ analyze --save-config --config-file myconfig --dry-run --tool pca variable=tas input=myfile.nc outputdir=/tmp eofs=3
INFO:__main__:Configuration file saved in myconfig

The configuration stored will be used to overwrite the default one. This is a possible usecase:
  1. Change the defaults to suit your general needs:
    $ analyze --save-config --dry-run --tool pca outputdir=/my_output_dir shiftlats=false
    
  2. Prepare some configurations you'll be using recurrently
    $ analyze --save-config --dry-run --config-file pca.tas.conf --tool pca variable=tas
    $ analyze --save-config --dry-run --config-file pca.uas.conf --tool pca variable=uas
    
  3. Use the configurations to your needs
    $ analyze --config-file pca.tas.conf --tool pca input=my_tas_file
    [...]
    $ analyze --config-file pca.uas.conf --tool pca input=my_uas_file
    [...]
    

You may also edit the configuration file manually. This is how it looks like:

[PCA]
#: Filename of the projection (back-transformation). If SESSION=1 this is not 
#:  applicable, if SESSION=2 this is output.
projection=$input.pro.$variable.nc

#: [mandatory] The name of the variable in the NetCDF INPUT file you want to 
#:  analyze.
variable=<THIS MUST BE DEFINED!>

#: Number of bootstraps.
boots=100
[...]

Comments start with the # character and are ignored. You may comment out variables so they are not considered (the defaults will be used instead).
Undefined optional variables without a default value are automatically commented out.
Undefined mandatory variables without a default value are marked with a < THIS MUST BE DEFINED! > message.

You may also skip the configuration and load the default one:

$ analyze --tool pca --use-defaults

And you can just view the resulting configuration:

$ analyze --tool pca --show-config
  areaweight: - (default: False)
       boots: 100
   bootstrap: - (default: False)
 eigvalscale: - (default: False)
        eofs: 3
       input: myfile.nc
     latname: lat
missingvalue: 1e+38
   normalize: - (default: False)
   outputdir: /tmp
     pcafile: $input.pca.$variable.nc
  principals: True
  projection: $input.pro.$variable.nc
     session: 1
   shiftlats: - (default: False)
  testorthog: - (default: False)
     threads: 7
    variable: tas

$ analyze --tool pca --show-config --use-defaults
  areaweight: - (default: False)
       boots: - (default: 100)
   bootstrap: - (default: False)
 eigvalscale: - (default: False)
        eofs: - (default: -1)
       input: - *MUST BE DEFINED!*
     latname: - (default: lat)
missingvalue: - (default: 1e+38)
   normalize: - (default: False)
   outputdir: - (default: $USER_OUTPUT_DIR)
     pcafile: - (default: $input.pca.$variable.nc)
  principals: - (default: True)
  projection: - (default: $input.pro.$variable.nc)
     session: - (default: 1)
   shiftlats: - (default: False)
  testorthog: - (default: False)
     threads: - (default: 8)
    variable: - *MUST BE DEFINED!*

History

To get the history:

$ analyze --history
24) pca [2013-01-14 10:46:44.575529] <THIS MUST BE DEFINED!>.pca.<THIS MUST BE DEFINED!>.nc {u'normalize...
23) pca [2013-01-14 10:46:01.322760] None.pca.None.nc {u'normalize': u'true', u'testorthog': u'true', u'...
22) nclplot [2013-01-11 14:51:40.910996] first_plot.eps {u'plot_name': u'first_plot', u'file_path': u'tas_Am...
21) nclplot [2013-01-11 14:44:15.297102] first_plot.eps {u'plot_name': u'first_plot', u'file_path': u'tas_Am...
20) nclplot [2013-01-11 14:43:37.748200] first_plot.eps {u'plot_name': u'first_plot', u'file_path': u'tas_Am...
[...]

It shows just the 10 latest entries, i.e. the 10 latest analysis that were performed. To create more complex queries check the help:

$ analyze --help --history
Displays the last 10 entries with a one-line compact description.
The first number you see is the entry id, which you might use to select single entries.
To store the resulting configuration user --save-config output_configuration_file.

Arguments
full_text       If present shows the complete configuration stored

limit=n         Where n is the number of entries to be displayed 
                (don't set it or set it to < 0 to display all)

tool=name       Display only entries from tool "name" 
since=date      Retrieve entries older than date (see DATE FORMAT)
until=date      Retrieve entries newer than date (see DATE FORMAT)
entry_ids=ids   Select entries whose ids are in "ids" 
                (single number or comma separated list, e.g. entry_ids=1,2 or entry_ids=5)

DATE FORMAT
   Dates can be given in "YYYY-MM-DD HH:mm:ss.n" or any less accurate subset of it.
   These are all valid: "2012-02-01 10:08:32.1233431", "2012-02-01 10:08:32",
   "2012-02-01 10:08", "2012-02-01 10", "2012-02-01", "2012-02", "2012".

   These are *NOT*: "01/01/2010", "10:34", "2012-20-01" 

   Missing values are assumed to be the minimal allowed value. For example:
   "2012" == "2012-01-01 00:00:00.0" 

   Please note that in the shell you need to escape spaces. 
   All these are valid examples (at least for the bash shell):    
   analyze --history since=2012-10-1\ 10:35
   analyze --history since=2012-10-1" "10:35
   analyze --history "since=2012-10-1 10:35" 
   analyze --history 'since=2012-10-1 10:35'

You can view the configuration used at nay time and the satatus of the created files (i.e. if the files are still there or has been modified)

$ analyze --history tool=pca limit=1 full_text
26) pca v3.1.0 [2013-01-14 10:51:26.244553] 
Configuration:
     areaweight=false
          boots=100
      bootstrap=false
    eigvalscale=false
           eofs=-1
          input=test.nc
        latname=lat
   missingvalue=1e+38
      normalize=false
      outputdir=/home/user/evaluation_system/output/pca
        pcafile=test.nc.pca.tas.nc
     principals=true
     projection=test.nc.pro.tas.nc
        session=1
      shiftlats=false
     testorthog=false
        threads=7
       variable=tas
Output:
  /home/user/evaluation_system/output/pca/test.nc.pca.tas.nc (deleted)

You cannot directly run a tool from a configuration stored in the history, but you can store it in a file and use it

$ analyze --history tool=pca limit=1 store_file=old.cfg
Configuration stored in old.cfg
$ analyze --tool pca --config-file old.cfg
[...]
$ analyze --history tool=pca limit=1 full_text
28) pca v3.1.0 [2013-01-14 10:58:00.516257] 
Configuration:
     areaweight=false
          boots=100
      bootstrap=false
    eigvalscale=false
           eofs=-1
          input=test.nc
        latname=lat
   missingvalue=1e+38
      normalize=false
      outputdir=/home/user/evaluation_system/output/pca
        pcafile=test.nc.pca.tas.nc
     principals=true
     projection=test.nc.pro.tas.nc
        session=1
      shiftlats=false
     testorthog=false
        threads=7
       variable=tas
Output:
  /home/user/evaluation_system/output/pca/test.nc.pca.tas.nc (available)

The history offers a more direct way to re-run tools. The option return_command shows the analyze command belonging to the configuration. Here an example for the tool movieplotter:

analyze --history tool=movieplotter limit=1 return_command

It returns:
/miklip/home/zmaw/u290038/git/evaluation_system/bin/analyze --tool movieplotter latlon='None' polar='None' work='/home/zmaw/u290038/evaluation_system/cache/movieplotter/1387364295586' reverse='False' range_min='None' collage='False' range_max='None' earthball='False' level='0' ntasks='24' input=''/miklip/integration/data4miklip/projectdata/DroughtClip/output/MPI-M/MPI-ESM-LR/dec08o1914/mon/atmos/tas/r1i1p1/tas_Amon_MPI-ESM-LR_dec08o1914_r1i1p1_191501-192412.nc'' loops='0' colortable='ncl_default' animate='True' cacheclear='True' resolution='800' outputdir='/home/zmaw/u290038/evaluation_system/output/movieplotter' secperpic='1.0'

This is not an handy expression, but very useful. A re-run of the tool in batch shell could be easily performed by
$(analyze --history tool=movieplotter limit=1 return_command)

Scheduling

Instead of running your job directly in the terminal, you can involve the SLURM scheduler.

To run the tool murcss analyzing the variable tas the command is

$ analyze --tool murcss variable=tas

The execution takes a certain time (here: roughly 1 minute) and prints
Searching Files
Remapping Files
Calculating ensemble mean
Calculating crossvalidated mean
Calculating Anomalies
Analyzing year 2 to 9
Analyzing year 1 to 1
Analyzing year 2 to 5
Analyzing year 6 to 9
Finished.
Calculation took 63.4807469845 seconds

To schedule the same task you would use

$ analyze --batchmode true --tool murcss variable=tas

instead. The output changes to
Scheduled job with history id 414
You can view the job's status with the command squeue
Your job's progress will be shown with the command
tail -f  /home/zmaw/u290038/evaluation_system/slurm/murcss/slurm-1437.out

The last line shows you the command to view the output, which is created by the tool.
In this example you would type

$ tail -f  /home/zmaw/u290038/evaluation_system/slurm/murcss/slurm-1437.out

For jobs with a long run-time or large amounts of jobs you schould consider
to schedule them.

--help

$ analyze --help
analyze [opt] query 
opt:
  -h, --help           : displays this help or that of the given context.
  --history            : provides access to the configuration history (use also --help for more help).
  --tool <value>       : defines which tool should be used (use also --help for more help).
  --use-defaults       : skips user configuration and use system defaults.
  --list-tools         : lists all available tools.
  --config-file <value>: uses the given configuration file.
  --save-config <value>: saves the configuration at the given file path.
  --save               : saves the configuration locally for this user.
  --show-config        : shows the resulting configuration (implies dry-run).
  -d, --debug          : turn on debugging info and show stack trace on exceptions.
  -n, --dry-run        : dry-run, perform no computation. This is used for viewing and handling the configuration.

Applies some analysis to the given data.
See https://code.zmaw.de/projects/miklip-d-integration/wiki/Analyze for more information.

The "query" part is a key=value list used for configuring the tool. It's tool dependent so check that tool help.

For Example:
    analyze --tool pca eofs=4 bias=False input=myfile.nc outputdir=/tmp/test