Analyze¶
- Table of contents
- Analyze
This is the main access to all installed analysis tools and the history. The tools are implemented by providing plug-ins to the system. For more information on how to create a plugin check the Developing a Plugin guides. An Overview of installed tools you get in the BUG.
Basic Usage¶
To get the help:
$ analyze --help analyze [opt] query opt: [...]
To list all available analysis tools:
$ analyze --list-tools PCA: Principal Component Analysis
The Overview of tools installed.
To select a particular tool:
$ analyze --tool pca Missing required configuration for: input, variable
You see here that the PCA tool is complaining because of an incomplete configuration.
To get the help of a particular tool:
$ analyze --tool pca --help PCA (v3.1.0): Principal Component Analysis Options: areaweight (default: False) Whether or not you want to have your data area weighted. This is done per latitude with sqrt(cos(latitude)). boots (default: 100) Number of bootstraps. [...] input (default: None) [mandatory] An arbitrary NetCDF file. There are only two restrictions to your NetCDF file: a) Time has to be the very first dimension in the variable you like to analyze. b) All dimensions in your variable need to be defined as variables themselves with equal names. Both, a) and b), are usually true. [...]
Here you see the configuration parameter, its default value (None means there is no value setup), whether the configuration is mandatory (
[mandatory]
marking by the default value) and an explanation about the configuration parameter.
To pass the values to the tool you just need to use the key=value
construct like this:
$ analyze --tool pca input=myfile.nc outputdir=/tmp eofs=3 [...]
You may even define variables in terms of other variables like the projection name above. While doing so from the shell please remember you need to escape the $
sign by using the backslash (\
) or setting the value in single quotes (no, double quotes don't work). For example:
$ analyze --tool pca input=myfile_\${eofs}.nc outputdir=/tmp eofs=3 #or $ analyze --tool pca 'input=myfile_${eofs}.nc' outputdir=/tmp eofs=3
If you want to know more about this bash feature see this and if you want to want to know much more then take a look at this
Quoting is very important on any shell, so if you use them, be sure to know how it works. It may help you avoid losing data!
Configuring the tools¶
You may want to save the configuration of the tool:
$ analyze --save-config --tool pca variable=tas input=myfile.nc outputdir=/tmp eofs=3 INFO:__main__:Configuration file saved in /home/<user_account>/evaluation_system/config/pca/pca.conf
Note this starts the tool. To just save the configuration without starting the tool use the
-n
or --dry-run
flag.Also note this stores the configuration in a special directory structure so the system can find it again.
You can save the configuration somewhere else:
$ analyze --save-config --config-file myconfig --dry-run --tool pca variable=tas input=myfile.nc outputdir=/tmp eofs=3 INFO:__main__:Configuration file saved in myconfigThe configuration stored will be used to overwrite the default one. This is a possible usecase:
- Change the defaults to suit your general needs:
$ analyze --save-config --dry-run --tool pca outputdir=/my_output_dir shiftlats=false
- Prepare some configurations you'll be using recurrently
$ analyze --save-config --dry-run --config-file pca.tas.conf --tool pca variable=tas $ analyze --save-config --dry-run --config-file pca.uas.conf --tool pca variable=uas
- Use the configurations to your needs
$ analyze --config-file pca.tas.conf --tool pca input=my_tas_file [...] $ analyze --config-file pca.uas.conf --tool pca input=my_uas_file [...]
You may also edit the configuration file manually. This is how it looks like:
[PCA] #: Filename of the projection (back-transformation). If SESSION=1 this is not #: applicable, if SESSION=2 this is output. projection=$input.pro.$variable.nc #: [mandatory] The name of the variable in the NetCDF INPUT file you want to #: analyze. variable=<THIS MUST BE DEFINED!> #: Number of bootstraps. boots=100 [...]
Comments start with the #
character and are ignored. You may comment out variables so they are not considered (the defaults will be used instead).
Undefined optional variables without a default value are automatically commented out.
Undefined mandatory variables without a default value are marked with a < THIS MUST BE DEFINED! >
message.
You may also skip the configuration and load the default one:
$ analyze --tool pca --use-defaults
And you can just view the resulting configuration:
$ analyze --tool pca --show-config areaweight: - (default: False) boots: 100 bootstrap: - (default: False) eigvalscale: - (default: False) eofs: 3 input: myfile.nc latname: lat missingvalue: 1e+38 normalize: - (default: False) outputdir: /tmp pcafile: $input.pca.$variable.nc principals: True projection: $input.pro.$variable.nc session: 1 shiftlats: - (default: False) testorthog: - (default: False) threads: 7 variable: tas $ analyze --tool pca --show-config --use-defaults areaweight: - (default: False) boots: - (default: 100) bootstrap: - (default: False) eigvalscale: - (default: False) eofs: - (default: -1) input: - *MUST BE DEFINED!* latname: - (default: lat) missingvalue: - (default: 1e+38) normalize: - (default: False) outputdir: - (default: $USER_OUTPUT_DIR) pcafile: - (default: $input.pca.$variable.nc) principals: - (default: True) projection: - (default: $input.pro.$variable.nc) session: - (default: 1) shiftlats: - (default: False) testorthog: - (default: False) threads: - (default: 8) variable: - *MUST BE DEFINED!*
History¶
To get the history:
$ analyze --history 24) pca [2013-01-14 10:46:44.575529] <THIS MUST BE DEFINED!>.pca.<THIS MUST BE DEFINED!>.nc {u'normalize... 23) pca [2013-01-14 10:46:01.322760] None.pca.None.nc {u'normalize': u'true', u'testorthog': u'true', u'... 22) nclplot [2013-01-11 14:51:40.910996] first_plot.eps {u'plot_name': u'first_plot', u'file_path': u'tas_Am... 21) nclplot [2013-01-11 14:44:15.297102] first_plot.eps {u'plot_name': u'first_plot', u'file_path': u'tas_Am... 20) nclplot [2013-01-11 14:43:37.748200] first_plot.eps {u'plot_name': u'first_plot', u'file_path': u'tas_Am... [...]
It shows just the 10 latest entries, i.e. the 10 latest analysis that were performed. To create more complex queries check the help:
$ analyze --help --history Displays the last 10 entries with a one-line compact description. The first number you see is the entry id, which you might use to select single entries. To store the resulting configuration user --save-config output_configuration_file. Arguments full_text If present shows the complete configuration stored limit=n Where n is the number of entries to be displayed (don't set it or set it to < 0 to display all) tool=name Display only entries from tool "name" since=date Retrieve entries older than date (see DATE FORMAT) until=date Retrieve entries newer than date (see DATE FORMAT) entry_ids=ids Select entries whose ids are in "ids" (single number or comma separated list, e.g. entry_ids=1,2 or entry_ids=5) DATE FORMAT Dates can be given in "YYYY-MM-DD HH:mm:ss.n" or any less accurate subset of it. These are all valid: "2012-02-01 10:08:32.1233431", "2012-02-01 10:08:32", "2012-02-01 10:08", "2012-02-01 10", "2012-02-01", "2012-02", "2012". These are *NOT*: "01/01/2010", "10:34", "2012-20-01" Missing values are assumed to be the minimal allowed value. For example: "2012" == "2012-01-01 00:00:00.0" Please note that in the shell you need to escape spaces. All these are valid examples (at least for the bash shell): analyze --history since=2012-10-1\ 10:35 analyze --history since=2012-10-1" "10:35 analyze --history "since=2012-10-1 10:35" analyze --history 'since=2012-10-1 10:35'
You can view the configuration used at nay time and the satatus of the created files (i.e. if the files are still there or has been modified)
$ analyze --history tool=pca limit=1 full_text 26) pca v3.1.0 [2013-01-14 10:51:26.244553] Configuration: areaweight=false boots=100 bootstrap=false eigvalscale=false eofs=-1 input=test.nc latname=lat missingvalue=1e+38 normalize=false outputdir=/home/user/evaluation_system/output/pca pcafile=test.nc.pca.tas.nc principals=true projection=test.nc.pro.tas.nc session=1 shiftlats=false testorthog=false threads=7 variable=tas Output: /home/user/evaluation_system/output/pca/test.nc.pca.tas.nc (deleted)
You cannot directly run a tool from a configuration stored in the history, but you can store it in a file and use it
$ analyze --history tool=pca limit=1 store_file=old.cfg Configuration stored in old.cfg $ analyze --tool pca --config-file old.cfg [...] $ analyze --history tool=pca limit=1 full_text 28) pca v3.1.0 [2013-01-14 10:58:00.516257] Configuration: areaweight=false boots=100 bootstrap=false eigvalscale=false eofs=-1 input=test.nc latname=lat missingvalue=1e+38 normalize=false outputdir=/home/user/evaluation_system/output/pca pcafile=test.nc.pca.tas.nc principals=true projection=test.nc.pro.tas.nc session=1 shiftlats=false testorthog=false threads=7 variable=tas Output: /home/user/evaluation_system/output/pca/test.nc.pca.tas.nc (available)
The history offers a more direct way to re-run tools. The option return_command shows the analyze command belonging to the configuration. Here an example for the tool movieplotter:
analyze --history tool=movieplotter limit=1 return_command
It returns:
/miklip/home/zmaw/u290038/git/evaluation_system/bin/analyze --tool movieplotter latlon='None' polar='None' work='/home/zmaw/u290038/evaluation_system/cache/movieplotter/1387364295586' reverse='False' range_min='None' collage='False' range_max='None' earthball='False' level='0' ntasks='24' input=''/miklip/integration/data4miklip/projectdata/DroughtClip/output/MPI-M/MPI-ESM-LR/dec08o1914/mon/atmos/tas/r1i1p1/tas_Amon_MPI-ESM-LR_dec08o1914_r1i1p1_191501-192412.nc'' loops='0' colortable='ncl_default' animate='True' cacheclear='True' resolution='800' outputdir='/home/zmaw/u290038/evaluation_system/output/movieplotter' secperpic='1.0'
This is not an handy expression, but very useful. A re-run of the tool in batch shell could be easily performed by
$(analyze --history tool=movieplotter limit=1 return_command)
Scheduling¶
Instead of running your job directly in the terminal, you can involve the SLURM scheduler.
To run the tool murcss analyzing the variable tas the command is
$ analyze --tool murcss variable=tas
The execution takes a certain time (here: roughly 1 minute) and prints
Searching Files Remapping Files Calculating ensemble mean Calculating crossvalidated mean Calculating Anomalies Analyzing year 2 to 9 Analyzing year 1 to 1 Analyzing year 2 to 5 Analyzing year 6 to 9 Finished. Calculation took 63.4807469845 seconds
To schedule the same task you would use
$ analyze --batchmode true --tool murcss variable=tas
instead. The output changes to
Scheduled job with history id 414 You can view the job's status with the command squeue Your job's progress will be shown with the command tail -f /home/zmaw/u290038/evaluation_system/slurm/murcss/slurm-1437.out
The last line shows you the command to view the output, which is created by the tool.
In this example you would type
$ tail -f /home/zmaw/u290038/evaluation_system/slurm/murcss/slurm-1437.out
For jobs with a long run-time or large amounts of jobs you schould consider
to schedule them.
--help¶
$ analyze --help analyze [opt] query opt: -h, --help : displays this help or that of the given context. --history : provides access to the configuration history (use also --help for more help). --tool <value> : defines which tool should be used (use also --help for more help). --use-defaults : skips user configuration and use system defaults. --list-tools : lists all available tools. --config-file <value>: uses the given configuration file. --save-config <value>: saves the configuration at the given file path. --save : saves the configuration locally for this user. --show-config : shows the resulting configuration (implies dry-run). -d, --debug : turn on debugging info and show stack trace on exceptions. -n, --dry-run : dry-run, perform no computation. This is used for viewing and handling the configuration. Applies some analysis to the given data. See https://code.zmaw.de/projects/miklip-d-integration/wiki/Analyze for more information. The "query" part is a key=value list used for configuring the tool. It's tool dependent so check that tool help. For Example: analyze --tool pca eofs=4 bias=False input=myfile.nc outputdir=/tmp/test