Project

General

Profile

Pack'em ESMs! Tools for (un-)archiving Earth system model data

Updated almost 2 years ago by Karl-Hermann Wieners
Information on this page refers to packems-2.0.2 🔒 Previous versions:

Overview

Pack'em ESMs! (packems) contains a set of tools for packing, archiving, listing, and retrieval of earth system model and other data.

packems takes directories to be packed into tar files, and optionally pushes these directly to the tape archiving system. Features are parallel operation, batch job, and error recovery. It keeps track of the user's archiving operations to re-use this information for later retrieval of data. Currently only StrongLink's slk interface is supported for archiving. packems_wrapper is provided for use with sbatch

listems takes the information recorded by packems and allows to survey and examine the archived data without having to unarchiving or unpack large data files. It features UNIX and Regular Expression style search patterns for file selection and also allows to include user defined storage index information

unpackems uses the listems interface to unarchive and unpack the selected data. The retrieval process allows for transforming the original directory structure as needed by the user, eg using a different base directory, renaming parts of the path name or flattening out the tar file contents into a single directory

tapeinit allows to easily renew and create the StrongLink login tokens needed for the tape achive

Preparation

Tasks to do prior to each usage

Make sure the module is loaded

The tools are now available as a standard module. To load them, enter

module add packems

or, to request a specific version,

module add packems/2.0.2

Check StrongLink login token

StrongLink authentication works with so-called login tokens, which currently expire after one month. To check on your token, simply enter

tapeinit

This will do nothing, if your token is valid, and otherwise ask for your DKRZ user and password to create a new token.

Quick Tour (examples taken from MPI-ESM)

Packems

Basic use

  1. I recommend to start the job in the experiment directory on work and to specify the directories or files to be archived relative to it. In doing so, the archives will also contain relative path names and can easily be unpacked to other locations later on.
    cd /work/xy1234/m123456/mpiesm/experiments/abc1234
    
  2. -j JOBS specifies how many jobs are started in parallel. Useful in combination with SLURM, e.g. as
    sbatch -A xy1234 -p prepost,compute,compute2 --exclusive --mail-type=FAIL packems_wrapper -j 12 ...
    
  3. Because our example directory ends in abc1234, the archives are named abc1234_001.tar, abc1234_002.tar, and so on. With -o OUTPUT you may specify a different base name for them
    ... -o abc1234_outdata_echam ...
    
  4. As our example directory is under /work, the .tar files will be archived in under the same directory structure on tape, ie. xy1234/m123456/mpiesm/experiments/abc1234.
    Use -S to change the directory where the files are stored
    ... -S xy1234/common_data/abc1234 ...
    
  5. -p deletes the packed archives automatically after archiving. Otherwise you can manually delete the files in packing afterwards.
    ... -p ...
    
    (!) A similar option, -P, which also deletes the original files, should be only used with extreme caution! Re-starting after abnormal program termination needs proper knowledge of -M and -A option behavior to avoid data loss. We usually recommend to remove the original files in a second step
  6. Finally, define the directory which you would like to archive.
    ... outdata/echam6
    
  7. You can add more than one input directory, but you can set the output only once:
    ... -o abc1234_outdata outdata/echam6 outdata/jsbach outdata/mpiom outdata/hamocc
    
  8. For restart files, the temporal relationship is very important. Therefore, you sort them by timestamp:
    ... -O by_time restart
    

Restart after premature termination

When packems was stopped due to temporary conditions, eg. batch job time-out, it is usually safe to just restart with the same options. packems will pick-up any prepared .tar packs and remove partially transferred packs from the tape archive before continuing.

Lock time-out

If running more than one packems process in parallel using the same setting for -S/--archive-subdir, packems uses a lock directory on tape to synchronize index file access. If one process fails to properly remove this lock during termination, all subsequent processes using the same archive directory will time out. Use the --lock-break option to recover from this situation.

Caution when using -P/--purge

When using the -P/--purge option to remove original files as soon as the corresponding pack is ready, there is no easy way to restart. You may try a cleanup run by adding the -M/--make-only option to your command. This will try to continue the process without re-processing the input files. There will be errors due to the input files that were removed in the first run, but packems will continue to process all files and packs that still exist. To avoid data loss, check carefully if there are any other errors during the cleanup.

Other useful options

  • -d/--destination determines the directory in which the .tar files are packed (by default packems/OUTPUT under your /scratch directory). For large pack sizes or many parallel jobs, the 15 TB quota on /scratch might not be appropriate.
  • -L/--dereference will pack the files that are referenced by symbolic links. This should be used if the referenced files are not stored in the same or another packems call. By default, packems will just store the links themselves but not the file data to avoid duplicate copies.

To get an overview of all options, run

packems --help

Listems

Input and filtering

  1. -i INDEX_FILE specifies an INDEX file located on the local file system (see below for files on HPSS); only files provided by -i are ingested if -l is not specified; please note subsection Scope in which files/expressions are searched/evaluated below
    ... -i /home/user/my_index_files/INDEX_file.txt ...
    
  2. -i t:INDEX_FILE specifies an INDEX file located on HPSS by prepending t: to the path; please note subsection Scope in which files/expressions are searched/evaluated below
    ... -i t:/hpss/arch/bm0146/k204221/INDEX_file.txt ...
    
  3. -i INDEX_FILE_LIST specifies a list of INDEX files (separated by ;)
    ... -i t:/hpss/arch/bm0146/k204221/INDEX_file_1.txt;t:/hpss/arch/bm0146/k204221/INDEX_file_2.txt ...
    
  4. -i INDEX_FILE_WILDCARD specifies INDEX files via wildcard; Note: enclose by ' to prevent automatic evaluation by the shell
    ... -i 't:/hpss/arch/bm0146/k204221/INDEX_file_*.txt' ...
    
  5. -i r:INDEX_FILE_REGULAR_EXPRESSION specifies INDEX files via regular expression; prepend r:; the order of r: and t: has no effect; Note: enclose by ' to prevent automatic evaluation by the shell
    ... -i 'r:t:/hpss/arch/bm0146/k204221/INDEX_file_[0-9].txt' ...
    ... -i 't:r:/hpss/arch/bm0146/k204221/INDEX_file_[0-9].txt' ...
    
  6. -i INDEX_FILE_MIXED_EXPRESSION specifies INDEX files via a mixture of wildcard and regular expression; Note: enclose by ' to prevent automatic evaluation by the shell
    ... -i 't:/hpss/arch/bm0146/k204221/other/INDEX_file.txt;t:/hpss/arch/bm0146/k204221/icon/INDEX_file_*.txt;r:t:/hpss/arch/bm0146/k204221/data_c/INDEX_file_[0-9].txt' ...
    
  7. -s LIST_SEPARATOR -i INDEX_FILE_LIST specifies a list of INDEX files and manually sets the list-separator
    ... -s ',' -i t:/hpss/arch/bm0146/k204221/INDEX_file_1.txt,t:/hpss/arch/bm0146/k204221/INDEX_file_2.txt ...
    
  8. -l -i INDEX_FILE specifies an INDEX file located on the local file system in addition to the default INDEX files; please note subsection Scope in which files/expressions are searched/evaluated below
    ... -l -i /home/user/my_index_files/INDEX_file.txt ...
    ... -i /home/user/my_index_files/INDEX_file.txt -l ...
    ... -l '' -i /home/user/my_index_files/INDEX_file.txt ...
    ... -i /home/user/my_index_files/INDEX_file.txt -l '' ...
    
  9. -l INDEX_LIST_FILE specifies a file that contains the path(s) to INDEX file(s); only INDEX files listed in INDEX_LIST_FILE are ingested if no additional -l is set (seen next example); same wildcards and regular expressions as for -i can be used in the INDEX-file-list file and for INDEX_LIST_FILE
    ... -l my_dummy_index_list.txt ...
    
  10. -l -l INDEX_LIST_FILE specifies a file that contains the path(s) to INDEX file(s) in addition to the default INDEX files; same wildcards and regular expressions as for -i can be used in the INDEX-file-list file and for INDEX_LIST_FILE; alternatively to a pure -l the user can provide an empty string to -l
    ... -l -l my_dummy_index_list.txt ...
    ... -l my_dummy_index_list.txt -l ...
    ... -l '' -l my_dummy_index_list.txt ...
    ... -l my_dummy_index_list.txt -l '' ...
    ... -l ';my_dummy_index_list.txt' ...
    ... -l 'my_dummy_index_list.txt;' ...
    
  11. -a TAR_ARCHIVE select one(/more) tar archive(s) exclusively; only tar archives, which are listed in the provided INDEX file(s), are included; no new tar archives can be added; same usage of wildcards and regular expressions as for -i
    ... -a search_only_this_archive.tar ...
    
  12. -x EXCLUDE_TAR_ARCHIVE exclude one(/more) tar archive(s) that are listed in the provided INDEX files; no new tar archives can be added; same usage of wildcards and regular expressions as for -i
    ... -x contains_bad_data.tar ...
    
  13. Note: unpackems also has a -A/--archive-file option to retrieve and extract not-indexed files

Output formatting

  1. --long print detailed/long output (more columns)
    ... --long ...
    
  2. -t OUTPUT_FORMAT choose an output format: txt/text (default), csv, json or html
    ... -t json ...
    
  3. -o OUTPUT_FILE writes default output into a file
    ... -o output.txt ...
    
  4. -t OUTPUT_FORMAT -o OUTPUT_FILE writes output into a file of specific format; file extensions are not automatically recognized
    ... -t json -o output.json ...
    

storing local copies of INDEX files

By default, the INDEX files are copied/retrieved to a local temporary directory and are deleted after they have been read by listems. A prefix t: in the beginning of the INDEX file path indicates that the files should be retrieved from HPSS. Omitting the t: prefix indicates that the files should be copied from the local file system.

  1. -N dry run; stop after a list of INDEX files is created; don't download INDEX files from HPSS; meant to check whether input from -i and -l was properly processed
    ... -N ...
    
  2. -w TMP_DIRECTORY user-provided working directory to retrieve the INDEX files into; the directory will be created if it does not exist; the INDEX files will be kept if purge is not set (-p)
    ... -w store/index/files/here/ ...
    
  3. -p -w TMP_DIRECTORY remove INDEX files from user-provided directory after they were imported into listems
    ... -p -w store/index/files/here/ ...
    
  4. --use-old-index-files-n-seconds TIME take existing INDEX files when they are available in TMP_DIRECTORY and younger than TIME (seconds); default is 3600 seconds = 1 hour;
    ... --use-old-index-files-n-seconds 7200 -w store/index/files/here/ ...
    

Other options/arguments

  1. -v activate verbose output; the program prints information on ingested INDEX files and processing of the index; might help to diagnose bad input
    ... -v ...
    

Unpackems

Overview of the Options

Please see section on packems (above) for options -b, -F, -n, -N and -j.

Please see section on listems (above) for options -l, -i, -a, -x, -s, -v, and --use-old-index-files-n-seconds.

Information these flags provided here: -A, -d, -D, --flatten, -f, -o, -p, -K, -O, -q, -w, --no-untar

Note: The option -A does not allow the usage of -i, -l, -a, -x and files. Either files listed in INDEX files will be retrieved (-i, -l, -a, -x and files) or files not listed in INDEX files (-A). --force is ignored when -A is set because all files specified by -A will be automatically retrieved.

Setting working and destination directories; modifying extraction paths

  1. -d DESTINATION_DIR: destination directory into which the files should be extracted from the tar files
    ... -d /work/bm0146/k204221/model_results ...
    
  2. -D REPLACE_DESTINATION_DIR: replace the first n folders of the archived files by the folder provided by -D; n is the number of folders provided (4 in the example below); see Rules to construct/modifiy the output path of extracted files below for details
    ... -D new/folder/for/data ...
    
  3. -d DESTINATION_DIR -D REPLACE_DESTINATION_DIR: extract files (from tar archives) into /work/bm0146/k204221/model_results/new_folder and drop first directory of each archived file
    ... -d /work/bm0146/k204221/model_results -D new_folder ...
    
  4. -d DESTINATION_DIR -D REPLACE_DESTINATION_DIR: extract files (from tar archives) into /work/bm0146/k204221/model_results/new_folder/subfolder and drop first two directories of each archived file
    ... -d /work/bm0146/k204221/model_results -D new_folder/subfolder ...
    
  5. --flatten flatten/remove directory tree of files stored in tar archives
    ... --flatten ...
    
  6. -w WORK_DIR: specifiy a working directory into which the INDEX and tar files are retrieved.
    ... -w /scratch/k/k204221 ...
    

Selecting files for retrieval

  1. -A FILE_PATH: retrieve the file(s) specified by -A. If these are tar balls, they will be un-tar-ed except if --retrieve-only is provided. Non-tar files will be retrieved into the working/tmp directory and then moved into the destination directory. INDEX files and tar balls listed within them will not be considered.
    ... -A 't:/hpss/arch/my_project/my_user/data/example_01.nc' ...
    

Un-taring, keeping and overwriting files

  1. --retrieve-only: only retrieve and do not un-tar the files; the retrieved files are left in the working/tmp directory
    ... --retrieve-only ...
    
  2. no -K or -O: error by unpackems if file does already exist or if file would be extracted twice (or more) to same location; this is not fail-save, e.g. during parallel extractions
  3. -K: keep existing files during extraction; warn if keeping is expected
    ... -K ...
    
  4. -O: overwrite existing files during extraction; warn if overwriting is expected
    ... -O ...
    
  5. -q: suppress warnings thrown by -K and -O
    ... -q ...
    
  6. --no-untar: just retrieve files from HPSS but do not un-tar them
    ... --no-untar ...
    
  7. If non-tar files are listed in INDEX files (and selected for retrieval) they are retrieved but no attempt is done to unpack them.

Cleanup after extraction

  1. -p: purge INDEX files from working directory provided by -w
    ... -p ...
    

NOTE: tar archives are not automatically removed after retrieval and successful extraction when -p is not set.

Other options/arguments

  1. -f/--force: force to extract all available files even if it is a hugh amount of data; -f is not needed when: (a) a file for retrieval is specified via -A or (b) and list of files for extraction is provided to the call of unpackems. If you are not sure whether you need it or not: don't use -f and unpackems will tell you when it needs -f to be set. If -f was not existing, all files listed in available INDEX files would be retrieved when packems was called without any arguments. You don't want that in most situations.
    ... -f ...
    
  2. -o NAME_MAKEFILE: set name of the Makefile to create
    ... -o example_makefile_name ...
    

Detailed explanations on some topics

Regular Expressions and shell wildcards (listems and unpackems)

General Notes

  • enclose the expression with '; e.g. 'data/results/*.nc', 'file_?.nc'
  • regular expressions are indicated by leading r:; e.g. 'r:file_[0-9].nc'
  • wildcards have to match the whole path; e.g. 'file_?.nc' will not match data/results/file_1.nc; but, '*/file_?.nc' will match data/results/file_1.nc
  • for -a, -x and names/expression without preceeding flag: if a regular expression should only match the whole path, we need to add line beginning (^) and line end characters ($); e.g.: '^file_[0-9].nc$' will not match data/results/file_1.nc;

RegEx lookup on local file system:

  • If relative path is given: resolve from current working directory
  • If absolute path is given:
    • we look whether the first three folders exist (not via regex matching but as fixed expression; e.g. /first/second/third); if they don't exist, we stop evaluation of regex
    • reason: users should be prevented to do [a-zA-Z_/]*/my_file.txt causing lot of traffic on the file system / metadata server

RegEx and Wildcard lookup on HPSS:

  • currently deactivated
  • implemented but not activated:
    • we look whether the first three folders exist (not via regex matching but as fixed expression; e.g. /first/second/third); if they don't exist, we stop evaluation of regex
    • reason: users should be prevented to do [a-zA-Z_/]*/my_file.txt or *.txt causing lot of traffic on the file system / metadata server

Scope in which files/expressions are searched/evaluated (listems and unpackems)

  • -l look locally
  • -i:
    • no prefix (or l:): look locally
    • t: prefix: look at HPSS
  • -a and -x search for tar archives in last column of the provided INDEX files (resp.: in the content of the column)
  • arguments without flag (files/expressions attached to the call): files/expressions are looked up in the files listed in the provided INDEX files; the available list has been filtered by -a and -x previously

Available prefixes for file names/expressions (listems and unpackems)

  • r:: evaluated following expression as regular expression
  • t:: expect file to be located on HPSS
  • l:: expect file to be located on local file system (optional; will be ignored; same as omitting t:)

Rules to construct/modifiy the output path of extracted files (unpackems)

files stored in tar archive file:

data/mask.nc
data/forcing/sst.nc
data/forcing/emis.nc
data/output/wind.nc

extraction with -d /work/bm0146/k204221:

/work/bm0146/k204221/data/mask.nc
/work/bm0146/k204221/data/forcing/sst.nc
/work/bm0146/k204221/data/forcing/emis.nc
/work/bm0146/k204221/data/output/wind.nc

extraction with -d ./old_data:

./old_data/data/mask.nc
./old_data/data/forcing/sst.nc
./old_data/data/forcing/emis.nc
./old_data/data/output/wind.nc

extraction with --flatten:

./mask.nc
./sst.nc
./emis.nc
./wind.nc

extraction with -d old_data --flatten:

./old_data/mask.nc
./old_data/sst.nc
./old_data/emis.nc
./old_data/wind.nc

extraction with -D new_dir:

./new_dir/mask.nc
./new_dir/forcing/sst.nc
./new_dir/forcing/emis.nc
./new_dir/output/wind.nc

extraction with -D new_dir/second_dir:

./new_dir/second_dir/mask.nc
./new_dir/second_dir/sst.nc
./new_dir/second_dir/emis.nc
./new_dir/second_dir/wind.nc

extraction with -d old_data -D new_dir:

./old_data/new_dir/mask.nc
./old_data/new_dir/forcing/sst.nc
./old_data/new_dir/forcing/emis.nc
./old_data/new_dir/output/wind.nc

Detailed Examples

Please download this zip archive and extract it in your testing/training directory to run the example commands below (those aimed on the local file system).

Listems

Locally stored INDEX files

# read index files from file lists provided by @-l@ but don't use default INDEX list files; local index files
listems -v -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt'

# same as above but include default INDEX list file
listems -v -l -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt'
listems -v -l '' -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt'
listems -v -l ';examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt'

# as two above but other separator for file lists
listems -v -s ',' -l 'examples/index_file_lists/commented_file_list.txt,examples/index_file_lists/empty_line_file_list.txt,examples/index_file_lists/plain_file_list.txt'

# print as text table
listems -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt' data/ocean_day3d_t_pocp_emep_2012.nc

# print extended output
listems --long -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt' 'data/ocean_day3d_t_pocp_emep_201?.nc'

# more file to look for
listems -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc

# wildcard in -l
listems -l "examples/index_file_lists/*.txt" data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc

# wildcard in -l and simple file in -x
listems -l "examples/index_file_lists/*.txt" -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc 

# regex for -l
listems -v -l "r:examples/index_file_lists/[a-zA-Z-_]*.txt" -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc 

# use several files in -a
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_001.tar;iow_data_004.tar;iow_data4_002.tar;iow_data_006.tar' -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc 

# use wildcard in -a
listems -l "examples/index_file_lists/*.txt" -a '*4.tar' -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc

# use regex in -a
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_00[0-9].tar' -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc

# use wildcard in -x
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_00[0-9].tar' -x '*6.tar' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc 

# use wildcards in files to search
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_00[0-9].tar;abc.tar' -x '*6.tar' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc *.nc

# `warnow_river...` was not found in the calls before because it is in a subfolder; wie solve it here:
listems -l "examples/index_file_lists/*.txt" data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*/warnow_river_phoswam_v04_ist.nc'

# use regular expressions in files to search
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_00[0-9].tar;abc.tar' -x '*6.tar' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc r:data/ocean_day3d_u_emep_20[0-9][0-9].nc

# some other call ...
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_001.tar;iow_data_004.tar;iow_data4_002.tar;iow_data_006.tar' -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc

# print as json; with verbose flag
listems -v -t json -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc *warnow_river_phoswam_v04_ist.nc

# look via regex for files to retrieve and print as json
listems -t json -l 'examples/index_file_lists/*.txt' 'r:.*warnow_river_phoswam_v0[0-9]_[a-zA-Z0-9]+.nc'

# some verbose output
listems -v -l 'examples/index_file_lists/*.txt' 'data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc *warnow_river_phoswam_v04_ist.nc'

# write output into file `output_listems.txt`
listems -o output_listems.txt -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*warnow_river_phoswam_v04_ist.nc'

# write out into html file; if omit `-o` we get it printed to the command line
listems -t html -o output_listems.html -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*warnow_river_phoswam_v04_ist.nc'

# create download directory
listems -v -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_*.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc -t json -w tmp

# create download directory and remove/purge
listems -v -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_*.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc -t json -w tmp -p

INDEX files on HPSS

# call for searching in hpss:
listems -v -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt '*day3d_area_t_phoswam_v04_15_1995.nc'

# some extended
listems -v -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt '*day3d_area_t_phoswam_v04_15_1995.nc' '*warnow_river*'

Unpackems

Locally stored INDEX files, dry runs for testing

see examples for listems for more ideas

# an example call
unpackems -N -l examples/index_file_lists/*.txt data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc

# provide a name to the make file
unpackems -N -l examples/index_file_lists/*.txt data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc -o test_makefile

# select some files to extract
unpackems -N -l "examples/index_file_lists/*.txt" -a 'iow_data_001.tar;iow_data_004.tar;iow_data4_002.tar;iow_data_006.tar' -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc 

# retrieve and unpack a specific tar ball that is not listed in our INDEX files
unpackems -N -A 't:/hpss/arch/bm0146/k204221/iow/some_tar_001.tar'

# retrieve (no unpack) a specific tar ball that is not listed in our INDEX files
unpackems -N -A 't:/hpss/arch/bm0146/k204221/iow/some_tar_001.tar' --retrieve-only

# retrieve and unpack everything that is listed in our INDEX files; **Be very careful with this option. Better check via @listems@ first, how many files would be retrieved**
unpackems -f

# retrieve all tar files that contain files like @*day3d_area_t_phoswam_v04_15_1995.nc@ and are listed in @t:/hpss/arch/bm0146/k204221/iow/INDEX.txt@
unpackems -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt '*day3d_area_t_phoswam_v04_15_1995.nc'

# will not work because no @files@ are specified; need @--force@/@-f@
unpackems -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt

# retrieves all files listed in @t:/hpss/arch/bm0146/k204221/iow/INDEX.txt@ and extracts them
unpackems -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt --force
unpackems -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt -f

create some problematic files, first, and then call unpackems:

# create some files that should be extracted
mkdir abc/def/ghi/ -p
touch abc/def/ghi/warnow_river_phoswam_v04_ist.nc

# should throw error
unpackems -N -v -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*/warnow_river_phoswam_v04_ist.nc' -D abc/def/ghi

# should throw warning
unpackems -N -v -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*/warnow_river_phoswam_v04_ist.nc' -D abc/def/ghi -K

# will overwrite files and be quite with respect to this
unpackems -N -v -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*/warnow_river_phoswam_v04_ist.nc' -D abc/def/ghi -O -q