Project

General

Profile

Pack'em ESMs! Tools for (un-)archiving Earth system model data

Updated 4 months ago by Karl-Hermann Wieners
Information on this page refers to packems-1.1.0 🔒
Older versions: packems-1.0.2 packems-1.0.1


New in this release

packems
  • archiving of packed data is now enabled by default
  • changed make process for better parallelization (archive packs asap)
  • archiving of pre-packed data/data with purged sources (-A, --archive-only)
  • option to set the target location for unpackems (-R, --restore-to)
  • allow use of absolute directories in archive (eg. /hpss/double)
unpackems
  • unarchiving of named tar files (-A, --archive-file)
  • option to suppress unpacking (--retrieve-only)
  • restoring of files to the original directory
listems
  • processing of non-tar files listed in index
  • recognition and handling of symbolic links in index files
  • enable index file caching

Overview

Pack'em ESMs! (packems) contains a set of tools for packing, archiving, listing, and retrieval of earth system model and other data.

packems takes directories to be packed into tar files, and optionally pushes these directly to the tape archiving system. Features are parallel operation, batch job, and error recovery. It keeps track of the user's archiving operations to re-use this information for later retrieval of data. Currently only HPSS's pftp interface is supported for archiving. packems_wrapper is provided for use with sbatch

listems takes the information recorded by packems and allows to survey and examine the archived data without having to unarchiving or unpack large data files. It features UNIX and Regular Expression style search patterns for file selection and also allows to include user defined storage index information

unpackems uses the listems interface to unarchive and unpack the selected data. The retrieval process allows for transforming the original directory structure as needed by the user, eg using a different base directory, renaming parts of the path name or flattening out the tar file contents into a single directory

tapeinit allows to easily renew and create the Kerberos tickets needed for the tape achive


Preparation

Tasks to do once prior to the first usage

Passwordless access to the tape archive (HPSS)

Before using the packems tools for the first time, you must register for DKRZ's Kerberos service if you haven't done so before.

Tasks to do prior to each usage

Make sure the module is loaded

The tools are now available as a standard module. To load them, enter

module add packems

Check Kerberos ticket

Kerberos authentication works with so-called tickets, which currently expire after one week. To check on the tickets, simply enter

tapeinit

This will renew the tickets if possible, and otherwise
ask your Kerberos password to create a new ticket.


Quick Tour (examples taken from MPI-ESM)

Packems

  1. I recommend to start the job in the experiment directory on work and to specify the directories or files to be archived relative to it. In doing so, the archives will also contain relative path names and can easily be unpacked to other locations later on.
    cd /work/xy1234/m123456/mpiesm/experiments/abc1234
    
  2. -j JOBS specifies how many jobs are started in parallel. Useful in combination with SLURM, e.g. as
    sbatch -A xy1234 -p prepost,compute,compute2 --exclusive --mail-type=FAIL packems_wrapper -j 12 ...
    
  3. Because our example directory ends in abc1234, the archives are named abc1234_001.tar, abc1234_002.tar, and so on. With -o OUTPUT you may specify a different base name for them
    ... -o abc1234_outdata_echam ...
    
  4. -d determines the directory in which the .tar files are packed (by default in the current directory). Important if you archive directories in which you do not have write permission
    ... -d packing ...
    
  5. As our example directory is under /work, the .tar files will be archived in under the same directory structure on tape, ie. xy1234/m123456/mpiesm/experiments/abc1234.
    Use -S to change the directory where the files are stored
    ... -S xy1234/common_data/abc1234 ...
    
  6. -p deletes the packed archives automatically after archiving. Otherwise you can manually delete the files in packing afterwards.
    ... -p ...
    
    (!) A similar option, -P, which also deletes the original files, should be only used with extreme caution! Re-starting after abnormal program termination needs proper knowledge of -M and -A option behavior to avoid data loss. We usually recommend to remove the original files in a second step
  7. Finally, define the directory which you would like to archive.
    ... outdata/echam6
    
  8. You can add more than one input directory, but you can set the output only once:
    ... -o abc1234_outdata outdata/echam6 outdata/jsbach outdata/mpiom outdata/hamocc
    
  9. For restart files, the temporal relationship is very important. Therefore, you sort them by timestamp:
    ... -O by_time restart
    

Listems

Input and filtering

  1. -i INDEX_FILE specifies an INDEX file located on the local file system (see below for files on HPSS); only files provided by -i are ingested if -l is not specified; please note subsection Scope in which files/expressions are searched/evaluated below
    ... -i /home/user/my_index_files/INDEX_file.txt ...
    
  2. -i t:INDEX_FILE specifies an INDEX file located on HPSS by prepending t: to the path; please note subsection Scope in which files/expressions are searched/evaluated below
    ... -i t:/hpss/arch/bm0146/k204221/INDEX_file.txt ...
    
  3. -i INDEX_FILE_LIST specifies a list of INDEX files (separated by ;)
    ... -i t:/hpss/arch/bm0146/k204221/INDEX_file_1.txt;t:/hpss/arch/bm0146/k204221/INDEX_file_2.txt ...
    
  4. -i INDEX_FILE_WILDCARD specifies INDEX files via wildcard; Note: enclose by ' to prevent automatic evaluation by the shell
    ... -i 't:/hpss/arch/bm0146/k204221/INDEX_file_*.txt' ...
    
  5. -i r:INDEX_FILE_REGULAR_EXPRESSION specifies INDEX files via regular expression; prepend r:; the order of r: and t: has no effect; Note: enclose by ' to prevent automatic evaluation by the shell
    ... -i 'r:t:/hpss/arch/bm0146/k204221/INDEX_file_[0-9].txt' ...
    ... -i 't:r:/hpss/arch/bm0146/k204221/INDEX_file_[0-9].txt' ...
    
  6. -i INDEX_FILE_MIXED_EXPRESSION specifies INDEX files via a mixture of wildcard and regular expression; Note: enclose by ' to prevent automatic evaluation by the shell
    ... -i 't:/hpss/arch/bm0146/k204221/other/INDEX_file.txt;t:/hpss/arch/bm0146/k204221/icon/INDEX_file_*.txt;r:t:/hpss/arch/bm0146/k204221/data_c/INDEX_file_[0-9].txt' ...
    
  7. -s LIST_SEPARATOR -i INDEX_FILE_LIST specifies a list of INDEX files and manually sets the list-separator
    ... -s ',' -i t:/hpss/arch/bm0146/k204221/INDEX_file_1.txt,t:/hpss/arch/bm0146/k204221/INDEX_file_2.txt ...
    
  8. -l -i INDEX_FILE specifies an INDEX file located on the local file system in addition to the default INDEX files; please note subsection Scope in which files/expressions are searched/evaluated below
    ... -l -i /home/user/my_index_files/INDEX_file.txt ...
    ... -i /home/user/my_index_files/INDEX_file.txt -l ...
    ... -l '' -i /home/user/my_index_files/INDEX_file.txt ...
    ... -i /home/user/my_index_files/INDEX_file.txt -l '' ...
    
  9. -l INDEX_LIST_FILE specifies a file that contains the path(s) to INDEX file(s); only INDEX files listed in INDEX_LIST_FILE are ingested if no additional -l is set (seen next example); same wildcards and regular expressions as for -i can be used in the INDEX-file-list file and for INDEX_LIST_FILE
    ... -l my_dummy_index_list.txt ...
    
  10. -l -l INDEX_LIST_FILE specifies a file that contains the path(s) to INDEX file(s) in addition to the default INDEX files; same wildcards and regular expressions as for -i can be used in the INDEX-file-list file and for INDEX_LIST_FILE; alternatively to a pure -l the user can provide an empty string to -l
    ... -l -l my_dummy_index_list.txt ...
    ... -l my_dummy_index_list.txt -l ...
    ... -l '' -l my_dummy_index_list.txt ...
    ... -l my_dummy_index_list.txt -l '' ...
    ... -l ';my_dummy_index_list.txt' ...
    ... -l 'my_dummy_index_list.txt;' ...
    
  11. -a TAR_ARCHIVE select one(/more) tar archive(s) exclusively; only tar archives, which are listed in the provided INDEX file(s), are included; no new tar archives can be added; same usage of wildcards and regular expressions as for -i
    ... -a search_only_this_archive.tar ...
    
  12. -x EXCLUDE_TAR_ARCHIVE exclude one(/more) tar archive(s) that are listed in the provided INDEX files; no new tar archives can be added; same usage of wildcards and regular expressions as for -i
    ... -x contains_bad_data.tar ...
    
  13. Note: unpackems also has a -A/--archive-file option to retrieve and extract not-indexed files

Output formatting

  1. --long print detailed/long output (more columns)
    ... --long ...
    
  2. -t OUTPUT_FORMAT choose an output format: txt/text (default), csv, json or html
    ... -t json ...
    
  3. -o OUTPUT_FILE writes default output into a file
    ... -o output.txt ...
    
  4. -t OUTPUT_FORMAT -o OUTPUT_FILE writes output into a file of specific format; file extensions are not automatically recognized
    ... -t json -o output.json ...
    

storing local copies of INDEX files

By default, the INDEX files are copied/retrieved to a local temporary directory and are deleted after they have been read by listems. A prefix t: in the beginning of the INDEX file path indicates that the files should be retrieved from HPSS. Omitting the t: prefix indicates that the files should be copied from the local file system.

  1. -N dry run; stop after a list of INDEX files is created; don't download INDEX files from HPSS; meant to check whether input from -i and -l was properly processed
    ... -N ...
    
  2. -w TMP_DIRECTORY user-provided working directory to retrieve the INDEX files into; the directory will be created if it does not exist; the INDEX files will be kept if purge is not set (-p)
    ... -w store/index/files/here/ ...
    
  3. -p -w TMP_DIRECTORY remove INDEX files from user-provided directory after they were imported into listems
    ... -p -w store/index/files/here/ ...
    
  4. --use-old-index-files-n-seconds TIME take existing INDEX files when they are available in TMP_DIRECTORY and younger than TIME (seconds); default is 3600 seconds = 1 hour;
    ... --use-old-index-files-n-seconds 7200 -w store/index/files/here/ ...
    

Other options/arguments

  1. -v activate verbose output; the program prints information on ingested INDEX files and processing of the index; might help to diagnose bad input
    ... -v ...
    

Unpackems

Overview of the Options

Please see section on packems (above) for options -b, -F, -n, -N and -j.

Please see section on listems (above) for options -l, -i, -a, -x, -s, -v, and --use-old-index-files-n-seconds.

Information these flags provided here: -A, -d, -D, --flatten, -f, -o, -p, -K, -O, -q, -w, --no-untar

Note: The option -A does not allow the usage of -i, -l, -a, -x and files. Either files listed in INDEX files will be retrieved (-i, -l, -a, -x and files) or files not listed in INDEX files (-A). --force is ignored when -A is set because all files specified by -A will be automatically retrieved.

Setting working and destination directories; modifying extraction paths

  1. -d DESTINATION_DIR: destination directory into which the files should be extracted from the tar files
    ... -d /work/bm0146/k204221/model_results ...
    
  2. -D REPLACE_DESTINATION_DIR: replace the first n folders of the archived files by the folder provided by -D; n is the number of folders provided (4 in the example below); see Rules to construct/modifiy the output path of extracted files below for details
    ... -D new/folder/for/data ...
    
  3. -d DESTINATION_DIR -D REPLACE_DESTINATION_DIR: extract files (from tar archives) into /work/bm0146/k204221/model_results/new_folder and drop first directory of each archived file
    ... -d /work/bm0146/k204221/model_results -D new_folder ...
    
  4. -d DESTINATION_DIR -D REPLACE_DESTINATION_DIR: extract files (from tar archives) into /work/bm0146/k204221/model_results/new_folder/subfolder and drop first two directories of each archived file
    ... -d /work/bm0146/k204221/model_results -D new_folder/subfolder ...
    
  5. --flatten flatten/remove directory tree of files stored in tar archives
    ... --flatten ...
    
  6. -w WORK_DIR: specifiy a working directory into which the INDEX and tar files are retrieved.
    ... -w /scratch/k/k204221 ...
    

Selecting files for retrieval

  1. -A FILE_PATH: retrieve the file(s) specified by -A. If these are tar balls, they will be un-tar-ed except if --retrieve-only is provided. Non-tar files will be retrieved into the working/tmp directory and then moved into the destination directory. INDEX files and tar balls listed within them will not be considered.
    ... -A 't:/hpss/arch/my_project/my_user/data/example_01.nc' ...
    

Un-taring, keeping and overwriting files

  1. --retrieve-only: only retrieve and do not un-tar the files; the retrieved files are left in the working/tmp directory
    ... --retrieve-only ...
    
  2. no -K or -O: error by unpackems if file does already exist or if file would be extracted twice (or more) to same location; this is not fail-save, e.g. during parallel extractions
  3. -K: keep existing files during extraction; warn if keeping is expected
    ... -K ...
    
  4. -O: overwrite existing files during extraction; warn if overwriting is expected
    ... -O ...
    
  5. -q: suppress warnings thrown by -K and -O
    ... -q ...
    
  6. --no-untar: just retrieve files from HPSS but do not un-tar them
    ... --no-untar ...
    
  7. If non-tar files are listed in INDEX files (and selected for retrieval) they are retrieved but no attempt is done to unpack them.

Cleanup after extraction

  1. -p: purge INDEX files from working directory provided by -w
    ... -p ...
    

NOTE: tar archives are not automatically removed after retrieval and successful extraction when -p is not set.

Other options/arguments

  1. -f/--force: force to extract all available files even if it is a hugh amount of data; -f is not needed when: (a) a file for retrieval is specified via -A or (b) and list of files for extraction is provided to the call of unpackems. If you are not sure whether you need it or not: don't use -f and unpackems will tell you when it needs -f to be set. If -f was not existing, all files listed in available INDEX files would be retrieved when packems was called without any arguments. You don't want that in most situations.
    ... -f ...
    
  2. -o NAME_MAKEFILE: set name of the Makefile to create
    ... -o example_makefile_name ...
    

Detailed explanations on some topics

Regular Expressions and shell wildcards (listems and unpackems)

General Notes

  • enclose the expression with '; e.g. 'data/results/*.nc', 'file_?.nc'
  • regular expressions are indicated by leading r:; e.g. 'r:file_[0-9].nc'
  • wildcards have to match the whole path; e.g. 'file_?.nc' will not match data/results/file_1.nc; but, '*/file_?.nc' will match data/results/file_1.nc
  • for -a, -x and names/expression without preceeding flag: if a regular expression should only match the whole path, we need to add line beginning (^) and line end characters ($); e.g.: '^file_[0-9].nc$' will not match data/results/file_1.nc;

RegEx lookup on local file system:

  • If relative path is given: resolve from current working directory
  • If absolute path is given:
    • we look whether the first three folders exist (not via regex matching but as fixed expression; e.g. /first/second/third); if they don't exist, we stop evaluation of regex
    • reason: users should be prevented to do [a-zA-Z_/]*/my_file.txt causing lot of traffic on the file system / metadata server

RegEx and Wildcard lookup on HPSS:

  • currently deactivated
  • implemented but not activated:
    • we look whether the first three folders exist (not via regex matching but as fixed expression; e.g. /first/second/third); if they don't exist, we stop evaluation of regex
    • reason: users should be prevented to do [a-zA-Z_/]*/my_file.txt or *.txt causing lot of traffic on the file system / metadata server

Scope in which files/expressions are searched/evaluated (listems and unpackems)

  • -l look locally
  • -i:
    • no prefix (or l:): look locally
    • t: prefix: look at HPSS
  • -a and -x search for tar archives in last column of the provided INDEX files (resp.: in the content of the column)
  • arguments without flag (files/expressions attached to the call): files/expressions are looked up in the files listed in the provided INDEX files; the available list has been filtered by -a and -x previously

Available prefixes for file names/expressions (listems and unpackems)

  • r:: evaluated following expression as regular expression
  • t:: expect file to be located on HPSS
  • l:: expect file to be located on local file system (optional; will be ignored; same as omitting t:)

Rules to construct/modifiy the output path of extracted files (unpackems)

files stored in tar archive file:

data/mask.nc
data/forcing/sst.nc
data/forcing/emis.nc
data/output/wind.nc

extraction with -d /work/bm0146/k204221:

/work/bm0146/k204221/data/mask.nc
/work/bm0146/k204221/data/forcing/sst.nc
/work/bm0146/k204221/data/forcing/emis.nc
/work/bm0146/k204221/data/output/wind.nc

extraction with -d ./old_data:

./old_data/data/mask.nc
./old_data/data/forcing/sst.nc
./old_data/data/forcing/emis.nc
./old_data/data/output/wind.nc

extraction with --flatten:

./mask.nc
./sst.nc
./emis.nc
./wind.nc

extraction with -d old_data --flatten:

./old_data/mask.nc
./old_data/sst.nc
./old_data/emis.nc
./old_data/wind.nc

extraction with -D new_dir:

./new_dir/mask.nc
./new_dir/forcing/sst.nc
./new_dir/forcing/emis.nc
./new_dir/output/wind.nc

extraction with -D new_dir/second_dir:

./new_dir/second_dir/mask.nc
./new_dir/second_dir/sst.nc
./new_dir/second_dir/emis.nc
./new_dir/second_dir/wind.nc

extraction with -d old_data -D new_dir:

./old_data/new_dir/mask.nc
./old_data/new_dir/forcing/sst.nc
./old_data/new_dir/forcing/emis.nc
./old_data/new_dir/output/wind.nc

Detailed Examples

Please download this zip archive and extract it in your testing/training directory to run the example commands below (those aimed on the local file system).

Listems

Locally stored INDEX files

# read index files from file lists provided by @-l@ but don't use default INDEX list files; local index files
listems -v -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt'

# same as above but include default INDEX list file
listems -v -l -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt'
listems -v -l '' -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt'
listems -v -l ';examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt'

# as two above but other separator for file lists
listems -v -s ',' -l 'examples/index_file_lists/commented_file_list.txt,examples/index_file_lists/empty_line_file_list.txt,examples/index_file_lists/plain_file_list.txt'

# print as text table
listems -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt' data/ocean_day3d_t_pocp_emep_2012.nc

# print extended output
listems --long -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt' 'data/ocean_day3d_t_pocp_emep_201?.nc'

# more file to look for
listems -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc

# wildcard in -l
listems -l "examples/index_file_lists/*.txt" data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc

# wildcard in -l and simple file in -x
listems -l "examples/index_file_lists/*.txt" -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc 

# regex for -l
listems -v -l "r:examples/index_file_lists/[a-zA-Z-_]*.txt" -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc 

# use several files in -a
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_001.tar;iow_data_004.tar;iow_data4_002.tar;iow_data_006.tar' -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc 

# use wildcard in -a
listems -l "examples/index_file_lists/*.txt" -a '*4.tar' -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc

# use regex in -a
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_00[0-9].tar' -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc

# use wildcard in -x
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_00[0-9].tar' -x '*6.tar' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc 

# use wildcards in files to search
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_00[0-9].tar;abc.tar' -x '*6.tar' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc *.nc

# `warnow_river...` was not found in the calls before because it is in a subfolder; wie solve it here:
listems -l "examples/index_file_lists/*.txt" data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*/warnow_river_phoswam_v04_ist.nc'

# use regular expressions in files to search
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_00[0-9].tar;abc.tar' -x '*6.tar' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc r:data/ocean_day3d_u_emep_20[0-9][0-9].nc

# some other call ...
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_001.tar;iow_data_004.tar;iow_data4_002.tar;iow_data_006.tar' -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc

# print as json; with verbose flag
listems -v -t json -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc *warnow_river_phoswam_v04_ist.nc

# look via regex for files to retrieve and print as json
listems -t json -l 'examples/index_file_lists/*.txt' 'r:.*warnow_river_phoswam_v0[0-9]_[a-zA-Z0-9]+.nc'

# some verbose output
listems -v -l 'examples/index_file_lists/*.txt' 'data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc *warnow_river_phoswam_v04_ist.nc'

# write output into file `output_listems.txt`
listems -o output_listems.txt -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*warnow_river_phoswam_v04_ist.nc'

# write out into html file; if omit `-o` we get it printed to the command line
listems -t html -o output_listems.html -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*warnow_river_phoswam_v04_ist.nc'

# create download directory
listems -v -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_*.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc -t json -w tmp

# create download directory and remove/purge
listems -v -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_*.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc -t json -w tmp -p

INDEX files on HPSS

# call for searching in hpss:
listems -v -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt '*day3d_area_t_phoswam_v04_15_1995.nc'

# some extended
listems -v -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt '*day3d_area_t_phoswam_v04_15_1995.nc' '*warnow_river*'

Unpackems

Locally stored INDEX files, dry runs for testing

see examples for listems for more ideas

# an example call
unpackems -N -l examples/index_file_lists/*.txt data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc

# provide a name to the make file
unpackems -N -l examples/index_file_lists/*.txt data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc -o test_makefile

# select some files to extract
unpackems -N -l "examples/index_file_lists/*.txt" -a 'iow_data_001.tar;iow_data_004.tar;iow_data4_002.tar;iow_data_006.tar' -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc 

# retrieve and unpack a specific tar ball that is not listed in our INDEX files
unpackems -N -A 't:/hpss/arch/bm0146/k204221/iow/some_tar_001.tar'

# retrieve (no unpack) a specific tar ball that is not listed in our INDEX files
unpackems -N -A 't:/hpss/arch/bm0146/k204221/iow/some_tar_001.tar' --retrieve-only

# retrieve and unpack everything that is listed in our INDEX files; **Be very careful with this option. Better check via @listems@ first, how many files would be retrieved**
unpackems -f

# retrieve all tar files that contain files like @*day3d_area_t_phoswam_v04_15_1995.nc@ and are listed in @t:/hpss/arch/bm0146/k204221/iow/INDEX.txt@
unpackems -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt '*day3d_area_t_phoswam_v04_15_1995.nc'

# will not work because no @files@ are specified; need @--force@/@-f@
unpackems -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt

# retrieves all files listed in @t:/hpss/arch/bm0146/k204221/iow/INDEX.txt@ and extracts them
unpackems -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt --force
unpackems -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt -f

create some problematic files, first, and then call unpackems:

# create some files that should be extracted
mkdir abc/def/ghi/ -p
touch abc/def/ghi/warnow_river_phoswam_v04_ist.nc

# should throw error
unpackems -N -v -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*/warnow_river_phoswam_v04_ist.nc' -D abc/def/ghi

# should throw warning
unpackems -N -v -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*/warnow_river_phoswam_v04_ist.nc' -D abc/def/ghi -K

# will overwrite files and be quite with respect to this
unpackems -N -v -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*/warnow_river_phoswam_v04_ist.nc' -D abc/def/ghi -O -q