Pack'em ESMs! Tools for (un-)archiving Earth system model data
- Table of contents
- Overview
- Preparation
- Quick Tour (examples taken from MPI-ESM)
- Detailed explanations on some topics
- Detailed Examples
Updated almost 4 years ago by Karl-Hermann Wieners
Information on this page refers to packems-1.2.1 🔒
Current Beta version: packems-1.2.2beta1
Older versions: packems-1.1.0 packems-1.0.2 packems-1.0.1
New in this release¶
packems¶
- allows files exceeding target size to be handled as single file pack
- better locking behaviour when processing large pack numbers
- improved restart behaviour (keep tar files, tmp files on tape, --lock-break)
- additional option checks (no data, -d directory within input directory)
- adapt digits of tar file names to total pack count
unpackems¶
- new option to allow customization of the target directory (-R)
- handle usage of absolute and relative paths in tar packs
- allow unpacking of all MIME defined archive files
listems¶
- handle usage of absolute and relative paths in tar packs
Overview¶
Pack'em ESMs! (packems) contains a set of tools for packing, archiving, listing, and retrieval of earth system model and other data.
packems takes directories to be packed into tar files, and optionally pushes these directly to the tape archiving system. Features are parallel operation, batch job, and error recovery. It keeps track of the user's archiving operations to re-use this information for later retrieval of data. Currently only HPSS's pftp interface is supported for archiving. packems_wrapper is provided for use with sbatch
listems takes the information recorded by packems and allows to survey and examine the archived data without having to unarchiving or unpack large data files. It features UNIX and Regular Expression style search patterns for file selection and also allows to include user defined storage index information
unpackems uses the listems interface to unarchive and unpack the selected data. The retrieval process allows for transforming the original directory structure as needed by the user, eg using a different base directory, renaming parts of the path name or flattening out the tar file contents into a single directory
tapeinit allows to easily renew and create the Kerberos tickets needed for the tape achive
Preparation¶
Tasks to do once prior to the first usage¶
Passwordless access to the tape archive (HPSS)¶
Before using the packems
tools for the first time, you must register for DKRZ's Kerberos service if you haven't done so before.
Tasks to do prior to each usage¶
Make sure the module is loaded¶
The tools are now available as a standard module. To load them, enter
module add packems
Check Kerberos ticket¶
Kerberos authentication works with so-called tickets, which currently expire after one week. To check on the tickets, simply enter
tapeinit
This will renew the tickets if possible, and otherwise
ask your Kerberos password to create a new ticket.
Quick Tour (examples taken from MPI-ESM)¶
Packems¶
Basic use¶
- I recommend to start the job in the experiment directory on work and to specify the directories or files to be archived relative to it. In doing so, the archives will also contain relative path names and can easily be unpacked to other locations later on.
cd /work/xy1234/m123456/mpiesm/experiments/abc1234
-j JOBS
specifies how many jobs are started in parallel. Useful in combination with SLURM, e.g. assbatch -A xy1234 -p prepost,compute,compute2 --exclusive --mail-type=FAIL packems_wrapper -j 12 ...
- Because our example directory ends in
abc1234
, the archives are namedabc1234_001.tar
,abc1234_002.tar
, and so on. With-o OUTPUT
you may specify a different base name for them... -o abc1234_outdata_echam ...
- As our example directory is under
/work
, the.tar
files will be archived in under the same directory structure on tape, ie.xy1234/m123456/mpiesm/experiments/abc1234
.
Use-S
to change the directory where the files are stored... -S xy1234/common_data/abc1234 ...
-p
deletes the packed archives automatically after archiving. Otherwise you can manually delete the files inpacking
afterwards.... -p ...
A similar option,-P
, which also deletes the original files, should be only used with extreme caution! Re-starting after abnormal program termination needs proper knowledge of-M
and-A
option behavior to avoid data loss. We usually recommend to remove the original files in a second step- Finally, define the directory which you would like to archive.
... outdata/echam6
- You can add more than one input directory, but you can set the output only once:
... -o abc1234_outdata outdata/echam6 outdata/jsbach outdata/mpiom outdata/hamocc
- For restart files, the temporal relationship is very important. Therefore, you sort them by timestamp:
... -O by_time restart
Restart after premature termination¶
When packems
was stopped due to temporary conditions, eg. batch job time-out, it is usually safe to just restart with the same options. packems
will pick-up any prepared .tar
packs and remove partially transferred packs from the tape archive before continuing.
Lock time-out¶
If running more than one packems
process in parallel using the same setting for -S
/--archive-subdir
, packems
uses a lock directory on tape to synchronize index file access. If one process fails to properly remove this lock during termination, all subsequent processes using the same archive directory will time out. Use the --lock-break
option to recover from this situation.
Caution when using -P
/--purge
¶
When using the -P
/--purge
option to remove original files as soon as the corresponding pack is ready, there is no easy way to restart. You may try a cleanup run by adding the -M
/--make-only
option to your command. This will try to continue the process without re-processing the input files. There will be errors due to the input files that were removed in the first run, but packems
will continue to process all files and packs that still exist. To avoid data loss, check carefully if there are any other errors during the cleanup.
Other useful options¶
-d
/--destination
determines the directory in which the.tar
files are packed (by defaultpackems/OUTPUT
under your/scratch
directory). For large pack sizes or many parallel jobs, the 15 TB quota on/scratch
might not be appropriate.-L
/--dereference
will pack the files that are referenced by symbolic links. This should be used if the referenced files are not stored in the same or anotherpackems
call. By default,packems
will just store the links themselves but not the file data to avoid duplicate copies.
To get an overview of all options, run
packems --help
Listems¶
Input and filtering¶
-i INDEX_FILE
specifies an INDEX file located on the local file system (see below for files on HPSS); only files provided by-i
are ingested if-l
is not specified; please note subsection Scope in which files/expressions are searched/evaluated below... -i /home/user/my_index_files/INDEX_file.txt ...
-i t:INDEX_FILE
specifies an INDEX file located on HPSS by prependingt:
to the path; please note subsection Scope in which files/expressions are searched/evaluated below... -i t:/hpss/arch/bm0146/k204221/INDEX_file.txt ...
-i INDEX_FILE_LIST
specifies a list of INDEX files (separated by;
)... -i t:/hpss/arch/bm0146/k204221/INDEX_file_1.txt;t:/hpss/arch/bm0146/k204221/INDEX_file_2.txt ...
-i INDEX_FILE_WILDCARD
specifies INDEX files via wildcard; Note: enclose by'
to prevent automatic evaluation by the shell... -i 't:/hpss/arch/bm0146/k204221/INDEX_file_*.txt' ...
-i r:INDEX_FILE_REGULAR_EXPRESSION
specifies INDEX files via regular expression; prependr:
; the order ofr:
andt:
has no effect; Note: enclose by'
to prevent automatic evaluation by the shell... -i 'r:t:/hpss/arch/bm0146/k204221/INDEX_file_[0-9].txt' ... ... -i 't:r:/hpss/arch/bm0146/k204221/INDEX_file_[0-9].txt' ...
-i INDEX_FILE_MIXED_EXPRESSION
specifies INDEX files via a mixture of wildcard and regular expression; Note: enclose by'
to prevent automatic evaluation by the shell... -i 't:/hpss/arch/bm0146/k204221/other/INDEX_file.txt;t:/hpss/arch/bm0146/k204221/icon/INDEX_file_*.txt;r:t:/hpss/arch/bm0146/k204221/data_c/INDEX_file_[0-9].txt' ...
-s LIST_SEPARATOR -i INDEX_FILE_LIST
specifies a list of INDEX files and manually sets the list-separator... -s ',' -i t:/hpss/arch/bm0146/k204221/INDEX_file_1.txt,t:/hpss/arch/bm0146/k204221/INDEX_file_2.txt ...
-l -i INDEX_FILE
specifies an INDEX file located on the local file system in addition to the default INDEX files; please note subsection Scope in which files/expressions are searched/evaluated below... -l -i /home/user/my_index_files/INDEX_file.txt ... ... -i /home/user/my_index_files/INDEX_file.txt -l ... ... -l '' -i /home/user/my_index_files/INDEX_file.txt ... ... -i /home/user/my_index_files/INDEX_file.txt -l '' ...
-l INDEX_LIST_FILE
specifies a file that contains the path(s) to INDEX file(s); only INDEX files listed inINDEX_LIST_FILE
are ingested if no additional-l
is set (seen next example); same wildcards and regular expressions as for-i
can be used in the INDEX-file-list file and forINDEX_LIST_FILE
... -l my_dummy_index_list.txt ...
-l -l INDEX_LIST_FILE
specifies a file that contains the path(s) to INDEX file(s) in addition to the default INDEX files; same wildcards and regular expressions as for-i
can be used in the INDEX-file-list file and forINDEX_LIST_FILE
; alternatively to a pure-l
the user can provide an empty string to-l
... -l -l my_dummy_index_list.txt ... ... -l my_dummy_index_list.txt -l ... ... -l '' -l my_dummy_index_list.txt ... ... -l my_dummy_index_list.txt -l '' ... ... -l ';my_dummy_index_list.txt' ... ... -l 'my_dummy_index_list.txt;' ...
-a TAR_ARCHIVE
select one(/more) tar archive(s) exclusively; only tar archives, which are listed in the provided INDEX file(s), are included; no new tar archives can be added; same usage of wildcards and regular expressions as for-i
... -a search_only_this_archive.tar ...
-x EXCLUDE_TAR_ARCHIVE
exclude one(/more) tar archive(s) that are listed in the provided INDEX files; no new tar archives can be added; same usage of wildcards and regular expressions as for-i
... -x contains_bad_data.tar ...
- Note: unpackems also has a
-A/--archive-file
option to retrieve and extract not-indexed files
Output formatting¶
--long
print detailed/long output (more columns)... --long ...
-t OUTPUT_FORMAT
choose an output format:txt
/text
(default),csv
,json
orhtml
... -t json ...
-o OUTPUT_FILE
writes default output into a file... -o output.txt ...
-t OUTPUT_FORMAT -o OUTPUT_FILE
writes output into a file of specific format; file extensions are not automatically recognized... -t json -o output.json ...
storing local copies of INDEX files¶
By default, the INDEX files are copied/retrieved to a local temporary directory and are deleted after they have been read by listems. A prefix t:
in the beginning of the INDEX file path indicates that the files should be retrieved from HPSS. Omitting the t:
prefix indicates that the files should be copied from the local file system.
-N
dry run; stop after a list of INDEX files is created; don't download INDEX files from HPSS; meant to check whether input from-i
and-l
was properly processed... -N ...
-w TMP_DIRECTORY
user-provided working directory to retrieve the INDEX files into; the directory will be created if it does not exist; the INDEX files will be kept if purge is not set (-p
)... -w store/index/files/here/ ...
-p -w TMP_DIRECTORY
remove INDEX files from user-provided directory after they were imported into listems... -p -w store/index/files/here/ ...
--use-old-index-files-n-seconds TIME
take existing INDEX files when they are available inTMP_DIRECTORY
and younger thanTIME
(seconds); default is 3600 seconds = 1 hour;... --use-old-index-files-n-seconds 7200 -w store/index/files/here/ ...
Other options/arguments¶
-v
activate verbose output; the program prints information on ingested INDEX files and processing of the index; might help to diagnose bad input... -v ...
Unpackems¶
Overview of the Options¶
Please see section on packems
(above) for options -b
, -F
, -n
, -N
and -j
.
Please see section on listems
(above) for options -l
, -i
, -a
, -x
, -s
, -v
, and --use-old-index-files-n-seconds
.
Information these flags provided here: -A
, -d
, -D
, --flatten
, -f
, -o
, -p
, -K
, -O
, -q
, -w
, --no-untar
Note: The option -A
does not allow the usage of -i
, -l
, -a
, -x
and files
. Either files listed in INDEX files will be retrieved (-i
, -l
, -a
, -x
and files
) or files not listed in INDEX files (-A
). --force
is ignored when -A
is set because all files specified by -A
will be automatically retrieved.
Setting working and destination directories; modifying extraction paths¶
-d DESTINATION_DIR
: destination directory into which the files should be extracted from the tar files... -d /work/bm0146/k204221/model_results ...
-D REPLACE_DESTINATION_DIR
: replace the firstn
folders of the archived files by the folder provided by-D
;n
is the number of folders provided (4
in the example below); see Rules to construct/modifiy the output path of extracted files below for details... -D new/folder/for/data ...
-d DESTINATION_DIR -D REPLACE_DESTINATION_DIR
: extract files (from tar archives) into/work/bm0146/k204221/model_results/new_folder
and drop first directory of each archived file... -d /work/bm0146/k204221/model_results -D new_folder ...
-d DESTINATION_DIR -D REPLACE_DESTINATION_DIR
: extract files (from tar archives) into/work/bm0146/k204221/model_results/new_folder/subfolder
and drop first two directories of each archived file... -d /work/bm0146/k204221/model_results -D new_folder/subfolder ...
--flatten
flatten/remove directory tree of files stored in tar archives... --flatten ...
-w WORK_DIR
: specifiy a working directory into which the INDEX and tar files are retrieved.... -w /scratch/k/k204221 ...
Selecting files for retrieval¶
-A FILE_PATH
: retrieve the file(s) specified by-A
. If these are tar balls, they will be un-tar-ed except if--retrieve-only
is provided. Non-tar files will be retrieved into the working/tmp directory and then moved into the destination directory. INDEX files and tar balls listed within them will not be considered.... -A 't:/hpss/arch/my_project/my_user/data/example_01.nc' ...
Un-taring, keeping and overwriting files¶
--retrieve-only
: only retrieve and do not un-tar the files; the retrieved files are left in the working/tmp directory... --retrieve-only ...
no -K or -O
: error by unpackems if file does already exist or if file would be extracted twice (or more) to same location; this is not fail-save, e.g. during parallel extractions-K
: keep existing files during extraction; warn if keeping is expected... -K ...
-O
: overwrite existing files during extraction; warn if overwriting is expected... -O ...
-q
: suppress warnings thrown by-K
and-O
... -q ...
--no-untar
: just retrieve files from HPSS but do not un-tar them... --no-untar ...
- If non-tar files are listed in INDEX files (and selected for retrieval) they are retrieved but no attempt is done to unpack them.
Cleanup after extraction¶
-p
: purge INDEX files from working directory provided by-w
... -p ...
NOTE: tar archives are not automatically removed after retrieval and successful extraction when -p
is not set.
Other options/arguments¶
-f/--force
: force to extract all available files even if it is a hugh amount of data;-f
is not needed when: (a) a file for retrieval is specified via-A
or (b) and list offiles
for extraction is provided to the call of unpackems. If you are not sure whether you need it or not: don't use-f
and unpackems will tell you when it needs-f
to be set. If-f
was not existing, all files listed in available INDEX files would be retrieved when packems was called without any arguments. You don't want that in most situations.... -f ...
-o NAME_MAKEFILE
: set name of the Makefile to create... -o example_makefile_name ...
Detailed explanations on some topics¶
Regular Expressions and shell wildcards (listems and unpackems)¶
General Notes¶
- enclose the expression with
'
; e.g.'data/results/*.nc'
,'file_?.nc'
- regular expressions are indicated by leading
r:
; e.g.'r:file_[0-9].nc'
- wildcards have to match the whole path; e.g.
'file_?.nc'
will not matchdata/results/file_1.nc
; but,'*/file_?.nc'
will matchdata/results/file_1.nc
- for
-a
,-x
and names/expression without preceeding flag: if a regular expression should only match the whole path, we need to add line beginning (^
) and line end characters ($
); e.g.:'^file_[0-9].nc$'
will not matchdata/results/file_1.nc
;
RegEx lookup on local file system:¶
- If relative path is given: resolve from current working directory
- If absolute path is given:
- we look whether the first three folders exist (not via regex matching but as fixed expression; e.g.
/first/second/third
); if they don't exist, we stop evaluation of regex - reason: users should be prevented to do
[a-zA-Z_/]*/my_file.txt
causing lot of traffic on the file system / metadata server
- we look whether the first three folders exist (not via regex matching but as fixed expression; e.g.
RegEx and Wildcard lookup on HPSS:¶
- currently deactivated
- implemented but not activated:
- we look whether the first three folders exist (not via regex matching but as fixed expression; e.g.
/first/second/third
); if they don't exist, we stop evaluation of regex - reason: users should be prevented to do
[a-zA-Z_/]*/my_file.txt
or*.txt
causing lot of traffic on the file system / metadata server
- we look whether the first three folders exist (not via regex matching but as fixed expression; e.g.
Scope in which files/expressions are searched/evaluated (listems and unpackems)¶
-l
look locally-i
:- no prefix (or
l:
): look locally t:
prefix: look at HPSS
- no prefix (or
-a
and-x
search for tar archives in last column of the provided INDEX files (resp.: in the content of the column)arguments without flag
(files/expressions attached to the call): files/expressions are looked up in the files listed in the provided INDEX files; the available list has been filtered by-a
and-x
previously
Available prefixes for file names/expressions (listems and unpackems)¶
r:
: evaluated following expression as regular expressiont:
: expect file to be located on HPSSl:
: expect file to be located on local file system (optional; will be ignored; same as omittingt:
)
Rules to construct/modifiy the output path of extracted files (unpackems)¶
files stored in tar archive file:
data/mask.nc
data/forcing/sst.nc
data/forcing/emis.nc
data/output/wind.nc
extraction with -d /work/bm0146/k204221
:
/work/bm0146/k204221/data/mask.nc
/work/bm0146/k204221/data/forcing/sst.nc
/work/bm0146/k204221/data/forcing/emis.nc
/work/bm0146/k204221/data/output/wind.nc
extraction with -d ./old_data
:
./old_data/data/mask.nc
./old_data/data/forcing/sst.nc
./old_data/data/forcing/emis.nc
./old_data/data/output/wind.nc
extraction with --flatten
:
./mask.nc
./sst.nc
./emis.nc
./wind.nc
extraction with -d old_data --flatten
:
./old_data/mask.nc
./old_data/sst.nc
./old_data/emis.nc
./old_data/wind.nc
extraction with -D new_dir
:
./new_dir/mask.nc
./new_dir/forcing/sst.nc
./new_dir/forcing/emis.nc
./new_dir/output/wind.nc
extraction with -D new_dir/second_dir
:
./new_dir/second_dir/mask.nc
./new_dir/second_dir/sst.nc
./new_dir/second_dir/emis.nc
./new_dir/second_dir/wind.nc
extraction with -d old_data -D new_dir
:
./old_data/new_dir/mask.nc
./old_data/new_dir/forcing/sst.nc
./old_data/new_dir/forcing/emis.nc
./old_data/new_dir/output/wind.nc
Detailed Examples¶
Please download this zip archive and extract it in your testing/training directory to run the example commands below (those aimed on the local file system).
Listems¶
Locally stored INDEX files¶
# read index files from file lists provided by @-l@ but don't use default INDEX list files; local index files
listems -v -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt'
# same as above but include default INDEX list file
listems -v -l -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt'
listems -v -l '' -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt'
listems -v -l ';examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt'
# as two above but other separator for file lists
listems -v -s ',' -l 'examples/index_file_lists/commented_file_list.txt,examples/index_file_lists/empty_line_file_list.txt,examples/index_file_lists/plain_file_list.txt'
# print as text table
listems -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt' data/ocean_day3d_t_pocp_emep_2012.nc
# print extended output
listems --long -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt' 'data/ocean_day3d_t_pocp_emep_201?.nc'
# more file to look for
listems -l 'examples/index_file_lists/commented_file_list.txt;examples/index_file_lists/empty_line_file_list.txt;examples/index_file_lists/plain_file_list.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc
# wildcard in -l
listems -l "examples/index_file_lists/*.txt" data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc
# wildcard in -l and simple file in -x
listems -l "examples/index_file_lists/*.txt" -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc
# regex for -l
listems -v -l "r:examples/index_file_lists/[a-zA-Z-_]*.txt" -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc
# use several files in -a
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_001.tar;iow_data_004.tar;iow_data4_002.tar;iow_data_006.tar' -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc
# use wildcard in -a
listems -l "examples/index_file_lists/*.txt" -a '*4.tar' -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc
# use regex in -a
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_00[0-9].tar' -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc
# use wildcard in -x
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_00[0-9].tar' -x '*6.tar' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc
# use wildcards in files to search
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_00[0-9].tar;abc.tar' -x '*6.tar' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc *.nc
# `warnow_river...` was not found in the calls before because it is in a subfolder; wie solve it here:
listems -l "examples/index_file_lists/*.txt" data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*/warnow_river_phoswam_v04_ist.nc'
# use regular expressions in files to search
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_00[0-9].tar;abc.tar' -x '*6.tar' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc r:data/ocean_day3d_u_emep_20[0-9][0-9].nc
# some other call ...
listems -l "examples/index_file_lists/*.txt" -a 'iow_data_001.tar;iow_data_004.tar;iow_data4_002.tar;iow_data_006.tar' -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc
# print as json; with verbose flag
listems -v -t json -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc *warnow_river_phoswam_v04_ist.nc
# look via regex for files to retrieve and print as json
listems -t json -l 'examples/index_file_lists/*.txt' 'r:.*warnow_river_phoswam_v0[0-9]_[a-zA-Z0-9]+.nc'
# some verbose output
listems -v -l 'examples/index_file_lists/*.txt' 'data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc *warnow_river_phoswam_v04_ist.nc'
# write output into file `output_listems.txt`
listems -o output_listems.txt -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*warnow_river_phoswam_v04_ist.nc'
# write out into html file; if omit `-o` we get it printed to the command line
listems -t html -o output_listems.html -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*warnow_river_phoswam_v04_ist.nc'
# create download directory
listems -v -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_*.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc -t json -w tmp
# create download directory and remove/purge
listems -v -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_*.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc -t json -w tmp -p
INDEX files on HPSS¶
# call for searching in hpss:
listems -v -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt '*day3d_area_t_phoswam_v04_15_1995.nc'
# some extended
listems -v -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt '*day3d_area_t_phoswam_v04_15_1995.nc' '*warnow_river*'
Unpackems¶
Locally stored INDEX files, dry runs for testing¶
see examples for listems for more ideas
# an example call
unpackems -N -l examples/index_file_lists/*.txt data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc
# provide a name to the make file
unpackems -N -l examples/index_file_lists/*.txt data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc -o test_makefile
# select some files to extract
unpackems -N -l "examples/index_file_lists/*.txt" -a 'iow_data_001.tar;iow_data_004.tar;iow_data4_002.tar;iow_data_006.tar' -x iow_data4_002.tar data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc warnow_river_phoswam_v04_ist.nc
# retrieve and unpack a specific tar ball that is not listed in our INDEX files
unpackems -N -A 't:/hpss/arch/bm0146/k204221/iow/some_tar_001.tar'
# retrieve (no unpack) a specific tar ball that is not listed in our INDEX files
unpackems -N -A 't:/hpss/arch/bm0146/k204221/iow/some_tar_001.tar' --retrieve-only
# retrieve and unpack everything that is listed in our INDEX files; **Be very careful with this option. Better check via @listems@ first, how many files would be retrieved**
unpackems -f
# retrieve all tar files that contain files like @*day3d_area_t_phoswam_v04_15_1995.nc@ and are listed in @t:/hpss/arch/bm0146/k204221/iow/INDEX.txt@
unpackems -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt '*day3d_area_t_phoswam_v04_15_1995.nc'
# will not work because no @files@ are specified; need @--force@/@-f@
unpackems -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt
# retrieves all files listed in @t:/hpss/arch/bm0146/k204221/iow/INDEX.txt@ and extracts them
unpackems -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt --force
unpackems -i t:/hpss/arch/bm0146/k204221/iow/INDEX.txt -f
create some problematic files, first, and then call unpackems:
# create some files that should be extracted
mkdir abc/def/ghi/ -p
touch abc/def/ghi/warnow_river_phoswam_v04_ist.nc
# should throw error
unpackems -N -v -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*/warnow_river_phoswam_v04_ist.nc' -D abc/def/ghi
# should throw warning
unpackems -N -v -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*/warnow_river_phoswam_v04_ist.nc' -D abc/def/ghi -K
# will overwrite files and be quite with respect to this
unpackems -N -v -l 'examples/index_file_lists/*.txt' data/ocean_day3d_t_pocp_emep_2012.nc data/ocean_day3d_t_pocp_emep_2004.nc '*/warnow_river_phoswam_v04_ist.nc' -D abc/def/ghi -O -q