compresm - tools for COMPRessing Earth System Model data
Updated almost 2 years ago by Karl-Hermann Wieners
[2023-03-17]
Unfortunately, due to a bug in version 1.3.1, GRIB files were no longer recognized and thus compressed with gzip. A new official version 1.3.2 has been installed and fixes this problem
compresm - compression for NetCDF, GRIB, and arbitrary tar and text files¶
Usage¶
compresm
is installed via dkrz
's module system. To use it, enter
module load compresmon
levante
. Information on test versions
For a moderate number of files to be compressed, compresm
may now simply be called as
compresm [-n] [-j njobs] [file_or_directory ...] [find_option ...]
Without any options,
compresm
will look up the MIME type for all files within or below the current directory. Text files are compressed with xz
which gives the best result for the usual model log files. GRIB and NetCDF files are processed with cdo -z szip clone
or nccopy -d 1 -s
, respectively. The contents of *.tar
archives are unpacked, individual files handled as described, and re-packed. Empty or compressed files are ignored. Files of any other type are compressed with gzip
.
compresm
uses make
internally, so if the program stops for some reason, you may simply restart it, and it will pick up work from where it stopped.
Customization of file selection¶
To restrict the search, you may use the the same syntax that is understood by find
(see man find
). For instance, you may just search certain files and/or subdirectories,
compresm some_directory another_file.grb
or omit the *.tar
files
compresm \! -name '*.tar'
Get diagnostics or help¶
When the first option is -n
('no-op'), compresm
will only print the compression commands, but not execute them.
For a summary of compresm
and find
options, enter
compresm -hView current help page
Performance considerations, SLURM¶
With -j njobs
, compresm
uses up to njobs
parallel jobs doing the actual compression. As compression of large files requires quite some memory, use of a SLURM job is strongly recommended to run compresm
for any more than just a small number of medium sized files.
For long running, parallelized jobs on a large number of files, you may run SLURM jobs directly from the command line and in any directory, as in
# make sure module compresm (or possibly compresm-dev) is loaded! sbatch -A mh0287 -p compute -o %x_%j.log compresm -j 120 [...]
Output and error messages will be written to the current directory as compresm_<job_id>.log