Project

General

Profile

compresm - tools for COMPRessing Earth System Model data

Updated over 1 year ago by Karl-Hermann Wieners

[2023-03-17]
Unfortunately, due to a bug in version 1.3.1, GRIB files were no longer recognized and thus compressed with gzip. A new official version 1.3.2 has been installed and fixes this problem

compresm - compression for NetCDF, GRIB, and arbitrary tar and text files

Usage

compresm is installed via dkrz's module system. To use it, enter

module load compresm
on levante. Information on test versions

For a moderate number of files to be compressed, compresm may now simply be called as

compresm [-n] [-j njobs] [file_or_directory ...] [find_option ...]

Without any options, compresm will look up the MIME type for all files within or below the current directory. Text files are compressed with xz which gives the best result for the usual model log files. GRIB and NetCDF files are processed with cdo -z szip clone or nccopy -d 1 -s, respectively. The contents of *.tar archives are unpacked, individual files handled as described, and re-packed. Empty or compressed files are ignored. Files of any other type are compressed with gzip.

compresm uses make internally, so if the program stops for some reason, you may simply restart it, and it will pick up work from where it stopped.

Customization of file selection

To restrict the search, you may use the the same syntax that is understood by find (see man find). For instance, you may just search certain files and/or subdirectories,

compresm some_directory another_file.grb

or omit the *.tar files

compresm \! -name '*.tar'

Get diagnostics or help

When the first option is -n ('no-op'), compresm will only print the compression commands, but not execute them.

For a summary of compresm and find options, enter

compresm -h
View current help page

Performance considerations, SLURM

With -j njobs, compresm uses up to njobs parallel jobs doing the actual compression. As compression of large files requires quite some memory, use of a SLURM job is strongly recommended to run compresm for any more than just a small number of medium sized files.

For long running, parallelized jobs on a large number of files, you may run SLURM jobs directly from the command line and in any directory, as in

# make sure module compresm (or possibly compresm-dev) is loaded!
sbatch -A mh0287 -p compute -o %x_%j.log compresm -j 120 [...]

Output and error messages will be written to the current directory as compresm_<job_id>.log