compresm - tools for COMPRessing Earth System Model data
Updated almost 3 years ago by Karl-Hermann Wieners
[2023-03-17]
Unfortunately, due to a bug in version 1.3.1, GRIB files were no longer recognized and thus compressed with gzip. A new official version 1.3.2 has been installed and fixes this problem
compresm - compression for NetCDF, GRIB, and arbitrary tar and text files¶
Usage¶
compresm is installed via dkrz's module system. To use it, enter
module load compresmon
levante. Information on test versions
For a moderate number of files to be compressed, compresm may now simply be called as
compresm [-n] [-j njobs] [file_or_directory ...] [find_option ...]
Without any options,
compresm will look up the MIME type for all files within or below the current directory. Text files are compressed with xz which gives the best result for the usual model log files. GRIB and NetCDF files are processed with cdo -z szip clone or nccopy -d 1 -s, respectively. The contents of *.tar archives are unpacked, individual files handled as described, and re-packed. Empty or compressed files are ignored. Files of any other type are compressed with gzip.
compresm uses make internally, so if the program stops for some reason, you may simply restart it, and it will pick up work from where it stopped.
Customization of file selection¶
To restrict the search, you may use the the same syntax that is understood by find (see man find). For instance, you may just search certain files and/or subdirectories,
compresm some_directory another_file.grb
or omit the *.tar files
compresm \! -name '*.tar'
Get diagnostics or help¶
When the first option is -n ('no-op'), compresm will only print the compression commands, but not execute them.
For a summary of compresm and find options, enter
compresm -hView current help page
Performance considerations, SLURM¶
With -j njobs, compresm uses up to njobs parallel jobs doing the actual compression. As compression of large files requires quite some memory, use of a SLURM job is strongly recommended to run compresm for any more than just a small number of medium sized files.
For long running, parallelized jobs on a large number of files, you may run SLURM jobs directly from the command line and in any directory, as in
# make sure module compresm (or possibly compresm-dev) is loaded! sbatch -A mh0287 -p compute -o %x_%j.log compresm -j 120 [...]
Output and error messages will be written to the current directory as compresm_<job_id>.log