compresm - tools for COMPRessing Earth System Model data

Table of contents
compresm - compression for NetCDF, GRIB, and arbitrary tar and text files

Updated over 2 years ago by Karl-Hermann Wieners

[2023-03-17]
Unfortunately, due to a bug in version 1.3.1, GRIB files were no longer recognized and thus compressed with gzip. A new official version 1.3.2 has been installed and fixes this problem

compresm - compression for NetCDF, GRIB, and arbitrary tar and text files¶

Usage¶

compresm is installed via dkrz's module system. To use it, enter

module load compresm

on levante. Information on test versions

To use the latest test version instead, run

module unload compresm
module use ~m221078/etc/Modules
module load compresm-dev

For a moderate number of files to be compressed, compresm may now simply be called as

compresm [-n] [-j njobs] [file_or_directory ...] [find_option ...]

Without any options, compresm will look up the MIME type for all files within or below the current directory. Text files are compressed with xz which gives the best result for the usual model log files. GRIB and NetCDF files are processed with cdo -z szip clone or nccopy -d 1 -s, respectively. The contents of *.tar archives are unpacked, individual files handled as described, and re-packed. Empty or compressed files are ignored. Files of any other type are compressed with gzip.

compresm uses make internally, so if the program stops for some reason, you may simply restart it, and it will pick up work from where it stopped.

Customization of file selection¶

To restrict the search, you may use the the same syntax that is understood by find (see man find). For instance, you may just search certain files and/or subdirectories,

compresm some_directory another_file.grb

or omit the *.tar files

compresm \! -name '*.tar'

Get diagnostics or help¶

When the first option is -n ('no-op'), compresm will only print the compression commands, but not execute them.

For a summary of compresm and find options, enter

compresm -h

View current help page

compresm [-h] [-n] [-j njobs] [-l[=options]] [-v] [-f] [-d destdir] [file_or_directory ...] [find_option ...]

-n: no-op: only print the compression commands, but do not execute them
-j njobs: jobs: compression will create up to njobs parallel compression tasks (default: 1)
-l[=options]: use lossy compression for NetCDF
-v: verbose mode
-f: force re-compression of already compressed data
-d destdir: save compressed files to alternative destination
-h: help
file_or_directory, find_option:
[output of 'find --help'...]

Performance considerations, SLURM¶

With -j njobs, compresm uses up to njobs parallel jobs doing the actual compression. As compression of large files requires quite some memory, use of a SLURM job is strongly recommended to run compresm for any more than just a small number of medium sized files.

For long running, parallelized jobs on a large number of files, you may run SLURM jobs directly from the command line and in any directory, as in

# make sure module compresm (or possibly compresm-dev) is loaded!
sbatch -A mh0287 -p compute -o %x_%j.log compresm -j 120 [...]

Output and error messages will be written to the current directory as compresm_<job_id>.log

Files (0)

Project

General

Profile

Earth System Model Environment

Wiki

compresm - compression for NetCDF, GRIB, and arbitrary tar and text files¶

Usage¶

Customization of file selection¶

Get diagnostics or help¶

Performance considerations, SLURM¶