Project

General

Profile

Land masking many .nc files using Python subprocess loop and CDO

Added by Michael Pletcher over 2 years ago

Hi everyone,

I am attempting to mask land for a large quantity of .nc files using Python's subprocess command and CDO, but I keep receiving this error: 'cdo (Abort): Unprocessed Input, could not process all Operators/Files'. My code is below.

import datetime as dt
from datetime import datetime
import subprocess
import glob

directory = '/uufs/chpc.utah.edu/common/home/zpu-group10/pu03/mpletch/thesis/IMERG/July/gpm1.gesdisc.eosdis.nasa.gov/opendap/GPM_L3/GPM_3IMERGHH.06/2020/183/*.nc4'
hourly    = ("00.V06B.HDF5.nc4", "20.V06B.HDF5.nc4", "40.V06B.HDF5.nc4", "60.V06B.HDF5.nc4", "80.V06B.HDF5.nc4")

curDT = datetime(2020, 7, 1, 0)
endDT = datetime(2020, 7, 1, 23)

while curDT < endDT:
  for filepath in glob.iglob(directory):
    if filepath.endswith(hourly):
      mask = subprocess.call("cdo -expr,'topo = ((topo<0.0)) ? 1.0 : topo/0.0' /uufs/chpc.utah.edu/common/home/zpu-group10/pu03/mpletch/thesis/IMERG/July/gpm1.gesdisc.eosdis.nasa.gov/opendap/GPM_L3/GPM_3IMERGHH.06/2020/183/3B*.nc4 /uufs/chpc.utah.edu/common/home/zpu-group10/pu03/mpletch/thesis/IMERG_landmasks/{0}_IMERG_mask.nc".format(curDT.strftime('%m%d%H00')), shell = True)
      land_masking = subprocess.call("cdo -mul /uufs/chpc.utah.edu/common/home/zpu-group10/pu03/mpletch/thesis/IMERG_landmasks/*.nc /uufs/chpc.utah.edu/common/home/zpu-group10/pu03/mpletch/thesis/IMERG/July/gpm1.gesdisc.eosdis.nasa.gov/opendap/GPM_L3/GPM_3IMERGHH.06/2020/183/3B*.nc4 {0}_land_masked_IMERG.nc4.nc4".format(curDT.strftime('%m%d%H00')), shell = True)

    curDT = curDT + dt.timedelta(hours = 0.5)

Is this an issue with the formatting of the code, or am I not understanding how this loop is working correctly? Any help is greatly appreciated.

Thank you,
Michael


Replies (7)

RE: Land masking many .nc files using Python subprocess loop and CDO - Added by Karin Meier-Fleischer over 2 years ago

Hi Michael,

if you need to access a huge number of files you may have exceeded the maximum length of arguments in the subrocess command.

You can retrieve the system max. value with

getconf ARG_MAX

-Karin

RE: Land masking many .nc files using Python subprocess loop and CDO - Added by Michael Pletcher over 2 years ago

Hi Karin,

I have attempted to fix this issue by adding getconf ARG_MAX to the beginning of my subprocess command as shown below:

mask = subprocess.call("getconf ARG_MAX cdo -expr,'topo = ((topo<0.0)) ? 1.0 : topo/0.0' /uufs/chpc.utah.edu/common/home/zpu-group10/pu03/mpletch/thesis/IMERG/July/gpm1.gesdisc.eosdis.nasa.gov/opendap/GPM_L3/GPM_3IMERGHH.06/2020/183/3B*.nc4 /uufs/chpc.utah.edu/common/home/zpu-group10/pu03/mpletch/thesis/IMERG_landmasks/{0}_IMERG_mask.nc".format(curDT.strftime('%m%d%H00')), shell = True)
land_masking = subprocess.call("getconf ARG_MAX cdo -mul /uufs/chpc.utah.edu/common/home/zpu-group10/pu03/mpletch/thesis/IMERG_landmasks/*.nc /uufs/chpc.utah.edu/common/home/zpu-group10/pu03/mpletch/thesis/IMERG/July/gpm1.gesdisc.eosdis.nasa.gov/opendap/GPM_L3/GPM_3IMERGHH.06/2020/183/*.nc4 {0}_land_masked_IMERG.nc4.nc4".format(curDT.strftime('%m%d%H00')), shell = True)

And the following is what the command output:

&lt;pre&gt;&lt;code class="shell"&gt;
Usage: getconf [-v specification] variable_name [pathname]
       getconf -a [pathname]
Usage: getconf [-v specification] variable_name [pathname]
       getconf -a [pathname]
&lt;/code&gt;&lt;/pre&gt;

The actual CDO operation did not occur, only the above text was output. In addition, the maximum number of files I will be processing is 24. Is there anything else I can do differently here?

Thanks,
Michael

RE: Land masking many .nc files using Python subprocess loop and CDO - Added by Michael Pletcher over 2 years ago

Sorry I realized that I didn't format the above response correctly. Here's my updated reply.

Hi Karin,

I have attempted to fix this issue by adding getconf ARG_MAX to the beginning of my subprocess command as shown below:

mask = subprocess.call("getconf ARG_MAX cdo -expr,'topo = ((topo<0.0)) ? 1.0 : topo/0.0' /uufs/chpc.utah.edu/common/home/zpu-group10/pu03/mpletch/thesis/IMERG/July/gpm1.gesdisc.eosdis.nasa.gov/opendap/GPM_L3/GPM_3IMERGHH.06/2020/183/3B*.nc4 /uufs/chpc.utah.edu/common/home/zpu-group10/pu03/mpletch/thesis/IMERG_landmasks/{0}_IMERG_mask.nc".format(curDT.strftime('%m%d%H00')), shell = True)
land_masking = subprocess.call("getconf ARG_MAX cdo -mul /uufs/chpc.utah.edu/common/home/zpu-group10/pu03/mpletch/thesis/IMERG_landmasks/*.nc /uufs/chpc.utah.edu/common/home/zpu-group10/pu03/mpletch/thesis/IMERG/July/gpm1.gesdisc.eosdis.nasa.gov/opendap/GPM_L3/GPM_3IMERGHH.06/2020/183/*.nc4 {0}_land_masked_IMERG.nc4.nc4".format(curDT.strftime('%m%d%H00')), shell = True)

And the following is what each command output:

Usage: getconf [-v specification] variable_name [pathname]
       getconf -a [pathname]
Usage: getconf [-v specification] variable_name [pathname]
       getconf -a [pathname]

The actual CDO operation did not occur, only the above text was output. In addition, the maximum number of files I will be processing is 24. Is there anything else I can do differently here?

Thanks,
Michael

RE: Land masking many .nc files using Python subprocess loop and CDO - Added by Karin Meier-Fleischer over 2 years ago

Hi Michael,

the getconf command is a shell command (don't use it in your script - it is not an environment variable setting) that will show you the number of max. length for a command line of your system.

For more information do the next in a terminal :

getconf ARG_MAX

The huge number of input files seems to exceed this max. length limit and gives an error.

-Karin

RE: Land masking many .nc files using Python subprocess loop and CDO - Added by Michael Pletcher over 2 years ago

Hi Karin,

When I used the getconf ARG_MAX in my terminal, the number that I received was 4611686018427387903. Is this an issue?

Thank you,
Michael

RE: Land masking many .nc files using Python subprocess loop and CDO - Added by Karin Meier-Fleischer over 2 years ago

Now, I see what the problem is, I was on the wrong track. You try to multiply multiple file in one call which is not possible. The operator mul takes only two input fields.
See https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf#subsection.2.7.4
To multiply all fields you have to use a loop.

RE: Land masking many .nc files using Python subprocess loop and CDO - Added by Michael Pletcher over 2 years ago

Hi Karin,

I'm not sure if you fully read through my code. I actually already am using nested loops in my Python code, and each time the loop runs it is only taking two fields.

    (1-7/7)