Project

General

Profile

Trying to parallelize a CDO operation - is there any reason that launching these in parallel is not using the multiple cores on the system?

Added by Maxim Mayo almost 2 years ago

We are trying to parallelize a CDO operation as below. Even though we can get some parallelism with OpenMP, we want to see if launching multiple CDO tasks in parallel as below can provide better speedup. But we are seeing an odd behavior where even though we see the multiple CDO tasks launched, they are all running on a single core (monitoring the cpu utilization using htop). Is there any reason that launching these in parallel is not using the multiple cores on the system?

We've also tried the GNU parallel approach mentioned in the forum - https://code.mpimet.mpg.de/boards/53/topics/6672 , but still see similar behavior (i.e. using only 1 core in the system)

COUNTPP=0
for FILE in ${DATAFILELIST}
do
(
  FILEOUT=$(basename ${FILE} .nc)
  ${CDO} -s -P ${OMP_THREADS} ${GCM_REMAP},${INIDIR}/triangular-grid.nc -selname,u,v,w,pres,temp,qv,qc,qi,z_ifc${ICON_INPUT_OPTIONAL}  ${FILE} ${OUTDIR}/${YYYY_MM}/${FILEOUT}_lbc.nc
)&
    (( COUNTPP=COUNTPP+1 ))
    if [ ${COUNTPP} -ge ${MAXCORES} ]
    then
      COUNTPP=0
      wait
    fi
done
wait

Any advice would be very helpful.


Replies (3)

RE: Trying to parallelize a CDO operation - is there any reason that launching these in parallel is not using the multiple cores on the system? - Added by Ralf Mueller almost 2 years ago

Hi!

how many cores do you have on your system? (call nproc)

this looks like spawning regular unix processes. instead of this complex scripting, you can write a single script with as many calls as you want, run it and watch the output of top.

if you system puts all processes on a single core I cannot help you.

BUT: please test this without openMP, i.e. the -P option.

good luck

RE: Trying to parallelize a CDO operation - is there any reason that launching these in parallel is not using the multiple cores on the system? - Added by Maxim Mayo almost 2 years ago

Thanks for the response, we identified the problem. It was due to an OpenMP affinity setting set in the scripts which was causing this in our environment.

RE: Trying to parallelize a CDO operation - is there any reason that launching these in parallel is not using the multiple cores on the system? - Added by Ralf Mueller almost 2 years ago

ah, thanks for posting the final solution. My suggestion doing it without OpenMP was about splitting unix process handling from thread handling by the system to get closer to what causes the issue.

cheer
ralf

    (1-3/3)