Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[How-To] Using multiple cpu cores in a script
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
Akkara
Bodhisattva
Bodhisattva


Joined: 28 Mar 2006
Posts: 6702
Location: &akkara

PostPosted: Sat Mar 17, 2007 12:58 pm    Post subject: [How-To] Using multiple cpu cores in a script Reply with quote

Many times I had to do some long-running batch work on a whole bunch of files but the usual scripting tools (e.g., find -type f ... -exec ...) only fires off one job at a time, which doesn't fully use a multi-core machine.

So I wrote a script that can fire multiple jobs at once and as any finish it fires off a new one.

Here it is to share, in case this is useful to others.


What it does: It reads stdin one line at a time and feeds each line to one of several "sh" processes it controls, depending on which one is not busy.

How to use it: prepare a list of fully-spelled-out independently-runnable shell commands. Then instead of feeding the commands to the shell, feed that to this script.

For example, let's say you have lots of .wav files that you'd like to convert to flac (and you've named this script parallel-sh):
Quote:
find . -type f -name "*.wav" | sed -e "s;';'"'"'"'"'"'"';g" -e "s;.*;flac -8 -V '&';" | parallel-sh
(The second sed expression uses single quotes around the filename in case there's spaces or other shell-significant characters in it; the first sed expression quotes any single quotes that are in the filename - there's probably a better way to do all that.)

Code:
#!/bin/sh
#-------------------------------------------------------------------------------
# Parallel-execute shell commands, by Akkara
#
# Read stdin one line at a time and feed each line to one of several
# concurrent "sh" processes trying to keep each one busy.
#
# Commands can be run in any order, therefore the lines should be
# independent of each other for best results.
#
# Options:
#   -j N   Run N jobs in parallel (default is the number of cpu cores).
#   -jN   Like -j N
#   -f file   Take commands from file rathar than from stdin
#   -nice N   Run the jobs with nice-value N
#   -v   Verbose.  Say what's happening.
#   -test   Run the parallizing code but don't actually do anything.
#      (Useful with -v to test the algorithm.)
#
# Bugs:
#   The poll-sleep loop has a granularity of 1 second which adds
#   significant dead-time when used with fast-finishing jobs
#
# Example:
#   find . -type f -name "*.wav" |   \
#       sed -e "s;';'"'"'"'"'"'"';g" -e "s;.*;flac -8 -V '&';" | parallel-sh
#-------------------------------------------------------------------------------


# syncfiles are named $SYNCFILE.N for N = 0, 1, etc.
SYNCFILE="/dev/shm/parallel-sh-$$"

# default number of jobs = number of cores
N=`grep "^processor" </proc/cpuinfo | wc -l`
THIS=`basename $0`
SLEEP="sleep 1"
VERBOSE="false"
EVAL="eval"
FILE=""

# Parse options
while [[ "$1" != "" ]]; do
    case "$1" in
        -j)
            N=$2
            shift 2 ;;
        -j*)
            N=`expr substr $1 3 100`
            shift 1 ;;
        -f)
            FILE="$2"
            shift 2 ;;
        -v)
            VERBOSE="true"
            shift 1 ;;
        -nice)
            test "$EVAL" != "true"  &&  EVAL="nice -$2 sh -c"
            shift 2 ;;
        -test)
            EVAL="true"
            shift 1 ;;
        -h)
            echo "Usage: $THIS [-j N] [-f file] [-nice n] [-v] [-test] [-help]" >&2
            exit ;;
        -*help)
            echo "Usage: $THIS [-j N] [-f file] [-nice n] [-v] [-test] [-help]" >&2
            sed -n '/^# Options/,/^#$/s;^# *;;p' <$0 >&2
            exit ;;
        *)
            echo "$THIS: Unknown option $1" >&2
            exit 2 ;;
    esac
done


CORES=`eval "echo {1..$N}"`
$VERBOSE  &&  echo "Processes:   <$CORES>" >&2


# Make sure we can write the sync files
for i in $CORES; do
    SYNC="$SYNCFILE.$i"
    if echo -n >"$SYNC" && rm -f "$SYNC"; then
        :
    else
        echo "$THIS: Cannot set up sync files.  Exiting." >&2
        exit
    fi
done


# If input file specified, redirect stdin to read from there
if [[ "$FILE" != "" ]]; then
    exec 0<"$FILE"
fi


# Start the background listen/execute processes:
#   - wait for the syncfile to appear, sleep until it does
#   - if it's a null file that means we're done, exit
#   - execute the command found in the syncfile
#   - remove the syncfile as a way of signalling to the parent
#     to supply more work
for i in $CORES; do
    SYNC="$SYNCFILE.$i"
    while :; do
        if [[ ! -f "$SYNC" ]]; then
            $SLEEP
        elif [[ -s "$SYNC" ]]; then
            mv "$SYNC" "$SYNC.now"
            CMD=`cat "$SYNC.now"`
            $VERBOSE  &&  echo "$i:   $CMD" >&2
            $EVAL "$CMD"
            rm "$SYNC.now"
        else
            $VERBOSE  &&  echo "$i:   <exiting>" >&2
            rm "$SYNC"
            exit
        fi
    done &
done



# Feed them lines from stdin, one at a time, as they finish
#   - if any syncfile doesn't exist, it means that process needs work
#   - synchronize using mv and rm which are atomic at the filesystem
#     level, rathar than echo >file which can result in a partial read
#   - any line starting with 'mkdir' we do ourselves to avoid races
#     where a subsequent command might run before the directory exists
while read -r LINE; do
    if echo "$LINE" | grep -q "^mkdir"; then
        $VERBOSE  &&  echo "Perform:   $LINE" >&2
        $EVAL "$LINE"
        continue
    fi
    while :; do
        for i in $CORES; do
            SYNC="$SYNCFILE.$i"
            if [[ ! -f "$SYNC" ]]; then
                $VERBOSE  &&  echo "Give $i:   $LINE" >&2
                echo "$LINE" >"$SYNCFILE"
                mv "$SYNCFILE" "$SYNC"
                break 2      # back out to the read-line loop
            fi
        done
        $SLEEP
    done
done

# Hit EOF - give a null file to signal done, and wait for exit
while [[ "$CORES" != "" ]]; do
    while :; do
        for i in $CORES; do
            SYNC="$SYNCFILE.$i"
            if [[ ! -f "$SYNC" ]]; then
                CORES=`echo " $CORES " | sed -e "s; $i ; ;" -e "s;^  *;;" -e "s;  *$;;"`
                $VERBOSE  &&  echo "Done $i Remaining: <$CORES>" >&2
                echo -n >"$SYNC"   # empty file signals done
                break 2      # back out to the while-cores loop
            fi
        done
        $SLEEP
    done
done

wait
$VERBOSE  &&  echo "-- All done --" >&2


[Edit 20070417] small improvements, better overlap of feed-work process with the worker processes
Back to top
View user's profile Send private message
ferringb
Retired Dev
Retired Dev


Joined: 03 Apr 2003
Posts: 355
Location: USA

PostPosted: Fri Mar 20, 2009 10:27 am    Post subject: Reply with quote

Mostly commenting since what's listed above is the DIY way of getting parallelization- just use xargs -P <desired-#-of-jobs> # instead (upshot it can do null delimiting)...
_________________
I don't want to be buried in a pet cemetery. ~Ramones
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum