A simple way to arrange for parallel execution is to use recursive subdivision. Each node is given some number of processes to run a command on. It divides that list in half, and sends the upper half to the first processor in that half. This process continues until only one process is left. This takes 5#5 steps for p processes.
Note that all of these commands can execute faster if a server process is
always running on each of the parallel processors. Such a server is not
required however; the prototype implementation is written entirely in terms of
shell scripts.