Ben is a simple and versatile batch job scheduler. It lets you run a queue of jobs, in parallel, on multiple machines. It comes in a single executable,
ben, and does not require any configuration, only a working ssh setup.
By default, ben relies on Unix-domain sockets and piggy-backs on file permissions for security. It uses ssh forwarding under the hood (
ssh -L and
ssh -R) for networking. As an alternative, ben can also use TCP/IP sockets directly1. Ben is free software (GPLv3) written in C with no dependencies.
Say we want to transcode a hundred
mp4/H.264 videos (
v99.ogv). We want to run commands like:
$ ffmpeg -i v00.mp4 v00.ogv
Now, transcoding takes time, and we assume for this example that
ffmpeg’s encoder is singlethreaded only (this is actually true for the Theora encoder in the current default build of
ffmpeg). Therefore, we want to run multiple instances in parallel.
We can do it as follows. First, we start the ben server. It will maintain the job queue, dispatch jobs and handle coordination.
$ ben server -d
-d option puts the server “in the background”, so that we get our command prompt back. We can omit it, and we will see log information displayed in the terminal. Next, we start a client. The client will take jobs off the queue (first-in-first-out) and run them. The server and client are separate processes because, as we will see later, they can be run on distinct machines.
$ ben client -n 4 -d
-d option has the same meaning as for the server, it puts the client “in the background”. The
-n 4 option specifies that we allow up to 4 jobs to run simultaneously. We are now ready to transcode. Let us start with the first six videos:
$ ben add -c 'ffmpeg -i $job.mp4 $job.ogv' -j v00 v01 v02 v03 v04 v05
-c option specifies the command to run. Each command is run with the environment variable
job set to the job name, so we can use
$job to refer to it. The output of the jobs (the output of
ffmpeg) is stored in files named after the jobs:
$job.log. So we will get 12 files
Here, we added all six jobs in a single command, but we could have added them equivalently with multiple
ben add commands, for example one at a time, or two by two. Jobs can be added to the queue at any time. As soon as some client is available, it will run jobs from the queue. In this example, the first four will start running immediately. As soon as one of them is finished, the fifth will start.
It is often more convenient to store the job command and the job names in files instead of passing them as arguments to
ben add. Many ben options have a capital-letter variant that lets you do exactly that. The above
ben add command can thus be replaced by:
$ echo 'ffmpeg -i $job.mp4 $job.ogv' > command.sh
$ echo 'v00 v01 v02 v03 v04 v05' > job-list.txt
$ ben add -C command.sh -J job-list.txt
Of course, we want
job-list.txt to contain our 100 videos not just the first six. For example, we could generate the appropriate file as follows:
$ basename -s .mp4 v*.mp4 > job-list.txt
Four parallel jobs is still not much. Assume that all the previous commands were run as user
machine-A.host.com. Furthermore, we have ssh access as user
user-B to a larger computer,
machine-B.host.com. We can start a client there, either remotely:
[user-A@machine-A ~]$ ben client -r user-B@machine-B.host.com -n 48 -d
or by first logging into it, then contacting
machine-A from there:
[user-A@machine-A ~]$ ssh user-B@machine-B.host.com [user-B@machine-B ~]$ ben client -f user-A@machine-A.host.com -n 48 -d
Note that the latter is more robust, because remote-starting a client needs ben to be installed on the remote host in a specific way2. Ben uses ssh under the hood, so a password may be prompted. We can now check that everything worked fine with the command
[user-A@machine-A ~]$ ben nodes # node R P 0 machine-A 0 4 1 machine-B 0 48 2 machine-A: ben nodes - C
We can see that we have two clients connected, which can run up to 4 and 48 simultaneous jobs, respectively. The last line indicates the connection that
ben nodes is currently using to communicate with the server; it is a “control” client that does not execute jobs. If we proceed to add the first 6 jobs, we will see that the clients get busy (the
R column gives the number of running jobs).
[user-A@machine-A ~]$ ben nodes # node R P 0 machine-A 3 4 1 machine-B 3 48 4 machine-A: ben nodes - C
We can also display the queue with the command
[user-A@machine-A ~]$ ben list # id dir job S node duration 0 v00 r 0 machine-A >00:00:27 2 v02 r 0 machine-A >00:00:27 4 v04 r 0 machine-A >00:00:27 1 v01 r 1 machine-B >00:00:27 3 v03 r 1 machine-B >00:00:27 5 v05 r 1 machine-B >00:00:27
After adding jobs, one can remove them with
ben rm. If we remove a job that is currently running, it is stopped and its (partial) output files are removed. If it is completed, the output is preserved.
One can dynamically change the maximum number of simultaneous jobs on a client with
ben scale. If the new maximum number exceeds the current number of running jobs on that client, some jobs will be interrupted (and re-queued3), unless the
--retire option is specified. In the latter case, the client will not accept new jobs until its load falls below the new maximum, but no jobs will be interrupted. Similarly,
ben kill, which disconnects a client, also has a
--retire option, letting currently running jobs finish beforehand.
Another useful command is
ben exec. It queues one special “sync” job per client. A each of those sync jobs can only run on its specified client, and cannot be run simultaneously with any other job. Sync jobs are useful for scheduling code updates or recompilations, or for getting notified when all jobs queued previously are completed. Unlike
ben exec hangs until the sync jobs are run, and it displays their output. However, it can be interrupted (for example with
control+c), without affecting the sync jobs: only their output is discarded.
See the manual for more detailed info.
Using TCP/IP sockets directly is possible, but comes with no authentication or encryption. It is then strongly advised to bind to
localhost and manually setup ssh port forwardings. This is still not perfect (everyone with access to
localhost needs to be trusted), but better than having open public ports. It is the only option with old versions of
ssh that do not support Unix-domain sockets.↩︎
This requires ben to be installed in the default
machine-B. Because of the way ssh works, entries added to the
$PATH by user-profile scripts (e.g.
.bash_profile) will not work. Furthermore, when
ben client is called with option
-r, it is essentially a wrapper for
ssh -R remote_socket:local_socket "ben client ...". By default,
remote_socket is a Unix-domain socket placed in the directory
ssh will not create on its own. As a result, the first time on uses
ben client -r on a machine, multiple round-trips will be necessary to first create the directory, then create the socket.↩︎
Interrupted jobs are requeued at the beginning of the queue: they will be run before the other jobs currently pending on the queue. Note that the same happens if a client is unintentionally interrupted (with
SIGKILL for example): its running jobs are requeued.↩︎