Keep a program running after logoff

Shared clusters often have job schedulers to handle multiple users’ job submissions. On the cloud, however, the entire server belongs to you so there’s generally no need for a scheduler.

Note

Have multiple jobs? Why schedule them? Just launch multiple instances to run all of them at the same time. Running 5 instances for 1 hour costs exactly the same as running 1 instance for 5 hours, but the former approach saves you 80% of time without incurring any additional charges.

Thus, instead of using qsub (with PBS) or sbatch (with Slurm), you would simply run the executable ./geos.mp from the terminal. To keep the program running after logoff or internet interruption, use simple tools such as the nohup command, GNU screen or tmux. I personally like tmux as it is very easy to use and also allows advanced terminal management if needed. It is also quite useful for managing other time-consuming computations such as big data processing or training machine learning models, so worth learning.

Use GNU Screen

The screen commmand creates terminal sessions that can persist after logoff. Here’s a nice tutorial offered by Harvard Research Computing.

Start a screen session with any name you like

$ screen -S run-geoschem

Inside the screen session, run the model as usual:

$ ./geos.mp | tee run.log

(Here I use tee to print model log to both the terminal screen and a file.)

Type Ctrl + a, and then type d, to detach from the current session. You will be back to the normal terminal but the model is still running inside that detached session. You can log off the server and re-login if you like.

List existing sessions by:

$ screen -ls
There is a screen on:
      13279.run-geoschem      (03/12/2018 12:25:39 AM)        (Detached)
1 Socket in /var/run/screen/S-ubuntu.

Resume that session by

screen -x run-geoschem