Keep a program running after logoff¶
Shared clusters often have job schedulers to handle multiple users’ job submissions. On the cloud, however, the entire server belongs to you so there’s generally no need for a scheduler.
Have multiple jobs? Why schedule them? Just launch multiple instances to run all of them at the same time. Running 5 instances for 1 hour costs exactly the same as running 1 instance for 5 hours, but the former approach saves you 80% of time without incurring any additional charges.
Thus, instead of using
qsub (with PBS) or
sbatch (with Slurm), you would simply run the executable
./geos.mp from the terminal. To keep the program running after logoff or internet interruption, use simple tools such as the nohup command, GNU screen or tmux. I personally like tmux as it is very easy to use and also allows advanced terminal management if needed. It is also quite useful for managing other time-consuming computations such as big data processing or training machine learning models, so worth learning.
Use nohup command (not recommended)¶
nohup is a built-in Linux command to prevent a program from being interrupted. This works but is not recommended since monitoring nohup jobs is kind of a mess. Instead, use
tmux as detailed in the next section. I just put basic
nohup commands here for record.
Start the simulation with nohup mode:
$ nohup ./geos.mp > run.log &
$ nohup: ignoring input and redirecting stderr to stdout
Crtl + c to go back to normal terminal. Use
tail -f run.log to monitor the log file if necessary. Log off and re-login the server if you like.
List nohup jobs by
$ ps x
13067 pts/0 Rl 4:56 ./geos.mp
If necessary, kill the job by its ID. In this case, it is:
Use GNU Screen¶
screen commmand creates terminal sessions that can persist after logoff. Here’s a nice tutorial offered by Harvard Research Computing.
Start a screen session with any name you like
$ screen -S run-geoschem
Inside the screen session, run the model as usual:
$ ./geos.mp | tee run.log
(Here I use
tee to print model log to both the terminal screen and a file.)
Ctrl + a, and then type
d, to detach from the current session. You will be back to the normal terminal but the model is still running inside that detached session. You can log off the server and re-login if you like.
List existing sessions by:
$ screen -ls
There is a screen on:
13279.run-geoschem (03/12/2018 12:25:39 AM) (Detached)
1 Socket in /var/run/screen/S-ubuntu.
Resume that session by
screen -x run-geoschem
Use tmux (recommended)¶
tmux command behaves almost the same as
screen for single-panel sessions. But it is also useful for splitting one terminal window into multiple panels (tons of quick tutorials online, say this, and this).
screen also does terminal splitting but is not as convenient as tmux.
Start a new session by
Inside the session, run the model as usual, just like in the
$ ./geos.mp | tee run.log
Ctrl + b, and then type
d, to detach from the current session. Use
tmux ls to list existing sessions and
tmux a (shortcut for
tmux attach) to resume the session.
To handle multiple sessions, use
tmux new -s session_name to create a session with a name and
tmux a -t session_name to resume that specific session.