PBS and Maui

Table of Contents

Introduction

Portable batch system (PBS) was originally developed by NASA, which is comprised of a resource manager, a scheduler and a client. What is different, a resource manager and a scheduler are usually running on server side, while clients running on computing nodes. For the softwares, both commercial PBS Pro and open-source Torque are available. But herein, we consider the open-source Torque only.

Just as their names indicate, a resource manager targets for resource management and configuration, responsible for checking the availability of resource, e.g. CPU, memory, etc., and resource assignment; a scheduler accept jobs from subscribers and maintains one or more queues.

The built-in scheduler is a simple FIFO scheduler. However, other schedulers can be adopted instead. For instance, Maui is a very popular and powerful scheduler, based on which advanced and complex strategies can be realized and customized.

Installation

For ArchLinux, Torque and Maui can be easily installed from AUR.

yaourt torque
...
yaourt maui
...

Configuration

Server (head node)

/var/spoo/torque/server_priv/nodes

node1 np=2
node2 np=4
...

PBS/Torque server

qmgr -c 'p s'
#
# Create queues and set their attributes.
#
#
# Create and define queue COMMON
#
create queue COMMON
set queue COMMON queue_type = Execution
set queue COMMON resources_default.nodes = 1
set queue COMMON resources_default.walltime = 240:00:00
set queue COMMON enabled = True
set queue COMMON started = True
#
# Create and define queue VIP
#
create queue VIP
set queue VIP queue_type = Execution
set queue VIP resources_default.nodes = 1
set queue VIP resources_default.walltime = 240:00:00
set queue VIP enabled = True
set queue VIP started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = head
set server default_queue = COMMON
set server log_events = 511
set server mail_from = server
set server query_other_jobs = True
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 300
set server job_stat_rate = 45
set server poll_jobs = True
set server mom_job_sync = True
set server mail_domain = example.com
set server keep_completed = 0
set server next_job_number = 1
set server moab_array_compatible = True
set server nppcu = 1

Maui

Append following lines to maui.cfg.

CLASSWEIGHT   1
CLASSCFG[COMMON]   PRIORITY=1
CLASSCFG[VIP]   PRIORITY=100000
CLASSLIST   [COMMON VIP]

Client (computing node)

/var/spool/torque/server_name

head

/var/spool/torque/mom_priv/config

$pbssever head
$logevent 255

Configure the service

Server (head node)

Resource manager (PBS/Torque server)

Create /etc/systemd/system/torque-server.service

[Unit]
Description=TORQUE server
Wants=basic.target
After=basic.target network.target

[Service]
Type=forking
PIDFile=/var/spool/torque/server_priv/server.lock
ExecStart=/usr/local/sbin/pbs_server

[Install]
WantedBy=multi-user.target

Scheduler1

Maui

Create /etc/systemd/system/torque-maui.service

[Unit]
Description=Maui scheduler
Wants=torque-server.service
After=torque-server.service

[Service]
Type=forking
PIDFile=/usr/local/maui/maui.pid
ExecStart=/usr/local/maui/sbin/maui

[Install]
WantedBy=multi-user.target
PBS scheduler

Create /etc/systemd/system/torque-scheduler.service

[Unit]
Description=TORQUE scheduler
Wants=torque-server.service
After=torque-server.service

[Service]
Type=forking
PIDFile=/var/spool/torque/sched_priv/sched.lock
ExecStart=/usr/local/sbin/pbs_sched

[Install]
WantedBy=multi-user.target

Client (computing node)

PBS client

Create /etc/systemd/system/torque-node.service

[Unit]
Description=TORQUE node
Wants=basic.target
After=basic.target network.target

[Service]
Type=forking
PIDFile=/var/spool/torque/mom_priv/mom.lock
ExecStart=/usr/local/sbin/pbs_mom

[Install]
WantedBy=multi-user.target

Start and enable the service

Server (head node)

systemctl enable torque-server.service
systemctl start torque-server.service
systemctl enable torque-maui.service
systemctl start torque-maui.service

Client (computing node)

systemctl enable torque-node.service
systemctl start torque-node.service

Footnotes:

1

Only one scheduler is needed, which can be Maui scheduler or built-in PBS scheduler.