Division of Computing and Information Systems
Division of Computing and Information Systems
Division of Computing |and Information Systems
  • Services
    • Backup and Restore
      • Backup Policy
      • Joining and Installation
      • Restoring data from backup
      • Mailbox Backup
    • Cloud Computing Services
      • Office 365
        • SharePoint Online
        • Central email at the Technion – Cloud services Office 365
      • Microsoft Azure Service
      • Video Conference service using Zoom software
    • Communication Services
    • High Performance Computing (HPC)
      • HPC Services
        • Zeus CPU Cluster
        • Zeus DGX Cluster
      • HPC Resources
        • CPUs (Zeus) – Utilization Graphs
        • GPUs (Zeus) – Utilization Graphs
        • Active Table
      • HPC Documentation
        • Work procedure
        • Accounts
      • HPC Support
      • HPC Rates & Billing
      • HPC Software
        • Zeus Cluster Supported Software
      • IUCC HPC Cloud
    • Monitoring
    • Servers
      • Account Opening Guide for a computer account at the Technion
      • Authentication Services in Central Servers
      • Hosting physical servers
      • Hosting virtual servers
      • Virtualization
        • Server virtualization
        • Azure Virtual Desktop service for students
    • Software
      • Software Catalogue
      • Software Acquisition
      • Microsoft Software Licensing
    • Storage Solutions
      • Central storage
      • OneDrive Cloud Storage
    • Technion Email
      • Central email at the Technion – Cloud services Office 365
      • Mailing Lists
      • FAQ’s
    • SECTIGO Certificate System
  • Support
    • FAQ-Frequently Asked Questions
    • ESS/MSS System – Self-Service for Employees and Managers
    • Training Aids – Miscal Project
    • TeamViewer Installation
    • Remote Desktop Connection
    • Faculty Engineers
    • Macintosh Support
    • Antivirus
  • Information Security
    • User Information
    • Security alerts
    • Safe Surfing
    • Privacy in social networks
    • Password management guide
    • Password Management
    • File Encryption Guide
    • Tips for using Zoom
    • Avoid and report phishing emails
  • About
    • Directions to CIS
    • Contact Us
    • MyCIS
    • People
  • ע
Division of Computing and Information Systems > Services > High Performance Computing (HPC) > HPC Documentation > PBS usage > PBS – Portable Batch System Scheduler

Services

  • Central Services
  • CIS Division Price List
  • Communication services
    • Off Campus connection
    • Communication on Campus
    • Communication at the Dormitories
    • Wireless Communication at the Technion
  • Backup and Restore
    • Backup Policy
    • Joining and Installation
    • Restoring data from backup
    • Mailbox Backup
  • Servers
    • Account Opening Guide for a computer account at the Technion
    • Authentication Services in Central Servers
    • Hosting physical servers
    • Hosting virtual servers
    • Virtualization
    • Server virtualization
    • Azure Virtual Desktop service for students
  • High Performance Computing (HPC)
    • Getting Started
    • HPC Services
      • Zeus CPU Cluster
      • Athena GPU Cluster
    • HPC Resources
      • Zeus CPU Utilization Graphs
      • Athena GPU Utilization Graphs
      • Active Table
    • HPC Documentation
      • PBS usage
    • HPC Support
    • HPC Rates & Billing
    • HPC Software
      • Zeus Cluster Supported Software
      • Python on ZEUS under Rocky8
  • Cloud Computing Services
    • IUCC HPC Cloud
    • Microsoft Azure Service
    • Office 365
      • Cloud email services Office 365
      • SharePoint
    • Zoom Video Conference
  • Technion Email
  • Software
  • Storage Solutions
    • Central storage
    • OneDrive Cloud Storage
  • Monitoring

PBS – Portable Batch System Scheduler

Definition and Primary Roles

Definition: PBS is a distributed workload management system. It handles the management and monitoring of the computational workload on a set of computers

Queuing: Users submit tasks or “jobs” to the resource management system where they are queued up until the system is ready to run them.

Scheduling: The process of selecting which jobs to run, when, and where, according to a predetermined policy. Aimed at balance competing needs and goals on the system(s) to maximize efficient use of resources

Monitoring: Tracking and reserving system resources, enforcing usage policy. This includes both software enforcement of usage limits and user or administrator monitoring of scheduling policies

 

Submitting jobs to PBS: qsub command

qsub command is used to submit a batch job to PBS. Submitting a PBS job specifies a task, requests resources and sets job attributes, which can be defined in an executable scriptfile. The syntax of qsub recommended on ZEUS:

> qsub [options] scriptfile

PBS script files ( PBS shell scripts, see the next page) should be created in the user’s directory

To obtain detailed information about qsub options, please use the command:

> man qsub

Job Identifier (JOB_ID) Upon successful submission of a batch job PBS returns a job identifier in the following format:

> sequence_number.server_name
> 12345.zeus

The PBS shell script sections

Shell specification: #!/bin/sh

PBS directives: used to request resources or set attributes. A directive begins with the default string “#PBS”.

Tasks (programs or commands)

– environment definitions
– I/O specifications
– executable specifications

NB! Other lines started with # are comments

Zeus Public Queues

Queues, available for public usage on compute nodes: zeus_all_q (24 h) , zeus_long_q (72 h) zeus_short_q (3 h) . 
There are 24 nodes each compute node contains 80 cores and 378 GB RAM.
Queue, available for public usage on GPU nodes: gpu_v100_q. 
There are two nodes each containing 4 GPUs.

PBS script example for multicore user code

#!/bin/sh
#PBS  -N  job_name
#PBS  -q  queue_name
#PBS  -m  abe
#PBS  -M  user@technion.ac.il
#PBS  -l select=1:ncpus=N
#PBS  -l select=mem=P GB
#PBS  -l walltime=24:00:00
PBS_O_WORKDIR=$HOME/mydir
cd $PBS_O_WORKDIR

./program.exe < input.file > output.file 2>&1
Other examples see at
http://hpc.technion.ac.il/doc/Local-help/PBS-scripts/

You can use the PBS script generator here

Checking the job/queue status: qstat command

qstat command is used to request the status of batch jobs, queues, or servers
Detailed information: > man qstat
qstat output structure (see on Zeus)
Useful commands
> qstat –a all users in all queues (default)
> qstat -1n all jobs in the system with node names
> qstat -1nu username all user’s jobs with node names
> qstat –f JOB_ID extended output for the job
> Qstat –Q list of all queues in the system
> qstat –Qf queue_name extended queue details
> qstat –1Gn queue_name all jobs in the queue with node names

Removing job from a queue: qdel command

qdel used to delete queued or running jobs. The job’s running processes are killed. A PBS job may be deleted by its owner or by the administrator

Detailed information: > man qdel
Useful commands
> qdel JOB_ID deletes job from a queue
> qdel -W force JOB_ID force delete job

Checking a job results and Troubleshooting

Save the JOB_ID for further inspection
Check error and output files:
job_name.eJOB_ID; job_name.oJOB_ID

Inspect job’s details (also after N days ) : > tracejob [-n N]JOB_ID

Job in E state – occupies resources, will be deleted

Running interactive batch job (debugging): > qsub –I pbs_script
Job sent to execution node, PBS directives executed, job awaits user’s command

Checking the job on an execution node: > ssh node_name
> hostname
> top /u user – shows user shows processes ; /1 – CPU usage
> kill -9 PID remove job from the node
> ls –rtl /gtmp check error, output and other files under user ownership

Output can be copied from the node to the home directory

© All rights reserved to Division of Computing and Information Systems
  • Accessibility Statement (heb)
  • Phone Directory
  • Site Map
Font Resize
Accessibility by WAH