Triton LSF Commands

Common LSF commands and descriptions:

Command

Purpose

bsub

Submits a job to LSF. Define resource requirements with flags.

bsub < scriptfile

Submits a job to LSF via script file. The redirection symbol < is required when submitting a job script file

bjobs

Displays your running and pending jobs.

bhist

Displays historical information about your finished jobs.

bkill

Removes/cancels a job or jobs from the class.

bqueues

Shows the current configuration of queues.

bhosts

Shows the load on each node.

bpeek

Displays stderr and stdout from your unfinished job.

Scheduling Jobs

The command bsub will submit a job for processing. You must include the information LSF needs to allocate the resources your job requires, handle standard I/O streams, and run the job. For more information about flags, type bsub -h at the Pegasus prompt. Detailed information can be displayed with man bsub. On submission, LSF will return the job id which can be used to keep track of your job.

[username@mgt3.summit ~]$ bsub -J jobname -o %J.out -e %J.err -q normal -P myproject myprogram
Job <2607> is submitted to normal queue .

The Job Scripts section has more information about organizing multiple flags into a job script file for submission.

Monitoring Jobs

bjobs

The commands bjobs displays information about your own pending, running, and suspended jobs.

[username@mgt3.summit ~]$ bjobs
JOBID  USER   STAT  QUEUE    FROM_HOST  EXEC_HOST   JOB_NAME  SUBMIT_TIME
4225   usernam   RUN   normal  mgt3      16*t030     testjob   Mar  2 11:53
                                         16*t031
                                         16*t032
                                         16*t033

For details about your particular job, issue the command bjobs -l jobID where jobID is obtained from the JOBID field of the above bjobs output. To display a specific user’s jobs, use bjobs -u username. To display all user jobs in paging format, pipe output to less:

[username@mgt3.summit ~]$ bjobs -u all | less
JOBID     USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
5990529   axt651  RUN   interactiv mgt3        t035        bash       Feb 13 15:23
6010636   zxh69   RUN   normal     mgt3        16*t030    *acsjob-01  Feb 23 11:36
                                               16*t031
                                               16*t032
                                               16*t033
6014246   swishne RUN   interactiv t034.mgt3   t034        bash       Feb 24 14:10
...

bhist

bhist displays information about your recently finished jobs. CPU time is not normalized in bhist output. To see your finished and unfinished jobs, use bhist -a.

bkill

bkill kills the last job submitted by the user running the command, by default. The command bkill jobID will remove a specific job from the queue and terminate the job if it is running. bkill 0 will kill all jobs belonging to current user.

[username@mgt3.summit ~]$ bkill 4225
Job <4225> is being terminated

On Triton (Unix), SIGINT and SIGTERM are sent to give the job a chance to clean up before termination, then SIGKILL is sent to kill the job.

bqueues

bqueues displays information about queues such as queue name, queue priority, queue status, job slot statistics, and job state statistics. CPU time is normalized by CPU factor.

 [username@mgt3.summit ~]$ bqueues
QUEUE_NAME      PRIO STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN  SUSP
admin            50  Open:Active       -    -    -    -     0     0     0     0
owners           43  Open:Active       -    -    -    -     0     0     0     0
priority         43  Open:Active       -    -    -    -     0     0     0     0
night            40  Open:Inact        -    -    -    -     0     0     0     0
short            35  Open:Active       -    -    -    -     0     0     0     0
dataq            33  Open:Active       -    -    -    -     0     0     0     0
normal           30  Open:Active       -    -    -    -     0     0     0     0
interactive      30  Open:Active       -    -    -    -     1     0     1     0
idle             20  Open:Active       -    -    -    -     0     0     0     0

bhosts

bhosts displays information about all hosts such as host name, host status, job state statistics, and jobs lot limits. bhosts -s displays information about numeric resources (shared or host-based) and their associated hosts. bhosts hostname displays information about an individual host and bhosts -w displays more detailed host status. closed_Full means the configured maximum number of running jobs has been reached (running jobs will not be affected), no new job will be assigned to this host.

 [username@mgt3.summit ~]$ bhosts -w | less
HOST_NAME          STATUS          JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV
mgt3               ok              -     32      1      1      0      0      0
t030               ok              -     40      0      0      0      0      0
t031               ok              -     40      0      0      0      0      0
t032               ok              -     40      0      0      0      0      0
t033               ok              -     40      0      0      0      0      0
t034               ok              -     40      0      0      0      0      0
t035               ok              -     40      0      0      0      0      0
t036               ok              -     40      0      0      0      0      0
t037               ok              -     40      0      0      0      0      0
t038               ok              -     40      0      0      0      0      0
t039               ok              -     40      0      0      0      0      0

bpeek

Use bpeek jobID to monitor the progress of a job and identify errors. If errors are observed, valuable user time and system resources can be saved by terminating an erroneous job with bkill jobID. By default, bpeek displays the standard output and standard error produced by one of your unfinished jobs, up to the time the command is invoked. bpeek -q queuename operates on your most recently submitted job in that queue and bpeek -m hostname operates on your most recently submitted job dispatched to the specified host. bpeek -f jobID display live outputs from a running job and it can be terminated by Ctrl-C (Windows & most Linux) or Command-C (Mac).

Examining Job Output

Once your job has completed, examine the contents of your job’s output files. Note the script submission under User input, whether the job completed, and the Resource usage summary.

 [nra20@mgt3.summit ~]$ cat 391.out
Sender: LSF System <lsfadmin@t037>
Subject: Job 391: <mpi_hello_world> in cluster <t1> Done

Job <mpi_hello_world> was submitted from host <mgt3> by user <nra20> in cluster <t1> at Wed Apr  9 10:22:26 2025
Job was executed on host(s) <4*t037>, in queue <normal>, as user <nra20> in cluster <t1> at Wed Apr  9 10:04:52 2025
                            <4*t030>
                            <4*t039>
</projectnb/triton/home/nra20> was used as the home directory.
</scratch/projects/hpc/nra20/mpi_test> was used as the working directory.
Started at Wed Apr  9 10:04:52 2025
Terminated at Wed Apr  9 10:05:08 2025
Results reported at Wed Apr  9 10:05:08 2025

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#!/bin/sh
#BSUB -P hpc
#BSUB -J mpi_hello_world
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q normal
#BSUB -n 12
#BSUB -R "span[ptile=4]"
#BSUB -R "rusage[mem=128M]"

module load spectrum-mpi/10.4.0.6-20230210

mpirun -n 12 ./mpi_hello_world

------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time :                                   7.67 sec.
    Max Memory :                                 34 MB
    Average Memory :                             24.75 MB
    Total Requested Memory :                     384.00 MB
    Delta Memory :                               350.00 MB
    Max Swap :                                   -
    Max Processes :                              5
    Max Threads :                                9
    Run time :                                   14 sec.
    Turnaround time :                            0 sec.

The output (if any) follows:

Hello world from processor t037, rank 0 out of 12 processors
Hello world from processor t037, rank 1 out of 12 processors
Hello world from processor t037, rank 2 out of 12 processors
Hello world from processor t037, rank 3 out of 12 processors
Hello world from processor t030, rank 5 out of 12 processors
Hello world from processor t039, rank 10 out of 12 processors
Hello world from processor t030, rank 6 out of 12 processors
Hello world from processor t039, rank 11 out of 12 processors
Hello world from processor t030, rank 7 out of 12 processors
Hello world from processor t039, rank 8 out of 12 processors
Hello world from processor t039, rank 9 out of 12 processors
Hello world from processor t030, rank 4 out of 12 processors


PS:

Read file <391.err> for stderr output of this job.