LoadLeveler基本使用说明

目录

单位业务系统使用LoadLeveler管理作业。运行环境、使用脚本都已经配置好,一般只使用命令行命令。

LoadLeveler基本命令

  • llsubmit: 提交作业
  • llq: 显示作业状态
  • llcancel: 取消作业
  • llcancel -user: 取消一个用户的作业
  • llstatus: 显示节点状态
  • llclass: 查看信息

### llsubmit 提交作业

SMS系统中提交作业用封装好的llsubmit2脚本,还没用过。

#!/bin/ksh  
\# llsubmit2 jobname taskname ## host  
set -ex  
SUBMITLOG=$WORKDIR/sublog/submit.log  
test -d $WORKDIR/sublog ||mkdir -p $WORKDIR/sublog  
export SUBMITLOG  
if [ $# -ne 2 ] ; then  
echo …  
exit 1  
fi  
jobname=$1  
taskname=$2  
#echo $jobname $taskname >>$SUBMITLOG  
\# host=$2  
#  
\# host=sp04n01  
#  
\# rsh $host llsubmit &#8211; < $jobname  
if [[ $USER = nwp ]] ; then  
nameofsms=nwpc_op  
fi  
if [[ $USER = nwp_qu ]] ; then  
nameofsms=nwpc_qu  
fi  
if [[ $USER = nwp_sp ]] ; then  
nameofsms=nwpc_sp  
fi  
if [[ $USER = nwp_ex ]] ; then  
nameofsms=nwpc_ex  
fi  
if [[ $USER = nwp_xp ]] ; then  
nameofsms=nwpc_xp  
fi  
name=$(llsubmit $jobname 2>>$SUBMITLOG)  
rid=$(echo $name | cut -d &#8216;"&#8217; -f 2)  
cdp <<EOF  
login $nameofsms $USER 1  
alter -v $taskname SMSRID $rid  
exit  
EOF  
#rsh llq  

llq 显示作业状态

llq 列出队列中的任务

Id                       Owner      Submitted   ST PRI Class        Running On
------------------------ ---------- ----------- -- --- ------------ -----------
cma20n02.26337.0         zhangli     8/2  07:18 R  50  normal       cma11n24
cma19n02.26580.0         zhangli     8/2  07:27 R  50  normal       cma09n07
cma19n02.28964.0         yurc        8/3  03:43 R  50  normal       cma05n10
cma20n02.33084.0         xinyf       8/5  00:21 R  50  normal       cma18n13
cma19n02.36616.0         zhanglin    8/5  20:41 R  50  serial       cma18n06
cma19n02.36944.0         majzh       8/5  23:33 R  50  normal       cma01n26
cma20n02.36797.0         wangzzh     8/6  00:15 R  50  normal       cma05n03
cma19n02.37164.0         weimin      8/6  00:34 R  50  normal       cma15n10
cma20n02.37238.0         wangzzh     8/6  02:46 R  50  normal       cma07n03
cma19n02.37470.0         weimin      8/6  02:46 R  50  normal       cma11n28
cma19n02.37987.0         dengguo     8/6  05:48 R  50  normal       cma12n23
cma19n02.38103.0         wangzhl     8/6  06:18 R  50  normal       cma11n17
cma20n02.37872.0         dengguo     8/6  06:19 R  50  normal       cma02n15
cma20n02.37874.0         wutw        8/6  06:20 R  50  normal       cma09n25
cma20n02.37916.0         dengguo     8/6  06:34 R  50  normal       cma11n08
cma20n02.37922.0         wangzhl     8/6  06:36 R  50  normal       cma08n21
cma19n02.38159.0         wangzhl     8/6  06:39 R  50  normal       cma07n28
cma20n02.37937.0         chenlsh     8/6  06:41 R  50  normal       cma14n01
cma19n02.38185.0         typ_xp      8/6  06:47 R  50  normal       cma17n13
cma20n02.37969.0         quax        8/6  07:03 R  50  serial       cma19n05
cma20n02.37986.0         quax        8/6  07:06 R  50  normal       cma17n27
cma19n02.38218.0         zhanglin    8/6  07:06 R  50  normal       cma11n04
cma20n02.37987.0         quax        8/6  07:07 R  50  serial       cma19n07
cma19n02.38219.0         quax        8/6  07:07 R  50  serial       cma19n01
cma20n02.37988.0         quax        8/6  07:07 R  50  serial       cma20n01
cma19n02.38220.0         typ_xp      8/6  07:07 R  50  normal       cma12n04
cma19n02.38199.0         liuyzh      8/6  07:00 RP 50  normal
26 job step(s) in queue, 0 waiting, 0 pending, 26 running, 0 held, 0 preempted

可以按不同条件查询

  • -u userlist 用户
  • -h hostlist 主机
  • -c classlist 类型

例如:

$ llq -u nwp
Id                       Owner      Submitted   ST PRI Class        Running On
------------------------ ---------- ----------- -- --- ------------ -----------
cma19n02.38103.0         wangzhl     8/6  06:18 R  50  normal       cma11n17
cma20n02.37922.0         wangzhl     8/6  06:36 R  50  normal       cma08n21
cma19n02.38159.0         wangzhl     8/6  06:39 R  50  normal       cma07n28
3 job step(s) in query, 0 waiting, 0 pending, 3 running, 0 held, 0 preempted

可以使用llq -l根据ID查询单个任务的详细信息

$ llq -l cma19n02.38237.0
===== Job Step cma19n02.38237.0 =====
        Job Step Id: cma19n02.38237.0
           Job Name: cma19n02.38237
          Step Name: 0
  Structure Version: 10
              Owner: nwp
         Queue Date: Tue Aug  6 07:13:32 2013
             Status: Running
     Reservation ID:
  Requested Res. ID:
   Flexible Res. ID:
          Recurring: False
 Scheduling Cluster:
 Submitting Cluster:
    Sending Cluster:
  Requested Cluster:
     Schedd History:
   Outbound Schedds:
    Submitting User:
   Eligibility Time: Tue Aug  6 07:13:32 2013
      Dispatch Time: Tue Aug  6 07:13:32 2013
    Completion Date:
    Completion Code:
        Favored Job: No
      User Priority: 50
       user_sysprio: 0
      class_sysprio: 0
      group_sysprio: 0
    System Priority: -504520
          q_sysprio: -504520
 Previous q_sysprio: 0
      Notifications: Error
 Virtual Image Size: 15 kb
         Large Page: N
              Trace: no
         Coschedule: no
       SMT required: as_is
    MetaCluster Job: no
     Checkpointable: no
    Ckpt Start Time:
Good Ckpt Time/Date:
   Ckpt Elapse Time: 0 seconds
Fail Ckpt Time/Date:
    Ckpt Accum Time: 0 seconds
    Checkpoint File:
   Ckpt Execute Dir:
  Restart From Ckpt: no
 Restart Same Nodes: no
            Restart: no
        Preemptable: yes
 Preempt Wait Count: 0
     Hold Job Until:
     User Hold Time: 00:00:00 (0 seconds)
               RSet: RSET_NONE
Mcm Affinity Option:
      Task Affinity:
      Cpus Per Core:  0
   Parallel Threads:  0
                Cmd: /cma/g1/nwp/SMSOUT/gmf_ssi_v1/T213/00/vortex/vortex_relocat.job1
               Args:
                Env:
                 In: /dev/null
                Out: /cma/g1/nwp/SMSOUT/gmf_ssi_v1/T213/00/vortex/vortex_relocat.1
                Err: /cma/g1/nwp/SMSOUT/gmf_ssi_v1/T213/00/vortex/vortex_relocat.1.err
Initial Working Dir: /cma/u/nwp/smsworks/sms
         Dependency:
Data Stg Dependency:
          Resources:
     Node Resources:
     Step Resources:
       Requirements:
        Preferences:
          Step Type: Serial
     Min Processors:
     Max Processors:
     Allocated Host: cma18n03
         Node Usage: shared
    Submitting Host: cma20n03
        Schedd Host: cma19n02
      Job Queue Key:
        Notify User: nwp@cma20n03
              Shell: /usr/bin/ksh
  LoadLeveler Group: No_Group
              Class: serial
    Ckpt Hard Limit: undefined
    Ckpt Soft Limit: undefined
     Cpu Hard Limit: undefined
     Cpu Soft Limit: undefined
    Data Hard Limit: undefined
    Data Soft Limit: undefined
      As Hard Limit: undefined
      As Soft Limit: undefined
   Nproc Hard Limit: undefined
   Nproc Soft Limit: undefined
 Memlock Hard Limit: undefined
 Memlock Soft Limit: undefined
   Locks Hard Limit: undefined
   Locks Soft Limit: undefined
  Nofile Hard Limit: undefined
  Nofile Soft Limit: undefined
    Core Hard Limit: undefined
    Core Soft Limit: undefined
    File Hard Limit: undefined
    File Soft Limit: undefined
   Stack Hard Limit: undefined
   Stack Soft Limit: undefined
     Rss Hard Limit: undefined
     Rss Soft Limit: undefined
Step Cpu Hard Limit: undefined
Step Cpu Soft Limit: undefined
Wall Clk Hard Limit: 100+00:00:10 (8640010 seconds)
Wall Clk Soft Limit: 100+00:00:00 (8640000 seconds)
            Comment: T213
            Account:
         Unix Group: nwpop
Negotiator Messages:
      Bulk Transfer: No
Adapter Requirement:
          Step Cpus: 0
Step Virtual Memory: 0.000 mb
   Step Real Memory: 0.000 mb
Step Large Page Mem: 0.000 mb
     Cluster Option: none
Topology Requirement: none
     Network Usages:
Stripe Min Networks: False
    Monitor Program:
1 job step(s) in query, 0 waiting, 0 pending, 1 running, 0 held, 0 preempted

llcancel 取消作业

llcancel jobid 加作业ID号直接取消作业

llcancel -u username 取消用户所有的作业

llstatus 显示节点状态

$ llstatus
Active       556/556
Schedd         2/2                 27 job steps
Startd       554/554             4327 running tasks
The Central Manager is defined on cma20n02
Absent:         0
Startd:      Down    Drained   Draining      Flush    Suspend
                0          0          0          0          0
Schedd:      Down    Drained   Draining
                0          0          0

llclass 查看类型信息

$ llclass
Name                 MaxJobCPU     MaxProcCPU  Free   Max Description
                    d+hh:mm:ss     d+hh:mm:ss Slots Slots
--------------- -------------- -------------- ----- ----- ---------------------
tmp_largemem         undefined      undefined   256   256
test                 undefined      undefined   256   256
mediummem            undefined      undefined  1344  1344
lowmem               undefined      undefined  1344  1344
highmem              undefined      undefined  1344  1344
largemem             undefined      undefined  1600  1600
serial               undefined      undefined   478   480
normal               undefined      undefined  9979 9999+
operation            undefined      undefined   896   896
--------------------------------------------------------------------------------
"Free Slots" value of the class "normal" is constrained by the MAX_STARTERS limit(s).

-l 详细查看

$ llclass -l operation
=============== Class operation ===============
                    Name: operation
                Priority: 0
           Exclude_Users:
           Include_Users: loadl nwp nwp_qu
          Exclude_Groups:
          Include_Groups:
              Exclude_Bg:
              Include_Bg:
                   Admin:
                Max_node: -1
                 Maxjobs: -1
    Resource_requirement:
       Node Resource Req:
           Max Resources: ConsumableMemory(100.000 gb)
      Max Node Resources: ConsumableMemory(100.000 gb)
           Class_comment:
          Class_ckpt_dir:
              Ckpt_limit: undefined, undefined
        Wall_clock_limit: 100+00:00:10, 100+00:00:00 (8640010 seconds, 8640000 seconds)
Default_wall_clock_limit: 100+00:00:10, 100+00:00:00 (8640010 seconds, 8640000 seconds)
           Job_cpu_limit: undefined, undefined
               Cpu_limit: undefined, undefined
              Data_limit: undefined, undefined
                As_limit: undefined, undefined
             Nproc_limit: undefined, undefined
           Memlock_limit: undefined, undefined
             Locks_limit: undefined, undefined
            Nofile_limit: undefined, undefined
              Core_limit: undefined, undefined
              File_limit: undefined, undefined
             Stack_limit: undefined, undefined
               Rss_limit: undefined, undefined
                    Nice: 0
              Free_slots: 896
           Maximum_slots: 896
         Max_total_tasks: -1
     Max_proto_instances: 2
     Stripe_min_networks: False
           Preempt_class:
             Start_class:
            User default: maxidle(-1) maxqueued(-1) maxjobs(-1) max_total_tasks(-1)
        Imm_send_buffers: 1
       Collective_groups: 0
                 Restart: yes
               Endpoints: 1

作业

队列

一个队列设置的例子:

<td>
  计算节点数量
</td>

<td>
  CPU核数
</td>

<td>
  说明
</td>
<td>
  425(128GB)
</td>

<td>
  14496
</td>

<td>
  普通计算节点组成,用于研发作业运行。
</td>
<td>
  28(128GB)
</td>

<td>
  896
</td>

<td>
  普通计算节点组成,用于业务/准业务作业运行。
</td>
<td>
  58(256GB)
</td>

<td>
  1856
</td>

<td>
  大内存计算节点组成,用于对内存需求量大的作业运行。
</td>
<td>
  15(256GB)
</td>

<td>
  480
</td>

<td>
  前后处理节点组成,用于串行/交互式作业等运行
</td>

作业脚本说明

串行作业

  
#!/bin/ksh  
# @ job_type = serial  
# @ initialdir = /u/sunjing/loadl  
# @ comment = WRF (模式名称)  
# @ input = /dev/null  
# @ error = ./out/$(jobid).err  
# @ output = ./out/$(jobid).out  
# @ executable = example1  
# @ notification = complete  
# @ notify_user = sunjing@cma18n01  
# @ class = interactive  
# @ queue  
  • job_type 作业类型,serial为串行,parallel为并行(还需使用node和task_pre_node关键字)
  • initialdir 作业初始工作目录,默认为子任务当前工作目录
  • input 用文件代替标准输入,默认为/dev/null
  • output 标准输出,默认为std.out
  • error 标准错误输出,默认为std.err
  • comment  作业注释
  • class 设定使用的计算队列名,上节表格中的名称
  • notification 何时向notify_user用户发送email通知,支持下面的选项
    • always
    • error
    • statr
    • never
    • complete
  • notify_user 被通知的用户,格式:user@host
  • executable 对于串行作业,给出需要执行的程序;并行作业,该关键字设为poe或包含poe的脚本文件。若未指定该关键字,则执行程序为作业脚本文件本身(业务系统都采用这种方法)
  • checkpoint 设置作业是否保存中间状态(interval,yes,no)。默认值为no。
  • restart 作业未完成是否重新启动。若设为no,作业没完成则会被取消。默认为yes(业务系统中设置为no?)。
  • node_usage 该作业步骤是否与其他作业步骤共享节点(shared,not_shared:业务系统中串行作业用not_shared,并行作业用shared)。
  • queue 告诉loadleveler执行作业,标志作业脚本的结束

MPI作业

#!/bin/ksh  
# @ job_type = parallel  
# @ initialdir = /u/sunjing/loadl  
# @ comment = WRF (模式名称)  
# @ error = ./out/$(jobid).err  
# @ output =./out/$(jobid).out  
# @ notification = complete  
# @ notify_user = sunjing@cma18n02  
# @ network.MPI = sn_all,shared,us  
# @ node = 6  
# @ tasks_per_node = 32  
# @ class = normal  
# @ queue  
export TARGET_CPU_LIST=-1  
poe launch wrf.exe #(launch用于将进程自动绑定到CPU,会提高性能)  
  • node 使用的节点个数,格式 node=[min],[max]
  • tasks_pre_node 每个节点运行任务个数
  • network 设置任务如何相互通信,格式:network_type, usage, mode
    • network.MPI: Message Passing Interface network.LAPI: Low-Level Application Programming Interface
    • network_type: 可选择ethernet或sn_single (sn_all) usage: 设置是否可以共享network adapter,shared或not_shared。 mode: 通信模式,IP(the Internet Protocol)或US(for User Space)。

OpenMP+MPI作业

#!/bin/ksh  
# @ job_type = parallel  
# @ initialdir = /u/sunjing/loadl  
# @ comment = WRF (模式名称)  
# @ error = ./out/$(jobid).err  
# @ output =./out/$(jobid).out  
# @ notification = complete  
# @ notify_user = sunjing@cma18n02  
# @ network.MPI = sn_all,shared,us  
# @ node = 6  
# @ tasks_per_node = 32  
# @ class = normal  
# @ queue  

export TARGET\_CPU\_LIST=-1  
poe hybird_launch wrf.exe #(hybird\_launch用于将进程自动绑定到CPU,会提高性能)  

参考资料

IBM Cluster 1350 簡介
LoadLeveler Command File Syntax
Introduction to LoadLeveler
SP Parallel Programming Workshop – loadleveler