LoadLeveler基本使用说明
目录
单位业务系统使用LoadLeveler管理作业。运行环境、使用脚本都已经配置好,一般只使用命令行命令。
LoadLeveler基本命令
llsubmit
: 提交作业llq
: 显示作业状态llcancel
: 取消作业llcancel -user
: 取消一个用户的作业llstatus
: 显示节点状态llclass
: 查看信息
### llsubmit 提交作业
SMS系统中提交作业用封装好的llsubmit2
脚本,还没用过。
#!/bin/ksh
\# llsubmit2 jobname taskname ## host
set -ex
SUBMITLOG=$WORKDIR/sublog/submit.log
test -d $WORKDIR/sublog ||mkdir -p $WORKDIR/sublog
export SUBMITLOG
if [ $# -ne 2 ] ; then
echo …
exit 1
fi
jobname=$1
taskname=$2
#echo $jobname $taskname >>$SUBMITLOG
\# host=$2
#
\# host=sp04n01
#
\# rsh $host llsubmit – < $jobname
if [[ $USER = nwp ]] ; then
nameofsms=nwpc_op
fi
if [[ $USER = nwp_qu ]] ; then
nameofsms=nwpc_qu
fi
if [[ $USER = nwp_sp ]] ; then
nameofsms=nwpc_sp
fi
if [[ $USER = nwp_ex ]] ; then
nameofsms=nwpc_ex
fi
if [[ $USER = nwp_xp ]] ; then
nameofsms=nwpc_xp
fi
name=$(llsubmit $jobname 2>>$SUBMITLOG)
rid=$(echo $name | cut -d ‘"’ -f 2)
cdp <<EOF
login $nameofsms $USER 1
alter -v $taskname SMSRID $rid
exit
EOF
#rsh llq
llq 显示作业状态
llq 列出队列中的任务
Id Owner Submitted ST PRI Class Running On
------------------------ ---------- ----------- -- --- ------------ -----------
cma20n02.26337.0 zhangli 8/2 07:18 R 50 normal cma11n24
cma19n02.26580.0 zhangli 8/2 07:27 R 50 normal cma09n07
cma19n02.28964.0 yurc 8/3 03:43 R 50 normal cma05n10
cma20n02.33084.0 xinyf 8/5 00:21 R 50 normal cma18n13
cma19n02.36616.0 zhanglin 8/5 20:41 R 50 serial cma18n06
cma19n02.36944.0 majzh 8/5 23:33 R 50 normal cma01n26
cma20n02.36797.0 wangzzh 8/6 00:15 R 50 normal cma05n03
cma19n02.37164.0 weimin 8/6 00:34 R 50 normal cma15n10
cma20n02.37238.0 wangzzh 8/6 02:46 R 50 normal cma07n03
cma19n02.37470.0 weimin 8/6 02:46 R 50 normal cma11n28
cma19n02.37987.0 dengguo 8/6 05:48 R 50 normal cma12n23
cma19n02.38103.0 wangzhl 8/6 06:18 R 50 normal cma11n17
cma20n02.37872.0 dengguo 8/6 06:19 R 50 normal cma02n15
cma20n02.37874.0 wutw 8/6 06:20 R 50 normal cma09n25
cma20n02.37916.0 dengguo 8/6 06:34 R 50 normal cma11n08
cma20n02.37922.0 wangzhl 8/6 06:36 R 50 normal cma08n21
cma19n02.38159.0 wangzhl 8/6 06:39 R 50 normal cma07n28
cma20n02.37937.0 chenlsh 8/6 06:41 R 50 normal cma14n01
cma19n02.38185.0 typ_xp 8/6 06:47 R 50 normal cma17n13
cma20n02.37969.0 quax 8/6 07:03 R 50 serial cma19n05
cma20n02.37986.0 quax 8/6 07:06 R 50 normal cma17n27
cma19n02.38218.0 zhanglin 8/6 07:06 R 50 normal cma11n04
cma20n02.37987.0 quax 8/6 07:07 R 50 serial cma19n07
cma19n02.38219.0 quax 8/6 07:07 R 50 serial cma19n01
cma20n02.37988.0 quax 8/6 07:07 R 50 serial cma20n01
cma19n02.38220.0 typ_xp 8/6 07:07 R 50 normal cma12n04
cma19n02.38199.0 liuyzh 8/6 07:00 RP 50 normal
26 job step(s) in queue, 0 waiting, 0 pending, 26 running, 0 held, 0 preempted
可以按不同条件查询
-u userlist
用户-h hostlist
主机-c classlist
类型
例如:
$ llq -u nwp
Id Owner Submitted ST PRI Class Running On
------------------------ ---------- ----------- -- --- ------------ -----------
cma19n02.38103.0 wangzhl 8/6 06:18 R 50 normal cma11n17
cma20n02.37922.0 wangzhl 8/6 06:36 R 50 normal cma08n21
cma19n02.38159.0 wangzhl 8/6 06:39 R 50 normal cma07n28
3 job step(s) in query, 0 waiting, 0 pending, 3 running, 0 held, 0 preempted
可以使用llq -l
根据ID查询单个任务的详细信息
$ llq -l cma19n02.38237.0
===== Job Step cma19n02.38237.0 =====
Job Step Id: cma19n02.38237.0
Job Name: cma19n02.38237
Step Name: 0
Structure Version: 10
Owner: nwp
Queue Date: Tue Aug 6 07:13:32 2013
Status: Running
Reservation ID:
Requested Res. ID:
Flexible Res. ID:
Recurring: False
Scheduling Cluster:
Submitting Cluster:
Sending Cluster:
Requested Cluster:
Schedd History:
Outbound Schedds:
Submitting User:
Eligibility Time: Tue Aug 6 07:13:32 2013
Dispatch Time: Tue Aug 6 07:13:32 2013
Completion Date:
Completion Code:
Favored Job: No
User Priority: 50
user_sysprio: 0
class_sysprio: 0
group_sysprio: 0
System Priority: -504520
q_sysprio: -504520
Previous q_sysprio: 0
Notifications: Error
Virtual Image Size: 15 kb
Large Page: N
Trace: no
Coschedule: no
SMT required: as_is
MetaCluster Job: no
Checkpointable: no
Ckpt Start Time:
Good Ckpt Time/Date:
Ckpt Elapse Time: 0 seconds
Fail Ckpt Time/Date:
Ckpt Accum Time: 0 seconds
Checkpoint File:
Ckpt Execute Dir:
Restart From Ckpt: no
Restart Same Nodes: no
Restart: no
Preemptable: yes
Preempt Wait Count: 0
Hold Job Until:
User Hold Time: 00:00:00 (0 seconds)
RSet: RSET_NONE
Mcm Affinity Option:
Task Affinity:
Cpus Per Core: 0
Parallel Threads: 0
Cmd: /cma/g1/nwp/SMSOUT/gmf_ssi_v1/T213/00/vortex/vortex_relocat.job1
Args:
Env:
In: /dev/null
Out: /cma/g1/nwp/SMSOUT/gmf_ssi_v1/T213/00/vortex/vortex_relocat.1
Err: /cma/g1/nwp/SMSOUT/gmf_ssi_v1/T213/00/vortex/vortex_relocat.1.err
Initial Working Dir: /cma/u/nwp/smsworks/sms
Dependency:
Data Stg Dependency:
Resources:
Node Resources:
Step Resources:
Requirements:
Preferences:
Step Type: Serial
Min Processors:
Max Processors:
Allocated Host: cma18n03
Node Usage: shared
Submitting Host: cma20n03
Schedd Host: cma19n02
Job Queue Key:
Notify User: nwp@cma20n03
Shell: /usr/bin/ksh
LoadLeveler Group: No_Group
Class: serial
Ckpt Hard Limit: undefined
Ckpt Soft Limit: undefined
Cpu Hard Limit: undefined
Cpu Soft Limit: undefined
Data Hard Limit: undefined
Data Soft Limit: undefined
As Hard Limit: undefined
As Soft Limit: undefined
Nproc Hard Limit: undefined
Nproc Soft Limit: undefined
Memlock Hard Limit: undefined
Memlock Soft Limit: undefined
Locks Hard Limit: undefined
Locks Soft Limit: undefined
Nofile Hard Limit: undefined
Nofile Soft Limit: undefined
Core Hard Limit: undefined
Core Soft Limit: undefined
File Hard Limit: undefined
File Soft Limit: undefined
Stack Hard Limit: undefined
Stack Soft Limit: undefined
Rss Hard Limit: undefined
Rss Soft Limit: undefined
Step Cpu Hard Limit: undefined
Step Cpu Soft Limit: undefined
Wall Clk Hard Limit: 100+00:00:10 (8640010 seconds)
Wall Clk Soft Limit: 100+00:00:00 (8640000 seconds)
Comment: T213
Account:
Unix Group: nwpop
Negotiator Messages:
Bulk Transfer: No
Adapter Requirement:
Step Cpus: 0
Step Virtual Memory: 0.000 mb
Step Real Memory: 0.000 mb
Step Large Page Mem: 0.000 mb
Cluster Option: none
Topology Requirement: none
Network Usages:
Stripe Min Networks: False
Monitor Program:
1 job step(s) in query, 0 waiting, 0 pending, 1 running, 0 held, 0 preempted
llcancel 取消作业
llcancel jobid
加作业ID号直接取消作业
llcancel -u username
取消用户所有的作业
llstatus 显示节点状态
$ llstatus
Active 556/556
Schedd 2/2 27 job steps
Startd 554/554 4327 running tasks
The Central Manager is defined on cma20n02
Absent: 0
Startd: Down Drained Draining Flush Suspend
0 0 0 0 0
Schedd: Down Drained Draining
0 0 0
llclass 查看类型信息
$ llclass
Name MaxJobCPU MaxProcCPU Free Max Description
d+hh:mm:ss d+hh:mm:ss Slots Slots
--------------- -------------- -------------- ----- ----- ---------------------
tmp_largemem undefined undefined 256 256
test undefined undefined 256 256
mediummem undefined undefined 1344 1344
lowmem undefined undefined 1344 1344
highmem undefined undefined 1344 1344
largemem undefined undefined 1600 1600
serial undefined undefined 478 480
normal undefined undefined 9979 9999+
operation undefined undefined 896 896
--------------------------------------------------------------------------------
"Free Slots" value of the class "normal" is constrained by the MAX_STARTERS limit(s).
-l
详细查看
$ llclass -l operation
=============== Class operation ===============
Name: operation
Priority: 0
Exclude_Users:
Include_Users: loadl nwp nwp_qu
Exclude_Groups:
Include_Groups:
Exclude_Bg:
Include_Bg:
Admin:
Max_node: -1
Maxjobs: -1
Resource_requirement:
Node Resource Req:
Max Resources: ConsumableMemory(100.000 gb)
Max Node Resources: ConsumableMemory(100.000 gb)
Class_comment:
Class_ckpt_dir:
Ckpt_limit: undefined, undefined
Wall_clock_limit: 100+00:00:10, 100+00:00:00 (8640010 seconds, 8640000 seconds)
Default_wall_clock_limit: 100+00:00:10, 100+00:00:00 (8640010 seconds, 8640000 seconds)
Job_cpu_limit: undefined, undefined
Cpu_limit: undefined, undefined
Data_limit: undefined, undefined
As_limit: undefined, undefined
Nproc_limit: undefined, undefined
Memlock_limit: undefined, undefined
Locks_limit: undefined, undefined
Nofile_limit: undefined, undefined
Core_limit: undefined, undefined
File_limit: undefined, undefined
Stack_limit: undefined, undefined
Rss_limit: undefined, undefined
Nice: 0
Free_slots: 896
Maximum_slots: 896
Max_total_tasks: -1
Max_proto_instances: 2
Stripe_min_networks: False
Preempt_class:
Start_class:
User default: maxidle(-1) maxqueued(-1) maxjobs(-1) max_total_tasks(-1)
Imm_send_buffers: 1
Collective_groups: 0
Restart: yes
Endpoints: 1
作业
队列
一个队列设置的例子:
<td>
计算节点数量
</td>
<td>
CPU核数
</td>
<td>
说明
</td>
<td>
425(128GB)
</td>
<td>
14496
</td>
<td>
普通计算节点组成,用于研发作业运行。
</td>
<td>
28(128GB)
</td>
<td>
896
</td>
<td>
普通计算节点组成,用于业务/准业务作业运行。
</td>
<td>
58(256GB)
</td>
<td>
1856
</td>
<td>
大内存计算节点组成,用于对内存需求量大的作业运行。
</td>
<td>
15(256GB)
</td>
<td>
480
</td>
<td>
前后处理节点组成,用于串行/交互式作业等运行
</td>
作业脚本说明
串行作业
#!/bin/ksh
# @ job_type = serial
# @ initialdir = /u/sunjing/loadl
# @ comment = WRF (模式名称)
# @ input = /dev/null
# @ error = ./out/$(jobid).err
# @ output = ./out/$(jobid).out
# @ executable = example1
# @ notification = complete
# @ notify_user = sunjing@cma18n01
# @ class = interactive
# @ queue
job_type
作业类型,serial为串行,parallel为并行(还需使用node和task_pre_node关键字)initialdir
作业初始工作目录,默认为子任务当前工作目录input
用文件代替标准输入,默认为/dev/nulloutput
标准输出,默认为std.outerror
标准错误输出,默认为std.errcomment
作业注释class
设定使用的计算队列名,上节表格中的名称notification
何时向notify_user用户发送email通知,支持下面的选项- always
- error
- statr
- never
- complete
notify_user
被通知的用户,格式:user@hostexecutable
对于串行作业,给出需要执行的程序;并行作业,该关键字设为poe或包含poe的脚本文件。若未指定该关键字,则执行程序为作业脚本文件本身(业务系统都采用这种方法)checkpoint
设置作业是否保存中间状态(interval,yes,no)。默认值为no。restart
作业未完成是否重新启动。若设为no,作业没完成则会被取消。默认为yes(业务系统中设置为no?)。node_usage
该作业步骤是否与其他作业步骤共享节点(shared,not_shared:业务系统中串行作业用not_shared,并行作业用shared)。queue
告诉loadleveler执行作业,标志作业脚本的结束
MPI作业
#!/bin/ksh
# @ job_type = parallel
# @ initialdir = /u/sunjing/loadl
# @ comment = WRF (模式名称)
# @ error = ./out/$(jobid).err
# @ output =./out/$(jobid).out
# @ notification = complete
# @ notify_user = sunjing@cma18n02
# @ network.MPI = sn_all,shared,us
# @ node = 6
# @ tasks_per_node = 32
# @ class = normal
# @ queue
export TARGET_CPU_LIST=-1
poe launch wrf.exe #(launch用于将进程自动绑定到CPU,会提高性能)
node
使用的节点个数,格式 node=[min],[max]tasks_pre_node
每个节点运行任务个数network
设置任务如何相互通信,格式:network_type, usage, modenetwork.MPI
: Message Passing Interface network.LAPI: Low-Level Application Programming Interfacenetwork_type
: 可选择ethernet或sn_single (sn_all) usage: 设置是否可以共享network adapter,shared或not_shared。 mode: 通信模式,IP(the Internet Protocol)或US(for User Space)。
OpenMP+MPI作业
#!/bin/ksh
# @ job_type = parallel
# @ initialdir = /u/sunjing/loadl
# @ comment = WRF (模式名称)
# @ error = ./out/$(jobid).err
# @ output =./out/$(jobid).out
# @ notification = complete
# @ notify_user = sunjing@cma18n02
# @ network.MPI = sn_all,shared,us
# @ node = 6
# @ tasks_per_node = 32
# @ class = normal
# @ queue
export TARGET\_CPU\_LIST=-1
poe hybird_launch wrf.exe #(hybird\_launch用于将进程自动绑定到CPU,会提高性能)
参考资料
《IBM Cluster 1350 簡介》
《LoadLeveler Command File Syntax》
《Introduction to LoadLeveler》
《SP Parallel Programming Workshop – loadleveler》