GRAPES MESO模式学习笔记08-1 —— 使用LoadLeveler运行python脚本

August 01, 2014 (最后修改: November 19, 2019)

目前业务中的LoadLeveler作业脚本都是shell脚本，在头部加入LoadLeveler的脚本命令（以#开头，类似shell的注释）。

LoadLeveler中作业文件(job file)与可执行程序(program to run)是两个不同的概念，作业脚本执行executable变量指定的程序。没有设置executable变量时，执行程序就是作业脚本本身。这样的脚本既能直接运行，也可以用LoadLeveler提交运行，便于测试。

shell脚本、python脚本、perl脚本都可以在UNIX系统直接运行，所以LoadLeveler也支持Python脚本。

基本用法

python脚本以#!/bin/env python开头。一个简单的作业脚本如下：

#!/bin/env python
# @ job_type=serial
# @ input=/dev/null
# @ output = ./myjob.out.$(stepid)
# @ error =  ./myjob.out.err.$(stepid)
# @ notification = never
# @ checkpoint = no
# @ restart = no
# @ class= normal
# @ comment = WRF
# @ node_usage = shared
# @ queue

import subprocess

print "Begin to start 10 tasks..."

for i in range(0,10):
    print "Task %d hello" % i
    subprocess.call("sleep 1;echo {i}".format(i=i),shell=True)
    print "Task %d done" % i

print "Task Done"

输出文件myjob.out.0内容为：

0
1
2
3
4
5
6
7
8
9
Begin to start 10 tasks...
Task 0 hello
Task 0 done
Task 1 hello
Task 1 done
Task 2 hello
Task 2 done
Task 3 hello
Task 3 done
Task 4 hello
Task 4 done
Task 5 hello
Task 5 done
Task 6 hello
Task 6 done
Task 7 hello
Task 7 done
Task 8 hello
Task 8 done
Task 9 hello
Task 9 done
Task Done

可以见到输出顺序与我们的期望明显不同，print语句打印的内容最后才输出，而subprocess.call的输出最先打印出来。
这是因为Python默认开启了输出缓冲，而subprocess.call执行的命令默认关闭输出缓冲。subprocess.call调用subprocess.Popen()方法，而subprocess.Popen默认不适用输出缓冲。 subprocess.call的源码：

def call(*popenargs, **kwargs):
    """Run command with arguments.  Wait for command to complete, then
    return the returncode attribute.
    The arguments are the same as for the Popen constructor.  Example:
    retcode = call(["ls", "-l"])
    """
    return Popen(*popenargs, **kwargs).wait()

以及subprocess.Popen的文档。

bufsize, if given, has the same meaning as the corresponding argument to the built-in open() function: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size. A negative bufsize means to use the system default, which usually means fully buffered. The default value for bufsize is 0 (unbuffered).

进一步

可以关掉python的输出缓存功能。使用命令行参数-u或者设置环境变量PYTHONUNBUFFERED均可关闭缓存。
LoadLeveler使用environment设置环境变量，添加如下语句

# @ environment = COPY_ALL; PYTHONUNBUFFERED = 1

在所有LoadLeveler执行节点中设置PYTHONUNBUFFERED变量为1。

修改上述例子为：

#!/bin/env python
# @ environment = COPY_ALL;PYTHONUNBUFFERED=1
# @ job_type=serial
# @ input=/dev/null
# @ output = ./myjob.out.$(stepid)
# @ error =  ./myjob.out.err.$(stepid)
# @ notification = never
# @ checkpoint = no
# @ restart = no
# @ class= normal
# @ comment = WRF
# @ node_usage = shared
# @ queue
import subprocess

print "Begin to start 10 tasks..."

for i in range(0,10):
    print "Task %d hello" % i
    subprocess.call("sleep 1;echo {i}".format(i=i),shell=True)
    print "Task %d done" % i

print "Task Done"

LoadLeveler提交后运行结果为

Begin to start 10 tasks...
Task 0 hello
Task 0 done
Task 1 hello
1
Task 1 done
Task 2 hello
2
Task 2 done
Task 3 hello
3
Task 3 done
Task 4 hello
4
Task 4 done
Task 5 hello
5
Task 5 done
Task 6 hello
6
Task 6 done
Task 7 hello
7
Task 7 done
Task 8 hello
8
Task 8 done
Task 9 hello
9
Task 9 done
Task Done

加入变量后，输出顺序恢复正常。

总结

在LoadLeveler中使用Python脚本与Shell脚本一样方便。