ecFlow学习笔记02.2.4 —— 检查job生成

目录

这是 ecFlow 教程的一部分,完整教程请参看《ecFlow学习笔记02 —— 教程

  • * *前面章节我们已经实现第一个task(t1.ecf 文件)。t1.ecf 脚本需要经过预处理生成 jobs file。这个过程由 ecflow_server 在将要运行 task 时自动完成。

我们还可以在 suite definition 加载到 ecflow_server 前检查 job creation。

文本方式

检查脚本生成仅在Python方式下可用。
如果 ecflow_server 无法定位 ecf script,请参看 ecf file locaiton algorithm

Python

在 suite 定义加载到服务器前可以检查作业生成过程,检查包括:

  1. 定位 ecf 脚本文件,对应 suite 定义中的每个 task
  2. 进行预处理

当 suite definition 较长且包含许多 ecf script 时,这种检查可以节省大量时间。
检查 job creation 时需要注意一下几点:

  1. 检查独立于 ecflow_server,所以 ECF_PORT 和 ECF_NODE 将被设为默认值。
  2. job 文件扩展名为 .job0,服务器生成的job文件扩展名为 job<1-n>,ECF_TRYNO将不为0.
  3. 默认 job 文件将在 ecf 脚本同样目录下生成,请查看词汇表 ECF_JOB。

使用 ecflow.Defs.check_job_creation 进行检查,修改 test.py

#!/usr/bin/env python2.7
import os
import ecflow
print "Creating suite definition"
defs = ecflow.Defs()
suite = defs.add_suite("test")
suite.add_variable("ECF_HOME", os.path.join(os.getenv("HOME"), "course"))
suite.add_task("t1")
print defs
print "Checking job creation: .ecf -> .job0"
print defs.check_job_creation()
# We can assert, so that we only progress, once all job creation works
# assert len(defs.check_job_creation()) == 0, "Job generation failed"

运行上述脚本后,会在 test 目录下生成的t1.job0,文件内容如下:

#!/bin/ksh
set -e # stop the shell on first error
set -u # fail when using an undefined variable
set -x # echo script lines as they are executed
# Defines the variables that are needed for any communication with ECF
export ECF_PORT=3141    # The server port number
export ECF_NODE=localhost    # The name of ecf host that issued this task
export ECF_NAME=/test/t1    # The name of this current task
export ECF_PASS=S75oxLzE    # A unique password
export ECF_TRYNO=0  # Current try number of the task
export ECF_RID=$$             # record the process id. Also used for zombie detection
# Define the path where to find ecflow_client
# make sure client and server use the *same* version.
# Important when there are multiple versions of ecFlow
export PATH=/usr/local/apps/ecflow/4.0.9/bin:$PATH
# Tell ecFlow we have started
ecflow_client --init=$$
# Define a error handler
ERROR() {
   set +e                      # Clear -e flag, so we don't fail
   wait                        # wait for background process to stop
   ecflow_client --abort=trap  # Notify ecFlow that something went wrong, using 'trap' as the reason
   trap 0                      # Remove the trap
   exit 0                      # End the script
}
# Trap any calls to exit and errors caught by the -e flag
trap ERROR 0
# Trap any signal that may cause the script to fail
trap '{ echo "Killed by a signal"; ERROR ; }' 1 2 3 4 5 6 7 8 10 12 13 15
echo "I am part of a suite that lives in /home/windroc/course"
wait                      # wait for background process to stop
ecflow_client --complete  # Notify ecFlow of a normal end
trap 0                    # Remove all traps
exit 0                    # End the shell

强烈建议随后的例子中使用 job creation 检查

任务

添加 job creation 检查
查看job文件 $HOME/course/test/t1.job0

词汇表

ECF_TRYNO
ECF_JOB