|
本帖最后由 panzerkiller 于 2010-3-31 22:10 编辑
原理:
利用脚本监视进程,当检测到进程中没有fah运行的时候自动启动;如果检测到fah程序已经运行,等待若干分钟再检测。
延伸:
可以建立一个joblist.lst文件来保存不同的启动命令,在命令中加入-oneunit参数,这样就可以有计划的跑不同的参数,比如像我,跑完一个bigadv大包想歇两天,每天只开半天跑smp2,joblist.lst可以预先编辑成
“
screen -d -m ./fah6 -local -smp 8 -bigadv -verbosity 9 -oneunit
screen -d -m ./fah6 -local -smp 7 -advmethods -verbosity 9 -oneunit
screen -d -m ./fah6 -local -smp 7 -advmethods -verbosity 9 -oneunit
screen -d -m ./fah6 -local -smp 7 -advmethods -verbosity 9 -oneunit
screen -d -m ./fah6 -local -smp 8 -bigadv -verbosity 9 -oneunit
(略)”
上code,本人非编程出身,遇到复杂情况没有啥技巧,哪位能用帮忙精简一下,感激不禁
#! /bin/bash
#The original code came from http://bbs.ustc.edu.cn/cgi/bbscon?bn=Linux&fn=M4B696189
sleep 600 #stop 10min to make sure the system is start-up well
list="/home/xxx/folding/joblist.lst" #File that contains submit-job commands
job=`head -n1 $list` #Load the first line
num_job=`less $list |wc -l`
job_exe="fah6"
num_cpu=1
rm -f joblist.tmp
while [ "$num_job" -ge 2 ]
do
for ((i=0; i<1000; i++))
do
job_count=`ps -ef|grep $job_exe | grep -v "grep"|grep -v "SCREEN"|wc -l`
while [ "$job_count" -ge "$num_cpu" ]
do
sleep 1800 # 如果当前有num_cpu个任务在运行的话,等1800秒再检查一次
job_count=`ps -ef|grep $job_exe | grep -v "grep"|grep -v "SCREEN"|wc -l`
echo $job_count
done
# 现在有CPU空闲了,运行一个任务
pushd /home/xxx/folding/smp
$job &
echo `date`
echo $job
popd
# Remove the submitted job from joblist
cp $list joblist.tmp; sed 1d joblist.tmp > $list ; rm -f joblist.tmp
sleep 120
job=`head -n1 $list` #Load the next line
num_job=`less $list |wc -l`
echo $num_job
sleep 600 #Wait 10 minutes to make sure fah starts properly
done
done |
评分
-
查看全部评分
|