|
马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。
您需要 登录 才可以下载或查看,没有账号?注册
x
背景:实验室的计算集群安装了ROCKS集群管理软件,系统是centos,PBS是troque。
问题:提交算例以后就一直处于Q等待调度的状态。
[bgb@cluster test]$ qstat
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
25.cluster 0.01-0.00001 bgb 0 Q default
强制运行
[bgb@cluster test]$ qrun 25.cluster
pbs_iff: Access from host not allowed, or unknown host MSG=request not authorized from host cluster.local
pbs_iff: Access from host not allowed, or unknown host MSG=request not authorized from host cluster.local
qrun: Unknown Job Id MSG=cannot locate job 25.cluster.local
查了下pbs_iff,说是和用户认证有关,为pbs server提供pbs信任状。但是vi pbs_iff全是乱码
弄了一天不知道到底是什么原因?
还有问一下关于Pbs队列配置的问题:
我发现在/opt/troque目录下有一个pbs.default文件,default是我定义的一个队列,打开以后如下:
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default keep_completed = 120
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = False
set server managers = maui@cluster.hpc.org
set server managers += root@cluster.hpc.org
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server allow_node_submit = True
set server moab_array_compatible = True
这和我用qmgr -c 'p s'命令查到的队列配置:
[bgb@cluster test]$ qmgr -c 'p s'
#
# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default acl_host_enable = True
set queue default acl_user_enable = True
set queue default acl_users = bgb
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = True
set server acl_hosts = cluster.hpc.org
set server acl_users = root@*
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server poll_jobs = True
set server mom_job_sync = True
set server auto_node_np = True
set server next_job_number = 26
不同,到底哪个配置被执行了?下面的配置有一部分是自己写的,应该有不少错误的地方还望各位前辈能帮忙指正。
谢谢
|
|