找回密码
 新注册用户
搜索
查看: 14586|回复: 3

[求助] mpi执行时发生错误

[复制链接]
发表于 2008-9-11 11:01:01 | 显示全部楼层 |阅读模式
下面是我运行mpirun后出现的问题,解决不了,求教

[mpiuser@nd15 cf2w]$ /cluster/mpich-1.2.7-ssh/bin/mpirun -np 104 -machinefile mf ./cf2w
   Re     nx     ny     nstep
  20000. 1040 1040 20000000
   nsave      nout          ct
200000 200000 0
les      cles  
0  0.165000007
knu. num =  0.00001329  vis =  0.00005000  sound sp =  6.666667
it, time, tau 1  0.  7.21153847E-05
it, time, tau 200001  13.1176453  6.55378753E-05
it, time, tau 400001  26.1293888  6.5490829E-05
it, time, tau 600001  39.0993614  6.53959578E-05
it, time, tau 800001  52.0693321  6.52170202E-05
it, time, tau 1000001  65.100441  6.5070657E-05
it, time, tau 1200001  78.8333511  6.50126312E-05
it, time, tau 1400001  92.5662613  6.49166977E-05
p56_15548: (524700.113281) net_recv failed for fd = 7
p56_15548:  p4_error: net_recv read, errno = : 104
p9_20188:  p4_error: net_recv read:  probable EOF on socket: 1
p7_18696:  p4_error: net_recv read:  probable EOF on socket: 1
p6_18672:  p4_error: net_recv read:  probable EOF on socket: 1
p38_15819:  p4_error: net_recv read:  probable EOF on socket: 1
p14_20293:  p4_error: net_recv read:  probable EOF on socket: 1



rm_l_56_15565: (524700.113281) net_send: could not write to fd=5, errno = 32
rm_l_33_15731: (524706.675781) net_send: could not write to fd=5, errno = 32
rm_l_22_15668: (524705.625000) net_send: could not write to fd=5, errno = 32
rm_l_9_20205: (524709.511719) net_send: could not write to fd=5, errno = 32
rm_l_25_15424: (524705.011719) net_send: could not write to fd=5, errno = 32
rm_l_6_18690: (524709.445312) net_send: could not write to fd=5, errno = 32
rm_l_7_18713: (524709.269531) net_send: could not write to fd=5, errno = 32
rm_l_5_18665: (524709.621094) net_send: could not write to fd=5, errno = 32
rm_l_41_15663: (524703.386719) net_send: could not write to fd=5, errno = 32
rm_l_14_20310: (524708.640625) net_send: could not write to fd=5, errno = 32
rm_l_39_15857: (524705.625000) net_send: could not write to fd=5, errno = 32
rm_l_18_15582: (524706.332031) net_send: could not write to fd=5, errno = 32
rm_l_26_15445: (524704.839844) net_send: could not write to fd=5, errno = 32
rm_l_35_15773: (524706.328125) net_send: could not write to fd=5, errno = 32
rm_l_31_15550: (524703.953125) net_send: could not write to fd=5, errno = 32
rm_l_82_16096: (524697.078125) net_send: could not write to fd=5, errno = 32
rm_l_63_15712: (524698.878906) net_send: could not write to fd=5, errno = 32


p82_16079: (524909.273438) net_send: could not write to fd=5, errno = 32
p83_16100: (524909.097656) net_send: could not write to fd=5, errno = 32
p100_15089: (524905.191406) net_send: could not write to fd=5, errno = 32
p64_15914: (524911.781250) net_send: could not write to fd=5, errno = 32
p47_15773: (524914.531250) net_send: could not write to fd=5, errno = 32
p103_15152: (524904.660156) net_send: could not write to fd=5, errno = 32
p93_15576: (524906.082031) net_send: could not write to fd=5, errno = 32
Read from remote host nd7: Connection reset by peer
Read from remote host nd7: Connection reset by peer
Read from remote host nd7: Connection reset by peer
Read from remote host nd7: Connection reset by peer
Read from remote host nd7: Connection reset by peer
Read from remote host nd7: Connection reset by peer
Read from remote host nd7: Connection reset by peer
Read from remote host nd7: Connection reset by peer
回复

使用道具 举报

发表于 2008-12-25 09:58:51 | 显示全部楼层
只有错误信息,怎么找错?
回复

使用道具 举报

发表于 2010-3-13 08:46:55 | 显示全部楼层
问下楼主你的问题解决了没有?我最近也遇到这个问题了
回复

使用道具 举报

发表于 2011-6-16 12:45:48 | 显示全部楼层
本帖最后由 zhuliting 于 2011-6-16 12:55 编辑

文件的写权限不足!
比如说,集群中所有的结点共享的是0号结点的zmpi目录,你的程序放在了zmpi/myuser目录下,而该目录下的文件,非0结点没有写入权限
解决方法:
chmod -R 777 myuser
另:共享目录中,多个进程会出现写同一个文件的情况,代码实现时也要考虑。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 新注册用户

本版积分规则

论坛官方淘宝店开业啦~
欢迎大家多多支持基金会~

Archiver|手机版|小黑屋|中国分布式计算总站 ( 沪ICP备05042587号 )

GMT+8, 2024-3-29 20:19

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表