xinjun 发表于 2008-9-11 11:01:01

mpi执行时发生错误

下面是我运行mpirun后出现的问题,解决不了,求教

$ /cluster/mpich-1.2.7-ssh/bin/mpirun -np 104 -machinefile mf ./cf2w
   Re   nx   ny   nstep
20000. 1040 1040 20000000
   nsave      nout          ct
200000 200000 0
les      cles
00.165000007
knu. num =0.00001329vis =0.00005000sound sp =6.666667
it, time, tau 10.7.21153847E-05
it, time, tau 20000113.11764536.55378753E-05
it, time, tau 40000126.12938886.5490829E-05
it, time, tau 60000139.09936146.53959578E-05
it, time, tau 80000152.06933216.52170202E-05
it, time, tau 100000165.1004416.5070657E-05
it, time, tau 120000178.83335116.50126312E-05
it, time, tau 140000192.56626136.49166977E-05
p56_15548: (524700.113281) net_recv failed for fd = 7
p56_15548:p4_error: net_recv read, errno = : 104
p9_20188:p4_error: net_recv read:probable EOF on socket: 1
p7_18696:p4_error: net_recv read:probable EOF on socket: 1
p6_18672:p4_error: net_recv read:probable EOF on socket: 1
p38_15819:p4_error: net_recv read:probable EOF on socket: 1
p14_20293:p4_error: net_recv read:probable EOF on socket: 1



rm_l_56_15565: (524700.113281) net_send: could not write to fd=5, errno = 32
rm_l_33_15731: (524706.675781) net_send: could not write to fd=5, errno = 32
rm_l_22_15668: (524705.625000) net_send: could not write to fd=5, errno = 32
rm_l_9_20205: (524709.511719) net_send: could not write to fd=5, errno = 32
rm_l_25_15424: (524705.011719) net_send: could not write to fd=5, errno = 32
rm_l_6_18690: (524709.445312) net_send: could not write to fd=5, errno = 32
rm_l_7_18713: (524709.269531) net_send: could not write to fd=5, errno = 32
rm_l_5_18665: (524709.621094) net_send: could not write to fd=5, errno = 32
rm_l_41_15663: (524703.386719) net_send: could not write to fd=5, errno = 32
rm_l_14_20310: (524708.640625) net_send: could not write to fd=5, errno = 32
rm_l_39_15857: (524705.625000) net_send: could not write to fd=5, errno = 32
rm_l_18_15582: (524706.332031) net_send: could not write to fd=5, errno = 32
rm_l_26_15445: (524704.839844) net_send: could not write to fd=5, errno = 32
rm_l_35_15773: (524706.328125) net_send: could not write to fd=5, errno = 32
rm_l_31_15550: (524703.953125) net_send: could not write to fd=5, errno = 32
rm_l_82_16096: (524697.078125) net_send: could not write to fd=5, errno = 32
rm_l_63_15712: (524698.878906) net_send: could not write to fd=5, errno = 32


p82_16079: (524909.273438) net_send: could not write to fd=5, errno = 32
p83_16100: (524909.097656) net_send: could not write to fd=5, errno = 32
p100_15089: (524905.191406) net_send: could not write to fd=5, errno = 32
p64_15914: (524911.781250) net_send: could not write to fd=5, errno = 32
p47_15773: (524914.531250) net_send: could not write to fd=5, errno = 32
p103_15152: (524904.660156) net_send: could not write to fd=5, errno = 32
p93_15576: (524906.082031) net_send: could not write to fd=5, errno = 32
Read from remote host nd7: Connection reset by peer
Read from remote host nd7: Connection reset by peer
Read from remote host nd7: Connection reset by peer
Read from remote host nd7: Connection reset by peer
Read from remote host nd7: Connection reset by peer
Read from remote host nd7: Connection reset by peer
Read from remote host nd7: Connection reset by peer
Read from remote host nd7: Connection reset by peer

bracetoy 发表于 2008-12-25 09:58:51

只有错误信息,怎么找错?

lianggy 发表于 2010-3-13 08:46:55

问下楼主你的问题解决了没有?我最近也遇到这个问题了

zhuliting 发表于 2011-6-16 12:45:48

本帖最后由 zhuliting 于 2011-6-16 12:55 编辑

文件的写权限不足!
比如说,集群中所有的结点共享的是0号结点的zmpi目录,你的程序放在了zmpi/myuser目录下,而该目录下的文件,非0结点没有写入权限
解决方法:
chmod -R 777 myuser
另:共享目录中,多个进程会出现写同一个文件的情况,代码实现时也要考虑。
页: [1]
查看完整版本: mpi执行时发生错误

论坛官方淘宝店开业啦~