找回密码
 新注册用户
搜索
查看: 3228|回复: 5

[讨论] 重启后找不到存盘点?

[复制链接]
发表于 2014-3-11 13:09:48 | 显示全部楼层 |阅读模式
前些日子把机器拿去参加yoyo比赛了,这周日拿回来跑FAH。第一个任务就出状况,重启后找不到存盘点然后跑了70%+的bigadv挂了……
  1. [00:01:20] Completed 172500 out of 250000 steps  (69%)
  2. [00:18:51] Completed 175000 out of 250000 steps  (70%)
  3. [00:36:23] Completed 177500 out of 250000 steps  (71%)
  4. [00:53:59] Completed 180000 out of 250000 steps  (72%)
  5. [01:11:48] Completed 182500 out of 250000 steps  (73%)
  6. [01:24:59] ***** Got a SIGTERM signal (15)
  7. [01:24:59] Killing all core threads

  8. Folding@Home Client Shutdown.


  9. --- Opening Log file [March 11 02:34:07 UTC]


  10. # Linux SMP Console Edition ###################################################
  11. ###############################################################################

  12.                        Folding@Home Client Version 6.34

  13.                           http://folding.stanford.edu

  14. ###############################################################################
  15. ###############################################################################

  16. Launch directory: /usr/local/fah
  17. Executable: ./fah6
  18. Arguments: -bigadv -verbosity 9 -smp 32

  19. [02:34:07] - Ask before connecting: No
  20. [02:34:07] - User name: Vorfeed (Team 3213)
  21. [02:34:07] - User ID: 1801BDFD51DF3F7C
  22. [02:34:07] - Machine ID: 1
  23. [02:34:07]
  24. [02:34:07] Loaded queue successfully.
  25. [02:34:07]
  26. [02:34:07] + Processing work unit
  27. [02:34:07] Core required: FahCore_a5.exe
  28. [02:34:07] Core found.
  29. [02:34:07] - Autosending finished units... [02:34:07]02:34:07 UTC]
  30. [02:34:07] Trying to send all finished work units
  31. [02:34:07] + No unsent completed units remaining.
  32. [02:34:07] - Autosend completed
  33. [02:34:07] Working on queue slot 00 [March 11 02:34:07 UTC]
  34. [02:34:07] + Working ...
  35. [02:34:07] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 00 -np 32 -checkpoint 15 -verbose -lifeline 1084 -version 634'

  36. [02:34:07]
  37. [02:34:07] *------------------------------*
  38. [02:34:07] Folding@Home Gromacs SMP Core
  39. [02:34:07] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
  40. [02:34:07]
  41. [02:34:07] Preparing to commence simulation
  42. [02:34:07] - Looking at optimizations...
  43. [02:34:07] - Files status OK
  44. [02:34:10] - Expanded 30301187 -> 33130012 (decompressed 109.3 percent)
  45. [02:34:10] Called DecompressByteArray: compressed_data_size=30301187 data_size=33130012, decompressed_data_size=33130012 diff=0
  46. [02:34:10] - Digital signature verified
  47. [02:34:10]
  48. [02:34:10] Project: 8105 (Run 0, Clone 35, Gen 285)
  49. [02:34:10]
  50. [02:34:10] Assembly optimizations on if available.
  51. [02:34:10] Entering M.D.
  52. [02:34:16] Using Gromacs checkpoints
  53. [02:34:18] Mapping NT from 32 to 32
  54. [02:34:40] fcSaveRestoreState: I/O failed dir=0, var=00007F521E7F08E0, varsize=20
  55. [02:34:40] fcCheckPointResume: failure in call to fcSaveRestoreState() to restore cpt hash.
  56. [02:34:40] mdrun returned 3
  57. [02:34:40] Gromacs detected an invalid checkpoint.  Restarting...fcSaveRestoreState: I/O failed dir=0, var=00007F52217F68E0, varsize=20
  58. [02:34:41] fcCheckPointResume: failure in call to fcSaveRestoreState() to restore cpt hash.
  59. [02:34:41] Can't open checkpoint file
  60. [02:34:41] Can't open checkpoint file
  61. [02:34:41] Can't open checkpoint file
  62. [02:34:41] Can't open checkpoint file
  63. [02:34:41] Can't open checkpoint file
  64. [02:34:41] Can't open checkpoint file
  65. [02:34:41] Can't open checkpoint file
  66. [02:34:41] Can't open checkpoint file
  67. [02:34:41] Can't open checkpoint file
  68. [02:34:41] Can't open checkpoint file
  69. [02:34:41] Can't open checkpoint file
  70. [02:34:41] Can't open checkpoint file
  71. [02:34:41] Can't open checkpoint file
  72. [02:34:41] Can't open checkpoint file
  73. [02:34:41] Can't open checkpoint file
  74. [02:34:41] Can't open checkpoint file
  75. [02:34:41] Can't open checkpoint file
  76. [02:34:41] Can't open checkpoint file
  77. [02:34:41] Can't open checkpoint file
  78. [02:34:41] Can't open checkpoint file
  79. [02:34:41] Can't open checkpoint file
  80. [02:34:41] Can't open checkpoint file
  81. [02:34:41] Can't open checkpoint file
  82. [02:34:41] Can't open checkpoint file
  83. [02:34:41] Can't open checkpoint file
  84. [02:34:41] Can't open checkpoint file
  85. [02:34:41] Can't open checkpoint file
  86. [02:34:41] Can't open checkpoint file
  87. [02:34:41] Can't open checkpoint file
  88. [02:34:41] Resuming from checkpoint
  89. [02:34:41] Can't open checkpoint file
  90. [02:34:56]
  91. [02:34:56] Folding@home Core Shutdown: UNKNOWN_ERROR
  92. [02:34:57] CoreStatus = 62 (98)
  93. [02:34:57] + Restarting core (settings changed)
  94. [02:34:57]
  95. [02:34:57] + Processing work unit
  96. [02:34:57] Core required: FahCore_a5.exe
  97. [02:34:57] Core found.
  98. [02:34:57] Working on queue slot 00 [March 11 02:34:57 UTC]
  99. [02:34:57] + Working ...
  100. [02:34:57] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 00 -np 32 -checkpoint 15 -notermcheck -verbose -lifeline 1084 -version 634'

  101. [02:34:57]
  102. [02:34:57] *------------------------------*
  103. [02:34:57] Folding@Home Gromacs SMP Core
  104. [02:34:57] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
  105. [02:34:57]
  106. [02:34:57] Preparing to commence simulation
  107. [02:34:57] - Looking at optimizations...
  108. [02:34:57] - Not checking prior termination.
  109. [02:34:59] - Expanded 30301187 -> 33130012 (decompressed 109.3 percent)
  110. [02:34:59] Called DecompressByteArray: compressed_data_size=30301187 data_size=33130012, decompressed_data_size=33130012 diff=0
  111. [02:34:59] - Digital signature verified
  112. [02:34:59]
  113. [02:34:59] Project: 8105 (Run 0, Clone 35, Gen 285)
  114. [02:34:59]
  115. [02:35:00] Assembly optimizations on if available.
  116. [02:35:00] Entering M.D.
  117. [02:35:06] Mapping NT from 32 to 32
  118. [02:35:30] Completed 0 out of 250000 steps  (0%)
  119. [02:53:31] Completed 2500 out of 250000 steps  (1%)
  120. [03:10:53] Completed 5000 out of 250000 steps  (2%)
  121. [03:28:22] Completed 7500 out of 250000 steps  (3%)
  122. [03:45:34] Completed 10000 out of 250000 steps  (4%)
  123. [04:03:42] Completed 12500 out of 250000 steps  (5%)
  124. [04:21:48] Completed 15000 out of 250000 steps  (6%)
  125. [04:39:12] Completed 17500 out of 250000 steps  (7%)
  126. [04:56:52] Completed 20000 out of 250000 steps  (8%)
复制代码
回复

使用道具 举报

发表于 2014-3-11 13:52:47 | 显示全部楼层
兄弟跑的虚拟机还是原生1.3.4镜像?如果虚拟机用的哪个镜像?

如果使用WEB控制启动,关机时恰好遇到客户端定时存盘时,就会出现此类丢失存盘点问题



回复

使用道具 举报

您需要登录后才可以回帖 登录 | 新注册用户

本版积分规则

论坛官方淘宝店开业啦~
欢迎大家多多支持基金会~

Archiver|手机版|小黑屋|中国分布式计算总站 ( 沪ICP备05042587号 )

GMT+8, 2024-5-5 04:06

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表