中国分布式计算论坛

 找回密码
 新注册用户
搜索
查看: 450|回复: 6

[求助] NVIDIA GPU包计算错误

[复制链接]
发表于 2020-6-27 12:44:56 | 显示全部楼层 |阅读模式
就是,我今天开始入坑Einstein@home ,然后每一个N卡的GPU包(Gravitational Wave search O2 Multi-Directional GPU v2.07 (GW-opencl-nvidia) 计算程序),算到7、8分钟左右就计算错误了。这里就是一个任务的情况。

我先放一个标准错误输出:
  1. <core_client_version>7.16.5</core_client_version>
  2. <![CDATA[
  3. <message>
  4. - exit code 114 (0x72)</message>
  5. <stderr_txt>
  6. putenv 'LAL_DEBUG_LEVEL=3'
  7. 2020-06-27 12:00:42.2152 (9560) [normal]: This program is published under the GNU General Public License, version 2
  8. 2020-06-27 12:00:42.2222 (9560) [normal]: For details see http://einstein.phys.uwm.edu/license.php
  9. 2020-06-27 12:00:42.2262 (9560) [normal]: This Einstein@home App was built at: Dec 19 2019 12:14:49

  10. 2020-06-27 12:00:42.2312 (9560) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O2MDF_2.07_windows_x86_64__GW-opencl-nvidia.exe'.
  11. Activated exception handling...
  12. [DEBUG} GPU type: 1
  13. [DEBUG} got GPU info from BOINC
  14. [DEBUG} got VendorID 4318
  15. 2020-06-27 12:00:43.2126 (9560) [debug]: BSGL output files
  16. 2020-06-27 12:00:43.2375 (9560) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
  17. 2020-06-27 12:00:43.2435 (9560) [debug]: Set up communication with graphics process.

  18. DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
  19. Code-version: %% LAL: 6.19.2.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
  20. %% LALPulsar: 1.17.1.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
  21. %% LALApps: 6.23.0.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)

  22. 2020-06-27 12:00:46.9237 (9560) [normal]: Reading input data ... 2020-06-27 12:06:48.3530 (9560) [normal]: Search FstatMethod used: 'ResampOpenCL'
  23. 2020-06-27 12:06:48.3540 (9560) [normal]: Recalc FstatMethod used: 'DemodSSE'
  24. 2020-06-27 12:06:48.3560 (9560) [normal]: OpenCL Device used for Search/Recalc and/or semi coherent step: 'GeForce MX150 (Platform: NVIDIA CUDA, global memory: 2048 MiB)'
  25. 2020-06-27 12:06:48.3580 (9560) [normal]: OpenCL version is used for the semi-coherent step!
  26. XLAL Error - XLALTransferVectorToCLMEMVector (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/OpenCLutils.c:494): Transferring host memory to GPU failed: CL_MEM_OBJECT_ALLOCATION_FAILURE
  27. XLAL Error - XLALTransferVectorToCLMEMVector (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/OpenCLutils.c:494): Internal function call failed
  28. XLAL Error - XLALTransferMultiCOMPLEX8TimeSeriesToCLMEM (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/LFTandTSutils.c:1284): Check failed: XLALTransferVectorToCLMEMVector ( (CLMEMVector*) multiTimeSeries->data[X]->data, tempX->data, tempX->length, sizeof(tempX->data[0]), (1 == 1) ) == XLAL_SUCCESS
  29. XLAL Error - XLALTransferMultiCOMPLEX8TimeSeriesToCLMEM (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/LFTandTSutils.c:1284): Internal function call failed
  30. XLAL Error - XLALSetupFstatResamp_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp_OpenCL.c:311): Check failed: XLALTransferMultiCOMPLEX8TimeSeriesToCLMEM ( resamp->multiTimeSeries_SRC_b ) == XLAL_SUCCESS
  31. XLAL Error - XLALSetupFstatResamp_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp_OpenCL.c:311): Internal function call failed
  32. XLAL Error - XLALSetupFstatResamp (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp.c:242): Check failed: XLALSetupFstatResamp_OpenCL ( common, numSamplesMax_SRC, resamp, funcs ) == XLAL_SUCCESS
  33. XLAL Error - XLALSetupFstatResamp (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp.c:242): Internal function call failed
  34. XLAL Error - XLALCreateFstatInput (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat.c:603): Check failed: (setupFuncMethod) ( &input->method_data, common, funcs, multiSFTs, &optArgs ) == XLAL_SUCCESS
  35. XLAL Error - XLALCreateFstatInput (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat.c:603): Internal function call failed
  36. SetUpSFTs: XLALCreateFstatInput() failed with errno=1024Error[1] 14: function SetUpSFTs, file /home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c, line 2807, $Id$
  37.         ABORT: XLAL function call failed
  38. Level 0: $Id$
  39.         Function call `SetUpSFTs( &status, &usefulParams )' failed.
  40.         file /home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c, line 1049

  41. Level 1: $Id$
  42.         Status code 14: XLAL function call failed
  43.         function SetUpSFTs, file /home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c, line 2807
  44. 2020-06-27 12:08:15.4391 (9560) [CRITICAL]: BOINC_LAL_ErrHand(): xlalErrno = 1024
  45. 2020-06-27 12:08:15.4401 (9560) [CRITICAL]: BOINC_LAL_ErrHand(): now calling boinc_finish()
  46. 12:08:15 (9560): called boinc_finish

  47. </stderr_txt>
  48. ]]>
复制代码
就是一堆XLAL Error,一堆Internal function call failed,我更新了显卡驱动也没用,倒是I卡的包算的也没事。你们遇到过这个情况吗?有什么解决办法啊?

发表于 2020-6-27 21:19:57 | 显示全部楼层
引力波需要至少3G以上的显存,很可能是炸显存了
发表于 2020-6-27 21:34:08 | 显示全部楼层
牵牛星 发表于 2020-6-27 21:19
引力波需要至少3G以上的显存,很可能是炸显存了

不塞包1GB就够了
发表于 2020-6-27 23:53:56 | 显示全部楼层

1050Ti2G版不塞包运行,依然大概率炸包
 楼主| 发表于 2020-6-30 13:37:59 | 显示全部楼层
牵牛星 发表于 2020-6-27 21:19
引力波需要至少3G以上的显存,很可能是炸显存了

我也觉得,我的MX150显存才2GB
 楼主| 发表于 2020-6-30 13:46:22 | 显示全部楼层
牵牛星 发表于 2020-6-27 21:19
引力波需要至少3G以上的显存,很可能是炸显存了

我也觉得,我的MX150显存才2GB
发表于 2020-7-1 14:15:14 | 显示全部楼层
我的474最近一样各种出错,现在不算引力波了
您需要登录后才可以回帖 登录 | 新注册用户

本版积分规则

论坛官方淘宝店开业啦~
欢迎大家多多支持基金会~

小黑屋|手机版|Archiver|中国分布式计算总站 ( 沪ICP备05042587号 )

GMT+8, 2020-10-24 08:17

Powered by Discuz! X3.4

© 2001-2017 Comsenz Inc.

快速回复 返回顶部 返回列表