周安国 发表于 2020-6-27 12:44:56

NVIDIA GPU包计算错误

就是,我今天开始入坑Einstein@home ,然后每一个N卡的GPU包(Gravitational Wave search O2 Multi-Directional GPU v2.07 (GW-opencl-nvidia) 计算程序),算到7、8分钟左右就计算错误了。这里就是一个任务的情况。

我先放一个标准错误输出:
<core_client_version>7.16.5</core_client_version>
<![CDATA[
<message>
- exit code 114 (0x72)</message>
<stderr_txt>
putenv 'LAL_DEBUG_LEVEL=3'
2020-06-27 12:00:42.2152 (9560) : This program is published under the GNU General Public License, version 2
2020-06-27 12:00:42.2222 (9560) : For details see http://einstein.phys.uwm.edu/license.php
2020-06-27 12:00:42.2262 (9560) : This Einstein@home App was built at: Dec 19 2019 12:14:49

2020-06-27 12:00:42.2312 (9560) : Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O2MDF_2.07_windows_x86_64__GW-opencl-nvidia.exe'.
Activated exception handling...
[DEBUG} GPU type: 1
[DEBUG} got GPU info from BOINC
[DEBUG} got VendorID 4318
2020-06-27 12:00:43.2126 (9560) : BSGL output files
2020-06-27 12:00:43.2375 (9560) : Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2020-06-27 12:00:43.2435 (9560) : Set up communication with graphics process.

DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.19.2.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
%% LALPulsar: 1.17.1.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
%% LALApps: 6.23.0.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)

2020-06-27 12:00:46.9237 (9560) : Reading input data ... 2020-06-27 12:06:48.3530 (9560) : Search FstatMethod used: 'ResampOpenCL'
2020-06-27 12:06:48.3540 (9560) : Recalc FstatMethod used: 'DemodSSE'
2020-06-27 12:06:48.3560 (9560) : OpenCL Device used for Search/Recalc and/or semi coherent step: 'GeForce MX150 (Platform: NVIDIA CUDA, global memory: 2048 MiB)'
2020-06-27 12:06:48.3580 (9560) : OpenCL version is used for the semi-coherent step!
XLAL Error - XLALTransferVectorToCLMEMVector (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/OpenCLutils.c:494): Transferring host memory to GPU failed: CL_MEM_OBJECT_ALLOCATION_FAILURE
XLAL Error - XLALTransferVectorToCLMEMVector (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/OpenCLutils.c:494): Internal function call failed
XLAL Error - XLALTransferMultiCOMPLEX8TimeSeriesToCLMEM (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/LFTandTSutils.c:1284): Check failed: XLALTransferVectorToCLMEMVector ( (CLMEMVector*) multiTimeSeries->data->data, tempX->data, tempX->length, sizeof(tempX->data), (1 == 1) ) == XLAL_SUCCESS
XLAL Error - XLALTransferMultiCOMPLEX8TimeSeriesToCLMEM (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/LFTandTSutils.c:1284): Internal function call failed
XLAL Error - XLALSetupFstatResamp_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp_OpenCL.c:311): Check failed: XLALTransferMultiCOMPLEX8TimeSeriesToCLMEM ( resamp->multiTimeSeries_SRC_b ) == XLAL_SUCCESS
XLAL Error - XLALSetupFstatResamp_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp_OpenCL.c:311): Internal function call failed
XLAL Error - XLALSetupFstatResamp (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp.c:242): Check failed: XLALSetupFstatResamp_OpenCL ( common, numSamplesMax_SRC, resamp, funcs ) == XLAL_SUCCESS
XLAL Error - XLALSetupFstatResamp (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp.c:242): Internal function call failed
XLAL Error - XLALCreateFstatInput (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat.c:603): Check failed: (setupFuncMethod) ( &input->method_data, common, funcs, multiSFTs, &optArgs ) == XLAL_SUCCESS
XLAL Error - XLALCreateFstatInput (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat.c:603): Internal function call failed
SetUpSFTs: XLALCreateFstatInput() failed with errno=1024Error 14: function SetUpSFTs, file /home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c, line 2807, $Id$
      ABORT: XLAL function call failed
Level 0: $Id$
        Function call `SetUpSFTs( &status, &usefulParams )' failed.
        file /home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c, line 1049

Level 1: $Id$
        Status code 14: XLAL function call failed
        function SetUpSFTs, file /home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c, line 2807
2020-06-27 12:08:15.4391 (9560) : BOINC_LAL_ErrHand(): xlalErrno = 1024
2020-06-27 12:08:15.4401 (9560) : BOINC_LAL_ErrHand(): now calling boinc_finish()
12:08:15 (9560): called boinc_finish

</stderr_txt>
]]>就是一堆XLAL Error,一堆Internal function call failed,我更新了显卡驱动也没用,倒是I卡的包算的也没事。你们遇到过这个情况吗?有什么解决办法啊?

牵牛星 发表于 2020-6-27 21:19:57

引力波需要至少3G以上的显存,很可能是炸显存了

freestman 发表于 2020-6-27 21:34:08

牵牛星 发表于 2020-6-27 21:19
引力波需要至少3G以上的显存,很可能是炸显存了

不塞包1GB就够了

牵牛星 发表于 2020-6-27 23:53:56

freestman 发表于 2020-6-27 21:34
不塞包1GB就够了

1050Ti2G版不塞包运行,依然大概率炸包

周安国 发表于 2020-6-30 13:37:59

牵牛星 发表于 2020-6-27 21:19
引力波需要至少3G以上的显存,很可能是炸显存了

我也觉得,我的MX150显存才2GB

周安国 发表于 2020-6-30 13:46:22

牵牛星 发表于 2020-6-27 21:19
引力波需要至少3G以上的显存,很可能是炸显存了

我也觉得,我的MX150显存才2GB

aisong220 发表于 2020-7-1 14:15:14

我的474最近一样各种出错,现在不算引力波了
页: [1]
查看完整版本: NVIDIA GPU包计算错误

论坛官方淘宝店开业啦~