|
下载地址:
http://boinc.thesonntags.com/collatz/download/collatz_2.04_windows_intelx86__cuda23_v1.4.zip
可以使用cmdline了。
Collatz 2.04 CUDA Optimized Application Readme
==============================================
There are 3 command line parameters that can be used to control the performance and resource utilization of the Collatz v2.04 CUDA application.
In the examples below, a sample workunit that is only 2% of the normal workunit size was used to calculate the times. All timings listed are for a 9800 GTX+ at stock speeds.
To override the default settings, edit the cmdline elements (each of them) in the app_info.xml file. The order of the parameters is not important. Do NOT put spaces between the parameter and the value. (e.g. L5 is OK, L 5 is not)
For example, to use the same settings as the default:
<cmdline>L5 I8 S1</cmdline>
To run as fast as possible (max GPU WUs per day)
<cmdline>L13 I8 S0</cmdline>
To run as fast as possible on both CPU and GPU:
<cmdline>L13 I8 S1</cmdline>
To run on a GPU with poor response:
<cmdline>L3 I5 S1</cmdline>
All 3 files (app_info.xml, collatz_2.04_windows_intelx86__cuda23.exe, and cudart.dll) get placed in the project folder. For XP the location is:
"c:\documents and settings\all users\application data\boinc\projects\boinc.thesonntags.com_collatz"
For Windows Vista, Windows 7, and Windows Server 2008, the location is:
C:\ProgramData\BOINC\projects\boinc.thesonntags.com_collatz
Note: ProgramData is a hidden folder by default so you will either need to unhide it or use the full location in Windows Explorer to access the folder.
Ix
Default Value: I8
Valid Values: I5 through I8
Purpose: Controls the number of items per loop. The setting represents the power of 2 that will be used for each dimension of the two-dimensional array of items being calculated. e.g. I5 = 2^5 rows by 2^5 columns = 32x32 = 1024 numbers calculated per loop and I8 = 2^8 rows by 2^8 columns = 65536 numbers calculated per loop. Values below 32 could be used but result in the GPU being only partially utilized. Values above 8 exceeds the amount of memory allowed per CUDA kernel. Using the parameters L13 I5 takes 257 seconds to complete the sample workunit. Using L13 I8 takes only 44 seconds. Anything below I7 drastically increased the run time and will require move GPU time to complete the same workunit.
Lxx
Default Value: L5
Valid Values: L1 through L13
Purpose: controls the number of loops per reduction. The higher the number, the better the GPU utilization and the faster the workunit will complete. Also, the higher the number, the less responsive the system will be. Machines which are dedicated crunchers will likely want to use L13. Machines used while crunching will want to use a value from 1-5. The lower the number, the higher the elapsed time will be. For example, a value of L1 runs at 73% GPU utilization and takes 67 seconds whereas L13 runs at 99% GPU utilization and takes 43 seconds. By comparison, the v2.03 application takes about 51 seconds. The value is actually the power of 2 that is used, so L3 = 2^3 = 8 loops per reduction. L13 = 2^13 = 8192 loops per reduction. There is about a 2% difference in run time and a 1-2% difference in GPU utilization using L5 verse L13 on a 9800 GTX+.
In general the more items per loop (Ix) and the more loops per reduction (Lxx) the faster the workunit will complete and the worse the video response will be.
Sxxxx
Default Value: S1
Valid Values: S0 through S4294967295
Purpose: controls the number of milliseconds to wait for the application to complete the loops and reduction. Setting the value to 0 will cause it to use CPU while waiting for the GPU to finish its calculations but will result in teh fastest elapsed time. It will not increase or reduce the GPU time needed. Settings from 1 to 10 will have little effect on the runtime if using many loops per reduction (e.g. L13) but will drastically reduce GPU utilization when using fewer loops per reduction (e.g. the stock setting of L3). For example, using S10 results in an elapsed time of 83 seconds with L3 and 44 seconds with L13. Note: Setting this to the max value will require 136 YEARS to complete a workunit.
Hints:
Old Slow CUDA crunchers:
For those with very slow graphics cards wich have poor video response using the stock settings, a low setting for L will likely be all that is needed to improve response (L1 or L2). You may also try using a lower setting for I (I6 or I7) in combination with the lower L setting.
有谁想翻译可以翻一下。
或者等我一个月 - - |
|