找回密码
 新注册用户
搜索
查看: 6671|回复: 14

[Collatz Conjecture] 最新的cuda2.04客户端可以设置cmdline

[复制链接]
发表于 2010-5-30 21:39:31 | 显示全部楼层 |阅读模式
下载地址:
http://boinc.thesonntags.com/collatz/download/collatz_2.04_windows_intelx86__cuda23_v1.4.zip
可以使用cmdline了。

Collatz 2.04 CUDA Optimized Application Readme
==============================================

There are 3 command line parameters that can be used to control the performance and resource utilization of the Collatz v2.04 CUDA application.

In the examples below, a sample workunit that is only 2% of the normal workunit size was used to calculate the times.  All timings listed are for a 9800 GTX+ at stock speeds.

To override the default settings, edit the cmdline elements (each of them) in the app_info.xml file.  The order of the parameters is not important.  Do NOT put spaces between the parameter and the value.  (e.g. L5 is OK, L 5 is not)

For example, to use the same settings as the default:
<cmdline>L5 I8 S1</cmdline>  

To run as fast as possible (max GPU WUs per day)
<cmdline>L13 I8 S0</cmdline>  

To run as fast as possible on both CPU and GPU:
<cmdline>L13 I8 S1</cmdline>  

To run on a GPU with poor response:
<cmdline>L3 I5 S1</cmdline>  

All 3 files (app_info.xml, collatz_2.04_windows_intelx86__cuda23.exe, and cudart.dll) get placed in the project folder.  For XP the location is:
"c:\documents and settings\all users\application data\boinc\projects\boinc.thesonntags.com_collatz"

For Windows Vista, Windows 7, and Windows Server 2008, the location is:
C:\ProgramData\BOINC\projects\boinc.thesonntags.com_collatz

Note: ProgramData is a hidden folder by default so you will either need to unhide it or use the full location in Windows Explorer to access the folder.


Ix
Default Value: I8
Valid Values: I5 through I8
Purpose: Controls the number of items per loop.  The setting represents the power of 2 that will be used for each dimension of the two-dimensional array of items being calculated.  e.g. I5 = 2^5 rows by 2^5 columns = 32x32 = 1024 numbers calculated per loop and I8 = 2^8 rows by 2^8 columns = 65536 numbers calculated per loop. Values below 32 could be used but result in the GPU being only partially utilized.  Values above 8 exceeds the amount of memory allowed per CUDA kernel.  Using the parameters L13 I5 takes 257 seconds to complete the sample workunit.  Using L13 I8 takes only 44 seconds.  Anything below I7 drastically increased the run time and will require move GPU time to complete the same workunit.

Lxx
Default Value: L5
Valid Values: L1 through L13
Purpose: controls the number of loops per reduction.  The higher the number, the better the GPU utilization and the faster the workunit will complete.  Also, the higher the number, the less responsive the system will be.  Machines which are dedicated crunchers will likely want to use L13.  Machines used while crunching will want to use a value from 1-5.  The lower the number, the higher the elapsed time will be.  For example, a value of L1 runs at 73% GPU utilization and takes 67 seconds whereas L13 runs at 99% GPU utilization and takes 43 seconds.  By comparison, the v2.03 application takes about 51 seconds.  The value is actually the power of 2 that is used, so L3 = 2^3 = 8 loops per reduction.  L13 = 2^13 = 8192 loops per reduction.  There is about a 2% difference in run time and a 1-2% difference in GPU utilization using L5 verse L13 on a 9800 GTX+.

In general the more items per loop (Ix) and the more loops per reduction (Lxx) the faster the workunit will complete and the worse the video response will be.

Sxxxx
Default Value: S1
Valid Values: S0 through S4294967295
Purpose: controls the number of milliseconds to wait for the application to complete the loops and reduction.  Setting the value to 0 will cause it to use CPU while waiting for the GPU to finish its calculations but will result in teh fastest elapsed time.  It will not increase or reduce the GPU time needed.  Settings from 1 to 10 will have little effect on the runtime if using many loops per reduction (e.g. L13) but will drastically reduce GPU utilization when using fewer loops per reduction (e.g. the stock setting of L3).  For example, using S10 results in an elapsed time of 83 seconds with L3 and 44 seconds with L13.  Note: Setting this to the max value will require 136 YEARS to complete a workunit.


Hints:
Old Slow CUDA crunchers:
For those with very slow graphics cards wich have poor video response using the stock settings, a low setting for L will likely be all that is needed to improve response (L1 or L2).  You may also try using a lower setting for I (I6 or I7) in combination with the lower L setting.


有谁想翻译可以翻一下。
或者等我一个月 - -
回复

使用道具 举报

发表于 2010-5-30 23:27:06 | 显示全部楼层
参数默认 可以吧~
回复

使用道具 举报

发表于 2010-5-31 02:36:26 | 显示全部楼层
先等你一个月再说o(∩_∩)o...哈哈!!!
回复

使用道具 举报

发表于 2010-5-31 14:17:27 | 显示全部楼层
回复 3# muclemanxb


    这个参数 我大致能看懂 但是就是从昨晚开始借不到wu
回复

使用道具 举报

 楼主| 发表于 2010-5-31 20:24:30 | 显示全部楼层
先翻译一段:

在Collatz CUDA计算程序2.04版本中,新增添了3个用来控制性能和资源使用的命令行参数。

下面给出的例子中,使用相当于普通WU 2%大小的样本WU,用于比较计算时间。所有时间以9800 GTX+初始速度为准。

通过更改app_info.xml文件中cmdline元素,可以改变命令行参数。各个参数的顺序可以改变。不过,在一个参数中间添加空格是无效的(比如,“L5”有效,“L 5”无效)。


例如,下面是默认设置:
<cmdline>L5 I8 S1</cmdline>  

尽可能提高GPU运算速度:
<cmdline>L13 I8 S0</cmdline>  

尽可能使用GPU、CPU提高运算速度:
<cmdline>L13 I8 S1</cmdline>  

配置较低的显卡可以使用:
<cmdline>L3 I5 S1</cmdline>  

优化程序由3个文件组成:app_info.xml,collatz_2.04_windows_intelx86__cuda23.exe,cudart.dll。这些文件应该存储在BOINC项目文件夹中。

Windows XP的默认目录是:
"c:\documents and settings\all users\application data\boinc\projects\boinc.thesonntags.com_collatz"

Windows Vista,Windows 7,Windows Server 2008的默认目录是:
C:\ProgramData\BOINC\projects\boinc.thesonntags.com_collatz

注意:这些文件夹默认是隐藏的。

Ix
默认值:I8
取值范围:I5 ~ I8
用途:控制每一循环内处理的项目数。数值x代表每一次循环中,一个长和宽均为2^x的二维数组中的数据将被运算。例如:I5 代表二维数组长宽均为 2^5 即 32,一次循环中,32x32 = 1024个数据会被处理。实际上,数组尺寸小于32x32(即I值小于5)也是可以的,但这样就意味着GPU资源要浪费掉一部分。如果值大于8,程序占用显存会超过CUDA核心的最高限制。计算样本WU,参数“L13 I5”需要257秒,“L13 I8”就只需要44秒。计算同一个WU,小于I7的设置会显著增加计算时间。
回复

使用道具 举报

 楼主| 发表于 2010-5-31 22:08:05 | 显示全部楼层
使用了:
<cmdline>L13 I8 S0</cmdline>  

GPU温度飙升到95度 =。=
2.04的速度确有提升,某些WU居然只用半小时不到,大部分需要50+分钟(@9600GSO 原始频率)。
http://boinc.thesonntags.com/collatz/workunit.php?wuid=15742586
回复

使用道具 举报

发表于 2010-6-1 12:48:38 | 显示全部楼层
本帖最后由 zglloo 于 2010-6-1 13:33 编辑

95度有些高了哦~我使用的<cmdline>L13 I8 S1</cmdline>  L13还可以设大些吧!

两个<cmdline></cmdline> 设置为同样的吧~
回复

使用道具 举报

 楼主| 发表于 2010-6-1 19:45:36 | 显示全部楼层
本帖最后由 cuihao 于 2010-6-1 19:47 编辑

貌似我这里cmdline设成多少都没有太大影响,温度还是那么高,效率还是一样快 =。=
……什么情况?


哦,原来有两个cmdline,大家注意。
回复

使用道具 举报

 楼主| 发表于 2010-6-1 20:00:59 | 显示全部楼层
近期温度升得太快了。
今天居然突破100度。

不得不调控一下,寻找一个性能和温度的平衡。
暑假改造散热。
回复

使用道具 举报

 楼主| 发表于 2010-6-1 20:10:13 | 显示全部楼层
<cmdline>L4 I6 S1</cmdline>  
可以维持80度左右,不过效率过低,一秒钟0.01%多,需要2.5小时。
回复

使用道具 举报

 楼主| 发表于 2010-6-1 20:14:02 | 显示全部楼层
简述一下各个参数:
L,你可以把它理解为lag,因为这个值越大系统越迟钝。对效率有一定的影响
I,增加后会显著提高计算效率
S,反正最好别改
回复

使用道具 举报

发表于 2010-6-1 20:42:45 | 显示全部楼层
I的范围 能否超过8呢?
回复

使用道具 举报

 楼主| 发表于 2010-6-1 21:54:52 | 显示全部楼层
回复 12# zglloo


    官方文件说的是,I大于8显存占用会超标。
    没弄太明白,因为实际中I对显存占用貌似没啥影响。
回复

使用道具 举报

 楼主| 发表于 2010-6-1 22:04:01 | 显示全部楼层
目前温度已经高到了一定程度 *_*
9600GSO降频到C500/M800/S1242
目前CC计算一个WU需要2小时多。

温度维持在80度。
回复

使用道具 举报

发表于 2010-6-1 23:26:14 | 显示全部楼层
我开了侧盖 在75度左右  9800GT 1.5h左右
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 新注册用户

本版积分规则

论坛官方淘宝店开业啦~

Archiver|手机版|小黑屋|中国分布式计算总站 ( 沪ICP备05042587号 )

GMT+8, 2024-4-27 22:35

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表