|
楼主 |
发表于 2008-4-28 13:59:24
|
显示全部楼层
呵呵,没什么好失望的啊,在务必保证计算结果可靠性的前提下,项目方在优化程序方面已经很努力了,翻译几个帖子吧。
http://einstein.phys.uwm.edu/for ... p;nowrap=true#81998
有人问:
Since we're talking about feature detection now, are there any plans to go up to SSE2, or would the costs of doing so outweigh the benefits?
既然现在能够自动检测指令集了,有没有计划推出针对SSE2或其它指令集的计算程序?还是说这样做的代价会超过所带来的好处?
Bernd答:
There's still some room for improvements of the SSE code, I'll try that first. My rough guess is that SSE2 would gain less than 10% over the best possible SSE App.
However with the new way of feature-based App switching the "costs" (for the project) are lowered too, compared to the cumbersome mixed-linked Apps we had before.
目前的SSE代码仍有提升空间,我还会试一试,不过推测SSE2带来的提升不会超过10%,当然,考虑到可以自动检测指令集,多准备几种优化方式的代价也相对小一些了。
旁边有人说:
Fair enough. There will definitely be diminishing returns on the efforts. If you provide SSE2, then I'm sure people will want SSE3, SSSE3, SSE4, etc... My guess, based on what is being seen with SETI, is that SSE3 is where meaningful improvements would stop. For AMD processors, it became apparent that SSE3 had negligible differences (perhaps due to missing HyperThreading?)...
总归优化程度越高后带来的好处也会越来越少。如果提供了SSE2版本,马上就会有人要SSE3版本、SSSE3版本。。。根据我在SETI项目观察到的,在SSE3之后就不会有明显的提升了。而对于AMD处理器,SSE3的效果还更差。
Bernd又说:
In the new code we tried to avoid double precision as much as possible, so we already can perform most calculations in SSE. In the two functions that take the most time there is not much left to the compiler to optimize. Benefits from double-precision vectorization, more registers etc. are actually pretty minimal (e.g. the current kernel loop only uses 5 of 8 xmm registers, there is simply no benefit from having twice as many or even more).
There are a few specific features of SSE2 that are helpful, but only if the instructions are carefully placed into the code, probably in assembler (inline assembler in the code or using some well-coded math library). The full-blown 64Bit/SSE2 experiment where I left most to the compiler was nothing less than disappointing.
在新的代码里,我们已经尽量避免使用双精度数据,因此可以用SSE完成大部分的计算。在消耗计算时间最多的两个函数里已经没有多少可供优化的空间了。SSE2以及更高版本指令集带来的好处将非常少。有几个SSE2的特性可能会有所帮助,但需要非常小心地引入到代码中。目前的测试结果并不乐观。 |
|