中国分布式计算论坛

 找回密码
 新注册用户
搜索
楼主: vmzy

[独立平台] [生命科学类] Folding@Home

[复制链接]
 楼主| 发表于 2007-3-23 16:06:10 | 显示全部楼层
3/22/2007 Stanford School of Medicine Network Outtage
The main net for the school of medicine (and much of Folding@home) went down at about 8am pacific time. For now, stats and the assignment servers are down. Many of the data servers are up, as they are spread out on other networks. Stanford IT is working to get it up ASAP. We will post more news as we hear it.

3/22/2007 UPDATE: network back
The network is back up and running.

大意:
昨天晚上,斯坦福医学院断网,现在已恢复。

评分

参与人数 1基本分 +10 维基拼图 +3 收起 理由
BiscuiT + 10 + 3

查看全部评分

 楼主| 发表于 2007-3-24 22:25:57 | 显示全部楼层
3/23/2008 PS3 launch early results
With the PS3 launch, we've added a lot of CPU power to Folding@Home, with the total FLOPS now greatly increased an on its way to a Petaflop. Also, we've gotten a lot of crossover interest in the other Folding@Home clients (Win, Lin, OSX; SMP; and GPU), which is also wonderful.

Finally, due to all the interest, our web pages are getting hit very severely. We have been working to improve this, especially with the understanding that more PS3's could be on the way. So far, the performance is still pretty snappy, but we've made changes to make the process run more smoothly. First, we now allow the stats update script exclusive access to the db during updates. This speeds updates, but limits what donors can do during the update period to see their stats. We have also done several changes to improve caching of the stats data to improve overall performance. We are also working to get additional hardware to help out.

With these changes, we should be ready for a lot more clients!

大意:
PS3的计算能力太强了,我们的总计算能力已经接近Petaflop。

由于访问量大增,服务器有点不堪重负。现在统计更新时,采用独占数据库的策略以加快更新速度,但是更新时就无法访问统计了。同时我们也在考虑升级硬件。

希望这样可以容纳更多的客户端。

评分

参与人数 1基本分 +20 维基拼图 +10 收起 理由
BiscuiT + 20 + 10

查看全部评分

 楼主| 发表于 2007-3-26 10:25:12 | 显示全部楼层
3/25/2007 Working towards a petaflop
With the addition of more PS3 clients, we're working our way up towards a petaflop. The performance of the project depends on machines being left on running Folding@Home. There was a performance drop as certain machines started taking longer to do work units (most likely since these machines may not be running Folding@Home 24/7, naturally). This drop is expected as we move from the launch date (when people are running FAH in extended periods) and into a more steady-state set of numbers for the PS3 performance. We are also looking into different ways to evaluate FLOPS, as there are different pros and cons of our current method. As reaching a petaflop is an important milestone, we want to make sure that we use methods which allow our flop count to be directly comparable to others cited.

大意:
在PS3客户端的大力帮助下,我们的计算速度正在朝petaflop迈进(这是一个重要的里程碑)。为了提高统计精度,我们正在考虑改进计算速度统计算法。

评分

参与人数 1基本分 +20 维基拼图 +8 收起 理由
BiscuiT + 20 + 8

查看全部评分

 楼主| 发表于 2007-3-27 21:29:41 | 显示全部楼层
3/26/2007 Update on flops count
We have been looking into the flops count and its large variations and have found one more issue. The initial stats were based off the average we had seen during testing (yielding approximately 25 GFLOPs for a single PS3). However, the pre-launch testing period used big proteins which will result in higher GFLOP utilization. When we went live, we started our initial post-launch phase with small proteins to test the scientific validity; these smaller proteins have more overhead (since they spend less time calculating the force -- which is highly optimized) and thus the GFLOPS are lower now. As we switch back to the larger proteins, we expect to see an increase in the FLOPS per machine, and thus the overall FLOP count will change dramatically. We stress that there is a wide variation in FLOPS we can get (easily a factor of 3x) and so we expect the number to vary widely until we reach some steady state average.

大意:
PS3的flops统计有问题,算测试用的小蛋白质时计算速度低,现在换回大蛋白质。速度会大增。要过Pflops了。

评分

参与人数 1基本分 +20 维基拼图 +5 收起 理由
BiscuiT + 20 + 5

查看全部评分

 楼主| 发表于 2007-3-28 17:02:50 | 显示全部楼层
3/27/2007Two Million CPUs have returned work
We have just passed the 2,000,000 CPU mark -- 2M CPUs have at some time contributed to FAH. Right now, over 200,000 CPUs are actively returning work. With the addition of PS3 donors, Folding@Home is the most powerful distributed computing resource on the planet, and for the calculations we run (parallel independent molecular dynamics trajectories), the most powerful supercomputer of any type (distributed or otherwise).

大意:
活跃CPU总数超过2百万。

译者注:值得庆祝一下。

[ 本帖最后由 vmzy 于 2007-3-28 17:09 编辑 ]

评分

参与人数 1基本分 +10 维基拼图 +3 收起 理由
BiscuiT + 10 + 3

查看全部评分

 楼主| 发表于 2007-3-28 17:07:32 | 显示全部楼层
3/27/2007
General update
We've had a pretty busy few days. Unfortunately, on the day of the PS3 launch, the network went down throughout the whole Stanford Medical School, taking FAH servers off the internet. That unfortuantely lead to a major outtage last thursday, but that was resolved and now FAH is running smoothly. The PS3 machines are isolated from the rest of FAH in that they have their own AS and data servers and all seems to be running smoothly there.

The main issue over the last few days has been slow stats and web access. We've rewritten how the stats work to improve performance. There are a couple of new changes:

1) We've updated how the daily_*_summary.txt files get updated and we can now update them more frequently than before the PS3 launch (we are now updating them every 3 hours instead of 6). note that our bandwidth scripts check IPs which download these and other files too often, so to avoid getting caught by that script, keep the downloads of each of these files to under 10 per day. Since there are only 24/3 = 8 updates per day, this should hopefully not be a problem.

2) We've instituted a new policy where we update the stats db every hour with new WU's, but turn off the cgi web pages to read the stats during the update. This will avoid some of the very long updates seen in the past. The main downside is that the stats are down every hour for about 10 minutes (roughly from the 10 to 20 minute period in each hour). We are considering ways to improve this, including updating the stats every 2 hours (leading to less down time).

So, with the new changes, it looks like FAH is back to running smoothly. The PS3 clients bring a great new capability to our scientific research and so we're excited about what we'll be able to do now. It's important for us to stress that the other clients still play a key role, as the PS3 client (like the GPU clients) are limited in what they can do (although what they do do, they do fast). In particular, we are getting wonderful results and throughput from the SMP client and we expect that to play a very important role for years to come.

大意:
改变统计策略:1、每天更新次数由4次变为8次。2、统计更新时,暂时禁止访问统计结果页面。

评分

参与人数 1基本分 +10 维基拼图 +5 收起 理由
BiscuiT + 10 + 5

查看全部评分

 楼主| 发表于 2007-3-29 14:27:47 | 显示全部楼层
28 Mar, 2007
Catalyst 7.3 compatible with GPU client (w/caveats) [WinXP]

The new Catalyst drivers released 2007/03/28, version 7.3 (8.351) appear to work successfully with the GPU client.

The console client will work as expected with the new drivers.

On some single GPU systems you will have to force the card detection by using the -gpu 0 flag.

The GUI client appears to have a few problems (similar to those found with Cat 7.2):

You cannot start the GUI GPU client with the GUI window open, as the GPU will fail to initialise correctly.

Opening the GUI window in the GUI GPU client (after it has successfully started) will cause the client to pause folding on the GPU. Closing the GUI window will resume folding on the GPU.

Neither of the above cases cause the client to crash, or trash WUs, but they appear to run the WU in software mode only resulting in extremely long frame times, and these WUs will certainly expire if left processing on the CPU. It is recommended that when you start the GPU client (console or GUI versions) you watch the temperatures and current draws using ATItool or ATI Tray Tools to confirm that processing has actually begun on the GPU itself.

In order to guarantee that the GPU will initialise, the GUI window should be closed when starting the client.

At present the cause of these problems is not known, and anyone with more information is requested to post with their experiences.

Note, there appears to be a small folding performance hit compared to previous working versions. It equates to roughly a 2% increase in frame time/drop in PPD.

The following versions of Catalyst are known to work with the GPU client:
  • 6.5
  • 6.10
  • 6.11
  • 7.2
  • 7.3
The GPU client will work with these drivers with a significant performance hit:
  • 6.6
  • 6.7
The following drivers do not work at all with the GPU client:
  • 6.8
  • 6.9
  • 6.12
  • 7.1

Tested with X1900XT 256 on Windows XP 32bit

For more information see this thread:
Catalyst 7.3 is up

大意:
ATI新驱动7.3版,完全兼容GPU客户端。
另外官方提示,最好不要用GPU图形版客户端,因为在初始化阶段打开图形界面将导致任务失败。成功初始化后打开图形界面将导致运算暂停。命令行版没有任何问题。

评分

参与人数 1基本分 +20 维基拼图 +6 收起 理由
BiscuiT + 20 + 6

查看全部评分

 楼主| 发表于 2007-3-30 22:23:01 | 显示全部楼层
3/29/2008Major stats overhaul: Monday April 2
We are going to take the stats down for several hours at 10am pacific time on Monday April 2 (this coming Monday). We need to make updates to the stats system for v6 and test that these updates are working. When we go back on-line, we will hopefully have the upgdraded stats working and would then be ready to launch v6. We are keeping a backup of the stats, such that we can at any time revert to the old stats system if there are any bugs in the code. So, the stats data is very much safe during this transition, but there may be some unforseen problems (it's hard to predict those). This is actually mostly unrelated to all the stats work done last week to improve performance for the PS3.

大意:
下周一4月2日,更新使用新统计系统,为V6版客户端做准备。届时可能会出无法预料的问题,希望大家做好心理准备。

评分

参与人数 1基本分 +10 维基拼图 +5 收起 理由
BiscuiT + 10 + 5

查看全部评分

 楼主| 发表于 2007-4-19 10:35:21 | 显示全部楼层
18 Apr, 2007
Catalyst 7.4 compatible with GPU client

The new Catalyst drivers released 2007/04/18, version 7.4 (8.36) appear to work successfully with the GPU client.

The console client will work as expected with the new drivers.

On some single GPU systems you will have to force the card detection by using the -gpu 0 flag.

The GUI client appears to have a few problems (similar to those found with Cat 7.2 and 7.3):

You cannot start the GUI GPU client with the GUI window open, as the GPU will fail to initialise correctly.

Opening the GUI window in the GUI GPU client (after it has successfully started) will cause the client to pause folding on the GPU. Closing the GUI window will resume folding on the GPU.

Neither of the above cases cause the client to crash, or trash WUs, but they appear to run the WU in software mode only resulting in extremely long frame times, and these WUs will certainly expire if left processing on the CPU. It is recommended that when you start the GPU client (console or GUI versions) you watch the temperatures and current draws using ATItool or ATI Tray Tools to confirm that processing has actually begun on the GPU itself.

In order to guarantee that the GPU will initialise, the GUI window should be closed when starting the client.

At present the cause of these problems is not known, and anyone with more information is requested to post with their experiences.

Note, there appears to be a small folding performance hit compared to catalyst 7.2 No performance difference from 7.3. It equates to roughly a 2% increase in frame time/drop in PPD.

The following versions of Catalyst are known to work with the GPU client:
  • 6.5
  • 6.10
  • 6.11
  • 7.2
  • 7.3
  • 7.4
The GPU client will work with these drivers with a significant performance hit:
  • 6.6
  • 6.7
The following drivers do not work at all with the GPU client:
  • 6.8
  • 6.9
  • 6.12
  • 7.1

Tested with X1900XT 256 on Windows XP 32bit


大意:
新7.4版ATI驱动支持GPU客户端。
单显卡的命令行客户端用户如果检测不到显卡,请使用“-gpu 0”参数。
继续不推荐使用图形版客户端,因为容易出错(未初始化完毕打开图形界面会导致任务失败,初始化完毕后打开图形界面会导致任务暂停)。

评分

参与人数 1基本分 +20 维基拼图 +10 收起 理由
BiscuiT + 20 + 10

查看全部评分

 楼主| 发表于 2007-4-20 17:24:35 | 显示全部楼层
13 Apr, 2007
Strange hardware failure on vspg machines

We are seeing a strange hardware failure for all of the vspg machines. The fact that all the machines are suddenly behaving strangely suggests that it's not the machines themselves, but the networking. We are looking into this right now. Unfortunatley, it's 10pm pacific time and the Stanford networking support is out for the night, so this will likely go unresolved until the morning.

19 Apr, 2007
Update on server status

Here's an update. We have been working on servers the last few days. The reset due to the power outtage caused problems with 2 servers, which didn't want to come back up after they were shut down.

We've gotten one of the big ones back up (171.65.103.162), but we're still working on another (171.65.103.68). The latter one is giving some strange results and so we don't have an ETA on it.

Finally, we've found a problem with the collection server and it's now running again, although there may be some problems with the CS taking WU's from 171.65.103.68. We are looking into that.

大意:
2台服务器挂了。现已恢复一台,另一台仍在恢复中

评分

参与人数 1基本分 +10 维基拼图 +3 收起 理由
BiscuiT + 10 + 3

查看全部评分

 楼主| 发表于 2007-4-28 10:44:16 | 显示全部楼层
26 Apr, 2007
Collection server off line
We have taken the collection server off line while we work on a major upgrade/overhaul for it. Many WU's were being rejected by the CS and we now know why. We're working on a fix, which will require a mix of hardware and software updates.

Stanford IT find network issue
Stanford IT has sent this to me just now. We'll keep you updated. It's almost 5pm pacific time, so it may be a rough night.

We are currently experiencing a network outage that is affecting a number of buildings and services on campus. Engineers are working on the problem, there is no estimate for repair at this time.

Stanford-PCG
--
itss-service-alerts@lists is an internal list to notify ITSS staff about service outages and updates. The Incident Report form is available on the web at <http://itss-incident-report.stanford.edu/>.

大意:
斯坦福IT部门已经找到众多服务器异常的原因,现在正在维修中(维修过程中许多任务无法上传),恢复日期未知。

[ 本帖最后由 vmzy 于 2007-4-28 10:45 编辑 ]

评分

参与人数 1基本分 +10 维基拼图 +5 收起 理由
BiscuiT + 10 + 5

查看全部评分

 楼主| 发表于 2007-5-4 00:25:06 | 显示全部楼层
5/2/2007
Network upgrades at Stanford
Stanford is upgrading the network to one of Folding@home's data centers to try to help make the system more redundant and to get us on Stanford's new 10Gig backbone. However, this will lead to a network outtage on some of our machines (those in the 171.64.65.xx subnet) on two occaisons. Tomorrow (Tues) at 5am pacific time there will be some work, but likely not an outtage. On Thursday at 5am pacific time, there will be an outtage for about 1 hour.

大意:
斯坦福大学10G网络主干网升级。因此网络近期会不稳定,提醒大家注意一下。

评分

参与人数 1基本分 +10 维基拼图 +5 收起 理由
BiscuiT + 10 + 5

查看全部评分

 楼主| 发表于 2007-5-5 00:06:14 | 显示全部楼层
5/3/2007
Server and server room upgrades
We've got a series of new servers that will double the server capability of FAH. We've had the hardware for almost two weeks, but we're still waiting on the University to get the networking completed to our new server room. We've been told that this will be completed by the end of next week, so we're likely still 2 weeks out from getting these new servers up and going. We'll give an update as we get more info.

大意:
服务器将扩容一倍。设备早已到货,等待网络升级中。网络将于1周后升级完,服务器将于2周内升级。以后将会有更多详细信息。

评分

参与人数 1基本分 +10 维基拼图 +6 收起 理由
BiscuiT + 10 + 6

查看全部评分

 楼主| 发表于 2007-5-7 22:07:45 | 显示全部楼层
5/6/2007
Update on server and server room upgrades
Looks like we're on schedule to get the new servers on line by the end of the week, which means the new WU's can go out on Monday or Tuesday the week after that, if all goes well. These new machines will basically double the usable storage of FAH, adding almost an extra 150TB and adding 56 cores to the mix.

大意:
本周末升级结束,周一或周二将开始发放任务。升级将使FAH扩容一倍,新增150TB空间和56台服务器。

评分

参与人数 1基本分 +10 维基拼图 +5 收起 理由
BiscuiT + 10 + 5

查看全部评分

 楼主| 发表于 2007-5-21 10:16:09 | 显示全部楼层
5/20/2007
Update on collection server and server & server room upgrades
Since people have been curious, I thought it would be important to comment on these issues in a very visable place (I've addressed this our forum, but it's better to make a posting in a more visable place like this). The main issue with the CS right now is that with all the FAH WU's, the current CS hardware needs to be upgraded to handle it. We have a plan to handle it. Short term, we have put only certain machines on the CS to get it at least partially working (better than not working at all). We have been waiting for new hardware to make the real long term solution: new and more hardware. The new hardware will be much more beefy to handle the issue. Also, we will move to having multiple collection servers, which will also lessen the load and the requirements of each individual CS. The new machines are here and finally networked (as of Friday). Our sysadmins need to install the OS on the machines and we should be ready to roll.
大意:
新(任务回收)主服务器上周五已经到位,现在正在安装操作系统、调试。很快就会上线。

[ 本帖最后由 vmzy 于 2007-5-21 10:17 编辑 ]

评分

参与人数 1基本分 +10 维基拼图 +5 收起 理由
BiscuiT + 10 + 5

查看全部评分

您需要登录后才可以回帖 登录 | 新注册用户

本版积分规则

论坛官方淘宝店开业啦~
欢迎大家多多支持基金会~

小黑屋|手机版|Archiver|中国分布式计算总站 ( 沪ICP备05042587号 )

GMT+8, 2019-12-11 22:48

Powered by Discuz! X3.4

© 2001-2017 Comsenz Inc.

快速回复 返回顶部 返回列表