找回密码
 新注册用户
搜索
查看: 47864|回复: 75

[项目新闻] Drug Design & Optimization Lab (D2OL) [已结束]

[复制链接]
发表于 2005-3-5 22:52:58 | 显示全部楼层 |阅读模式
All,We are moving DNS servers which has become a less than pleasant experience. During this time, it may come to pass that domain information will be unavailable for a period of time. Obviously, this will affect the project.

My advice is to hoard some tasks and sit pretty. If everything goes as *planned* the storm will pass. If not, everyone remains productive.

Our IP address is not changing in any way - but moving registrars will blow away existing entries and then we need to re-input them and propagate.

apologies in advance - wish this worked faster on the web. (and no, we have no interest in hosting our own DNS server)
译文如下:
     大家好,我们正在移动令人很不爽的DNS服务器。在此期间,将可能会无法得到域名信息。显而易见,这将也影响整个项目。
    我的建议是多储存一些任务。
    无论如何,我们的IP地址没改变——但是需要重新注册搜索引擎入口地址。
    提前道歉 - 希望能尽快完成这些网上工作。(不,我们不想自己弄DNS服务器)

翻译BY VMZY

评分

参与人数 1基本分 +20 维基拼图 +10 收起 理由
霊烏路 空 + 20 + 10

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2005-5-5 00:01:06 | 显示全部楼层
2005年5月3日
All,I know everyone has been frustrated by lack of news from the management team - no one more so than I who hears your complaints and cries for information and rarely has much to offer.

Let's start at the heart of the matter - Yes, the project is still running. It's going strong. The Rothberg Institute remains committed to D2OL (and CommunityTSC) and will continue to support both the project and the community we have built.

Could we be better about updating? Heck yes. But behind that is the truth that in science, negative results are good data. The point of this project is not simply to find a cure for pathogen A. That's not going to happen - we need a wet lab for that. However, to whittle the possibilities down from a vast universe to a manageable dataset is a huge achievement.

It's slow, but the client is thousands of times faster than any manual method. Every bit of data we produce is something that does not need to be done by lab technicians. They focus on the results that MIGHT be good instead of the ones that are definitely a dead end.

So - I'm working on ways to communicate the successes we do have, even if they are not the kinds for which we all cross our fingers. Without a client update to support a stat overhaul - this is a challenge. I'll see what I can come up with.

Thanks to those of you who have kept with us despite our silence.

译文如下:
大家好,我知道大家对管理团队的新闻的缺乏很不满 - 虽然你们很少抱怨。

我们从核心问题开始 - 首先可以肯定的是,项目仍然运行。它计算力更强了。Rothberg研究所仍然负责D2OL(和CommunityTSC),并将继续支持项目和我们建立的论坛。

我们能了解更多吗? 能。就科学方面而言,数据结果负值越小越好。这个项目不是简单地寻找治疗某种病原菌的药物。事实上 - 我们需要的是一个虚拟实验室。然而,从宇宙般浩瀚的数据中筛选有用数据本身就是一个巨大的成就。

它的计算过程很慢,但客户端的处理速度要比手工处理快上千倍。我们处理的所有数据都不需要在实验室里完成。我们只关心可能是好的数据,而不是确定哪个是最坏的结果。

因此 - 我正在统计我们已有的成功结果,既使他们不是我们的所期望的全部。客户端不支持统计更新- 这是个挑战。我将看看我能做点什么。

感谢那些仍然坚持D2OL的兄弟们,尽管我们沈默了很久。

评分

参与人数 1基本分 +30 维基拼图 +18 收起 理由
霊烏路 空 + 30 + 18

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2005-9-14 00:09:38 | 显示全部楼层
9/12/05
On Friday, September 9th, 530 Whitfield (Home to The Rothberg Institute) experienced a total hardware failure to both their firewall and their phone system as a result of an electrical event in the telecom room. This isolated us from the outside world removing our email, our phones, and our website.
I regret that I had nowhere to post the details of this outage and I was diligently working on restoring service - this included locating and acquiring replacement hardware and restoring the configurations. Rest assured - the project is alive and we have not disappeared. I will post more as soon as we catch up on email & voice mail.
05年9月12日
在9月9日星期五,Whitfield 530(Rothberg学院所在)由于电信室的一次电路故障,他们的防火墙和电话系统都出现了硬件损坏。这使我们的电子邮件,我们的电话,和我们的网站与外部世界都断开了。
很抱歉,我无法张贴这次事故的细节,我正在努力地从事恢复工作- 这包括找出和替换故障硬件,并恢复配置。 放心- 项目工作正常,并且我们还未从世界上消失。当我们电子邮件&留言恢复时,我将张贴更多信息。

PS:D2OL终于更新信息了,我还以为D2OL和FAAH一样,没人管了呢!

评分

参与人数 1基本分 +20 维基拼图 +15 收起 理由
霊烏路 空 + 20 + 15

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2005-9-29 23:27:34 | 显示全部楼层
September 26, 2005
All,As we all know, JBOSS takes a holiday at a semi-regular interval and needs to be restarted. Yesterday morning was one such day and while I restarted the project, JBOSS was shut down again for reasons unknown.
When JBOSS is frozen and the stat processor is running, then stats are frozen and we get the awful error with which we are all familiar. If the stat processor is frozen, then the "to do" list piles up and stats are stuck until it is kicked.
What I learned today is that is JBOSS is actually shut down and the stat processor is running (albeit idle) - there is no page.
We're swell now, I just took a little longer than usual to try to figure out why JBOSS was hung.
2005年9月26日
大家好, 众所周知, JBOSS当了,需要重启。于是昨天早晨我重启了项目, JBOSS由于未知原因又当了。
当JBOSS罢工时统计还在运行, 然后统计也罢工了,我们又遇到了我们所熟悉的可怕的错误。如果统计工作罢工, 那么"to do"列表和统计页面都会出错。
我今天才知道JBOSS实际已经当了,统计工作正在运行(虽然没信息可以统计) - 没有统计页。
现在我们都快疯了, 我正在尽量找出导致JBOSS出错的元凶。

评分

参与人数 1基本分 +24 维基拼图 +12 收起 理由
霊烏路 空 + 24 + 12

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2005-10-9 01:19:04 | 显示全部楼层
October 07, 2005
I experienced a power failure today from which not all of my hardware has recovered. Right now I am trying to bring the MySQL database back up. This houses the members, nodes, etc which is fairly critical to displaying stats. This is technically not a project server, but only a stat server - so please, keep crunching!
I will be working diligently to obtain the hardware repair.
2005年10月7日
我们今天遇到了电源故障,至今仍有部分硬件尚未恢复。我现在在备份MySQL数据库。这里面有用户,节点,等对显示统计而言至关重要的信息。不过这是只是技术服务器而不是项目服务器, 仅仅是统计服务器 - 因此,请放心继续计算!
我们会努力工作以尽早修复硬件。

评分

参与人数 1基本分 +15 维基拼图 +10 收起 理由
霊烏路 空 + 15 + 10

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2005-10-11 23:06:49 | 显示全部楼层
October 10, 2005
A fallout from the power outages we received over the weekend was a few lost hard drives. Among these was a bad file system for the Result Handler.
As the name suggests, this is the applet that receives all data and sends it into the right buckets for processing stats, analyzing the docking results etc. For once, the Stats Processor is , but something upstream from it is DOWN.
While clearly frustrating for the stat hounds - the system refuses to load data without a handshake with the server - you are not sending results into a black hole. You are, however, hanging on to the pile of results until the file system recovers.
And that is what I am working on right now
2005年10月10日
由于这个周末的意外停电导致我们的几个硬盘报废。并导致结果处理程序的文件系统损坏。
顾名思义,它是接受所有数据并把它送入正确的地方进行统计处理,分析计算结果等的程序。这一次,统计程序, 由于某些不可逆原因,挂了。
当前很清楚统计功能很不稳定 - 系统拒绝载入数据,无法与服务器握手 - 你不要再试图上传结果了。您,只有, 耐心等到文件系统恢复。
我现在正在忙着搞好它

评分

参与人数 1基本分 +30 维基拼图 +15 收起 理由
霊烏路 空 + 30 + 15

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2005-10-13 15:52:26 | 显示全部楼层
October 10, 2005
methinks we have recovered cleanly - my apologies for the outage.
2005年10月10日
我想我们已经完全恢复正常了-我对此次停机向大家表示歉意!

评分

参与人数 1基本分 +10 维基拼图 +3 收起 理由
霊烏路 空 + 10 + 3

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2005-10-24 23:26:04 | 显示全部楼层
October 23, 2005 02:20 PM
Power Failure
We have experienced a power failure from which the Results NAS did not recover gracefully.

The When you return results, they arrive at the results handler - this box evaluates the result for structure and credits your account (which gets all rolled up into a bundle with the stats processor). Then the Results Handler ships the packets off to the Results NAS.

Later, at intervals and entirely behind the scenes, a separate engine called DRAS (Docking Result Analysis System) crunches through the results looking for ones that are scientifically interesting. It takes the results for a target off the NAS and crunches locally, in the end just holding the few that were best.

The NAS is a way station for data - it's damage does not set the project back - just ... stalls it ... until I can replace the drives. It's never fun to lose several drives in a RAID.
2005年10月23日下午02:20
电源故障
因为电源故障我们的结果NAS无法正常恢复。
当你上传结果时,他们先到达于结果处理程序 - 这个东东评估结果的结构有效性和你有的积分(并转交给统计处理程序)。然后结果处理程序把结果打包送入结果NAS存储.
然后, 每隔一段时间,系统后台会有一个独立引擎叫DRAS(返回结果分析系统)在结果中寻找那些有科学意义的结果。最后,在本地以NAS中的这些结果为目标再次计算,取其最佳结果。
NAS是一种数据存储方法 - 它对这个项目影响很小 - 暂时可以先不管它……直到我替换好驱动。从RAID中丢掉几块硬盘可不是什么好玩的事。

评分

参与人数 1基本分 +20 维基拼图 +10 收起 理由
霊烏路 空 + 20 + 10

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2005-10-27 23:53:16 | 显示全部楼层
October 25, 2005 09:48 AM
Due to some data corruption that occurred during the power failure, we were forced to roll back stats to early Sunday AM.

Once we brought the process back online and began receiving files, it became clear that some critical files had been mid-write when we lost power. (After the building lost power and then the generator overloaded and then the UPS ran down...) I apologize for the disappointing need to return to the last clean file set.

Of note - our weather forecast suggests continued storms. I apologize in advance for any unforeseen outages.
2005年10月25日上午09:48
由于在电源故障期间发生了一些数据错误,我们被迫将统计回档至上星期天早上。

当我们重新将处理服务器联网,并开始接受文件后,很明显的是当停电时有些重要文件正在执行写操作。(大楼停电,然后发电机超载,UPS最后也挂了...)我为因为需要回档而给大家带来的不便表示歉意。

注 - 我们的天气预报警告有持续的风暴。我事先为所有无法预见到的停机向大家表示歉意。

评分

参与人数 1基本分 +20 维基拼图 +10 收起 理由
霊烏路 空 + 20 + 10

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2005-12-1 01:35:01 | 显示全部楼层
November 29, 2005
Amid the holidays I have been away too much - I am sorry.

The NAS has thrown some errors (again) and I believe the drives themselves have exceeded their natural lifespan. We've had failures coming out of boot, lost shares, dropped drives from the RAID, and routinely run out of memory. At this moment, the project is down and I am requesting hardware replacement from the vendor while also pricing out a new unit.

At the moment, I have no ETA for recovery, but obviously the NAS is a critical central piece of the project, handing out work units and receiving results. The absence of the NAS halts the project until it is replaced.
2005年11月29日
假期中我离开太久了 - 抱歉。

NAS(再次)抛出了一些异常错误,我想可能是驱动器快寿终正寝了。我们遇到的问题包括开机启动失败,共享丢失,RAID丢失驱动器,频繁的内存不足错误。现在,项目被迫停机,并且我请求销售商更换硬件,并准备买一套新的存储设备。

此时,我没有ETA来进行还原,但很明显地是NAS是项目的核心,负责发放任务包和接收结果。只有等NAS好了项目才能恢复正常。

评分

参与人数 1基本分 +20 维基拼图 +10 收起 理由
霊烏路 空 + 20 + 10

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2005-12-14 23:42:10 | 显示全部楼层
12/13/2005
What is going on

The old NAS performs a chkdsk every time it boots, during which it finds no errors. This takes 4+ hours to validate the drives and is, how shall we say ... inconvenient.

The Drives check out OK physically, but the RAID does not. When it completes boot, at least one drive is deactivated and the RAID is offline. I must manually activate the Drive, bring the RAID back online and then ... wait ... while it rebuilds the array.

Oh, and guess what. It's a SOFTWARE RAID.

It has come to pass that there is irreparable data corruption in the operating system partition. With power failures and inadequate generators, when the NAS fails, this is bound to happen. With software RAID, this is a tragic thing.

The Current NAS has been offline for some time during this diagnosis and evaluation. I have secured a new NAS that has upgraded spindle speed, a hardware backplane with hardware RAID, and improved network connectivity. It is enroute.

The key here was to replace the device without the need to recode the software. Migrating the data will be dicey as the current NAS is not stable at all  I have unplugged it from the network to prevent y'all from using it and uploading doomed data.

Yes the project is down. No, the project has not ended. It will simply take longer than we had hoped to march forward.

2005年12月13日
发生了什么?

每次老NAS启动时都要执行一次chkdsk,却找不到任何错误。检测驱动器需要4+小时,怎么说呢……很不方便。

驱动器检查结果显示是好的,但RAID却老出错。当它完成启动后,只要有一个驱动器有问题,RAID就会当掉。我必须手工激活驱动器,重开RAID...然后等它重建磁盘列阵。

噢,猜到什么了。对我们用的是软RAID系统。

在操作系统分区出现了不可恢复的数据丢失。当遇到电源故障和发电机功率不足,NAS出错时,这经常发生。就软RAID而言,这简直是一个恶梦。

在当前查错期间NAS已经当掉了一段时间了。我已经找到了新的NAS系统,升级了核心速度,硬件支持硬RAID,加强了网络连接性能。不过它还在运送途中。

当前最重要的是要在不需要重新编译软件的前提下替换设备。就当前的NAS的稳定性而言转移数据是非常危险的,我已经将它和网络断开了,防止用户们使用它或上传要命的数据。

是的,项目现在当了。但,项目不会因此而结束的。不过我们要等很长一段时间才能恢复正常。


P.S.看来D2OL的结果分析服务器要当掉相当长的一段时间。(此间可以下载数据,但无法上传)
    请各位D2OLer慢慢算,等新服务器到了,一次上传它几万个结果,这样才有成就感。反正D2OL的任务没有时间限制,在机子上多存几天不碍事。

评分

参与人数 1基本分 +30 维基拼图 +15 收起 理由
霊烏路 空 + 30 + 15

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2005-12-21 23:54:29 | 显示全部楼层
12/20/2005
NAS Ship Date
The New NAS has shipped. The Holiday season may affect shipping times but I will begin configuration as soon as it arrives. We may encounter obstacles migrating data and we will tackle them one at a time - my priority is to get a working NAS in place ASAP. Target is to complete this before the end of the year.

2005年12月20日
新NAS今天发货了
新NAS今天发货了。节日也许会影响运输时间,但当它一到达我就将开始配置。我们也许在转移数据时会遇到许多难题,不过我们将尽量一一解决他们 - 我的优先事项是使新的NAS尽快开工。目标是尽量在年底之前完成这项任务。

评分

参与人数 1基本分 +10 维基拼图 +6 收起 理由
霊烏路 空 + 10 + 6

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2006-1-19 02:33:45 | 显示全部楼层
January 17, 2006
Stats froze - but that is nothing new. When this happens, we restart the process and use the backup, flushing the current queue (and offending task that it was processing). However, we have experienced a serious file error during recovery and are currently analyzing what our options are. This is the first time the Backup has been bad

2006年1月17日
统计挂了 - 这是老问题了。当它发生后,我们重启进程,并使用备份,清空当前的队列(禁用处理任务)。但是,在修复期间,我们遇到了一个严重的文件错误,当前我们正在考虑对策。这是备份第一次出错。

January 17, 2006
Things look poor at the moment - we experienced some disk errors on top of the typical "stats are down" Not sure what to expect as recovery options. :/

Because we are transitioning the development from Sengent to Adam the recovery may take longer than average as it is much more valuable to have Adam work with Sengent on the repair to facilitate knowledge tenasfer. I am coordinating now.

2006年1月17日
现在情况不容乐观 - 在"统计挂掉"的情况下,我们遇到了一些磁盘错误,不清楚下一步该怎么做。:/

由于我们的开发团队正处于由Sengent向Adam的转型期中,所以也许要多花些时间才能恢复,只有Adam和Sengent鼎立合作才能尽快恢复。我现在在协调完成它。

评分

参与人数 1基本分 +20 维基拼图 +10 收起 理由
霊烏路 空 + 20 + 10

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2006-2-11 16:45:04 | 显示全部楼层
February 02, 2006
Task Outage
I have new hardware for the Structures NAS and it has been exceedingly difficult to migrate data while y'all are crushing it. It will be offline temporarily while I try and get the brand spanking new hardware up to speed.

2006年2月2日
断粮通知
我已经为NAS结构安装了新硬件,如果在大家计算的时候进行数据转移将非常困难。所以我们将暂时关闭项目,以便让新硬件能全力以赴工作。

评分

参与人数 1基本分 +10 维基拼图 +5 收起 理由
霊烏路 空 + 10 + 5

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2006-2-11 16:50:05 | 显示全部楼层
February 08, 2006
Tasks ... what's going on?
The new hardware for the Task Assignment is just dandy, but getting the data moved from the old hardware to the new has turned into a nightmare.
Copying the data has repeatedly failed. A new file replication tool has produced the first successful result so far, but the hardware decay on the old NAS is such that the speed is severely limited. We are copying over 100 million files and 102 gigs of data at approximately .5 - 1 gig an hour. 200 hours of downtime is NOT what we signed up for, but we are more than half way there.

2006年2月8日
任务...咋了?
新硬件很爽,但移动数据成了我们的恶梦。
数据拷贝总是失败。我们使用新的复制工作,已经成功拷贝了,但老NAS的硬件很慢,所以速度受到一定的限制。我们正在以大约0.5~1GB/小时的速度拷贝超过1亿个文件,102GB数据。我们大约要拷贝200个小时,现在已经拷贝了一半多的数据。

[ Last edited by vmzy on 2006-2-11 at 17:18 ]

评分

参与人数 1基本分 +12 维基拼图 +6 收起 理由
霊烏路 空 + 12 + 6

查看全部评分

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 新注册用户

本版积分规则

论坛官方淘宝店开业啦~
欢迎大家多多支持基金会~

Archiver|手机版|小黑屋|中国分布式计算总站 ( 沪ICP备05042587号 )

GMT+8, 2024-3-29 22:17

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表