|
发表于 2011-1-10 23:32:27
|
显示全部楼层
更新最后一部分13、14、29、37、38
13.Are work units generated automatically or manually?
任务单元是自动还是手动生成的?
BOINC-based work units are always generated in a fully automated manner from operator-supplied input files. RNA World currently relies on operator-curated input archives that will be automatically processed by the RNA World server to yield several thousands of work units per archive. Archives can be placed in a on-hold queue such that once the server is running low on work units, it can process new ones from this supply.
We are currently working on implementing user job submission interfaces such that, under strict security guidelines, researches can use the RNA World distributed supercomputer to process their own project files. Here, we do not plan to allow batch job processing for security reasons and the users will have to register and use a digital certificate for clear identification.
It is also planned to derive work units fully automated by regularly scanning RNA-relevant databases for novel sequences that could be analyzed. 基于BOINC的任务单元一直完全由运行支持出入文件自动生成。RNA World目前依赖于运作辅助输出档案,它可以自动被RNA World服务器所运行以在每个档案中生成数千个任务单元。档案可以被安置在一个冻结队列中这样一旦服务器任务不足时可以从这里生成新的任务。 目前我们正在完善用户任务提交界面,因此在严格的安全保证下,研究人员可以利用RNA World分布式计算运行它们自己的项目文件。现在,出于安全原因我们不打算引入分批任务,用户也需要注册并使用数字证书进行验证。 我们还计划通过常规扫描RNA相关数据库,将任务单元完全自动导出以便寻找可以被分析的新奇的序列。
14.Is a continuous work supply guaranteed?
持续性的任务有保证吗?
Our objective is to continuously recruit more and more RNA-relevant bioinformatic tools to RNA World. Moreover, the data sources containing RNA-relevant information that require analysis by RNA World are growing daily. To cope with these two facts, we expect that RNA World will require increasing compute capacities and consequently should be expected not to run out of work, soon. However, we are computing on an individual project basis plus we try to build up databases containing pre-computed results e.g. for listing potential RNA candidate genes in any given organism. Once our objectives are reached, we will naturally stop sending out work units until we have new projects in store. This will be announced in time on the RNA World website to avoid machines to run idle.我们的目标是不断地在RNA World中引入越来越多的RNA World相关的生物结构的工具。而且,需要由RNA World进行分析的包含有RNA相关信息的资源与日俱增。为了应对这两个事实,我们预期RNA World将需要更强的计算能力并最终能够稳定运行。但是,我们正在一个个体项目基端上运行外加我们准备建立包含有运算前结果的数据库,即列举任意给定组织的潜在RNA候选基因。一旦我们的目标完成,很自然,我们将停止发送任务单元,除非我们又有了新的项目。这将被及时的在RNA World网站上宣布以防(用户)机器白白的运行。
29.What Internet traffic can be expected?
网络流量大约多少?
All files are transferred in compressed format and most files contain simple ASCII data such that compression rate is around 30%, i.e. original file sizes will be reduced to 30% of their original size. In general, CMBUILD and CMCALIBRATE work units are the smallest and should require less than 1 MB (usually even less than 100 kB) of data traffic. CMSEARCH work units cause somewhat higher download traffic depending on the size of the genome that is going to be analyzed: Current upper limit: With a maximum of 512 MB for one of the chromosomes of an opossum (uncompressed file size), 150 MB would have to be transferred (compressed file size) for a CMSEARCH work unit plus a few kB for additional control files. Of course, the upload traffic only contains the result file and not the genome that was searched for RNA presence and consequently will be much, much smaller. Normal traffic: A typical bacterial genome such as that of e.g. E. coli is about 4.6 MB (uncompressed) in size. Hence, 1.3 MB (compressed) of data plus the control files (just a few kB) will be transferred. Lower limit: Many viral genomes as well as plasmid sequences contain less than 10 kB of data in uncompressed format. However, note that small CMSEARCH work units are expected to complete quickly such that your machine may request new data over and over again depending on your systems performance.所有被传输的文件均为压缩格式,大多数包括基本的ASCII数据并且压缩率大约30%,即文件大小将为原始文件的30%。通常,CMBUILD和CMCALIBRATE的任务单元是最小的,应该不超过1MB(一般小于100KB)的流量。CMSEARCH任务单元会有一些更大流量的下载,取决于它将要分析的基因的尺寸:目前的上限是:一个负鼠染色组的大小不会超过512MB(原始数据),150MB的CMSEARCH任务单元将和一些不到1MB的额外的控制文件一同被传送。当然,上传文件只包括结果而不包括所要寻找的RNA基因组,因此最终(大小)将会非常小。普通流量:一个典型的细菌(比如coli)基因组大约为4.6MB(未压缩)。因此,1.3MB(压缩后)的数据外加控制文件(仅仅不到1MB)将被传送。更低的限制:很多病毒感染基因组以及质粒顺序在未压缩模式下只有不到10KB的数据。但是,注意到较小的CMSEARCH任务单元可以预见将被更快的完成,因此取决于电脑性能,你的机器可能会不断地申请任务。
37.I got the message "redundant result", what exactly does that mean?
我得到了“冗余结果”的信息,这是什么意思?
First, a few remarks on the terms used in BOINC. A work unit is defined as a computational job which we would like participants to complete. A result, by contrast, is a collective term for the files which the server generates and sends to the participants. If enough results (quorum) are successful (this includes the data transfer to the participant, computation of the job, return of the result files to the server, etc.) and got validated (i.e. is identical to at least one other result successfully returned to the server), then a work unit is complete. For example, in RNA World, for each CMSEARCH-based work unit three results are being generated and sent to three different machines. If two of these (quorum) are successful and get validated, the work unit is completed. As a consequence, the third result is no longer required, i.e. it is redundant (redundant result). This third result then (1) will not be sent out again (if it has not yet been sent out), (2) will be aborted on the client machine if it has been sent out but computation has not yet commenced or (3) will be completed on receive credits if its computation has already started. We generate more results (three) per work unit than required for the quorum (two), to collect results more quickly. If we would not do it this way, we would always have to wait for the deadline to complete until the server detects that the clients do not send anything else in. Only then the server would generate an additional result and send that on out again and again wait for incoming data.首先,BOINC中很少有关于术语的意见。任务单元被定义成我们想要参与并完成的计算性的任务。结果,相比而言,对文件是一种收集性的终端,它们由服务器生成并发送至参与者。如果足够的结果(“指定结果”)是成功(这包括传送数据给参与者,计算任务,返回结果给服务器等等)且经验证有效的(即至少与另外一个返回结果进行了验证)这样一个任务就算是完成了。例如,在RNA World中,每一个基于CMSEARCH的任务单元会被生成并传给三台不同的机器。如果其中两台(“指定机器”)计算成功并验证有效,该任务即为完成。因此,第三个结果就不再需要了,即冗余结果。这第三个结果将(1)不再被发送(如果尚未发送的话),(2)在用户端被删除,但仅当被发送却没有被计算的情况下,或(3)如果已经被计算的话讲正常提供积分。我们每个任务生成比判定有效任务数更多的结果,以便更快的收集结果。如果我们不这么做的话,我们将只能在截止日到来时才由服务器检测到用户没有发送任何数据并完成该任务。只有在这样的情况下,服务器将会生成额外的结果并一次又一次的发送并等待输入的数据。
38.The progress bar is at 100% and seems to sit there for hours - what is happening here?
进度条在100%上停了很长时间,出了什么问题?
This is common behavior in BOINC projects, especially if you have just switched from another project to RNA World or if the work units of a given BOINC project are very heterogenous compared to each other. RNA World work units are de facto extremely heterogenous in their system requirements. For each computation, a series of small mini simulations is run on the server to estimate the time required for completion on the server. Since your machine differs from our server hardware, information based on the benchmarks performed from time to time on your machine are used to scale the duration determined for that work unit on the server to your machine. This scaling process is good but not perfectly accurate. So, the first work units often differ detectably in completion time from what the progress bar indicates. But, with more and more work units of that type pouring in on your system, a BOINC-integrated calculation mechanism corrects for that deviation in a progressive manner. So, with time, this "sitting at 100%" should become more and more rare. However, if the incoming work units are extremely different from each other in type (as is often the case for RNA World work units even if based on the same application), this adjustment might again turn out inaccurate for these new work units and an automatic re-adjustment will take place. In the worst case scenario, this might lead to the perception of an apparently constant unreliablity of the progress bar indicator. The bottom line is that you should just expect a work unit to take longer than indicated and not conclude there is something wrong with the work unit or your hardware.这在BOINC平台是一个很常见的状态,尤其是你刚刚从另外一个项目转换到RNA World或者BOINC项目的任务单元之间非常不同。RNA World任务单元在系统需求方面事实上是极端相异的。每一个计算,在服务器上运行的一系列微型的模拟完整了计算时间需求。因为你的机器和我们的服务器硬件不同,由基准程序实时运行的信息将会被服务器用来度量任务持续时间。这种度量很有好处但并不是完全准确。所以,第一个任务单元的完成时间经常与任务进度条所显示的不符。不过,由于越来越多的该类型任务单元涌入你的系统内,一个BOINC整合计算方法将会以一种渐进的方法修正这种偏差。但是,如果输入任务单元类型之间迥然相异(尽管基于一个子项目,这在RNA World中会有这种情况),这种调整队新任务可能会再次显得不准确,而在调整将会进行(进一步修正)。在最坏的情况下,这可能会导致一种对于进度条的不信任。至少,你应该知道该任务将比所显示的数值消耗更多的时间而不是任务单元本身有误或硬件出了问题。
终于结束了。。。。。 |
|