[BOINC 开发文档]Designing distributed computations 部分

Youth · 发表于 2006-4-1 20:35:47

真是长啊,红色部分不好翻译,然后页面的格式也不是太好,有点晕...

http://boinc.berkeley.edu/sched_locality.php

Locality scheduling
Last modified 12:23 AM UTC, March 17 2005

局部调度

Locality scheduling is intended for projects for which

* Each workunit has a large input file (it may have other smaller input files as well).
* Each large input file is used by many workunits.

The goal of locality scheduling is to minimize the amount of data transfer to hosts. In sending work to at given host, the scheduler tries to send results that use input files already on the host.

局部调度用于如下类型的项目：

* 每个任务包都有比较大的输入文件(同时也可以有其它一些更小的输入文件)。
* 每个大的输入文件都被多个任务包所使用。

局部调度的目标是尽可能减少主机的数据传输量。在向特定主机发送任务的时候，调度服务会试图发送那些使用已经在该主机上的输入文件的计算结果。

To use locality scheduling, projects must do the following:

* Workunit names must be of the form FILENAME__*, where FILENAME is the name of the large input file used by that workunit. These filenames cannot contain '__'.
* The <file_info> for each large input file must contain the tags

<sticky/>
<report_on_rpc/>

* The config.xml file must contain

<locality_scheduling/>

要使用局部调度，项目必须做如下工作：

* 任务包的名称必须是 FILENAME__* 的形式，其中FILENAME是任务包所使用的大输入文件的名称。这个文件名不能包含“__”。
* 每个大输入文件的 <file_info> 必须包含如下标签：
* config.xml 文件必须包含：

Locality scheduling works as follows:

* Each scheduler RPC contains a list of the large files already on the host, if any.
* The scheduler attempts to send results that use a file already on the host.
* For each file that is on the host and for which no results are available for sending, the scheduler instructs the host to delete the file.

局部调度按如下方式工作：

* 每个调度服务的远端过程调用都包含一个主机上已存在的大文件列表。
* 调度服务试图发送那些使用主机上已经存在的文件的计算结果。
* 对于主机上的每一个已经没有计算结果可以发送的文件，调度服务通知主机删除该文件。

On-demand work generation

按需的任务生成

This mechanism, which is used in conjunction with locality scheduling, lets a project create work in response to scheduler requests rather than creating all work ahead of time. The mechanism is controlled by an element in config.xml of the form:

<locality_scheduling_wait_period> N </locality_scheduling_wait_period>

where N is some number of seconds.

When a host storing file X requests work, and there are no available results using X, then the scheduler touches a 'trigger file'

PROJECT_ROOT/locality_scheduling/need_work/X

The scheduler then sleeps for N seconds, and makes one additional attempt to find suitable unsent results.

这个机制，配合局部调度使用，可以使项目根据调度请求来创建任务而不是预先把任务都生成好。此机制可以通过 config.xml 文件中的一个元素来控制：

其中 N 是以秒为单位的一个数字。

当存储了文件 X 的主机请求任务而服务端并没有使用文件 X 的计算结果时，调度服务将创建一个“触发文件”在：

然后调度服务将休眠 N 秒，再试图寻找合适的非发送的计算结果。

The project must supply a 'on-demand work generator' daemon program that scans the need_work directory. If it finds an entry, it creates additional workunits for the file, and the transitioner then generates results for these workunits. N should be chosen large enough so that both tasks complete within N seconds most of the time (10 seconds is a good estimate).

The work generator should delete the trigger file after creating work.

项目必须提供一下提供“按需任务生成”服务的后台程序，它将对 need_work 目录进行扫描。如果寻找到文件，它会为该文件创建额外的任务包，而转换服务(transitioner)也将为些任务包生成计算结果。N 应该足够大，以便这两步操作可以在最多 N 秒的时间内完成(比如10秒就是一个不错的估计)。

In addition, if the work generator (or some other project daemon) determines that no further workunits can be made for a file X, then it can touch a trigger file

PROJECT_ROOT/locality_scheduling/no_work_available/X

If the scheduler finds this trigger file then it assumes that the project cannot create additional work for this data file and skips the 'notify, sleep, query again' sequence above. Of course it still does the initial query, so if the transitioner has made some new results for an existing (old) WU, they will get picked up.

另外，如果任务生成器(或其它的项目后台程序)认为已经没有更多的任务包可以为文件 X 创建，那它将创建如下的触发文件：

如果调度服务找到了这个触发文件，那它就认为项目已经不能为这个数据文件创建更多的任务，它将跳过上面的“通知，休眠，再次查询”的步骤。当然，它仍然会进行第一次查询，因此如果转换服务已经为一个现有的任务包生成了一些新的计算结果，他们将被如数取走。

Implementation notes

实现细节

Work is organized in a hierarchy:

File -> workunit -> result

Let's say there are N active hosts and target_nresults=M. Optimally, we'd like to send each file to M hosts, and have them process all the results for that file.

If the one_result_per_user_per_wu rule is in effect, a file may have work but be 'excluded' for a particular user.

任务按如下的层次结构组织：

文件 -> 任务包 -> 计算结果

如果说有 N 个活动的主机，target_nresults 的值是M。理想情况下，我们想要把每个文件都发送给 M 个主机，并且让他们处理使用那个文件的所有计算结果。

如果设置了 one_result_per_user_per_wu(每用户每任务只能有一个计算结果) 标志，有可能虽然一个文件仍有任务，但并不会发送给一个特定的用户。

Assigning work to a host with no files:

* maintain a working set of N/M files
* when a host with no file requests work, choose a file F uniformly (randomly or sequentially) from the working set.
* if F is excluded for this user, choose a file using a deterministic algorithm that doesn't involve the working set (don't want to do this in general to avoid flocking)

The working set is represented by a directory

PROJECT/locality_scheduling/file_working_set/

whose contents are names of files in the working set. A project-specific 'working set manager' daemon is responsible for maintaining this.

If the scheduler finds that there are no sendable results for a file, it makes a file with that name in

PROJECT/sched_locality/files_no_work/

The working set manager should poll this directory and remove those files from the working set. NOTE: BOINC may later create more results for the file, so it may be necessary to add it to the working set again.

向没有文件的主机分发任务时：

* 维护一个由 N/M 个文件组成的任务集
* 当没有文件的主机请求任务时，按某种方式(随机地或顺序地)从任务集中选择一下文件 F。
* 如果 F 不适合该用户，使用一个确定性的工作集无关(以防止flocking)的算法来选择一个文件。

如果调度服务发现没有适合某个文件的可以送出的计算结果，它会创建一个同名文件在：

任务集管理器会检查这个目录，然后从任务集中移除相关的文件。注意：BOINC 可能以后会为该文件创建更多的计算结果，因此仍有可能再次把它加入到任务集中。

Assigning work to a host with a file F:

* send more results for file F. To do this efficiently, we maintain the following invariant: For a given user/file pair, results are sent in increasing ID order.

Some projects may want to generate work incrementally. They can do this by supplying a 'work generator' daemon that polls the directory

PROJECT/locality_scheduling/need_work/

and creates work for any filenames found there. To enable this, add the element to config.xml; this tells the scheduler how long to wait for work to appear.

NOTE: we assume that all results have app_versions for the same set of platforms. So if any result is rejected for this reason, we give up immediately instead of scanning everything.

向有文件 F 的主机分发任务时：

* 发送更多的使用文件 F 的计算结果。为保证效率，我们维护了一个不变量：对于给定的用户或文件对，计算结果按 ID 递增的顺序发送。

一些项目可能希望增量地生成任务。它们可以通过提供一个“任务生成器”后台程序定期检查如下目录：

并为在目录中找到的文件创建任务来做到。在 config.xml 文件中增加一个元素就可以启用(参考上面的"按需的任务生成"部分)；调度程序可从中知道应该等待多长时间才有任务出现。

注意：我们假设对于同样的平台集，所有的计算结果都有 app_versions。因此如果任何计算结果因为这个原因被拒绝，我们将立即放弃而不去做任何扫描。

[ Last edited by Youth on 2006-4-1 at 20:37 ]

Youth · 发表于 2006-4-1 22:09:51

http://boinc.berkeley.edu/security.php

Security issues in volunteer computing
Last modified 9:16 PM UTC, December 23 2005

志愿计算中的安全问题

Many types of attacks are possible in volunteer computing.

* Result falsification. Attackers return incorrect results.
* Credit falsification. Attackers return results claiming more CPU time than was actually used.
* Malicious executable distribution. Attackers break into a BOINC server and, by modifying the database and files, attempt to distribute their own executable (e.g. a virus program) disguised as a BOINC application.
* Overrun of data server. Attackers repeatedly send large files to BOINC data servers, filling up their disks and rendering them unusable.
* Theft of participant account information by server attack. Attackers break into a BOINC server and steal email addresses and other account information.
* Theft of participant account information by network attack. Attackers exploit the BOINC network protocols to steal account information.
* Theft of project files. Attackers steal input and/or output files.
* Intentional abuse of participant hosts by projects. A project intentionally releases an application that abuses participant hosts, e.g. by stealing sensitive information stored in files.
* Accidental abuse of participant hosts by projects. A project releases an application that unintentionally abuses participant hosts, e.g. deleting files or causing crashes.

志愿计算有可能面临多种类型的攻击：

* 结果造假。攻击者返回不正确的计算结果。
* 积分造假。攻击者返回计算结果时声称使用了比实际情况更多的处理时间。
* 发布恶意执行文件。攻击者侵入 BOINC 服务器后，通过更改数据库和文件，试图分发他们自己的执行文件(比如一个病毒程序)并伪装成一个 BOINC 计算程序。
* 数据服务器溢出。攻击者反复地向 BOINC 数据服务器发送大文件，填满它们的磁盘，使它们不可用。
* 攻击服务器以窃取用户的帐号信息。攻击者侵入 BOINC 服务器并窃取电子邮件地址和其它的帐号信息。
* 攻击网络以窃取用户的帐号信息。攻击者利用 BOINC 的网络协议以偷取帐号信息。
* 窃取项目文件。攻击者窃取输入和(或)输出文件。
* 项目对用户主机的故意滥用。项目有意地发布一个计算程序以滥用用户的主机，比如窃取存储在文件中的敏感信息。
* 项目对用户主机的无意滥用。项目无意地发布一个计算程序，而且无意地滥用了用户的主机，比如删除文件或导致主机崩溃。

BOINC provides mechanisms to reduce the likelihood of some of these attacks.

BOINC 提供了各种机制以降低类似这些攻击的可能性：

Result falsification
This can be probabilistically detected using redundant computing and result verification: if a majority of results agree (according to an application-specific comparison) then they are classified as correct.

结果造假
这个在可以通过冗余计算以及计算结果验证来检测：如果大部分计算结果都相符(使用计算程序特定的比较方法)，那他们都被认为是正确的。

Credit falsification
This can be probabilistically detected using redundant computing and credit verification: each participant is given the minimum credit from among the correct results (or some other algorithm, such as the mean or median of claimed credits).

积分造假
这个在可以通过冗余计算以及计算结果验证来检测：每个用户均被授予正确的计算结果中的最小的积分(或者其它的算法，比如申请积分的平均值或中间值)。

Malicious executable distribution
BOINC uses code signing to prevent this. Each project has a key pair for code signing. The private key should be kept on a network-isolated machine used for generating digital signatures for executables. The public key is distributed to, and stored on, clients. All files associated with application versions are sent with digital signatures using this key pair.
Even if attackers break into a project's BOINC servers, they will not be able to cause clients to accept a false code file.
BOINC provides a mechanism by which projects can periodically change their code-signing key pair. The project generates a new key pair, then (using the code-signing machine) generates a signature for the new public key, signed with the old private key. The core client will accept a new key only if it's signed with the old key. This mechanism is designed to prevent attackers from breaking into a BOINC server and distributing a false key.

发布恶意执行文件
BOINC 使用代码签名来防止这类攻击。每个项目都有一对密钥用于代码签名。其中私钥应该被保存在网络隔离的计算机上，计算程序版本发送时就使用这对密钥进行数字签名。
即使攻击者进入了 BOINC 服务器，他们仍然不能使用户接收一个错误的代码文件。
BOINC 提供了一个让项目定期改变他们的代码签名密钥的机制。项目生成一个新的密钥对，然后使用代码签名计算机对新的公钥生成一个签名，用旧的密钥进行签名。客户端核心只接受用旧的密钥签名的新密钥。这个机制设计用来阻止攻击者侵入 BOINC 服务器并分发一个错误的密钥。

Denial of server attacks on data servers
Each result file has an associated maximum size. Each project has a upload authentication key pair. The public key is stored on the project's data servers. Result file descriptions are sent to clients with a digital signature, which is forwarded to the data server when the file is uploaded. The data server verifies the file description, and ensures that the amount of data uploaded does not exceed the maximum size.

对数据服务器的拒绝服务攻击
每个计算结果文件都有一个相关的最大限制。每个项目都一个上传认证密钥对。公钥保存在项目的数据服务器上。计算结果文件描述被发送到客户端时携带了一个数字签名，在上传文件时这个签名也会被发送。数据服务器会验证文件描述以确保上传的文件的大小不会超过最大限制。

Theft of participant account information by server attack
Each project must address theft of private account information (e.g. email addresses) using conventional security practices. All server machines should be protected by a firewall, and should have all unused network services disabled. Access to these machines should be done only with encrypted protocols like SSH. The machines should be subjected to regular security audits.
Projects should be undertaken only the organizations that have sufficient expertise and resources to secure their servers. A successful attack could discredit all BOINC-based projects, and public-participation computing in general.

攻击服务器以窃取用户的帐号信息
每个项目都应该使用传统的安全措施以解决私人帐号信息如电子邮件地址等的保密问题，所有的服务器计算机都应该安装防火墙，并禁用所有不使用的服务。对这些计算机的访问应该只能通过如 SSH 之类的加密协议。所有的计算机都应该定期地进行安全检查。
项目应该仅由那些有足够的专业能力和资源来保护他们的服务器的组织来承担。一次成功的攻击将使所有基于 BOINC 的项目或所有面向公众的计算失去信用。

Theft of participant account information by network attack
Attackers sniffing network traffic could get user's account keys, and use them to get the user's email address, or change the user's preferences. BOINC does nothing to prevent this.

攻击网络以窃取用户的帐号信息
攻击者可能通过窃听网络数据来得到用户的帐号密钥，并以此得到用户的电子邮件地址，或是改变用户的个人参数。BOINC 本身无法阻止这种攻击。

Theft of project files
The input and output files used by BOINC applications are not encrypted. Applications can do this themselves, but it has little effect since data resides in cleartext in memory, where it is easy to access with a debugger.

窃取项目文件
BOINC 计算程序使用的输入输出文件都没有加密。计算程序可以自行加密，但效果不大，因为数据以明文形式驻留在内存中，仍然可以通过某些调试工具来访问。

Intentional abuse of participant hosts by projects
BOINC does nothing to prevent this (e.g. there is no 'sandboxing' of applications). Participants must understand that when they join a BOINC project, they are entrusting the security of their systems to that project.

项目对用户主机的故意滥用
BIONC 本身无法阻止这种攻击(计算程序并不是在“沙箱”中工作的)。用户必须理解当他们加入一个 BOINC 项目，他们已经把安全的责任交给了项目方。

Accidental abuse of participant hosts by projects
BOINC prevents some problems: for example, it detects when applications use too much disk space, memory, or CPU time, and aborts them. But applications are not 'sandboxed', so many types of accidental abuse are possible. Projects can minimize the likelihood by pre-released application testing. Projects should test their applications thoroughly on all platforms and with all input data scenarios before promoting them to production status.

项目对用户主机的无意滥用
BOINC 解决了这方面的一些问题：比如，它会检测出计算程序使用了过多的磁盘空间，内存或是处理器时间，并中止它们。但计算程序并不是在“沙箱”中工作，因此仍有很多可能造成无意的滥用。项目可以通过在正式发布前测试计算程序来尽量减少这种可能性。项目应该在正式发布前针对所有平台针对所有的输入数据类型对他们的计算程序进行充分的测试。

碧城仙 · 发表于 2006-4-12 20:12:28

本帖除第 4 楼以外的其他页面均已转到服务器上。请通过 http://boinc.equn.com/dev/create_project.html 页面 Designing distributed computations with BOINC 部分的链接进入查看。

		自动登录	找回密码
密码			新注册用户

[已完成翻译] [BOINC 开发文档]Designing distributed computations 部分

评分

评分