|
发表于 2005-5-11 19:48:14
|
显示全部楼层
我也只是大概的看了一下,原文如下:
Client scheduling
Last modified 6:49 PM UTC, May 09 2005
This document describes the CPU scheduling policy and work-fetch policy used in the BOINC core client, starting with version 4.36.
Terminology
Debt
How much CPU time is 'owed' to a project in order to bring it into parity with other projects, based on the user's resource-share settings. Positive debt means that a project has not had enough CPU time to match its resource share. Negative debt means that a project has had more than its share. Long-term debt is tracked over the entire time the project is attached. Short-term debt is tracked only for projects that have work on the computer.
Deadlines
The 'deadline' of a result is the time by which it must be completed and reported. Deadlines are set by projects. Work that is returned after the deadline may or may not have any value to the project, and it may or may not be granted credit, even if it matches the results that were returned on time.
Goals
The goals of the CPU scheduler and work-fetch policies are:
To complete and report results by their deadline;
To honor resource shares;
To keep an interesting mix of work on the computer.
There may be times when fetching more work will result in missed deadlines.
The CPU scheduler has two modes, normal and panic.
In normal mode the CPU scheduler does round-robin scheduling among results, attempts to honor the resource shares.
In panic mode, the CPU scheduler runs results with the nearest deadline. This allows the client to meet deadlines that would otherwise be missed. Panic mode is entered if either a work unit has a deadline that is very near, or the sum of remaining calculation times is nearly as large as the remaining calculation time available. If the CPU scheduler is in panic mode, no new work is fetched.
The CPU scheduler decides which mode it is in when a result is completed, when the end of the user specified work period is reached, when new work is downloaded, or when the user takes some action through the UI.
The work-fetch policy has three modes: no download, download OK, and download required.
Download required means that there are not enough results to keep all of the CPUs busy or there is not enough work to get to the next time that you have indicated that you are likely to connect. Work should be retrieved from someplace even if it means that work is retrieved from a project with negative long term debt.
No downloads means that the CPU scheduler is in panic mode.
In the downloads OK mode, projects with high long term debt can download work, but projects with very low long term debt cannot. Very low long term debt projects have probably recently caused a panic mode, or they have been dominating the work on the computer in some other way.
BOINC work fetch and CPU policy design
Problem
The old work fetch policy and CPU scheduler policy can miss deadlines for a number of reasons. The computer is slow, too many projects are attached, a short deadline work unit is downloaded, or a work unit with a tight deadline is downloaded.
There is a difference between short deadlines and tight deadlines:
A short deadline is a deadline that would be missed because the debt did not increase to a level where the first time slice was given to the project before the work unit expired. For example the early work units from Pirates had a one hour deadline, and the CharMM work units from Protein Predictor have a 24 hour deadline.
A tight deadline is one where the time to crunch the work unit is a large fraction of the deadline. For example on one of my machines a Sulfur Cycle work unit from Climate Prediction.Net is estimated to take 145 days and has a 180 day deadline, which is more than half of the processing time for the CPU for the duration of the work. In this case the deadline is not short, but it is tight.
With the current policies, the slower the computer, the lower the fraction of time that the computer is on, and the tighter the deadlines for the projects that are attached to that computer, the fewer projects that computer may successfully attach to. In the case of the slowest computers the number may be one, even though there are several which could be run successfully individually.
Design goals
In order to keep the work the computer is running as varied as possible, each computer should be able to attach to as many projects as the user desires if that computer is capable of running each of the projects in isolation. The combination of the work fetch policy and the CPU scheduler should not download too much work for the CPU to complete on time, and should attempt to complete all work that is downloaded on time. Faster computers will be able to keep work from more different projects on hand than slower computers.
Design of the CPU scheduler
The CPU scheduler has two modes, normal and panic. In normal mode, the CPU scheduler uses the current debt calculations to attempt to balance the resource share with the work on hand. For some users with just a few projects and balanced resource shares, they may never leave this mode. In the panic mode, the CPU scheduler processes up the results with the nearest deadlines. It is possible to switch into the panic mode at any time, but the CPU scheduler will finish the current time segment processing the current result. It is only possible to switch out of the panic mode when the CPUs would be rescheduled. Having the CPU scheduler in panic mode is one of the drivers of the work fetch policy.
Design of the work fetch policy
The goals of the work fetch policy are:
not get too much work for the CPU to complete by the deadline.
to honor the resource shares that the user has specified.
to keep an interesting mix of work on the system.
The new work fetch policy limits how much work is on hand, it maintains a debt even if a project does not have work on hand.
The work fetch will always be done in order of highest long term debt. Projects with negative long term debts will not be allowed to connect. This prevents a project with a tight deadline from dominating out of proportion to its resource share. If the user connects to two projects and one of them has a processing time to deadline ratio of 0.6 and the other has a processing time to deadline ratio of 0.1, the project with the deadline ratio above a half would tend to get a 0.6 fraction of the work because the CPU scheduler will occasionally give it several turns out of order to get it done by its deadline. If at that point, that project were allowed to download another work unit, then it would again have to have several turns out of order to meet this deadline as well.
The work fetch policy has several gates in order to prevent downloaded work from overloading the CPU.
The second trigger is to have a tight string of deadlines. Having the CPU scheduler in panic mode for a short deadline will not preclude the downloading of work. If the work unit is due today, but the work otherwise is not in time trouble, there is no reason not to download some more work.
The third trigger is to have the sum of the processing fractions greater than some fraction of the wall time. This gives long term work units a chance to finish slowly instead of all at the end. This will normally be invoked soon enough to prevent the CPU scheduler from entering panic mode because of tight deadines.
Details
Short deadline
Result deadline is less than 24 hours or has already passed. This triggers the CPU scheduler into panic mode.
CPU queue overload
Sort the work units by deadline, earliest first. If at any point in this list, the sum of the remaining processing time is greater than 0.8 * up_frac * time to deadline, the CPU queue is overloaded. This triggers both no work requests and the CPU scheduler into earliest deadline first.
CPU queue fully loaded
Sum the fraction that the remaining processing time is of the time to deadline for each work unit. If this is greater than 0.8 * up_frac, the CPU queue is fully loaded. This triggers no work fetch.
[ Last edited by Youth on 2005-5-11 at 19:54 ] |
|