|开发者||Laboratoire de Recherche|
en Informatique of the
University of Paris-Sud
|通过 RSS 获取项目新闻|
Desktop grid (DG) systems use the idle computing power of many volunteered desktop PC's to support large-scale computation and storage. For over a decade, DG systems have been the largest and the most powerful distributed computing systems, offering a plethora of computing power at a fraction of the cost of supercomputers. The volunteer desktops participating in DG projects are volatile and heterogeneous, but there is little detailed information about their volatility and heterogeneity. Yet this characterization is essential for the simulation and modelling of such systems. We are conducting a project whose short-term goal is to obtain detailed picture of the DG landscape, and whose long-term goal is to create a testbed for network and distributed computing research. To this end, we have deployed XtremLab, which is a BOINC-based project that actively measures host CPU and network availability on volunteer desktops. The resulting resource measurement data and characterization will be useful for a broad range of research areas, including distributed and peer-to-peer computing, and fault tolerance. Our long-term goal is create a large-scale testbed for networking and distributed computing research. Ultimately, we believe the results will help broaden the range of applications that can utilize desktop grid systems, and accelerate discovery in a variety of scientific domains.
Since the late 1990's, DG systems, such as SETI@Home, have been the largest and most powerful distributed computing systems in the world, offering an abundance of computing power at a fraction of the cost of dedicated, custom-built supercomputers. Many applications from a wide range of scientific domains -- including computational biology, climate prediction, particle physics, and astronomy -- have utilized the computing power offered by DG systems. DG systems have allowed these applications to execute at a huge scale, often resulting in major scientific discoveries that would otherwise had not been possible.
The computing resources that power DG systems are shared with the owners of the machines. Because the resources are volunteered, utmost care is taken to ensure that the DG tasks do not obstruct the activities of each machine's owner; a DG task is suspended or terminated whenever the machine is in use by another person. As a result, DG resources are volatile in the sense that any number of factors can cause the task of a DG application to not complete. These factors include mouse or keyboard activity, the execution of other user applications, machine reboots, or hardware failures. Moreover, DG resources are heterogeneous in the sense that they differ in operating systems, CPU speeds, network bandwidth, memory and disk sizes. Consequently, the design of systems and applications that utilize these system is challenging.
The long-term overall goal of XtremLab is to create a testbed for networking and distributed computing research. This testbed will allow for computing experiments at unprecedented scale (i.e., thousands of nodes or more) and accuracy (i.e., nodes that are at the "ends" of the Internet).
Currently, the short-term goal of XtremLab is to determine a more detailed picture of the Internet computing landscape by measuring the network and CPU availability of many machines. While DG systems consist of volatile and heterogeneous computing resources, it unknown exactly how volatile and heterogeneous these computing resources are. Previous characterization studies on Internet-wide computing resources have not taken into account causes of volatility such as mouse and keyboard activity, other user applications, and machine reboots. Moreover, these studies often only report coarse aggregate statistics, such as the mean time to failure of resources. Yet, detailed resource characterization is essential for determining the utility of DG systems for various types of applications. Also this characterization is a prerequisite for the simulation and modelling of DG systems in a research area where many results are obtained via simulation, which allow for controlled and repeatable experimentation.
For example, one direct application of the measurements is to create a better BOINC CPU scheduler, which is the software component responsible for distributing tasks of the application to BOINC clients. We plan to use our measurements to run trace-driven simulations of the BOINC CPU scheduler in effort to identify ways it can be improved, and for testing new CPU schedulers before they are widely deployed. Approach
We conduct availability measurements by submitting real compute-bound tasks to the BOINC DG system. These tasks are executed only when the host is idle, as determined by the user's preferences and controlled the BOINC client. These tasks continuously perform computation and periodically record their computation rates to file. These files are collected and assembled to create a continuous time series of CPU availability for each participating host. Utmost care will be taken to ensure the privacy of participants. Our simple, active trace method allows us to measure exactly what actual compute power a real, compute-bound application would be able to exploit. Compared to other passive measurement techniques, our method is not as susceptible to OS idiosyncracies (e.g. with process scheduling) and takes into account keyboard and mouse activity, and host load, all of which directly impact application execution.
The results of this research will be useful to distributed computing research and other fields in many of ways. First, the trace data will enable accurate simulation and modelling of DG systems. For example, the traces could be used either to directly drive simulation experiments or to create generative probability models of resource availability, which in turn can be used by simulators to explore a wide range of hypothetical scenarios.
Second, because the traces will contain the temporal structure of availability, the traces will enable the assessment of the utility of DG systems for a wide range of applications. Currently, the range of applications that utilize DG systems effectively has been limited to applications with loosely-coupled tasks that are independent of one another; the volatility and heterogeneity of DG resources makes the execution of tightly-coupled applications with complex task dependencies extremely challenging. With the traces, we could conduct a cost-benefit analysis for a wide range of applications; specifically, we could determine the limitations that prevent certain types of applications from utilizing DG systems effectively, and suggest new research directions to address these limitations.
In addition, we believe our measurements could be useful for other sub-domains in computer science such as fault tolerance, peer-to-peer computing, and Grid computing. For example, one issue relevant to the fault tolerance research community is how often resources crash and why. The data we collect will reflect the time to failure for each desktop resource and thus be a valuable data set for those researchers. We will make the traces publicly available to all these research communities.
Finally, we believe the results of this project will help improve performance and broaden the set of applications that can take advantage DG systems. Currently, only applications with independent, compute-bound tasks can use desktop resources efficiently. We hope that the measurements collect in the near term will be useful in evaluating techniques for broaden the set ot DG application to ones that are more communication-intensive and tightly coupled, for example.
We have previously conducted a number of related research efforts. First, we measured and characterized several DG systems at the University of California at San Diego and the University of Paris-Sud. We obtained several months of traces of the availability of hundreds of desktop PC's within these organizations. We then characterized the DG systems by obtaining several aggregate and per-host statistics. This characterization formed the basis for a model describing the the utility of the DG systems for different applications, and for developing efficient ways of scheduling tasks to DG resources. So that others could use our gathered trace data sets, we created an an online DG trace archive publicly accessible at http://vs25.lri.fr:4320/dg. One limitation of this work, which we address in the XtremLab project, is that no measurements were taken of home desktop PC's, which contribute significantly to Internet-wide DG projects.
The XtremLab Team
The members of the XtremLab team belong to the Laboratoire de Recherche en Informatique (LRI, i.e., computer science laboratory) of the University of Paris-Sud, XI. In particular, Mr. Paul Malecot is a graduate student interested in distributed and parallel computing, and is the primary developer of XtremLab. Dr. Derrick Kondo (http://www.lri.fr/~dkondo/) and Dr. Gilles Fedak both serve as academic advisors for the project. Dr. Derrick Kondo is an INRIA post-doctoral fellow interested in the simulation and modelling of large-scale distributed systems. Dr. Gilles Fedak ( http://www.lri.fr/~fedak/) is an INRIA research scientist and is interested in the design and implementation of distributed systems. Professor Franck Cappello is the director of the project and the computer science laboratory.
The XtremLab project is funded by the Institut National de Recherche en Informatique et Automatique (INRIA), which is the non-profit national French institution for computer science research.