HPCwire
The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing / November 16, 2006
Grid@SC06
Converging Virtualization with Distributed Computing

On Friday at the Supercomputing conference in Tampa, the first IEEE/ACM International Workshop on Virtualization Technologies in Distributed Computing will be held. The convergence of virtualization technologies and distributed computing is an area of ongoing development and the subject of much current research. The workshop is intended as a forum for the exchange of ideas and experiences on the use of virtualization technologies in distributed computing, the challenges and opportunities offered by the development of virtual systems themselves, and case studies of application of virtualization.

Chairing the workshop will be Kate Keahey, an Argonne National Laboratory scientist working on the Globus Toolkit and other aspects of Grid technology. We recently had the opportunity to ask Keahey about what she would like to accomplish in the Friday workshop, the work she is involved with at Argonne's Distributed Systems Laboratory, and how virtualization is being used to implement Grid computing.

---

GRIDtoday: Can you describe the research work being done at the Distributed Systems Laboratory at Argonne and your involvement?

KATE KEAHEY: The Distributed Systems Laboratory does research on Grid computing and its various aspects. We also develop a variety of widely used Grid services collectively known as the Globus Toolkit.  Within this framework, my research focuses on mechanisms to dynamically provision well-defined execution environments -- aka, "virtual workspaces" -- and the various resource and policy management issues that it entails.


Gt:
Virtualization and distributed computing seem to permeate everything in IT today. Tell us about some of the ways virtualization is converging with distributing computing and how Grid technology fits in.

KEAHEY: I think of virtualization as a vehicle to realize the dream of Grid computing -- obtaining on-demand computational resources from distributed sources in the same simple and intuitive way we get electricity today. Today, in order to run a job on the grid a user has to identify a set of platforms capable of running that job by virtue of having the right installation of operating system, libraries, tools, and the right configuration of environment variables, etc. In practice, this means that the choice of platforms will either be limited to a very narrow set, or the job first has to be made compatible with an environment supported by a large resource provider, such as TeraGrid. For some applications this is a significant hurdle. Furthermore, even if you do manage to identify such an environment, it is hard to guarantee that the resource will be available when needed, for as long as needed, and that the user will gets his or her fair share of that resource.

Virtualization introduces a layer of abstraction that turns the question around from "let's see what resources are available and figure out if we can adapt our problem to use them" to "here is an environment I need to solve my problem -- I want to have it deployed on the grid as described." For a user this is a much simpler question. The issue is whether we can implement the middleware that will map such virtual workspace onto physical resources. One way to implement it would be to provide an automated environment installation on a remote node.

But what really gives this idea a boost is using virtual machine technology to represent such a workspace. This makes the environment easy to describe (you just install it), easy to transport, fast to deploy and, thanks to recent research, very efficient. Best of all, virtual machine management tools nowadays allow you to enforce the resource quantum assigned to a specific virtual machine very accurately -- so you could for example test or demo your application in a virtual cluster making sparing use of resources, and redeploy the virtual cluster on a much more powerful resource for production runs. This is another powerful idea behind virtualization: the environment is no longer permanently tied to a specific amount of resource but rather this resource quantum can be adjusted on-demand.

Similarly, we can define virtual storage and implemented using distributed storage facilities, or overlay networks implemented on top of networking infrastructure. We can compose those constructs to put together whole "virtual grids" and test their operation before requesting serious resource allocations. There are many exciting ongoing research efforts in this area and some of them will be represented at the VTDC workshop.

Further down the road, if the idea of running virtual machines becomes ubiquitous, we may find other ways of leveraging the fact that we can have more than one isolated "hardware device" on a physical resource. We could use it to host physical devices requiring isolation for security reasons. We could carry around pluggable virtualized environments the way we carry laptops today. We could rely on migration to a greater extent to provide uninterrupted services. All those potential applications will come more clearly in focus once we see how widespread the appeal of virtual machines will prove in practice.
 

Gt:
What are the key advantages of the convergence?

KEAHEY: The key and most immediate advantages are simplicity, making resource sharing easier, greater manageability -- in other words things that improve your "quality of life" as a Grid user and make on-demand resource provisioning applicable to a broader set of applications. The other advantage is that virtualization provides a good vehicle for quality of service through the capability to enforce resource usage assigned to any particular environment. This again makes the Grid and distributed computing generally applicable to a wider set of applications by providing a mechanism whereby resources can be found exactly when they are needed. 


Gt:
Can you give us some real-world examples?

KEAHEY: One interesting example comes from the Open Science Grid (OSG) which has been actively exploring the use of virtualization in various Grid settings for a while now. The Edge Services Project, conceived by Frank Wuerthwein of SDSC is exploring the use of virtual machines for site access management. "Edge Services" are services that mediate access between a site and the external world, for example by handling job submissions, data movement requests, or providing caches of information pulled from external sources. There are two issues associated with those services. First, each of the virtual organizations (VOs) belonging to OSG may have its own requirements with respect to the Edge Services it is going to run. Those requirements are often conflicting so each organization would have to configure its Edge Service on a different node -- resulting in a lot of potentially underutilized nodes. The other issue is fair share -- the Edge Services can easily become a bottleneck so if the VOs share the same node and one of them experiences heavy traffic, the quality of service for all VOs goes down. An ideal situation would be if the Edge Services for different VOs could be brought up dynamically only as they are needed and isolated in terms of both configuration and resource consumption. This is exactly what using the virtual workspaces provides -- a VO administrator can request the deployment of a specific Edge Service (configured to reflect the needs of his or her VO) and assign to it a resource quantum that guarantees the resources independently of load experienced by other VOs. One of our collaborators, Abhishek Rana, is working on configuring Edge Service images for the CMS project and using this mode of Edge Service provisioning on the OSG SDSC site.

Another example is the use of virtualization to dynamically provision environments and resources for applications. We are now working with Doug Olsen at LBNL on using workspaces to provide on-demand cycles for the STAR application doing HEP calculations. Again, it is hard to dynamically find an environment capable of running an application just when you need it -- and even if you do, installing some applications could be hard. We were lucky here to find that the rPath company was already working with Doug on producing a Xen image of the application. By itself, the workspace service is a little bit like iTunes without the music, so starting out with a ready-made image was very important. Using the workspace service we can now deploy this image on-demand on the University of Chicago TeraPort cluster -- all Doug needs to use is specify an option in his usual job submission tool and new platforms are dynamically provisioned for him on the TeraPort cluster. This effort is in a proof-of-concept stage right now (and we are demonstrating its use at SC in the Argonne booth) but it looks very promising. What we would eventually like to do is give scientists a tool whereby they can provision resources easily and allow them to expand the resource pool as their needs grow -- we realize that it will probably take some more research and more dedicated individuals to make this vision happen.
 

Gt: How are virtual workspaces and the workspace service implemented?

KEAHEY: The workspace service allows a Grid client to dynamically and securely deploy workspaces on a pool of resources using a Web Services Resource Framework (WSRF) interface. A workspace deployed in this way can then be managed -- paused, restored, "resized" (in terms of the resource allocation assigned to a workspace), or terminated. Using the Globus Toolkit's implementation of WSRF allows us to leverage its many features such as security handling -- very important for our service. A typical installation consists of the Workspace service running on a node that serves as a front-end, and a very lightweight control script on any node that hosts the workspaces. Based on requests accepted by the workspace service, this control script interacts with a locally installed workspace management system to deploy and manage workspaces.

At this point we rely on Xen virtual machines for workspace implementation. As a dynamically moving open source project, Xen is an ideal platform for our communities and in a fast-paced research environments it has the capability to rapidly and reliably integrate new, much needed features. We would like to explore other implementations in the future -- our first prototype was using VMware -- but our most immediate focus is to make this service useful as soon as possible.


Gt:
What would you like to accomplish with the Virtualization Technologies in Distributed Computing workshop you're chairing at SC06?

KEAHEY: Jose Fortes and I decided to organize the workshop because we thought that it would be useful to bring the virtualization and distributed computing communities together and facilitate the exchange of ideas and results. Some communities are working on investigating virtual machine management functions on a very fine-grain level, investigating, for example, how individual virtual machines should be scheduled to ensure timely processing of network traffic. Others are working on a level of constructing virtual grids. Those communities do not always talk to each other and so the requirements and results on various levels do not always propagate and are not necessarily well-understood. We essentially wanted for all those people to get in the same room and have some exciting conversations that will lead to a better understanding of the challenges, increase the synergy and iterations on solutions -- and most of all, provoke new ideas. Judging by the quality of submissions, lots of people with exciting things to say shared in this wish. We have a wonderful program and I am looking forward to the workshop!