Another Barrier Goes Down

By Kate Keahey | Posted in Uncategorized | 07-16-2010 | No responses

Right on the heels of Amazon’s groundbreaking news on the Cluster Compute instances a couple of days ago, comes this announcement about a partnership between CENIC, Pacific NorthWest GigaPoP (PNWGP), and Amazon: two 10 Gigabit per second (Gbps) connections to Amazon S3 and EC2. This connection will be available to CENIC and PNWGP member institutions (educational and research institutions on the West Coast and in the Pacific North-West) — among others, the many ocean scientists of the Ocean Observatory Initiative (OOI) who we are working with to develop cloud-based scientific infrastructure.

In other words, we can now not only use IaaS to lease supercomputers – we can move data to those supercomputers fast. While so far these high-speed connections to Amazon are not generally available, it will be interesting to see what scientists will be able to do with them, what performance will be achievable in practice, and how it will change the scientific use of cloud computing. That’s two great developments for cloud computing for science in one week – it seems that the pace of progress on this front is accelerating ;-).

There is a New Supercomputer on the Block

By Kate Keahey | Posted in Uncategorized | 07-13-2010 | No responses

We all woke up to a game-changing announcement today: Amazon announced Cluster Compute instances designed to support the kinds of closely coupled workloads that high performance computing (HPC) relies on. The Cluster Compute instances consist of a pair of quad-core Intel “Nehalem” processors with 23 GB of RAM, and 1690 GB of local instance storage. But by far the best part of the offering is the 10 Gbps network that connects Cluster Compute instances — essential for HPC applications.

The real headline though is that for the first time ever a virtual cluster could be featured on the Top500 list. Amazon published the result of the High Performance Linpack benchmark on a virtual cluster made up of 880 Cluster Compute instances (7040 cores) and measured the overall performance at 41.82 TeraFLOPS. This would place a virtual cluster made out of Cluster Compute instances in the 146th position on the Top500 list. For a sense of scale, the somewhat larger in size TACC Lonestar cluster, serving as computational resource in TeraGrid, currently occupies the 123th position on this list.

How much does it all cost? A quick back-of-the-envelope calculation shows that at $1.60 per hour, the Cluster Compute on-demand instances cost about $14K per node per year. However, if you use reserved instances the price drops significantly. Based on 100% utilization for a 3 year reserved instance (which is more similar to buying a supercomputer for 3 years) you’d pay only $0.81 per instance ($6590 up front and $0.56 per hour), in other words, $7K per node per year – but that’s all-inclusive, no additional operating costs. This rough calculation does not include the cost of EBS and data transfer which to some extent depend on the use of the cluster — still, something to keep in mind.

Grids versus Clouds

By Kate Keahey | Posted in Uncategorized | 12-19-2009 | One response

The issue of how exactly cloud computing differs from grid computing was responsible for much controversy in the last year. Here are my two cents on how Infrastructure-as-a-Service (IaaS) cloud computing and grid computing are different (also discussed in the Sky Computing paper)

At some level, both cloud computing and grid computing represent the idea of using remote resources. However, grid computing is built on the assumption that control over the manner in which resources are used stays with the site, reflecting local software and policy choices. These choices are not always useful to remote users who might need a different operating system, or login access instead of a batch scheduler interface to a site. Reconciling those choices between multiple groups of users proved to be complex, time-consuming, and expensive. Looking back, leaving complete control over the resources with the site was a pragmatic choice that enabled very fast adoption of a radically transformative technology. On the other hand, once the technology became successful, this factor made it difficult for it to scale to many user groups with different (and sometimes conflicting) requirements of what the resource should provide.

IaaS cloud computing represents a fundamental change of assumption: when a remote user “leases” a resource, the control of that resource is turned over to that user. This change in assumption was enabled by the availability of a free and efficient virtualization technology: the Xen hypervisor. Before virtualization, turning over the control to the user was fraught with danger: the user could easily subvert a site. But virtualization provides a way of isolating the leased resource from a site in a secure way that mitigates this danger. Virtual machines can be deployed very fast (on the order of miliseconds) – when in addition to that the overhead and the price associated with a reliable virtualization technology went down, it suddenly became viable and cost-effective to use them in order to lease resources to remote users.

The ability to turn the control of remote resource over to the user makes it possible to develop tools, such as Nimbus, and provide services such as Amazon’s EC2 or Science Clouds that allow users to carve out their own custom “site” out of remote resources. At the same time, this change of assumption challenges the established notions of what it means to be a site as we continue to struggle with the new meaning and implications of domain names, site licenses, and established security practices.

EC2’s boot from EBS capability

By Tim Freeman | Posted in News | 12-04-2009 | No responses

Amazon AWS recently announced that EC2 instances can be configured to launch from EBS volumes instead of bundled disk images.

Science users launching heterogeneous clusters can possibly take advantage of this in order to streamline the bundling of images. Those clusters often share a base image layout. Because these AMIs can now reference any number of EBS volumes in their external description including for the root disk, you can now work on customizing each partition and “mix and match” root disks and partitions more easily to make a cohesive cluster. That’s more convenient than maintaining such a partition organization separately and bundling images for each cluster node type, which is traditionally time consuming.

Another change is that the AMI can be above 10GB (up to 1TB) when launched in this manner: some clusters we have seen are pushing that limit even without any data sets!

There is an added cost involved in using EBS which must be taken into account. And EBS is charged by both disk size and number of I/O operations, so this may not be useful in a lot of cases.

Cloud Computing and Bioinformatics: Notes from a Workshop

By Kate Keahey | Posted in General | 12-01-2009 | No responses

I recently attended an immensely interesting workshop on using cloud computing for systems biology computations. The workshop was co-held with SC09. The agenda and the presentations are available online from the workshop pages and are well worth a look. Here are some impressions from the workshop.

The workshop began with a discussion of current challenges in biosciences. One of the most compelling is personal medicine which helps physicians tailor treatments to individual patients based on feedback obtained on genetic and molecular level. For example, knowledge of genetic variations can now help physicians better assess treatment risks, manage dosing of drugs, better detect diseases in early stages and optimize treatments such as e.g., breast cancer therapy. In his introductory talk, Eugene Kolker said that today there were already hundreds of patients treated based on information obtained from their genetic signatures as part of experimental programs. He also emphasized that the main obstacle to progress in this area is not obtaining the data but the response time and ability to store, process, and analyze it to obtain the right information. And this brings us to cloud computing, in this workshop the “prime suspect” to process, analyze and store on demand.

Simon Twigger from the Medical College of Wisconsin made a very compelling case for why bioscientists need cloud computing. His based his case on the pipette analogy – a common tool in molecular biology and medicine typically equipped with a disposable tip. The analogy was particularly apt as probably 90% of the audience was using the pipette on a daily basis. Simon proposed the following: “Imagine that you are running your lab with only one pipette tip to share.” [Huge laughter from the audience.] He then went on to explain how this assumption would change the work pattern in his laboratory. First, everybody would have to wait in line to use the pipette tip. Because of this waiting, they would do a lot less work. They would also do only small scale things because the imaginary pipette tip is small (moving large quantities of liquid would take weeks!). They would do fewer things because washing the pipette between uses is a pain. And finally they would not try to do something risky, because what it the pipette tip, e.g., becomes clogged? Having only one 16-node cluster for the lab, Simon explained, was exactly like having only one pipette tip – it was a bottleneck for the work in the lab. You queue your program and can’t make progress till the results become available. Because of that you do less work. Since the cluster is small, you also try only small scale things – as well as fewer types of things because different types of things may require configuration changes. And the risky stuff you don’t do at all.

The panel in the afternoon presented some option for cloud computing for science. Kathy Yelick from LBNL and our own Pete Beckman described the recently funded DOE Magellan project – a research project looking at how to build clouds for science. Afterwards, Owen White from the University of Maryland started a discussion on what makes cloud computing compelling to science. In addition to issues brought up earlier by Simon, the ease-of-use plays a very significant role. Owen described how his group was trying to use the TeraGrid and found it too complex to use both procedurally and technically – they were not able to overcome the entry barrier despite the significant resource incentive. The ease-of-use question has many aspects. Pete summed it up by saying that half the users tell him that they want to develop their own VM images and half that they don’t. A rough show of hands showed that in that particular audience everybody thought that developing their own image was much simpler than adapting their application to an environment provided by somebody else (because this is effectively the alternative). This does raise an issue however: for some people the need to develop their own image may be too high a barrier.

As if to address this issue the panel was followed by a presentation from Sam Angiuoli from the University of Maryland. Sam described an appliance for automated analysis of sequence data developed for the bio community. It seems that a model is emerging where some users take the initiative to develop appliances on as a service to their community. This is similar to e.g., the high-energy physics CERNVM project that provides images supporting all four LHC experiments.

The workshop was wrapped up by a talk from Deepak Singh from Amazon Web Services who described AWS capabilities but also the different ways in which various projects use them. It’s fun to see new potential for science emerge!

Welcome to scienceclouds.org

By Kate Keahey | Posted in General | 11-16-2009 | One response

Today we are moving Science Clouds to its own web pages. In addition to enabling quite a few exploratory projects, the Science Clouds to date served as a bit of a “cloud clinic” where various folks interested in using cloud computing for a scientific project would contact us and get advice and help on how to get started using Infrastructure-as-a-Service (IaaS) clouds.

Over time, these efforts resulted in shared images, papers, cloud evaluation projects, and other endeavors of which the following had the most impact:

The STAR “last minute” experiment: one of our most fascinating experiences last year has been helping the scientists of the STAR nuclear physics experiment meet a conference deadline by conducting a significant last-minute run on Amazon’s EC2. The run took place over 300+ virtual nodes, deployed as virtual clusters with the help of the Nimbus Context Broker, and ran over roughly 10 days consuming more than 36,000 hours of compute time. More details and perspectives are available in articles in ISGTW, HPCwire, and Newsweek.

The ALICE project with CernVM: the challenge was to elastically extend the globally distributed set of resources available to the ALICE HEP experiment — one of the four Large Hydron Collider experiments at CERN — in such a way that users don’t even notice whether their code runs in cloud provided resources or not. This was achieved by sensing resource demand and dynamically deploying virtual machine images developed by the CernVM project. More details available in HPCwire.

Sky Computing: can you create an environment deployed over a federation of cloud resources that is configured and protected in the same way as a local cluster? We defined and evaluated it in a paper that appeared in the September/October 2009 IEEE Internet Computing issue.

We appreciate all efforts and ideas that we have seen in the last year. With this new site we’d like to broaden the potential for discussions and share not just resources and images, but also papers, thoughts, and ideas on all sorts of topics related to how cloud computing can help science. If you have ideas, experiences or thoughts, let us know.