BaBar from the children’s books is a young elephant who comes to a big city, and brings back the benefits of civilization to other elephants in the jungle. He also happens to be a very apt mascot for a high-energy physics project.
The name BaBar actually derives from the B/B-bar subatomic particles produced at the SLAC collider in Stanford, California during electron-positron collisions. These experiments help us achieve a better understanding of the relationship between matter and anti-matter and ultimately answer questions about the nature of the universe. This groundbreaking research is moving forward at a brisk pace: the BaBar scientists have petabytes of data, a plethora of insight, and a recent Nobel prize to show for it.
Sifting through petabytes of data requires petaflops of computation. And here the BaBar scientists faced what is a common problem in science — the software required to process the data is complex: hard to maintain, impossible to rewrite. To illustrate: we are talking about roughly 9 million lines of C++ and Fortran code developed by hundreds of scientists spread across 75 institutions in 10 different countries… Such software requires significant effort to port to new platforms and operating systems. This, in practice, limits the scientists’ access to resources since it takes a lot of effort to port the code to e.g., a new grid site.
And this is also where cloud computing comes in. I was recently talking to Ian Gable from the University of Victoria (UVIC) who told me of their efforts to put BaBar in the clouds. The UVIC team’s idea was to put BaBar software on virtual machine (VM) images that could be run on any Infrastructure-as-a-Service (IaaS) cloud. They configured a base image which could be customized to support specific BaBar applications – this smoothed the way for the scientists to try cloud computing because it took the time-consuming configuration step out of the equation.
Running BaBar in the clouds proved to be a great success: so much so that the demand soon outstripped the capabilities of the Nimbus cloud at UVIC (named, appropriately enough, “Elephant” ;-). Fortunately, the UVIC team was able to provision additional cloud resources from the Alto Nimbus cloud at the National Research Council in Ottawa and from EC2 East. But another problem then arose: how can you present resources provisioned over distributed clouds to the users in a uniform fashion? Rather than sending people from cloud to cloud shopping for cycles, the UVIC team automated the process. They did so via an infrastructure called the Cloud Scheduler – a combination of cloud provisioning and the Condor scheduler typically used to submit BaBar jobs. Cloud Scheduler works roughly like this: a user submits a job to a Condor queue (as always) and the Cloud Scheduler monitors the queue. Then, if there are jobs in the queue Cloud Scheduler stands up VMs on the various available clouds and makes them available to the Condor installation; if there are no jobs in the queue, the VMs are terminated. With the aid of this infrastructure, BaBar computations have been happily churning away on the distributed clouds demonstrating how the provisioning issues in a distributed environments can be effectively overcome.
The BaBar story highlights two trends that increasingly come up in the context of cloud computing for science. One is the need to provide appliance management for scientists who do not necessarily have the skills — or frankly — the interest in working on configuring and maintaining an operating system installation. This is something that previously came “bundled” with resource management: a system administrator would both keep the machines running and try to provide a software configuration reconciling the needs of different application groups. The latter became harder as more application groups coming from remote sites tried to use the resource since there were more potentially conflicting requirements to resolve. Infrastructure-as-a-Service cuts the system administrator’s job in half. The cloud of course still needs to be operated — but the software it is running is now application-independent. The application software still has to be configured — but this is now done by configuring community-specific appliances developed by community experts such as the UVIC team who stepped up as appliance providers for their community.
The second trend is using multiple clouds – what we called at some point “sky computing”. In provisioning resources for BaBar, the UVIC team managed to successfully combine two distributed community clouds and one commercial cloud thus demonstrating that resources can indeed be provisioned from multiple sources with minimum inconvenience to the user. Furthermore, they provided a mechanism to do it based on need. This pattern — elastically provisioning over multiple clouds — is becoming increasingly important as communities explore using multiple resources — we used it in our work with the ALICE experiment at CERN, working on the ElasticSite project which elastically adds resources provisioned in the cloud to a local scheduler, and are implementing it in the infrastructure for the Ocean Observatory Imitative (OOI) – something to watch.
All in all an amazing effort by the UVIC team in making cloud computing easier to use for their community. When we think of elephants we tend to think of heavy, plodding animals that are hard to move… Not this one ;-).