Blog: Partly Cloudy with a Chance of Condor
02 Oct 2009 - Douglas Thain
We have been thinking about cloud computing quite a bit over the last month. As I noted
earlier
, cloud computing is hardly a new idea, but it does add a few new twists on some old concepts in distributed systems. So, we are spending some time to understand how we can take our existing big applications and make them work with cloud systems and software. It should come as no surprise that there are a number of ways to use
Condor
to harness clouds for big applications.
Two weeks ago, I gave a talk titled
Science in the Clouds
at an NSF workshop on
Cloud Computing and the Geosciences
. One of the points that I made was that although clouds make it easy to allocate new machines that have exactly the environment you want, they don't solve the problem of work management. That is, if you have one million tasks to do, how do you reliably distribute them between your workstation, your campus computer center, and your cloud workforce? For this, you need some kind of job execution system, which is largely what grid computing has focused on:
As it stands, Condor is pretty good at managing work across multiple different kinds of systems. In fact, today you can go to a commercial service like
Cycle Computing
, who can build an on-demand Condor pool by allocating machines from Amazon:
Just today, we hosted Dhruba Borthakur at Notre Dame. Dhruba is the project lead for the open source
Apache Hadoop
system. We are cooking up some neat ways for Condor and Hadoop to play together. As a first step, one of my students Peter Bui has cooked up a module for
Parrot
that talks to HDFS, the Hadoop file system. This allows any Unix program -- not just Java -- talk to HDFS, without requiring the kernel configuration and other headaches of using FUSE. Then, you can submit your jobs into a Condor pool and allow them to access data in HDFS as if it were a local file system. The next step is to co-locate the Condor jobs with the Hadoop data that they want to access.
Finally, if you are interested in cloud computing, you should attend CCA09 -
Cloud Computing and Applications
- to be held in Chicago on October 20th. This will be a focused, one day meeting with speakers from industry, academia who are both building and using cloud computers.
« Prev: REU Project: BXGrid
Next: On Programming With Processes, Part II »