Layer
|
Cloud Stack
|
Workflow Stack
|
HPC Stack
|
Declarative Language (DL)
|
Pig
|
Weaver
|
Swift-T
|
Directed Acyclic Graph (DAG)
|
Map-Reduce
|
Makeflow
|
-
|
Bag of Tasks (BOT)
|
JobTracker
|
Work Queue Master
|
Turbine
|
Distributed Data
|
HDFS
|
Work Queue Workers
|
MPI
|
Each layer of the system fulfills a distinct need. The declarative language (DL) at the top is compact, expressive, and easy for end users, but is intractable to analyze in the general case because it may have a high order of complexity, possibly Turing-complete. The DL can be used to generate a (large) directed acyclic graph (DAG) that represents every single task to be executed. The DAG is not a great user-interface language, but it is much more suitable for a system to perform capacity management and optimization because it is a finite structure with discrete components. A DAG executor then plays the graph, dispatching individual tasks as their dependencies are satisfied. The BOT consists of all the tasks that are ready to run, and is then scheduled onto the underlying computer, using the data dependencies made available from the higher levels.
Why bother with this sort of model? It allows us to compare the fundamental capabilities and expressiveness of different kinds of systems. For example, in the realm of compilers, everyone knows that a proper compiler consists of a scanner, a parser, an optimizer, and a code generator. Through these stages, the input program is transformed from a series of tokens to an abstract syntax tree, an intermediate representation, and eventually to assembly code. Not every compiler uses all of these stages, much less the same code, but by using a common language, it is helpful to understand, compare, and design new systems.