While many projects focus on high performance, usually measured in floating point operations per second, Condor has taken a different approach, focusing on high throughput with a better measure being floating point operations per year. This focus is called high throughput computing or HTC [condor-2001-manual]. Condor utilizes extra cycles on workstations as part of a cluster, this is similar to the concept behind the Mosix project. The idea originated with Miron Livny and was based on some of his graduate work. For a cluster of computers there is another computer called the ``central-manager'', a nest of sorts for the Condor cluster. When a user submits a job to the cluster, the manager looks over the network of computers and finds computers with available resources where the program may run. Throught this, the metaphor of a condor soaring over the desert looking for food is created. The machines within Condor are commodity level machines, mainly x86 Linux, Solaris and IRIX boxes. There is also support for Alpha Linux and Windows NT but it is called ``clipped'' because of it's lack of support for check-pointing or remote system calls. Furthermore, the computers within the Condor cluster do not need to be dedicated to the cluster. Workstation owners can specify times they wish for Condor to run, for example, only from 6pm to 8am. The owners also may choose to have Condor running whoever the computer is idle, much like Seti@home does with Windows PCs [condor-1993-hunter] In the same way, Condor does not require additional user accounts on the cluster computers, such a premise would be unmanageable on a cluster with hundreds of workstations. Instead it traps the library calls of Condor applications by using a modified version of GLIBC and relays the calls back over the network to the computer from whence the request originated. Although this has the advantage of making it very easy to add more machines to the cluster, it also limits the scalability of the network in a physical sense. High latency connections, such as those found over the Internet, make it difficult for Condor to operate, thus most clusters are restricted to a campus LAN. The ever changing topology of workstations also poses a problem for Condor. When a user returns to their computer, they will usually want their computer to stop running Condor processes. This is accomplished by having programs linked with Condor checkpoint themselves back to their host machine periodically. This may result in a machine loosing up to an hours worth of work when switching tasks, but a user is assured that their task will always complete, eventually, using this technique. One of Condor's most powerful features is that no changes to the source code are required, all that is needed is to link to the programs to the Condor libraries to obtain check-pointing and remote system calls. Condor also allows pools to be hooked together; this technique is called ``Flocking''. Under such a method, when a job is submitted to a pool that is already in use the job may be transferred to another pool that is able to have different execution rules for jobs not native to the pool. Condor also allows complex ordering of jobs via acyclic graphs. Each job can be represented by a node in a graph, the graph can then be submitted to Condor and then the jobs will executed properly. The ClassAds mechanism of Condor gives machines the ability to advertise the resources they have available. Thus, if a Job is specified to require 128 megs of resources, it will not be placed on a computer running with 64 megs or ram. Extremely large tasks will not monopolize all resources on a cluster. Through the ``Up-Down'' algorithm used for scheduling the longer a process runs, the lower priority a process gets. This allows users with short jobs to avoid facing endless queues that were common with mainframe computers of old. Finally Condor can be linked to the Grid via the ``Glide-in'' method. Using this a job submitted on a Condor pool may be executed elsewhere on the global computing Grid. Currently Condor is working with the Globus toolkit for resource sharing [condor-2001-manual]. Although Condor was originally developed, and still has it's home at the University of Wisconsin-Madison, it has since been adopted by many academic institutions. Some of those institutions are listed here. [condor-homepage]. * University of Michigan * US Air Force Academy * National University of Singapore * University of Amsterdam At first glance, Condor seems very similar to Beowulf and Mosix clusters. But it's fundamentally different in some ways which make it both more useful. * Condor and Mosix Mosix is a modification to the Linux kernel that creates a cluster of workstations. It is designed for high performance computing. Under the Mosix system, users need only compile their programs on a Mosix cluster node. Each forked process is able to run on a different node. The downside to this is that the network needs to be fairly homogeneous. Beyond that, migration in Mosix is still a long ways from being stable or usuable for anything beyond simple tests [mosix-homepage]. Condor does not require kernel modifications and does not require the maintenance of global accounts. However, processes in Condor may not use the fork system call, nor may they have long periods of I/O or any IPC as it causes problems for the automatic check-pointing. Given such restrictions, Mosix would work well for a scalable array of web servers that have to do non-trivial calculations on incoming data as the Apache server forks off processes. Likewise, it is best designed for high performance computing. Condor works better of long running jobs that require no interaction such as climate and ocean simulations. * Condor and Beowulf Beowulf is a technology that first saw birth in the summer of 1994 with Thomas Sterling and Donald Becker at CESDIS. The goal was to create a cluster out of commodity and off the shelf parts. The initial system was 16 486DX4 computers networked together. Beowulf is now recognized as a class of high performance computing and there are many vendors who support and build Beowulf class systems. Even IIT has a Beowulf cluster that is used in the Biological, Chemical and Physical Sciences department. Within a Beowulf cluster the nodes are dedicated to the cluster and only usable for the cluster. A cluster also requires a central administration point. This tight bonding allows signals to be passed between nodes, something not possible on Condor but also makes it slightly more expensive to build because you cannot harness idle nodes. Condor does not provide a mechanism to send signals between nodes (it actually disallows use of the SIGUSR2 and SIGTSTP signals beyond that). Furthermore in a Condor program system calls such as sleep, alarm and getittimer are prohibited. Programming for Beowulf clusters is typically done using MPI or PVM. This is in contrast to Condor and Mosix which require only that they are linked to the proper libraries [beowulf-homepage]. Condor is currently still in production and use at the University of Wisconsin-Madison. Currently work is underway on the Condor source tree to remove some Unix specific functionality to make it work on NT systems. Condor is also evolving to work with the Grid as it develops. It is also flexible enough and powerful enough that outside organizations are able to use. Currently the National Center for Supercomputing Applications in Urbana, Illinois makes large use of the University of Wisconsin-Madison cluster as a regional computing center.