Incubator/Falkon

From Globus

Contents

FALKON: A FAST AND LIGHT-WEIGHT TASK EXECUTION FRAMEWORK

Falkon aims to enable the rapid and efficient execution of many tasks on large compute clusters, and to improve application performance and scalability using novel data management techniques. Falkon combines three techniques to achieve these goals: (1) multi-level scheduling techniques to enable separate treatments of resource provisioning and the dispatch of user tasks to those resources; (2) a streamlined task dispatcher able to achieve order-of-magnitude higher task dispatch rates than conventional schedulers; and (3) performs data caching and uses a data-aware scheduler to leverage the co-located computational and storage resources to minimize the use of shared storage infrastructure. Falkon’s integration of multi-level scheduling, streamlined dispatchers, and data management delivers performance not provided by any other system. Falkon has been deployed and tested in a wide range of environments, from 100 node clusters, to Grids (TeraGrid), to specialized machines (SiCortex with 5832 CPUs), to supercomputers (IBM BlueGene/P with 160K CPUs). Micro-benchmarks have shown Falkon to achieve over 15K+ tasks/sec throughputs, scale to millions of queued tasks, and to execute billions of tasks per day. Large-scale applications from many domains (e.g., astronomy, medicine, biology, chemistry, molecular dynamics, economics, and analytics) have been successfully executed using the Falkon framework. Data diffusion has also shown to improve applications scalability and performance, with its ability to achieve hundreds of Gb/s I/O rates on modest sized clusters, with Tb/s I/O rates on the horizon. Falkon is actively being developed at University of Chicago with funding from NSF, DOE, and NASA, and has been instrumental in several other proposals to DOE and NSF for additional funding.

Goals

  • Reducing task dispatch time by using a streamlined dispatcher that eliminates support for features such as multiple queues, priorities, accounting, etc.
  • Using an adaptive provisioner to acquire and/or release resources as application demand varies.
  • Improve application performance and scalability through data diffusion and data-aware scheduling to leverage the co-located computational and storage resources offloading the shared file systems I/O with local disk I/O.

Project Branches

NEWS

Activity to Date

For more details on the various activities of Falkon (and Swift in certain cases) to date, please see the Activity Wiki. This wiki has detailed graphs of the outlined records for performance and scalability, and log history records.

Records: Performance and/or Scalability

  • 10-16-08: Endurance Test on ANL/UC TG, 1B tasks on 128 CPUs in 19.2 hours (15558 tasks/sec)
  • 07-10-08: Scalability Test on BG/P, 1M tasks on 160K CPUs in 6 minutes (3071 tasks/sec)
  • 06-28-08: MARS application on BG/P, 1M tasks on 128K CPUs in 41 minutes (9.3 CPU years)
  • 06-21-08: DOCK5 application on BG/P, 913K tasks on 116K CPUs in 120 minutes (21.4 CPU years, 99.7% efficiency, 99.6% utilization)
  • 06-12-08: DOCK5 application on BG/P, 217K tasks on 32K CPUs in 50 minutes
  • 04-01-08: DOCK5 application on SiCortex, 90K tasks on 5.6K CPUs in 210 minutes (1.94 CPU years, 98% efficiency)
  • 04-01-08: MARS application on BG/P, 48K tasks on 2K CPUs in 27 minutes (0.1 CPU years, 97.3% efficiency)

Note: K denotes thousands (1024), M denotes millions (1024*1024 = 1048576), B denotes billions (1024*1024*1024 = 1073741824)

Log History

  • December 2007 - April 2009 plot of Falkon across various systems (ANL/UC TG 316 CPU cluster, SiCortex 5832 CPU machine, IBM Blue Gene/P 4K and 160K CPU machines, and Sun Constellation with 62K CPUs)
* Falkon-logs-history.jpg

Documents

Publications

Presentations

Other Documents

Mailing Lists

Developer discussion (falkon-dev) archive/subscribe/unsubscribe
User discussion (falkon-user) archive/subscribe/unsubscribe
Announcements (falkon-announce) archive/subscribe/unsubscribe
Commit notifications (falkon-commit) archive/subscribe/unsubscribe

How to subscribe
How to unsubscribe
Search the email archives

Bugs Reporting

FAQ

Source Code

Code

To download the entite Falkon source tree, type:

svn co https://svn.globus.org/repos/falkon

For the latest snapshot (as well as past versions) of a the code as a source tree archive, please see below:

Prerequisites

  • Java 1.4+
  • SVN: only needed for source control
  • Apache Ant: only needed to compile
  • Ploticus: included; only needed to generate graphs
  • Globus Toolkit 4: included; needed to compile and run
  • gcc or g++: only needed for a specialized C-based component that was build to run on the IBM BlueGene

Instructions

Committers, Contributors, and Sponsors

Committers

If you would like to become a committer, guidelines are here:

  • Ioan Raicu, The University of Chicago, Computer Science Department
  • Yong Zhao, Microsoft
  • Zhao Zhang, The University of Chicago, Computational Institute

Contributors

The Falkon project gratefully acknowledges the following contributions:

  • Alex Szalay, John Hopkins University
  • Catalin Dumitrescu, Fermi National Labs
  • Ian Foster, Argonne National Laboratory (Math and Computer Science Div.) & The University of Chicago (Computer Science Department)
  • Mike Wilde, University of Chicago (Computation Institute) & Argonne National Laboratory
  • Ben Clifford, The University of Chicago, Computational Institute
  • Mihael Hategan, The University of Chicago, Computational Institute

Sponsors

The Falkon project gratefully acknowledges the following sponsors:

  • NASA Ames Research Center GSRP Grant Number NNA06CB89H
  • U.S. Dept. of Energy, Office of Advanced Scientific Computing Research, Office of Science, Mathematical, Information, and Computational Sciences Division Contract DE-AC02-06CH11357

Miscellaneous

Status

Newly accepted Incubator Project 11/06/2007, as defined by the Incubator Process Guidelines found at http://dev.globus.org/wiki/Incubator/Incubator_Process .

Roadmap

We plan to have an official incremental code snapshots every few months. The current version is v0.9, and we plan to release v1.0 in June 2008. The transition from 0.9 to 1.0 will mostly address ease of installation/use, reliability, and robustness; v1.0 will also have the basic data diffusion support. Check back soon for more details on the release schedule.

Policies

The Falkon project adheres to the following guidelines: Globus Alliance Project Guidelines.

Personal tools
Execution Projects
Information projects
Distribution Projects
Documentation Projects
Deprecated