Incubator/Falkon
From Globus
Contents |
FALKON: A FAST AND LIGHT-WEIGHT TASK EXECUTION FRAMEWORK
Falkon aims to enable the rapid and efficient execution of many tasks on large compute clusters, and to improve application performance and scalability using novel data management techniques. Falkon combines three techniques to achieve these goals: (1) multi-level scheduling techniques to enable separate treatments of resource provisioning and the dispatch of user tasks to those resources; (2) a streamlined task dispatcher able to achieve order-of-magnitude higher task dispatch rates than conventional schedulers; and (3) performs data caching and uses a data-aware scheduler to leverage the co-located computational and storage resources to minimize the use of shared storage infrastructure. Falkon’s integration of multi-level scheduling, streamlined dispatchers, and data management delivers performance not provided by any other system. Falkon has been deployed and tested in a wide range of environments, from 100 node clusters, to Grids (TeraGrid), to specialized machines (SiCortex with 5832 CPUs), to supercomputers (IBM BlueGene/P with 160K CPUs). Micro-benchmarks have shown Falkon to achieve over 15K+ tasks/sec throughputs, scale to millions of queued tasks, and to execute billions of tasks per day. Large-scale applications from many domains (e.g., astronomy, medicine, biology, chemistry, molecular dynamics, economics, and analytics) have been successfully executed using the Falkon framework. Data diffusion has also shown to improve applications scalability and performance, with its ability to achieve hundreds of Gb/s I/O rates on modest sized clusters, with Tb/s I/O rates on the horizon. Falkon is actively being developed at University of Chicago with funding from NSF, DOE, and NASA, and has been instrumental in several other proposals to DOE and NSF for additional funding.
Goals
- Reducing task dispatch time by using a streamlined dispatcher that eliminates support for features such as multiple queues, priorities, accounting, etc.
- Using an adaptive provisioner to acquire and/or release resources as application demand varies.
- Improve application performance and scalability through data diffusion and data-aware scheduling to leverage the co-located computational and storage resources offloading the shared file systems I/O with local disk I/O.
Project Branches
- Efficient Task Dispatch and Execution
- Dynamic Resource Provisioning
- Enabling Data Diffusion (Data Caching, Data Management, and Data-Aware Scheduling) in Falkon
- Executing Swift Workflows using Falkon
- Moving from Batch-Scheduled Grids to Economic Driven Resources: Falkon and the Amazon Elastic Computing Clound (EC2)
- Falkon Support on PetaScale System such as the IBM BlueGene/P
- AstroPortal Image Stacking Service
NEWS
- SC08 Activities
- Paper: Ioan Raicu, Zhao Zhang, Mike Wilde, Ian Foster, Pete Beckman, Kamil Iskra, Ben Clifford. “Towards Loosely-Coupled Programming on Petascale Systems”, to appear at IEEE/ACM Supercomputing 2008.
- Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS08), co-located with SC08
- Presentation: "Accelerating Large-Scale Data Exploration through Data Diffusion", DADC 2008, June 24th, 2008.
- Presentation: "From the Heroic to the Logistical, Programming Model Implications of new Supercomputing Applications", CLADE 2008, June 23rd, 2008.
- Presentation: "Accelerating Large-Scale Data Exploration through Data Diffusion", DSLW 2008, May 22nd, 2008.
- Presentation: "Managing and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers", GlobusWorld 2008, May 15th, 2008.
- Presentation: "Harnessing Grid Resources with Data Data-Centric Task Farms", NASA Ames Research Center, May 14th, 2008.
- Paper: Yong Zhao, Ioan Raicu, Ian Foster. “Scientific Workflow Systems for 21st Century e-Science, New Bottle or New Wine?”, Invited Paper, IEEE Workshop on Scientific Workflows 2008, co-located with IEEE International Conference on Services Computing (SCC) 2008.
- Paper: Ioan Raicu, Yong Zhao, Ian Foster, Alex Szalay. "Accelerating Large-scale Data Exploration through Data Diffusion", International Workshop on Data-Aware Distributed Computing 2008, co-locate with ACM/IEEE International Symposium High Performance Distributed Computing (HPDC) 2008.
Activity to Date
For more details on the various activities of Falkon (and Swift in certain cases) to date, please see the Activity Wiki. This wiki has detailed graphs of the outlined records for performance and scalability, and log history records.
Records: Performance and/or Scalability
- 10-16-08: Endurance Test on ANL/UC TG, 1B tasks on 128 CPUs in 19.2 hours (15558 tasks/sec)
- 07-10-08: Scalability Test on BG/P, 1M tasks on 160K CPUs in 6 minutes (3071 tasks/sec)
- 06-28-08: MARS application on BG/P, 1M tasks on 128K CPUs in 41 minutes (9.3 CPU years)
- 06-21-08: DOCK5 application on BG/P, 913K tasks on 116K CPUs in 120 minutes (21.4 CPU years, 99.7% efficiency, 99.6% utilization)
- 06-12-08: DOCK5 application on BG/P, 217K tasks on 32K CPUs in 50 minutes
- 04-01-08: DOCK5 application on SiCortex, 90K tasks on 5.6K CPUs in 210 minutes (1.94 CPU years, 98% efficiency)
- 04-01-08: MARS application on BG/P, 48K tasks on 2K CPUs in 27 minutes (0.1 CPU years, 97.3% efficiency)
Note: K denotes thousands (1024), M denotes millions (1024*1024 = 1048576), B denotes billions (1024*1024*1024 = 1073741824)
Log History
- December 2007 - April 2009 plot of Falkon across various systems (ANL/UC TG 316 CPU cluster, SiCortex 5832 CPU machine, IBM Blue Gene/P 4K and 160K CPU machines, and Sun Constellation with 62K CPUs)
*![]()
Documents
Publications
- Ioan Raicu, Zhao Zhang, Mike Wilde, Ian Foster, Pete Beckman, Kamil Iskra, Ben Clifford. “Towards Loosely-Coupled Programming on Petascale Systems”, to appear at IEEE/ACM Supercomputing 2008.
- Yong Zhao, Ioan Raicu, Ian Foster. “Scientific Workflow Systems for 21st Century e-Science, New Bottle or New Wine?”, Invited Paper, IEEE Workshop on Scientific Workflows 2008, co-located with IEEE International Conference on Services Computing (SCC) 2008.
- Ioan Raicu, Yong Zhao, Ian Foster, Alex Szalay. "Accelerating Large-scale Data Exploration through Data Diffusion", International Workshop on Data-Aware Distributed Computing 2008, co-locate with ACM/IEEE International Symposium High Performance Distributed Computing (HPDC) 2008.
- Yong Zhao, Ioan Raicu, Ian Foster, Mihael Hategan, Veronika Nefedova, Mike Wilde. “Realizing Fast, Scalable and Reliable Scientific Computations in Grid Environments”, book chapter in Grid Computing Research Progress, ISBN: 978-1-60456-404-4, Nova Publisher 2008.
- Ioan Raicu, Yong Zhao, Ian Foster, Alex Szalay. “A Data Diffusion Approach to Large Scale Scientific Exploration”, Microsoft Research eScience Workshop 2007.
- Ioan Raicu, Yong Zhao, Catalin Dumitrescu, Ian Foster, Mike Wilde. “Falkon: a Fast and Light-weight tasK executiON framework”, IEEE/ACM SuperComputing 2007.
- Ioan Raicu, Catalin Dumitrescu, Ian Foster. Dynamic Resource Provisioning in Grid Environments, TeraGrid Conference 2007.
- Yong Zhao, Mihael Hategan, Ben Clifford, Ian Foster, Gregor von Laszewski, Ioan Raicu, Tiberiu Stef-Praun, Mike Wilde. “Swift: Fast, Reliable, Loosely Coupled Parallel Computation”, IEEE Workshop on Scientific Workflows 2007.
- Alex Szalay, Julian Bunn, Jim Gray, Ian Foster, Ioan Raicu. “The Importance of Data Locality in Distributed Computing Applications”, NSF Workflow Workshop 2006.
Presentations
- "Harnessing Grid Resources with Data Data-Centric Task Farms", Notre Dame University, CSE Department, August 20th, 2008.
- "Scientific Workflow Systems for 21st Century, New Bottle or New Wine?", IEEE Workshop on Scientific Workflows 2008, July 2008.
- "Accelerating Large-scale Data Exploration through Data Diffusion", ACM/IEEE International Workshop on Data-Aware Distributed Computing 2008, June 2008.
- "Accelerating Large-Scale Data Exploration through Data Diffusion", DSLW 2008, May 22nd, 2008.
- "Managing and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers", GlobusWorld 2008, May 15th, 2008.
- "Harnessing Grid Resources with Data Data-Centric Task Farms", NASA Ames Research Center, May 14th, 2008.
- "Harnessing Grid Resources with Data Data-Centric Task Farms", Hyde Park Global Investments LLC, April 18th, 2008.
- "Harnessing Grid Resources with Grid Resources with Data Data-Centric Task Farms Centric Task Farms", University of Chicago, Department of Computer Science, Dissertation Proposal, December 12th, 2007.
- "Falkon: a Fast and Light-weight tasK executiON framework for Grid Environments", IEEE/ACM SuperComputing 2007, November 15th, 2007.
- "Accelerating Large Scale Scientific Exploration with Falkon", IEEE/ACM SuperComputing 2007, Argonne National Laboratory Booth, November 14th, 2007.
- "A Data Diffusion Approach to Large Scale Scientific Exploration", University of Chicago, CS Department, DSL Seminar, October 24th, 2007.
- "A Data Diffusion Approach to Large Scale Scientific Exploration", 2007 Microsoft eScience Workshop at RENCI, October 21st, 2007.
- "Falkon: a Fast and Light-weight tasK executiON framework for Grid Environments", DSL Workshop 2007, University of Chicago, April 30th, 2007.
- "Towards Urgency Solutions in the Globus Toolkit", April 27th, 2007.
Other Documents
- "Falkon Brochure", November 2007.
- Ioan Raicu, Ian Foster, "Harnessing Grid Resources with Data-Centric Task Farms", Technical Report, Department of Computer Science, University of Chicago, November 2007.
- Ioan Raicu, Ian Foster. "Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets", NASA GSRP Proposals and Progress Reports, Ames Research Center, NASA, 2006 - 2008.
Mailing Lists
| Developer discussion (falkon-dev) | archive/subscribe/unsubscribe |
| User discussion (falkon-user) | archive/subscribe/unsubscribe |
| Announcements (falkon-announce) | archive/subscribe/unsubscribe |
| Commit notifications (falkon-commit) | archive/subscribe/unsubscribe |
How to subscribe
How to unsubscribe
Search the email archives
Bugs Reporting
FAQ
Source Code
Code
To download the entite Falkon source tree, type:
svn co https://svn.globus.org/repos/falkon
For the latest snapshot (as well as past versions) of a the code as a source tree archive, please see below:
Prerequisites
- Java 1.4+
- SVN: only needed for source control
- Apache Ant: only needed to compile
- Ploticus: included; only needed to generate graphs
- Globus Toolkit 4: included; needed to compile and run
- gcc or g++: only needed for a specialized C-based component that was build to run on the IBM BlueGene
Instructions
Committers, Contributors, and Sponsors
Committers
If you would like to become a committer, guidelines are here:
- Ioan Raicu, The University of Chicago, Computer Science Department
- Yong Zhao, Microsoft
- Zhao Zhang, The University of Chicago, Computational Institute
Contributors
The Falkon project gratefully acknowledges the following contributions:
- Alex Szalay, John Hopkins University
- Catalin Dumitrescu, Fermi National Labs
- Ian Foster, Argonne National Laboratory (Math and Computer Science Div.) & The University of Chicago (Computer Science Department)
- Mike Wilde, University of Chicago (Computation Institute) & Argonne National Laboratory
- Ben Clifford, The University of Chicago, Computational Institute
- Mihael Hategan, The University of Chicago, Computational Institute
Sponsors
The Falkon project gratefully acknowledges the following sponsors:
- NASA Ames Research Center GSRP Grant Number NNA06CB89H
- U.S. Dept. of Energy, Office of Advanced Scientific Computing Research, Office of Science, Mathematical, Information, and Computational Sciences Division Contract DE-AC02-06CH11357
Miscellaneous
Status
Newly accepted Incubator Project 11/06/2007, as defined by the Incubator Process Guidelines found at http://dev.globus.org/wiki/Incubator/Incubator_Process .
Roadmap
We plan to have an official incremental code snapshots every few months. The current version is v0.9, and we plan to release v1.0 in June 2008. The transition from 0.9 to 1.0 will mostly address ease of installation/use, reliability, and robustness; v1.0 will also have the basic data diffusion support. Check back soon for more details on the release schedule.
Policies
The Falkon project adheres to the following guidelines: Globus Alliance Project Guidelines.
