Google Summer of Code 2008 Ideas

From Globus

Globus has been accepted as a Google Summer of Code 2008 mentoring organization. This page lists our proposed GSoC project ideas. The project ideas are grouped according to the Globus technology areas (common runtime, execution management, etc.). This list of ideas is by no means exclusive, so if you have a cool idea for a Globus-related project, please contact one of the GSoC mentors. There are also additional pages where you may be able to find inspiration for interesting summer projects.

Before submitting an application to GSoC with Globus as your mentoring organization, make sure you read our GSoC FAQ, which provides some pointers on how to write a successful application.

If you submit an application to GSoC with Globus as your mentoring organization, please remember that our code is licensed under the Apache 2 License. This means that, for your code to be included in the Globus Toolkit, you can only reuse existing code that is licensed under the Apache 2 License, or a compatible license (most notably, GPL'd code is ineligible for inclusion in the Globus Toolkit). Additionally, if your code gets committed to the official code repository at the end of the summer, you will be asked to sign an Individual Contributor's License.

Contents

Common runtime projects

NAT-friendly service hosting

Globus project: C WS Core

Description: Create a message handler which rewrites network addresses in EPRs returned by the service container with an external IP address. Interact with firewalls to allow connections to the service container's network ports. This will be useful for hosting services or receiving notifications when the client is behind an IP-address-rewriting firewall.

Mentor: Joe Bester

Database-backed resource implementation

Globus project: C WS Core

Description: Create a database-backed implementation of the WSRF resource API to allow transparently persistent resources. Add support for reloading resource state when a container is restarted

Mentor: Joe Bester

Automated generation of command-line Web Services clients

Globus project: C WS Core

Description: The globus-wsrf-cgen program generates C language type and service bindings from WSDL and XML schema documents. Add code to generate a client command-line program for each service in a bindings package. Add support for constructing valid input documents via command line. This would be useful for scripting and testing service implementations

Mentor: Joe Bester

Data projects

XIO Compression Driver

Globus project: XIO

Description: Multicore systems open the door to data transfer compression drivers that enable "faster than network speed" transfers. GridFTP is a data transport protocol that can break up its transfer payload in such a way that streaming it through multiple cores is possible. With the additional parallel processing power added by multicore systems it is possible to pipeline compression and packet switching in such a way that seemingly faster than network speed transfers are possible. The main deliverable is a new XIO driver which will use the libz library to compress/uncompress data buffers as they pass through it. This driver will then be inserted into existing GridFTP servers for experimentation. An additional goal is to understand what factors in the multicore system and the compression algorithms affect the effective transfer speed and processing load of various types of data (ASCII, graphical data, random byte data sets, etc) Additionally, the lessons learned would be used to improve the XIO driver development guide.

Mentor: John Bresnahan

Integration of GridFTP with Freeloader storage system

Globus project: GridFTP

Description: GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-area networks. It is based on the Internet FTP protocol, and it defines extensions for high performance operation and security. Striped data transfer (aka cluster-to-cluster data transfer) is a key feature that utilizes multiple CPUs and NICs to achieve higher performance. In striped mode, however, GridFTP assumes the support of a high-performance parallel file system, a relatively expensive resource. Freeloader is a storage system that aggregates the idle storage space from workstations connected to a local area network to build a high-performance data store. FreeLoader breaks files into chunks and distributes these chunks across the storage nodes. This accelerates read/write operations as they can benefit from the parallel access to multiple disks. This project aims to integrate GridFTP and FreeLoader to reduce the cost and increase the performance of GridFTP deployments.

Links related to this project

Requires: C network programming experience and understanding of GridFTP framework and Freeloader storage system

Mentor: Raj Kettimuthu

New CoG / Swift data transfer providers

Globus project: Swift

Description: The Java CoG kit provides an abstraction for file transfer (for example, local execution, local filesystem copy, over ssh/scp, GridFTP, GRAM2, GRAM4, direct submission to the PBS scheduling system). Providers can then be used by higher level applications such as Swift in order to move data to execution sites applications needing to know the particular details of the mechanism. An interesting project might be to implement a provider for some existing data transfer mechanisms so that they could be used as part of CoG and Swift to distribute and transfer large amounts of data.

Requires: Decent Java programming skills. A favourite data transfer mechanism

Mentor: Ben Clifford

Non-file based (eg database or spatial volume) Swift datasets

Globus project: Swift

Description: Swift has a concept of 'datasets' which at present are simple in-memory values, files on disk, or collections of files on disk. There is a language construct to specify how a a dataset is constructed out of collections of files. When Swift executes a job on a remote system, it transfers the appropriate input datasets to that remote system. When the dataset consists of files, this is accomplished with a file transfer mechanism such as GridFTP. It would be interesting to allow datasets to be specified in ways other than as collections of files; for example specify that a dataset is a table in a database, and allow partitioning of that dataset into several pieces, with an appropriate data transfer mechanism to get data to the remote location and back (for example, dumping the relevant part of a source SQL database and getting it into a database running on a remote system). Dealing with datasets consisting of 3d volumes that need to be specified and moved around in interesting ways could also be implemented.

Execution projects

Windows support for Swift

Globus project: Swift

Description: The bulk of the Swift codebase is written in Java and will run on Windows. Some components (such as the execute time environment) are very unix specific. This project would produce appropriate Windows versions of the execute-time code so that Swift can be used on a Windows platform (both in local mode and submitting to a Windows cluster - initially Condor but maybe other windows clustering technology too). The end goal would be able to run applications like the R stats package or GNU Octave under Windows through Swift (We can already run those through Swift submitting to unix platforms and have users who do that - one of the big benefits of this project would be to allow them to make use of the growing number of Windows workstations that are appearing in compute clusters)

Links related to this project

Requires: Working knowledge of Windows programming - the choice of implementation language is not so important, although one that is freely/commonly available desirable.

Mentor: Ben Clifford

Type checking and inference for Swift

Globus project: Swift

Description: Swift scripts are written in a mostly-functional language called (unsurprisingly) SwiftScript. This language should be strongly typed, but the compiler implementation does not do much at the moment in that respect. There are two potential pieces to this project - someone might want to do both, or just one.

i) implement better compile-time type checking so that more type errors are caught at compile time rather than run time. This piece is desirable in the production/release codebase so would have a high chance of being used but would have a correspondingly high requirement on code quality.

ii) many type declarations can be inferred using some variant of Hindley-Milner type inference (think Haskell...). This piece would be a prototype/proof of concept of doing that type inference - very low chance of finding its way into production codebase, correspondingly lower requirements on code quality, perhaps more interesting for someone interested in programming language research.

Links related to this project

Requires: Decent Java programming skills. An interest in programming languages/compilers.

Mentor: Ben Clifford

New execution and data transfer providers

Globus project: Swift

Description: The Java CoG kit provides an abstraction for process execution and file transfer (for example, local execution, local filesystem copy, over ssh/scp, GridFTP, GRAM2, GRAM4, direct submission to the PBS scheduling system). Execution and transfer providers can then be used by higher level applications such as Swift in order to move data to execution sites and to perform application execution without needing to be particularly aware of how that execution and transfer is happening. An interesting project might be to implement a provider for some existing execution or transfer mechanisms so that they could be used as part of CoG.

Requires: Decent Java programming skills. A favourite execution or data transfer mechanism

Mentor: Ben Clifford

KVM Backend to the Workspace Service

Globus project: Workspace Service

Description: The workspace service enables authorized clients to create "workspaces" -- environments needed by the client to do their work in the Grid. Currently the workspace service implements such workspaces using Xen virtual machines. The purpose of this project is to provide an alternative implementation based on KVM. The project would include designing and developing a virtual machine adapter and providing its implementation using KVM.

Requires: Decent Python, sh, and possibly Java programming skills. Linux administration and security experience. No WS/WSRF experience is necessary. Previous experience with virtualization and in particular KVM.

Mentor: Tim Freeman

Contextualized Virtual Cluster Library

Globus project: Workspace Service

Description: The workspace service enables authorized clients to create "workspaces" -- environments needed by the client to do their work in the Grid. Currently the workspace service implements such workspaces using Xen virtual machines. One of the next releases will include contextualization technology that can launch virtual machines in such a way that they are brought up at the same time with a secure context and mechanisms for lightweight, secure group communications. This context and the mechanisms are used to securely bootstrap virtual cluster configurations. Several virtual cluster samples are already in working form.

The project idea is to develop many such virtual clusters so that there are groups of template VMs that work together to accomplish what the end user wants with zero or minimal configuration. An interesting idea would be to develop a high level system/library (including documentation) where the groups of VMs are familiar in some way as one makes their way from using one cluster to the next, leveraging the user's previous learning.

Requires: Primarily Linux administration and security experience. Decent Python, sh, and possibly Java programming skills. Documentation expertise seems appropriate. No WS/WSRF experience is necessary. Previous experience with virtualization.

Mentor: Tim Freeman

Information projects

AJAX frontend to the MDS Index service

Globus project: MDS4

Description: AJAX provides seemingly realtime updates of data exposed by Web Services by doing a persistent XmlHttpRequest/Response in the background. This project would produce an AJAX frontend to data stored in the MDS Index service. This can be accomplished with a wide variety of free and open source tools, including those provided by Google.

Requires: Some knowledge of AJAX required. Some knowledge of WSRF and MDS4 interfaces is preferable, but can be learned over the summer.

Mentor: Ravi Madduri

Security projects

SAML Holder-of-Key Authentication

Globus project: GridShib

Description: SAML Web Browser SSO involves an authentication request from an HTTP user agent to a SAML identity provider. After identifying the user, the identity provider issues a signed SAML authentication assertion, which is returned to the HTTP user agent. The user, again acting through the HTTP user agent, presents the assertion to a SAML service provider in lieu of local authentication.

In accordance with the standard SAML Web Browser SSO Profile, an authentication assertion is a so-called bearer assertion, that is, the consumer of the assertion (namely, the service provider) assumes the subject of the assertion is the bearer of the assertion. A bearer assertion is a weak authentication token in that theft or misuse of the token enables an attacker to impersonate the subject.

A holder-of-key assertion, on the other hand, is a strong authentication token containing a public key. A service provider accepts a holder-of-key assertion if the presenter can prove possession of the corresponding private key. Thus holder-of-key assertions virtually eliminate the impersonation threat associated with bearer assertions.

The project goal is to implement an HTTP user agent that issues a <saml2:AuthnRequest> and requests a holder-of-key authentication assertion from a SAML V2.0 identity provider. Successful completion of the project will include a demo that displays the resulting holder-of-key assertion.

Use OpenSAML 2.0 to implement the client and Shibboleth 2.0 to implement an authentication request handler that issues holder-of-key assertions. The SAML exchange must conform to the OASIS SAML Holder-of-key Web Browser SSO Profile.

An interesting challenge is that the Holder-of-key Web Browser SSO Profile assumes the SAML requester is the service provider whereas we're interested in the case where the requester is the subject (a self-request, if you will). A project proposal that takes this into account will be given special consideration, and may in fact lead to an alternative specification being submitted to OASIS.

Links related to this project

Requires: A solid understanding of HTTP is required. Previous experience with XML is desirable; knowledge of SAML is a plus. Since both OpenSAML and the Shibboleth identity provider are implemented in Java, moderate to strong programming skill in that language is required.

Mentor: Tom Scavo

Linking Federated Identities

Globus project: GridShib

Description: Federated identity is the sharing of identity (and other security information) across security domains. A motivation for federated identity is the desire to minimize the number of credentials that the user and the relying party must manage.

To federate identity, the user, acting through an HTTP user agent, presents a SAML authentication assertion to a SAML service provider in lieu of local authentication. The authentication assertion contains an identifier for the user. We assume the identifier is persistent so that the service provider can correlate this particular access to a previous access by the same user.

The goal of this project is to extend a SAML service provider implementation so that the user's identity is persisted beyond the current session. Moreover, the identity in the authentication assertion should be mapped to a local identity for consistency and convenience. (The process of creating this mapping is sometimes called account linking.) The user will be known by this local identity throughout the service provider's security domain.

More generally, the task is to implement a Person Manager at the service provider. The Person Manager module links the identity in the authentication assertion to a local identity and persists this local identity in a Person Registry.

As far as we know, there is no open source implementation of such a Person Manager, and it's not entirely clear how this would be done in general. In that sense, this project is more open-ended than other projects listed here. That said, one practical approach to this problem might be to shib-enable a grid portal (such as gridsphere) and then figure out how to implement account linking in the grid portal. Having done this, a general solution to the account linking problem may be determined.

Links related to this project

Requires: A solid understanding of HTTP is required. A working knowledge of XML is desirable; familiarity with SAML and/or Shibboleth are a plus. Database programming experience is essential; JDBC programming experience is preferred.

Mentor: Tom Scavo

Support for PKCS#12

Globus project: C Security and CoG JGlobus

Description: Currently, the Globus GSI layer relies on the PEM credential format. Support in the lowest layer for the PKCS#12 credential store format, combined with auto-detection of which format that's used in a file and an environment variable that determines what format to write in (default PEM), would allow for easier integration with existing PKI environments and commercial CAs.

Mentor: Rachana Ananthakrishnan

SAML Attribute Query for X.509 Subjects

Globus project: GridShib

Description: SAML Attribute Query involves a back-channel exchange between a SAML requester and a SAML identity provider (or attribute authority). After mapping the subject of the query to a principal in its security domain, the identity provider issues a SAML attribute assertion. The requester uses the attributes in the assertion for the purposes of access control.

The project goal is to implement a SAML requester that issues a <saml2:AttributeQuery> and requests an attribute assertion from a SAML V2.0 identity provider. Successful completion of the project will include a demo that displays the resulting attribute assertion.

Use OpenSAML 2.0 to implement the client and Shibboleth 2.0 to implement an attribute request handler that issues attribute assertions. The SAML exchange must conform to the OASIS SAML Attribute Query Deployment Profile for X.509 Subjects.

Numerous research groups have successfully implemented the SAML Attribute Self-Query for X.509 Subjects (that is, the special case where the requester is the subject), but the case where the requester is acting on behalf of the subject has not been adequately addressed. In particular, in the use case scenario outlined in the SAML Attribute Query Deployment Profile for X.509 Subjects, how does the requester convince the identity provider that the subject initiated the request and is present? This is an interesting, unsolved problem.

Links related to this project

Requires: A solid understanding of HTTP is required. Previous experience with XML is desirable; knowledge of SAML and SOAP are a plus. Since both OpenSAML and the Shibboleth identity provider are implemented in Java, moderate to strong programming skill in that language is required.

Mentor: Tom Scavo

OpenVPN/GSI integration

Globus project: C Security and [Virtual Workspaces]

Description: OpenVPN is a very useful, proven networking tool that can be used with X509 certificates already. It's very useful for many grid computing configurations, virtual workspaces among them. The project would be to investigate and implement changes such that OpenVPN could support a) multiple CAs and b) proxy certificate extensions (RFC 3820).

Mentor: Tim Freeman

Distribution projects

Integrate Globus with standard Linux distributions

Globus project: Globus Toolkit

Description: Currently, the Globus Toolkit cannot be installed just by running "apt-get install globustoolkit" or "rpm gt4.0.5-full" with all configuration issues being handled automagically by a package manager. The goal of this project would be to package the toolkit in a Linux package format (RPM, DEB, ...).

Mentor: No mentor assigned yet. Please contact the Globus GSoC administrators if you have any questions.

"Blue sky" projects

The following are "blue sky" project ideas that some of our mentors have come up with. They are not as detailed as the above project proposals, and some of them might not even be feasible during a single summer. However, they could end up being the seed from which a really cool project springs. If any of these ideas seem interesting, don't hesitate to contact the corresponding mentor to discuss the idea further

  • Write Windows client libraries for all the Globus Toolkit services. Ravi Madduri
  • MSN Passport integration to X509. Ravi Madduri
  • Create cool AJAX web interfaces for Globus Toolkit services. Ravi Madduri
  • MiniGrid: Use virtual machines (VMs) to create a simple, didactic, self-contained Grid for educational purposes. Borja Sotomayor
  • Write a VM-aware cluster scheduler from scratch (instead of extending an existing job scheduler, which has already been done) Borja Sotomayor
  • Do something involving Grid Computing and OLPC. Borja Sotomayor
  • Integrate AccessGrid ( www.accessgrid.org) with the Globus Project. Ravi Madduri
  • Grid Job Submission and Management from a hand-held (like iPhone, BlackBerry etc). Ravi Madduri
  • Create OS X, Yahoo Widgets for Grid Service clients Ravi Madduri

Other sources of project ideas

The above list of project ideas is by no means exclusive. You may find inspiration for other cool ideas in the following places:

  • Our mentors. Feel free to contact any mentor whose field of interest matches your own. If you are unsure of who to contact, or no mentor seems like a good match, please contact Globus GSoC administrators, and they will put you in touch with the right person.
  • Bugzilla enhancement requests. Our bugzilla server includes a lot of enhancement requests that users and developers have come up with. Some of them are large enough to be restated as a summer project.
  • Development campaigns. The Globus Alliance has several active development campaigns, which are listed in our Bugzilla server. Some of these could be restated as a project, or can provide inspiration for projects that are complementary to a campaign.
  • Project Ideas page. Some of the ideas in this page are taken from an existing page where Globus developers list ideas for interesting projects. In this GSoC page we have only listed those ideas that would make sense as a self-contained summer project, but the other ideas listed in Project Ideas may provide inspiration for additional projects.

Mentors

Our GSoC mentors (and their areas of expertise) are:

If you have an idea for a project, but none of the above mentors seem like a good match, please contact the Globus GSoC administrators and they will try to match you to an adequate mentor.

Instructions for adding a new idea

The Google Summer of Code (GSoC) application requires that we include a list of project ideas. To quote their instructions:

An "Ideas" list should be a list of suggested student projects. This list is meant
to introduce contributors to your project's needs and to provide inspiration to 
would-be student applicants. It is useful to classify each idea as specifically 
as possible, e.g. "must know Python" or "easier project; good for a student with 
more limited experience with C++." [...] Keep in mind that your Ideas list should 
be a starting point for student applications; we've heard from past mentoring 
organization participants that some of their best student projects are those that 
greatly expanded on a proposed idea or were blue-sky proposals not mentioned 
on the Ideas list at all.

If you have an idea for a GSoC student project, please add it to the "Ideas" list above, following the format used by the currently listed ideas. Please include the following information:

  • Globus Project: What project (in the dev.globus sense of the word) does this idea relate to?
  • Description: Include a 1-2 paragraph description of what has to be accomplished in this project. You do not need to completely specify the project, just give prospective students a good idea of what work is required (is it mainly development? will it involve a lot of independent research? is it easy or hard? etc.). Also, note that ideas don't necessarily have to be concrete tasks ("Add support for protocol FOO in component BAR") but can also be "blue-sky" ideas (e.g., "GridFTP is not currently capable of dealing with the latencies involved in transferring large files to Mars. Solve this."). In fact, Google encourages that we include a couple of these since they usually lead to the most interesting projects.
  • Links (Optional): Websites or papers related to this project. For example, if you want a student to implement an idea you proposed in a paper, include a link to that paper.
  • Requirements: What specific skills are required to do this project. (languages, knowledge of protocols, should they already be familiar with GT4 or is on-the-job training ok?, etc.)
  • Mentor: Each project must have a mentor. The mentor is in charge of supervising students, tracking their progress, answering questions about the project, etc. If you would like to be the mentor for this project, please include your name and e-mail address here. If not, please leave this field blank, and we will assign a mentor from the mentor pool.

If you need additional inspiration on how to write up your idea, take a look at the GSoC 2007 list of ideas from other projects:

Personal tools
Execution Projects
Information projects
Distribution Projects
Documentation Projects
Deprecated