Using gRAVI Services in Taverna

From Globus

Contents

Prerequisites

  • In order to follow this tutorial/wiki, you will need a service created by gRAVI. Consult the gRAVI Users Guide for instructions on creating and deploying one.
  • A basic understanding of Taverna will be helpful. See The Taverna Users Guide for help.

You will also need:

Building a Workflow

For this guide, we will use a basic gRAVI service called HelloWorld that is deployed on a Globus WS-Core container and created with Introduce.

Start Taverna

  • Open Taverna; a blank workflow will be opened.

Image:Taverna_Main.png

Add WSDL Scavengers

In order for Taverna to find the service, we must add two WSDL scavengers - the service itself and its result resource. In our container, we point Taverna to:

When entering the services, be sure to put ?wsdl after the address of the service; see the images below. Image:Wsdl_1.png Image:Wsdl_2.png

Add an Input Field

  • Right click on "Workflow Inputs" under the Advanced Model Explorer.
  • Add a new input and name it.

Image:Add_input.png

Add a Service Method

  • Add the blocking version of your service to Taverna by right clicking it and choosing "add to model".

The blocking version is used so that the service fully completes before the support services are called. Image:Add_method.png

Show Ports

  • Click on Configure Diagram and select Show All Ports to see more details.
  • Now select Top-to-Bottom for easier viewing.

Image:All_ports.png

Add an XML Splitter

Services can take complex inputs that may need to be split into their individual components; use an XML splitter for this.

  • Add an XML splitter for the input parameters for your blocking service.
    • Right click on the input parameters and select "add XML splitter."
  • Make the workflow input feed into the new XML splitter.
    • Right click on it in the advanced model explorer and select the "arguements" field of the XML splitter.

Image:Add_xml_splitter.png Image:Connect_input.png

  • Add another XML splitter for the output of the blocking service, again by right clicking on it in the advanced model explorer.
  • Choose a meaningful name for your XML splitter.

Image:Add_xml_2.png

  • Rename your first XML splitter to something meaningful.

Image:Rename.png

Add Support Methods

At this point, our workflow will run, but we want to be able to get the output and see if it runs without any problems.

  • Add getStatus, getFiles, and getFile from the result resource of the service.

Image:Add_methods.png

  • Add input XML splitters
  • Add an additional XML splitter for one of the the resReferences, so that its input matches the output of the serviceOut XML splitter.

Image:Add_splitters.png

  • Connect the resReferences together by right clicking on an output and selecting the designated input.

Image:Connect_references.png

  • getFiles produces a list of files that getFile will need; connect them.

Image:Connect_files.png

Add Outputs

  • Add the following outputs in the same way the input was added, by right clicking on workflow outputs in the advanced model explorer.
    • Status (the status of the service)
    • Files (the files the service produced)
    • resultReference (the reference used to refer to the servic)
    • fileContentsBase64 (the encoded output file)
    • decodedFileContents (the contents of the output files)

Image:Add_outputs.png

  • Connect the outputs to the correct XML splitters as shown.

Image:Connect_output.png Image:Outputs_connected.png

Decode Output

The output of the service needs to be decoded from base 64. Luckily, Taverna offers some utility methods.

  • Find and add the decode64_bit module to the workflow.

Image:Add_decoder.png

  • Find and add the byte[] to string method to your workflow TWICE.

Image:Add_string_converter.png Image:Parsers_ready.png

  • Sandwich the decoder between the 2 string converters.

Image:Connect_parsers.png

  • Insert this bit between getFile and the decoded file output.

Image:Complete.png

Your workflow is now complete!

Run The Workflow

Now it's time to run the workflow.

  • Go to the menu of Taverna and select file->run this workflow

Image:Run.png

  • Our service is /bin/echo/ so it should echo back whatever input we give it.
  • If multiple inputs are given, the service will be run multiple times, once for each input.

Image:Declare_input.png

  • The service produces "Hello World" as expected.

Image:Success.png

Advanced Biology Workflow

Now that we've made a basic "Hello World" workflow in Taverna, we can move on to something more advanced. This workflow will use four different services to analyze DNA data:

  • findTransposons
  • formatdb
  • blast
  • createReport

findTransposons is only run once on input data, but the other 3 services are run in a loop for all NCBI prefixes, until one produces the desired output. Image:Bio_workflow.png

Making the Services

  • The services are binaries and perl files on the server computer
  • Java 1.5, which gRAVI uses, doesn't support redirects such as "binary input > output"
  • So we wrapped each service program in a shell script that the service calls
    • Service calls shell script
    • Shell script calls service program and handles any redirects
      • "binary input > output" becomes
      • "script.sh input output" where the content of the shell script is something like
      • "binary $1 > $2"

Staging the Input Data

  • The workflow takes a working directory as an input.
    • With the way string concatenation works, the working directory needs to have a '/' after it, like "/working_dir/".
  • Inside the working directory, you need to have an input and an ncbi folder
    • The input folder should have the sequence files that are read into findTransposons
    • The ncbi folder should have all the NCBI files in it.
  • The workflow also takes a list of NCBI prefixes, which should correspond to files in the ncbi folder.
  • All input data needs to be on the machine where the services are running from.

Nested Workflows

This workflow has nested workflows; see the Taverna User's guide for details of inserting them. First, we will make the most inner nested workflow, which is the core of the loop. Think of it as the body of a "for" loop, the part that is repeated executed.

  • Add the blocking version of the following services to the workflow.
    • formatdb
    • blast
    • createReport
  • Add XML splitters to their input arguments
  • Add a java beanshell for each one
    • These will automatically pass in the arguments to each service
  • Coordinate each service from its predecessor, so that each one is run only after the previous service completes (this is why we use blocking methods).
  • Add inputs for NCBI and the working directory

Note: We are not using any supporting methods to check the status or get the output files or even to clean up the temporary directories that the services create. This is for two reasons:

  • Taverna doesn't properly handle SOAP headers yet, so it would be overly complex to do all these things.
  • The services send their output files to the working directory already.

Image:Core.png

Bean Shells

In the image above, the services are connected from their inputs to their XML splitters through bean shells. Bean shells are a very powerful tool in Taverna that allow a workflow to execute almost any Java code.

  • Right click on each bean shell and select configure to bring up a dialog box.
  • First add ports to your bean shells. Here's an example of the ports for the beanshell handling the arguments for createReport.
    • When changing the type of the port, for example when making the output a list (since we want a list of arguments), you will need to click on another item to save that change. This is an error in Taverna. Simply closing the window will not register the change unless you've selected a different field first.

Image:Create_ports.png

  • Now add your code for the bean shells to format the arguments.

Image:Blast_beanshell.png Image:Format_beanshell.png Image:Create_beanshell.png

  • Now close the windows to save your changes.
  • You may want to re-open the configure dialog to make sure your changes are saved, as Taverna can be finicky.
  • Now save your workflow with a descriptive name. We will need it in later steps.

Looping

The workflow centers around a loop. Taverna doesn't really have loop support; "while" loops are currently impossible, but it is possible to hack a "for" loop together. It relies on Taverna's built-in iteration. If an output that is a list is connected to an input that is a single-non list type, then Taverna will run the service with the single input once for each element of the list. In our case, the NCBI files are a list, and we want to iterate through them until a condition is met.

Think of this workflow as the loop control, with the nested workflow as the loop body.

  • First, add the workflow you just created and saved from above as a nested workflow.
  • Now add and connect inputs for the NCBI file and the working directory.
  • Add an output, which will be used for error detection.
  • Add two Java bean shells
  • Find and add the Taverna module "Fail_if_True"
    • This is a special service that will fail (and thus prevent any services below it from running) if it is given the string "true" as an input.
  • Coordinate the nested workflow from "Fail_if_True".
  • Also Coordinate the 2nd beanshell to run after the nested workflow.
    • You'll notice that Taverna pops up a seemingly random box labeled "Nested_WorkflowWORKFLOWINTERNALCONTROL". This is an error in Taverna that should be fixed in later versions, but it doesn't affect performance at all, so don't worry.

Image:Loop.png

  • Now configure the first bean shell.
    • The first bean shell looks for a file called "no_hits.false", which is created when createReport creates an empty blast_no_hits file. This is the break condition for the loop.
    • Unfortunately, Taverna doesn't support breaks, so instead we send "true" and "fail_if_true" fails, so the nested workflow isn't run. But the loop will continue until all the NCBI prefixes are exhausted, failing each time.

Image:Empty_beanshell.png

  • Don't forget your ports.

Image:Empty_ports.png

  • The second bean shell looks at the size of the no_blast_hits file created by create_report.
  • If the file doesn't exist, it gives an output of "error".
  • If it does exist and is of size zero, it sends "true" and creates the "no_hits.false" file to tell the first bean shell to send "true", as the break condition is met when the file is of size zero.
  • If the file exists, but is not empty, it is copied into the loop_in.fasta file to be fed into the loop's next run.

Image:Empty_2_beanshell.png

  • The ports for the second bean shell are the same as the first.
  • Save this workflow. You will need it for the next steps. Don't forget to select "embed nested workflows".

The Main Workflow

Working from the inside out, we now want to put together the workflow outside of the loop. This part will run find_transposons and pass on inputs and outputs to/from the loop.

  • Add the blocking version of find_transposons
  • Add an XML splitter for its input
  • Add 3 inputs
    • List of NCBI prefixes
    • Pattern for find_transposons to search for
    • working directory
  • Add 2 bean shells
    • one for parsing the input of find_transposons (similar to blast, etc above)
    • one for copying the output of find_transposons so it can be used in the loop without being overwritten
  • Add an output

Image:Full_workflow.png

  • Configure the first bean shell as before

Image:Trans_beanshell.png Image:Trans_ports.png

  • The second bean shell copies the output of find_transposons (trans.out) to the input of the loop (loop_in.fasta). It only takes the working directory as an input port.

Image:Copy_beanshell.png

  • Coordinate the copy from the find_transposons service so that it runs after the service method completes.
  • Coordinate the loop from the copy.
  • Connect the inputs and outputs as appropriate.

Run

Run the workflow as you would run any workflow. Be sure to give it the correct inputs:

  • The working directory (don't forget the '/' at the end!)
  • The list of NCBI prefixes to be iterated through (make sure you have the NCBI files too)
  • The pattern to be searched for

The "output" of the workflow should return "true" if the end condition of the loop is met, "false" if it is not met, or "error" if something went wrong.

Output Data

All the output data should be prefixed with the respective NCBI prefixes. After a run, the working directory may look like this:

visitor123152:working_dir bio_user$ ls -l
total 184
-rw-r--r--   1 bio_user  staff  46127 Jul 24 16:07 NC_005945_blast.out
-rw-r--r--   1 bio_user  staff   1995 Jul 24 16:07 NC_005945_blast_hit_alignments.txt
-rw-r--r--   1 bio_user  staff   1713 Jul 24 16:07 NC_005945_blast_hits_with_genes.xls
-rw-r--r--   1 bio_user  staff    811 Jul 24 16:07 NC_005945_no_blast_hits.fasta
-rw-r--r--   1 bio_user  staff   9955 Jul 24 16:07 NC_007322_blast.out
-rw-r--r--   1 bio_user  staff    792 Jul 24 16:07 NC_007322_blast_hit_alignments.txt
-rw-r--r--   1 bio_user  staff    711 Jul 24 16:07 NC_007322_blast_hits_with_genes.xls
-rw-r--r--   1 bio_user  staff    241 Jul 24 16:07 NC_007322_no_blast_hits.fasta
drwxrwxrwx   9 bio_user  staff    306 Jul 24 15:08 input
-rw-r--r--   1 bio_user  staff    241 Jul 24 16:07 loop_in.fasta
drwxrwxrwx  20 bio_user  staff    680 Jul 24 16:07 ncbi
-rw-r--r--   1 bio_user  staff   1853 Jul 24 16:07 trans.out
visitor123152:working_dir bio_user$ 

Limitations of Taverna

Taverna is still under development, and its interface is likely to improve. But anyone making a workflow in Taverna should be aware of some of its limitations.

  • Taverna does not support SOAP headers.
    • A temporary work-around is to encode header data, such as state, in a special object and pass that object on to services and methods down the line in the workflow.
    • Soap headers would also allow gRAVI services to be put together in Taverna workflows/pipelines with fewer methods (boxes) for a simpler feel. For example, the Destroy method, which cleans up the temporary folders created by a gRAVI service uses SOAP headers, so it could be called without the overhead of XML splitters and passed references.
  • Taverna does not have global variables.
    • This means that any variables must be stored on the file system (of a service) or the a service itself.
    • A basic task like counting to 5 is extremely difficult in Taverna.
  • Taverna does not have loops.
    • Taverna does have iteration, which can be hacked to make a "for" loop, as in our example above.
    • But Taverna does not support "while" loops except in very special cases.
      • "while" loops may be achieved by using Taverna's "retry" feature, which will retry a failed service until it succeeds or a count limit is reached.
      • However a failed service gives no output data, so the loop cannot feed its next iteration.
      • A "while" loop may be achieved if the condition is not dependent on input, such as waiting for a service to do something and polling it for a "success".
  • Taverna can not open workflows unless all the services are accessible.
    • There is an offline mode, but it doesn't seem to help much.
    • One must edit the xml file manually to reflect changes in IP addresses before opening a workflow in Taverna.
  • Only Taverna can run Taverna workflows/pipelines.
Personal tools
Execution Projects
Information projects
Distribution Projects
Documentation Projects
Deprecated