Monday, February 24, 2014

Progress Reflections: Feature Addition to Galaxy

In order to have something to actually talk about for this post, some work had to be done since we had already picked out a feature that we wanted to add to Galaxy: The ability to transpose data.

For a quick summary, here is the link to the pull request: https://bitbucket.org/galaxy/galaxy-central/pull-request/335/added-transpose-tool/diff

So let's go over the files mentioned in the diff of the pull request and see what is actually going on:

tools/filters/transpose.py
Now the file itself has a lot of bulkiness located in it. This is for handling Galaxy's standards for error handling and function calling. They prefer the creation of a main method and the calling of it through an "if __name__" call.

The code located on the left is the actual conduction of the data transposing. The files are assumed to be tab-delimited (we could even fix this "bug" later by improving the tool). The user need only input a file through Galaxy's interface and this tool can be ran on the data.

Now, you may be wondering "What if someone does not use tab-delimited data?" or "How do people know that the data is supposed to be tab-delimited?"

This is all answered in the XML file:










tools/filters/transpose.xml









































Now the entirety of the XML file is important for feature addition in Galaxy.
Line 1 of this file specifies the tool id (just a unique identifier - does not get referenced anywhere else), the name of the tool (a name you want users to recognize the tool by in Galaxy's interface), and a version for the tool (since it is new, 1.0.0). This line is, finally, closed on the last line of the file by simple XML markups.
Line 2 of the XML calls for a description of the tool. This is appended (with a preceding space) to the tool name in Galaxy's interface to give an extremely brief description of what the tool does and from where. So here we just say you can transpose data from a file (as opposed to the inputting of data, manually).
The next few lines (not required to be a specified length in Galaxy's specifications) call for the command interpretation and the actual command line call Galaxy will be making. Galaxy supports all kinds of interpreters for scripting (perl and python are the only ones that come to mind). So here, since we are using python, we use "python" as the interpreter argument and then enclose our command. The first argument of the command is the python file itself. After this we have identifiers (signified by $) to inputs later specified in the XML markup - input and output, which just reference files.
So let's talk about those files since they are in the next 2 sections (inputs and outputs). We have one input. The arguments utilized here are "format", "name", "type", and "label". The name identifier is what references back to the command line call. Format is an optional argument specified for the type="data" that restricts users from using inappropriate arguments. So the tool, as it stands, only works on tabular (or tab-delimited) data. Lastly, there is a label argument. In the gui representation of the tool, the label will precede the placement of the argument.
Lastly, Galaxy has a help markup for their XML files. The first specification within this help section is a reference to a tool within Galaxy that can convert data to being tab-delimited. Essentially, this generalizes the tool by allowing any form of delimited data to be used as data can be converted to tab-delimited and then converted back. While tedious, users can create workflows that will conduct this task for them, if so desired. Next in the help section, there is an example. Just in case a user is unsure of what transposing actually does to their data, there is a simple markup that shows a before/after transposition on a small piece of data.

tool_conf.xml.main
The next specification made is updating the tool_conf.xml file. At first this was puzzling because the stable user-version of the Galaxy distribution uses tool_conf.xml as the file that is read, but it appears that Galaxy appends a ".main" in the developer version of Galaxy. So this was the file in which information was added. Effectively, all that was done here was reference the xml file of the transpose tool (and the xml file references the python file which is physically run) within the text manipulation component section. This allows the transpose tool to be seen for usage (an image of this can be seen below the tool_conf image).

test-data/transpose_in1.tabular
 Now here is where we provide information about those test tags in the XML that you may have noticed that I skipped talking about earlier. Galaxy has a built-in function that mines the XML files for running functional and unit tests - effective for making sure crazy bugs are not

test-data/transpose_out1.tabular

induced between builds and versions of Galaxy. Here, one test is written just to show how the transposing of the matrix works (this is the same as the example used in the help section of the XML). So this first file is the input that the functional test takes. A diff is then computed against the output file that has been provided. If they are different, then the test fails. If they are exactly the same, then the test passes - simple as that.

And that's the story of my second pull request to Galaxy.

Music listened to while blogging: Ellie Goulding

No comments:

Post a Comment