Tuesday, March 11, 2014

The Doc Is In!

For this post I will be blogging about Chapter 8 of Teaching Open Source and responding to a couple of the exercises.

So documentation... no one is a fan of making it, but everyone should be doing it and doing it well. You never know when you might have to go back to code written years back and perhaps written by someone else. Would you rather read a description of what is happening with that code and see a shorthand example or would you rather have to inject your own testing code to see what is happening? Not sold? Would you rather take about a minute or two to figure out how code works or take hours to figure out what is going on? The key to making code the best it can be is this documentation.

So it is evident that documentation is crucial when you are working with others on a project or if there is a potentiality that you will be leaving the project and have someone else take the lead on your section of the project, but what the project that you working on is done just by you? Well, as aforementioned, you will need to go back and change code at some point. As I have reflected upon in the past, degradation is going to take hold of your project if you are not constantly maintaining and improving. When you go to maintain and improve this code it is much nicer to just navigate comments to figure out what code is doing so you can modify only what is intended.

So there is an exercise in Chapter 8 of Teaching Open Source which is asking me to write thorough comments in all of my source code, make sense of source code through documentation alone, and write at least one wiki page of developer documentation for each program I am working on. Galaxy has a standard for writing documentation for code that is created for it.

Writing comments for all of the Galaxy source code would be a foolish process as this is a project that has been built up over a long period of time (since 2005). The code that I have created for contributing has also been marked up with comments (available here, here, and here). Now you may navigate to the page and see that there are no python comments on the page. The comments already existed for the Group tool and I was merely improving the code for it. This is because the Galaxy community has effectively perfected the art of documentation (enough to grasp what is going on and where, but not too much as to detract from the code itself). I was able to go directly to the section in the code where I needed to add my code. There also exists example usage documentation within the XML of the tool itself (so when someone is using it they can see what is supposed to happen). So if you couple that markup with the code itself and code comments then it is evident what the code is supposed to do.

To demonstrate making sense of source code through documentation alone I will present a tool in Galaxy which conducts some sort of biological analysis with which I am unfamiliar. So I randomly picked a toolset (phenotype_association) and randomly picked a python file from inside this toolset (senatag.py). The comments for this code is different. Instead of commenting in different parts of the code, there is a large amount of comments at the beginning of the file which explains every piece of the code. Overall, senatag takes a file with identifiers for SNPs (single nucleotide polymorphisms) and a comma separated file which has identifiers for different SNPs. Senatag then outputs a set of tag SNPs for the dataset provided (the comma separated list). The comment markup can be seen below (as well as the breakdown of the step-by-step code):


To contribute to the wiki I was planning on writing a step-by-step tutorial on how to add tools to Galaxy (as I have outlined this in previous posts and it is something I know well enough to write a structured wiki page on). After navigating the "all wiki pages" section of Galaxy, I realized that adding a page would be extremely difficult because the Galaxy community has done just about everything when it comes to wiki documentation. For reference, the add tool page I was referring to can be found here.

The next exercise I am asked to perform is to pick a feature that sounds tantalizing but is not clearly documented. Using Galaxy for this I realized that I do not have any experience actually tinkering around with the graphical user interface (barring simple XML markups). So first I tried navigating through Galaxy's wiki to find information on how to understand or maybe customize the interface (as that documentation would effectively explain how to manipulate all parts of the GUI). Much to my disappointment, I was unable to find any information on this topic, but that does mean I can potentially add this documentation (and it really helps for this exercise). There is a .jar file that looks like it is what is assembling graphics on the client-side through the usage and calling of dozens of python and BASH scripts. So here I will focus on one very simple part of the interaction with the user interface. If you deploy your own version of Galaxy then you are required to have Python 2.6 or 2.7 installed and set into the environment path. Rather than having crazy errors occur because someone does not have the correct version of Python installed, the Galaxy developers have created a simple "check_python.py" which creates a message that gets printed to the user explaining why they cannot run anything. Additionally, I learned something which has always puzzled me about Python. In Python you can use triple quotes to do large block comments. In this file the message that gets printed to the user is written in a block style. So I did some tinkering in my own python shell and have now learned that you can assign strings in this manner, adding a ton of readability to long strings (rather than dealing with a ridiculously long word-wrapped string).

Music listened to while blogging: Childish Gambino

No comments:

Post a Comment