Tuesday, March 18, 2014

Capstone: Documentation and Database Migration -> Countdown to Open Source

This post will focus primarily on the documentation that we have been adding to Learn2Mine in order to prepare it for the day we decide to open source it (the light at the end of this tunnel can be seen). Additionally, I will talk about some database issues with which I had to deal.

So documentation, it's something that I talked about in one of my posts for software engineering last week. No one likes doing it, but everyone needs it. Jake and I will be handing Learn2Mine off to some new students in the Anderson Lab at the end of this semester and we do not want to waste those students' time with them having to comb through code just to understand it. So we have put together a google document which is slowly growing. This document will, hopefully, morph itself into a README in the future, as that is something that is vital for a project that desires to be a successful open source project.

So right now the README includes details about how to modify and add to the continuously growing components of Learn2Mine (e.g. adding a node/badge to the skill tree). Right now the language used throughout is fairly colloquial and written for the Learn2Mine team, but we will clean it up and make it more formal in the future (like whenever we add installation and development instructions).

Learn2Mine is (as of today) being used in the data mining class here at the College of Charleston. Recently, we reinstalled Galaxy as we had a bunch of issues and this ended up also resetting the database that we had set up for Galaxy, a Postgres database. In our recent development we had not noticed a difference in the performance of Galaxy even though we went back to Galaxy's initial SQLite database. When dozens of students started to work on Galaxy at the same time, however, we ran into concurrency issues. Galaxy just was not allowing students to run jobs or submit lessons because too many people were trying to interact with the database at one time, a problem common in SQLite - this is why SQLite is not meant to be used for large servers. So I went and stopped Galaxy from running, effectively taking the site down momentarily. I delved into the universe_wsgi.ini file and pointed the database location to where the Postgres database existed on our virtual machine (which is the learn2mine-server that hosts Galaxy and RStudio). I then had to run "sh manage_db.sh upgrade" which is a script that looks at the database location and fixes Galaxy's database set up to point to that new database.

So that did migration and created a new Postgres database. Unfortunately, all the users that were in existence on the SQLite database are not in the Postgres database. Dr Anderson and I sat down and tried to use the dump of the SQLite database and get it to effectively merge with the Postgres database. After about 20 minutes of trying different ways of going about this it really seemed like it would be way too much work to migrate a database with only a day's worth of information. All in all, for the users that just started today, it is not much hassle for them to just recreate their accounts. It will create some errors on the backend; like, the users' RStudio accounts will already be created and will hit an error for trying to create an account that is already in existence, but that error will not stop the flow of Galaxy or RStudio and the users will not see the error, so there is no drawback, especially since this only pertains to the users that created accounts on Galaxy today and it only matters for the next time they try logging in.

Music listened to while blogging: Hopsin

No comments:

Post a Comment