Wednesday, February 19, 2014

Refactoring Mindset

This next post will be reflecting upon chapter 4 and segments of chapter 5 of Software Development: An Open Source Approach and talking about the next addition that Team Rocket will be adding to Galaxy.

Chapter 4 of Software Development is one that I skimmed over pretty quickly. The reasoning for thsi is that the chapter is on Software Architecture. A lot of people reading this do not have software architecture experience, but I have gone through an entire class where we studied software architecture and design. We used various design patterns in order to set up a project. The project can be viewed here. There are a few topics covered in the chapter that we did not focus largely on. For example, the section entitled concurrency, race conditions, and deadlocks is something that we did not focus on too much. Effectively, these are synchronization issues that can be difficult to debug. Concurrency is an issue best handles by using randomly generated keys or sessions in order to keep track of who is who when using software or an application. A race condition is a very dangerous flaw to have in software. If two users are trying to access a system and input data where there is only room for one user's data, then a race condition can occur. The two transactions will occur simultaneously but the outcome is completely unpredictable because the order between the two transactions is, most likely, very unknown. Lastly, there is the concept of a deadlock. A deadlock can be thought of as antonymous to race conditions. With race conditions, both users are able to complete their task (or at least think they completed it) where a deadlock has no users able to complete their task. If two users try to simultaneously access a record in a table of a database then neither may be able to access the record due to a lack of forethought on the programmers' end. Using locking can prevent these issues - effectively a programmer just wants to flag and tell the software "Hey, I'm accessing this, so if anyone else tries, let them know so they don't cause a race condition or lock us both out - then when I've relinquished control, they can jump in."

Chapter 5 largely focuses on what you can do with code from an open source project. The sections I'm particularly focusing on for this discussion are Debugging and Extending the software for a new project. These sections are just a recapitulation of the work that has been done with Teaching Open Source. This section dives into talking about bugs and how their natures can be very different. For example, a bug could just be a lack of implementation in the user interface or maybe be a bug in that it imposes an unintentional race condition that has destructive results, to tie back to chapter 4. Next, the chapter talks about project extinsibility. A plethora of open source projects have a primary usage that people love that piece of software for, but there are also users of these open source projects that see them with a very different vision. In my own research we have adapted the open source project Galaxy to produce a use different than the vision of the core developers of Galaxy, but it is strongly encouraged. We use Galaxy as a launcher for machine learning algorithms, grading of students' coding, and as a general communication tool between the branches of our project, Learn2Mine. Open source communities love having their projects extended and for good reason. Some bugs and features do not become apparent until someone uses their software for a purpose other than the original intention, for starters. Also, who wouldn't want their software being adopted by other people? That really is just a testament to how good of a job you did creating your software in the first place.

To close my talking about Software Development: An Open Source Approach, I will talk about my experience with working with RMH Homebase, first referenced in blog posts back in January 2014, while considering the aforementioned topics. Effectively, what I've done here is refactored code in RMH Homebase. There was a piece of ugly code which was repeated several times throughout RMH Homebase's codebase. Fixing this only required making a function in which the same computation was performed, but was only written once. This is so the function can just be called in each of these instances. You might be asking "Why do all that extra work if it already works? We know what that code does and it's fine?" Well, what if you want to change the code at some point? Will you remember to do it each time? Why take a risk and not refactor when you can refactor the code, make it look cleaner, and save yourself heavy-lifting later when the code starts to degrade and needs to be improved.


Finally, I'd like to make a mention of my team's latest discovery in what we want to work on next for Galaxy:

We will be working on a feature which has been requested here on Trello. A full description can be seen below:

Title: 633: New text manipulation tool: transpose matrix
Description:
Function: Transpose a matrix tabular infile of columns/rows.
The Python command would be (from Ross, 4/15/11):
you can transpose a matrix stored as a python list of lists with
t = map(None,*listoflists)
Reported by: Jennifer Jackson

So this feature implementation will allow Galaxy users to transpose data that they have uploaded to the Galaxy system.

Music listened to while blogging: Tech N9ne

No comments:

Post a Comment