Clayton Turner: An Introductory Guide: 2013

Wednesday, December 4, 2013

Final Product

So we presented our last deliverable last week. This means that we have finished our testing suite and gave our final presentation. Our poster can be found below. Due to size limits, I will not be posting our full deliverable here, but, if you are interested in seeing it, then send me an email or a message by other means (information the right of my postings here).

There are a few errors that exist in this poster, but it is not the final copy - the final copy of the poster has yet to be pushed to our repository, but I plan to update this post whenever that happens. There are a few typos and plurality disagreements - this is the result of saving a Google Presentation and putting it straight into Powerpoint without much tinkering.

Friday, November 15, 2013

Deliverable 4 Experience

For this post, much like my other deliverable experience posts, I will be reflecting upon the feedback we received during the presentation of our fourth deliverable.

We started by simply running the main driver of our script (runAllTests.sh). This produced an HTML output that Cameron formatted pretty well. Unfortunately, we formatted it for typical resolution on a laptop screen (I'm not sure what he used off the top of my head, but it looked nice on all of our laptops of varying sizes). This resolution did not hold up when we projected onto the screen in the room - this is something we will fix for the final presentation. We will definitely find a time to go in the room and test the HTML formatting within the room.

But the HTML format was not criticized as being poor in any regard, so kudos to Cameron for that. We inject records into the HTML as the driver runs and each record creates a new row in the HTML. At the time of our presentation, we were only saying which unit tests were running, a description of the test (the description included the method being tested and requirement), and a pass/fail slot. In order to make this more readable for anyone and everyone, we are having to change it up.

To change it up: Tyrieke started working on separating the method into its own "Method:" field in the HTML and putting the requirement in its own field. This is trivial as we are just manipulating a little bit of data that comes from the testCaseX.txt file. So we'll have the name of our input file we use for test cases and the name of the output and the name of the oracle. But here's an issue... we can't really show the inputs, expected output, and actual output on our html because Galaxy likes to use big files that have very unique formatting (fasta, for example).

So here's where I'm stepping in with my idea. After Tyrieke and Cameron finish their formatting, I had the idea of having the inputs, outputs, and oracles function as links to the files themselves. So we can click on (or maybe even hover over - get some CSS3 and HTML5 in the mix) the file and view the inputs, outputs, and oracles. This would make it so our HTML report meets all the requirements, all while I'm brushing up on some HTML5 (since I have not worked with it since early Summer.

Music listened to while blogging: N/A

Wednesday, November 13, 2013

Deliverable 4: Getting Out The Door

So Tyrieke was able to implement the testing of nested methods within our testing suite. This completely rounds out the testing we can do for Galaxy as any Python method can be tested now. Cameron has been working on the HTML - setting up our banner for our professional grade report.

There was an interesting issue that Logan pointed out to the rest of the group. His running of the runAllTests.sh file would result in an error because a .svn file was being read as a testCaseX.txt file. This was really weird to Tyrieke and I because we have been able to run all the tests without any issue. Well, we boiled it down to potentially being an operating system issue.

Tyrieke is using Fedora
I'm using Mint
Logan and Cameron are using Ubuntu

Well, it turns out that when installing svn from Ubuntu, svn is actually pulled from a different repository than when installing svn from Mint or Fedora. The Mint and Fedora installs pulled a newer version of subversion than that of Ubuntu.

This older version of svn puts .svn files in every single directory. The newer version only puts a .svn file within the highest level of the repository folder. We added a simple try/except to get around this issue in case someone wants to test this on any flavor of Linux.

Music listened to while blogging: N/A

Monday, November 11, 2013

On Our Way To A Full Testing Suite

So our group is on pretty good track to get this deliverable out the door on time.

We need 25 test cases, we currently have 13. I know of around 5 test cases we need to add for the first method we tested (fasta_compute_length). There was a variable we had not even tinkered with so that leaves room for test cases we can come up with. So, after the introduction of those test cases to our suite, we will be in very great position for getting all of our test cases done.

Tyrieke, Cameron, and I had a brief 5 minute discussion about the state of where the project is when we saw each other today. Tyrieke and Cameron went to the lab and started working on getting the last generic version of testing completed (as Tyrieke has found a method that would be trivial to test if we get that generic component done). I had to go to a class, but we plan to meet up and get the rest of this hammered out within the next day or two.

Concerning the deliverable: I went through and added detailed runs of the 5 test cases we identified in the third chapter of our deliverable. The detailed runs are pretty redundant, though, seeing as they all are testing the same method, with some of the parameters held constant the entire time as we were testing other aspects of the method for those 5 test cases. Redundancies are nothing more than safeguards, though, and it would be abnormal if 5 similar test cases did not have 5 similar runs and output.

Music listened to while blogging: 2Pac

Wednesday, November 6, 2013

Boolean Bombers Update

So our next deliverable is next week. Currently, we still have to amass 20 more test cases and make a few more slight edits to our experience report from the third deliverable.

Tyrieke, a fellow group member, is on the verge of breaking through with a new set of test cases. Galaxy has a plethora of methods that open files to make sure they exist. The files can be opened up in a variety of manners. Tyrieke knows all the specific of this and he will update us whenever we have our next team meeting. I am going to work on flushing out the rest of the test cases for the fasta length count function that we found initially. We will likely only need to find one other function to test as there are always a multitude of manners in which a method needs to be tested to fully flush out the functionality of said method.

I went to go and work on the project a bit just now from home, but I ran into an issue. For some reason, VMware decided it will restart my Mint instance each time I log in. At first, I thought I may have had an incorrect password, but typing gibberish into the password field yields an incorrect password response. I have been doing most of my work from my lab computer on campus, so I will probably just resume that in the future.

The next few blog posts will probably be kept short until the post right before our deliverable because I imagine that is when we will be knocking out a lot of the work for our project, as that is how it seems to work.

Bright side: I finished a rough draft of my personal statement for my graduate school applications.
Not-so-bright side: It's a very rough draft.

Music listened to while blogging: N/A (watching The Office - S04E01)

Monday, November 4, 2013

Functional Programming

For this post, I will be reflecting upon what I've been spending most of my time on as of late: Functional programming.

So the concept of functional programming makes sense, coming from a strong mathematical background. The biggest hoop I am having to deal with when switching from a strictly procedural programming state of mind to a functional state of mind is dealing with recursion within iterations. So the concept of iteratively moving through a list makes perfect sense: it feels just like a for-each loop and operates pretty similarly. What I am having trouble with, however, is trying to create a lambda call (Ruby) within an iteration. Typically, a lambda produces a proc (or procedure) which you can then call with whatever value(s) that you like. Doing this while trying to iterate over a list of functions is baffling me, for some reason. Hopefully I can figure this issue out and update this post at a later point in time, but, for now, this will have to be cut short as I have to finish pulling the rest of my hair out finishing this assignment.

Update: So I have finally been able to get past my functional programming issue thanks to Lisa Smith. So I was trying go multiple calls deep with lambda functions, when really I just needed to use a mapping function in conjunction with my iteration. This makes a lot more sense and I escaped without pulling all my hair out.

On the bright side: I have gotten a jumpstart on my personal statement for graduate school applications and I have an outline set up for that. So hopefully I can knock that out this week and then be on a good track to knock out my research proposal/statement as well.

Music listened to while blogging: Aeroplane - In Flight Entertainment (Continuous Mix)

Wednesday, October 30, 2013

Deliverable #3 Experience

I'd like to start this post by showcasing the logo I made for my software engineering team, the Boolean Bombers, using Logo Garden. The logo can be seen on the right.

There were a few criticisms of our current testing suite for Galaxy. These were very quick, very easy fixes. I took the liberty to make the corrections.

For starters, we were creating a Python file for each test case, which is not what we should have been doing. What is funny about this, though, is that when we were creating these python files, the only difference was the filename that we were using to find the testCaseX.txt file. So it was very easy to create a loop that runs all the code for each testCaseX.txt file within the testCases/ directory. This avoids storing loops and works regardless of how many or how few test cases we have.

Additionally, I added a quick boolean to check if we even had a test case to begin with. If we didn't have a test case, then we jump into an if statement at the end of the driver that reports to the html that there were no tests.

On top of all this, I added a quick line of code to the runAllTests.sh script that opens the browser with test results and, if a browser is already open, then it opens the results in a new tab in that browser.

To add to that, I made some quick changes in the CSS in the HTMLBackbone that retains styling to keep the namesake for testing (pass vs fail, rather than success vs fail/error).

All things considered, we are in excellent shape for the rest of this project.

Aside: I am a part of a Spotify internship right now and one of the things we have to do is try and refer people to Spotify. There's a contest going on right now where we can earn rewards based on who gets the most people referred. My referral URL is here.

Music listened to while blogging: Donald Glover comedy album ~ Weirdo

Monday, October 28, 2013

Galaxy Deliverable #3

For this post, I will be reflecting upon the final edits of my team's Galaxy Testing Project before our next deadline (tomorrow).

For starters, the testing framework did not remove files from the temp directory. The temp directory held all the reports that are produced by tests (1 report per test) that are then compared to the oracles. The framework now does this through the following call at the beginning of runAllTests.sh:

rm ../temp/*.*

The reason we do it this way rather than doing a recursive remove of the temp directory is so that we preserve the existence of the temp directory. We very easily could have taken an alternate route where we do the following:

rm -r ../temp

mkdir ../temp

These two segments of code achieve the same goal in the end.

Lastly, the testCaseX.txt files were cleaned up. Before, the "Expected Outcome:" section contained the name of the oracle file and the contents of the oracle file. This was a bit too redundant for my taste, so I removed the contents of the expected outcome from the testCaseX.txt file. The Python code did not have to be edited here since we checked the oracle for the actual results anyways.

What we did change in this part of the framework, though, was the way in which oracles were found. Before, the validation of test results through the oracle was performed through the invocation of a method I created that took a test case number and then created a call to an oracle. Instead, the testCaseX.txt file is read and the actual oracle filename is read and then sought out by the validation method.

Music listened to while blogging: N/A

Wednesday, October 23, 2013

Learn2Mine and Skill Trees

This post will not be reflective of my team's ongoing Galaxy Testing project as we have not met since my last post (we will be meeting and working tomorrow morning). So, instead, I will detail my latest work on my own research in the Anderson Lab here at the College of Charleston.

Learn2Mine utilizes gamification in order to help teach students data science. My latest work on the application is on revamping the way in which users view their skill tree.

Fig. 1 - Old Skill Tree

The best way to talk about these concepts is to use the physical images. The initial skill tree implementation is able to be seen in figure 1. This skill tree used a regular block method to represent skills that users have unlocked, learned, or mastered. This method seems a bit too cut and dry, though. For starters, we had 2 skill trees with this current implementation to represent pattern recognition techniques in one tree and R programming skills in a different, second tree. Also, if a skill was 'locked', then users were not able to work with that skill.

Fig. 3 - Classification Subtree

Fig. 2 - New Skill Tree Prototype

Now we have departed from that. Figure 2 shows the entirety of our current skill tree prototype - this prototype is created through a javascript implementation of Cytoscape (a flash-based graphical environment). We give a hierarchical progression of skills that are represented in tree-form. This is not strict, though; this hierarchy serves more as a guideline for what order we feel users will learn the best. There are many different progressions one can take through the skill tree. For example, you can go down the classification branch in the tree (as seen in figure 3). This will have you learning the basics of the K-Nearest Neighbor algorithm, Partial Least Squares Regression, and Neural Networks. We also have a mastery lesson for K-Nearest Neighbors that requires users to implement optimization techniques in order to improve their KNN classification over a specific threshold.

We make new lessons just about every week for Learn2Mine, as it is going hand-in-hand with Dr Anderson's Data Science 101 class. This is functioning as a replacement for other workflow management systems that implement data science techniques. Typical systems used in DATA 101 include Weka and RapidMiner - these systems have lots of flaws, flaws that we are trying to combat with our system, while also focusing on the educational aspect of this process. This launch in the DATA 101 class is also functioning as our pilot test deployment of the system.

Music listened to while blogging: 50 Cent and J. Cole

Monday, October 21, 2013

Further Information on Unit Testing with Galaxy

There has not been much headway with the Galaxy testing project since my last blog post, but we did run into some hurdles we did not foresee.

Our original implementation of our unit testing architecture banked on the usage of datasets. This seemed natural since Galaxy always uses datasets when doing its testing. The reason for this is because Galaxy's primary use is for dealing with large biological datasets, which naturally lend themselves to dataset files, rather than manual input. Since we are testing such small pieces of the architecture and tools, it may be more helpful for us to input actual values rather than forcing a dataset constraint.

In Python, this is extremely easy. Within our testCaseX.txt file, we just need to specify some kind of delimiter or boolean that can be programmatically read. The presence (or absence) of this value could mean just read the raw data value out of the testCaseX.txt file, rather than finding a dataset in a different folder. This will allow for ease when adding future tests. This also adds an extra way to break tests and see if they fail and break down as expected.

This, and much more, should be implemented before my next update here and I will reflect on those changes, along with the associated obstacles, in my next post. Consider all things, this project seems to be moving along smoothly, even though the team dynamic feels a bit lacking.

Music listened to while blogging: Kendrick Lamar and The C90s

Wednesday, October 16, 2013

Writing Unit Tests for Galaxy

In this post I will be examining and reflecting the processes which must be undergone in order to write a test case for Galaxy, from beginning to end.

The basic rundown of the calling hierarchy:

sh runAllTests.sh
testCaseX.py (testCasesExecutables/)
testCaseX.txt (testCases/)
testCaseDatasets/
testCaseXReport.txt (temp/) and testCaseXOracle.txt (oracles/)
Team3_Testing_Report.html (reports/)

1. Running all the test cases
There is a script located in the scripts/ folder known as runAllTests.sh. This script first removes the last testing report that has been produced. In the future, this will be modified to also delete test case reports, but more on that later. Second, the script iterates through every file in the testCasesExecutables/ folder and runs them using a for-each loop with a python call. It is worth noting that this also runs our helper file, but, because it is only a declaration of functions, it does not interfere with any calls or act as a detriment to timing for these tests. So within each call of a test case executable file we have to consider how we will be...

2. Running the executable
The first executable that will be called is testCase1.py. This file looks very clean as we have abstracted a lot of work that would be repeated in every test case file. We make the imports to subprocess, os, and testCaseFunctions. Subprocess is a module that we can import to make command line calls, a pretty vital element to testing from the command line. The invocation using subprocess is seen toward the end of the executable where we use the line:

# Let's make the call to the shell
subprocess.call(testCaseScript,shell=True)

This allows us to make a call directly to the shell - this is a security issue, but this is a project of ours we are running on open-source software, so this is not too much of a worry for us.

We make a call to os in order to have a default directory we are making calls from - this allows us to write relative paths in a more readable manner. This is done with the following line:

# Change the working directory to Galaxy's Home section to make our calls simpler (this also enforces our helper method calls)
os.chdir('../project/src/galaxy-dist/')

Lastly, we call testCaseFunctions in most other lines of the file. This is to interpret the textual version of the test case, to assemble our command line call, validate results with our testing oracles, and, finally, produce a professional-grade testing report.

3. Textual test case

Located within the testCases/ folder, our textual test cases are a vital part of guiding our test cases. For explanation, I will be referring to testCase1.txt. There exists full documentation about this file within the README within the Docs/ folder of our project, but I will detail a very basic version of the rundown.

Test ID:

A test ID is a textual description of the test being managed at that point. This ends up being passed to the final html. This ID is unique because no test case should be the same as a test case that already exists (but there are not preventative measures stopping someone from adding a congruous test case).

Requirement Being Tested:

This section names the function we are dealing with (for HTML purposes) along with the atomic requirement being handled. In our example, this specific requirement is: "Calculate the length of a single fasta-encoded string. Length = 1."

Component Being Tested:

Galaxy's hierarchy very easily allows us to pick out what component is being tested with this specific requirement. Galaxy has toolsets which are made up of various tools. This tools are, typically, one function tools (in this example, "fasta_compute_length" is one tool that performs one very basic function within the "fasta_tools" toolset). This allows us to pick out the name being managed and even use it when building relative directories into our test cases.

Method Being Tested:

This details the actual method that we are handling. In our example, the entire tool is one method and it is the one we are testing.

Test Inputs:

Test inputs make up the instantiation of the method we are testing. There is a line for each set of inputs we would need in order to test a tool and this is all handled with Python scripting. The order of the inputs is very important, though, as we have to build command-line arguments with these inputs and misordering them can confuse the command line and produce errors or false results.

Expected Outcome:

Expected outcome details which oracle we should refer to whenever comparing our testing report. Additionally, there is a textual description of what we expect to see under the oracle file.

4. Test case datasets

The nature of Galaxy lends itself to using full-on datasets rather than small inputs. Galaxy is meant to be use for data-intensive biology, but there is nothing stopping us from using the methods and input types of large biology datasets with our own test cases. In fact, this is the only way to effectively conduct tests of this nature. For example, within our testCaseDatasets/ folder, we have, currently, 5 FASTA files that contain data that is interpreted by the individual test cases. Following with the example we were using earlier, the testCase1.fasta file is a very simple file that contains a FASTA ID (no restrictions other than having to start with a ">") and then an encoded string on the very next line. The test case for the first test is about measuring a FASTA string that has only one character, hence only the value "C" being in the string section of that FASTA file.

5. Textual test case reports

Each individual test that is executed creates a textual report within the temp/ directory. Currently, each time a test is run, the new test case report will overwrite the old test case report. This is effective and it works, but we are going to work on getting the runAllTests.sh file to also remove all files within this directory (which should be easy with a rm -R call). These reports are nothing more than the results of running the tool with our given inputs. These files are what are compared to the oracle files. The files have to be exactly the same or else there will be a result that ends with a failed test.

6. Full testing report

While we compare each of those textual test cases to their oracles, we are slowly building up an HTML report for a final result within our reports/ section. This is done by using a htmlBackbone.html file we have obtained and appending individual testing results to our full report. The backbone came from running one of Galaxy's built-in functional test and then we stripped out all of the nonsense that was not CSS. This way, we can use Galaxy's CSS while using our tests and create a professional-grade report that mirrors the look and feel of Galaxy.

Music listened to while blogging: Spotify radio station based off of C90s - Shine a Light (Flight Facilities Remix)

Wednesday, October 9, 2013

Deliverable 2 Meta-Experience

From my title you should be able to deduce that the bulk of this blog post will be about the experience I, and my team, had whenever we presented our second deliverable for our testing project.

What I thought was going to be a glorious experience turned out to be one of the more horrific experiences I have had. I thought I had set my team up to be pretty far ahead in our project by already coding up our test cases and having HTML output. Apparently, though, the way I was conducting the tests did not follow the exact specifications for my Software Engineering class. I incorporated a couple of the file calls from Galaxy into my test cases and apparently that is not allowed. We have to, instead, reinvent the wheel. I was improving upon Galaxy's testing suite by implementing a way to conduct individual tests from individual tools (something they have listed in their documentation and wiki as something they need to do). Instead, though, we have to develop a testing suite that stands alone from the Galaxy project. The way I was doing it would be great to have submitted to Galaxy, but it would require too much work to do that and fit the specifications for the Software Engineering team project. While this is disappointing, this is the work that I have been given and, much like in the real-world when working for a company, the work given by the customer is the work that needs to be done.

All in all, this has been a good learning experience, though frustratingly wasteful for me when considering the amount of time I dumped into the functional testing.

The next step for my group and me is to examine the individual units that underlie the functions we have been examining. We have been examining the "Upload Data" toolset and this toolset has a multitude of helper functions that it requires in order to run. By extracting these helper functions, we can test each of them individually which, in essence, is a parallelization of unit testing as this goes hand-in-hand with it. If we can establish unit tests that work on these helper functions, then we may be able to work our way all the way up to testing a function that utilizes these test cases, which, in essence, is creating a functional test. In my eyes, this would be an immense, intensive learning experience that everyone in my team can benefit from.

Music listened to while blogging: The C90s

Monday, October 7, 2013

Deliverable 2 Experience

For starters, this second deliverable tripped my team and me up quite a bit.

Initially, I thought we had to implement 5 test cases (not just identify them) so I took the leap and created those test cases and the HTML output to follow. This is a requirement for the third deliverable of this testing project so at least I am ahead of the game on that.

Subversion, however, has decided that we have to become mortal enemies. When I initially downloaded our team repository, I ended up pulling each of my team's individual development branches in addition to the trunk, which we tried to avoid. We did this so we can simply do merges when we were confident with our changes. We wanted to test our changes and tests before merging our branches with the head of the master branch.

When I tried to push my final testing scripts into Subversion my GUI, RapidSVN, keeps mentioning "Unknown Error!" and a 405 error, which they really do not detail much. Upon short searching, I found that some people were able to fix this problem by re-checking out the repository and then proceeding to make changes. I am currently doing this, but Subversion likes to take its time and take hours to pull down development branches so I am currently waiting on that to finish.
Update: I just made it so I only pulled out the folder I need from the repository and Subversion finished the checkout in a matter of minutes. I then committed my changes up and they can now be viewed in the team repository (link on the right side of the page).

My goal is to get these scripts up before our presentation of our second deliverable tomorrow. I created python files that make a BASH call to a built-in Galaxy method that allows the running of functional tests. I decided to make the call this way because Galaxy creates a minimal HTML file with some important data about the test. So we developed tests and pass them to this BASH call.

The overarching structure:

runAllTests.sh loops and makes calls to each of the 5 test cases
The first test case generates the initial html file after the test finishes running
The proceeding test cases append to that html when their respective tests finish (Using Python to generate HTML code)
Once the runAllTests.sh loop finishes, then the HTML that was created is moved to our reports folder to function as a professional-grade report.

The report is still a work in progress, but it currently outputs the 5 tests we were conducting and uses Galaxy's color and layout scheme for the tests, which looks really nice (and will look better when we finish all the updates to it).

To conclude, one thing I want to get working is creating a soft link from our src directory (currently empty) to the location of our trunk (master branch) version of Galaxy since we have put it in a different place and our current relative paths are based upon the current structure. If this soft link is created, then anyone navigating the folder structure will be able to do so however they please.

Music listened to while blogging: The C90s & J. Cole

Wednesday, October 2, 2013

Reflection

For this post, I will be reflecting upon my recent Software Engineering test and doing a quick update.

The test shocked me at first, which it shouldn't have. It shocked me because I felt like I was writing essays upon essays, it being a complete short answer test. This shouldn't have shocked me, though, considering the nature of software engineering. One thing that did really mess with me on the test, however, was the way we had to recall answers. For example, one of the questions on the test asked about the dual-use dilemma; really just define and describe it. If I am correct now, then I do believe that the dual-use dilemma is that code that you produce can always be misused by someone else for malicious purposes and this is something to keep in mind when creating and disseminating code. This is an idea that is pretty secondhand to most computer scientists in my position, but knowing it specifically by the name 'dual-use dilemma' proved to be the only challenge for that problem. I guess this all really stems from my dislike of typical testing. I prefer a more hands-on approach to tests, rather than just written exams.

Overall, though, I do believe the test went rather well and did test some useful techniques within the realm of computer science and software engineering. For example, we had to detail test cases for a method that was given to us. Figuring out test cases is always a tricky thing to do because of the dozens of trivial examples that can arise that need to be tested. There's usually endpoint cases, blank cases, maximum cases, etc.

As for my update: I'm currently in the works of getting my SpotifyU internship off the ground. It's essentially a marketing internship, but it gives me a chance to work with Spotify, a program that I am, pretty much, obsessed with. I've been enrolled in the intern program for a while, but they have been pretty slow to get us the material we need to actually do the work. My personal research with Learn2Mine is going very well. Obviously I wish I could spend more time on it, much like I did in the Summer, but classes do get in the way of that. One of the reasons I am pumped for graduate school is so that I can spend more time doing research (50/50 ratio of class/research is my hope). That being said, I'm still waiting to hear back from a conference where we have submitted a paper to be a part of the conference proceedings at SIGCSE.

Music listened to while blogging: Badfinger (Breaking Bad inspired this) & Ellie Goulding

Wednesday, September 25, 2013

General Work and Reflection

This post will be unlike a lot of the posts I've had in the past. I will be reflecting on work from most of my classes, rather than just Software Engineering. This is because there have not been any major things done to our Galaxy project since my previous blog posting.

For the Galaxy project, we presented our findings and results of our first deliverable to the class. This included an overview of the project, how we ran the built-in tests, and our experience. Galaxy has built-in python files that can be run in order to conduct tests. One example is "run_functional_tests.py". This runs all the functional tests (takes hours to run) built into Galaxy. Optionally, you can specify a parameter that allows you to run a specific subset of functional tests. We will definitely be doing that whenever going through our tests cases because it would be too much of a hassle to have to wait 6 hours for each time we want to run through our testing suite.

In some of my other classes, we have been working on various projects. In Bioinformatics, I am currently working on a partnered assignment where we have to find the longest common subsequence in two strands of DNA. It's odd because our code, in Python, works on small versions of the problem and it works for the examples given, but our code does not produce the correct answer when we have to submit it for a larger problem. It's very unlikely that the site we are submitting to has errors, but I'm starting to question it at this point. In Advanced Algorithms, we just received back tests whose main subject was computational complexity of various algorithms and applied analysis of other algorithms; we looked at topics such as searching and sorting algorithms, big O, big Θ, and big Ω, recurrence relations, and other topics. In Programming Language Concepts, we have been studying regular expressions, grammars, compilers (every part: Scanner -> Parser -> ...), C, and pointers. In Public Speaking, I am currently preparing a speech that has the objective of informing the audience about the marvels of AI and how it will affect everyone in the future. Aside from that, work on my Bachelor's Essay has not gone into full swing yet as we are still in a heavy development phase for Learn2Mine. We are in the process of beefing up the security for the site as there is an exploit that we found that we are vigorously working to patch up.

Monday, September 23, 2013

First Deliverable Experience (Galaxy)

Working on Galaxy always turns out to be an interesting experience. Getting it installed on my virtual machine was pretty seamless once I figured out which dependencies were needed. Python was pre-installed with Linux Mint 15 so I did not need to worry about getting a compatible version of Python.

Unfortunately, we, my team and I, do not have access to our team SVN repository yet so we have been working in separate areas, still checking out files through SVN. I used my personal directory mentioned in an earlier blog post (./playground/Turner/galaxy/galaxy-dist/...)*.

Mercurial was needed to help initialize the Galaxy files. This was initialized very simply using the command:

sudo apt-get install mercurial

Mercurial is able to be called using the "hg" command (hg is the chemical symbol for Mercury - nifty little easter egg).

Once Mercurial is installed, you can run Galaxy. The source code does not need Mercurial to be extracted (or built), but it needs it for actually running the program itself. To initialize Galaxy, you navigate inside of the /galaxy-dist/ folder and use the command:

sh run.sh

This runs a bash script on the local machine (hence the "sh" for shell) and there exists a file within the /galaxy-dist/ folder named "run.sh". This file contains bash commands that start the Galaxy server.

This initiates Galaxy and it can then be redirected to by typing "localhost:8080" into a browser. Perhaps you have something else using your 8080 port, though. In order to modify the port on which Galaxy runs you would go into the "universe_wsgi.ini" file that exists in the /galaxy-dist/ folder. There are lines in this file that read as follows:

# The port on which to listen.
port = 8080

If you change this to 8081, for example, then you would navigate to "localhost:8081" in order to find the Galaxy interface. In order to push these changes into Galaxy, you have to stop the Galaxy instance from running (stopping the sh run.sh command). This can be done by using a keyboard interrupt (ctrl+c typically) in the terminal that started Galaxy. If, for some reason, Galaxy was initialized using the & (runs a file in the background so a keyboard interrupt is not possible), then you have to find the process number in order to stop Galaxy.

This can be done through the following process:

top

Using the "top" command allows you to view all the processes currently running on your machine. Galaxy is instated through a Python process. Finding the process number for a large Python process is the key here. Once found, make note of the number. Then the following command can be run to stop Galaxy:

sudo kill -9 $processnumber

where $processnumber is the process number of the Python process. Typically, kill -9 is frowned upon because it can lead to really bad cleanup on the computer-end. My simple solution to this is to not run Galaxy in the background. Just have it in a place where you can easily stop it. An alternative is to issue the restart command to Galaxy. Navigating to the /galaxy-dist/ directory and running:

sh run.sh --reload

This allows Galaxy to reload (restart).

All things considered, I think my team is really starting to understand Galaxy and I believe this testing project is going to go super-smoothly.

Music listened to while blogging: alt-j

*Update: We received access to our team repository as I was writing this blog so this will be updated accordingly.
The repository is now located at: https://svn.cs.cofc.edu/repos/CSCI362201302/team3/
Really, the only difference here is that instead of using /playground/Turner/ as the base of operations, we will be using /team3/ as the base of operations.

Wednesday, September 18, 2013

FOSS Project Decision: Galaxy

As I mentioned in my previous post, I am a part of a team for work on a project. This project is centered around a single FOSS Project. We have selected Galaxy as the project we are going to use.

Galaxy is probably one of the best choices for this project for a multitude of reasons. For starters, I am not completely unfamiliar with Galaxy as I know some of the inner-workings. Most of the implementing done in Galaxy is through the use of Python and XML. Galaxy, on a local instance, makes calls to a machine's terminal (or command line) through "tools", a term Galaxy has coined.

These tools are merely abstractions of Python files.

The XML is used as a way to set up the abstracted, user interface that Galaxy produces. All of the inputs, text fields, dropdown boxes, checkboxes, etc. are formatted through the XML. The XML file also has a reference to the Python file that will be run. Lastly, the XML has a listing of the command-line arguments. For example, if after the Python file name if you wanted to have an input, then you would put that input right next to the Python file name (space-separated, as command-line arguments always are). Below, I will detail a full example of a small XML file:

<command interpreter="python"> example.py $input1 $input3 -z "$results" $putin</command>
<inputs>
<param name="input1" size ="4" value="1054" type="integer" label="Number Input"/>
<param name="input3" type="integer" label="A Second Number Input"/>
<param name="putin" type="select" display="radio" label="Select One">
<option value="option1">This One</option>
<option value="2option">Or This One</option>
</param>
</inputs>
<outputs>
<data format="csv" name="results" from_work_dir="results.csv" label="CSV Results"/>
</outputs>

<tests>
<test>
</test>
</tests>
<help>
</help>

So, as you can see, this is a very flexible system. As long as your python file is the first argument in the command interpreter, then you are fine. I used different input names following the call (names do not matter, but usually there's a convention created for readability). The parameters are pretty self explanatory: 2 Integer inputs and a Radio button selection. The values produced from this are indexed by their command line argument. So getting the first integer would be as simple as going to sys.argv[1] (as the filename for the python file is at [0]). Radio button values take on either "option1" or "2option" in this case, even though the GUI presents them as "This One" and "Or This One".

One functionality I do not know much about, but will become increasingly important for this project, is the test(s) area of the XML file. Hopefully this is where I can specify all test cases that need to pass in order for a tool to be 'functional'.

Considering all things, Galaxy may seem convoluted at first, but it is a very useful project for various uses. It can be extended to really do anything as it merely makes command line calls. So with Galaxy, the galaxy's the limit.

Music listened to while blogging: Travis Barker, Notorious B.I.G.

Monday, September 16, 2013

FOSS Project Experience and On Visual Formalisms

For this post I will be reflecting upon my group's FOSS (Free and Open Source Software) project experience and briefly reflecting upon On Visual Formalisms

My Team: Boolean Bombers

Cam Spell
Logan Minnix
Rob Hambrick
Tyrieke Morton
and Me!

In class we worked extremely effectively and efficiently, so the group could not have been paired up any better. Currently, we are looking at the following FOSS projects for use in CSCI 360: Tor, Celestia, and Galaxy. I have a personal preference for Galaxy as I have a significant biology background (having my Data Science concentration being in Molecular Biology). Additionally, Galaxy does not have as much of a learning curve as some other FOSS projects. I feel this way because Galaxy is largely written in Python (a language everyone starts with here at the College of Charleston) with some XML on the side.

The On Visual Formalisms article reminded me of the daunting experience I had in my Introduction to Abstract Algebra class here at College of Charleston. It had a lot to do with set theory and the mapping of functions across different sets, which is not a hard topic in-and-of itself, but also having this as the introduction to formal proofs class is what makes me feel that this memory should be repressed. I did take away all the knowledge from that class and I am able to apply it to everything I know today. I had taken that class before ever thinking about taking a computer science course so whenever I see people using union operators or saying that functions map 1-1 onto R^N, I know what it means in a heartbeat. That being said, the set data structure was the easiest to understand whenever I stepped into Programming II (Java). I have digressed a bit here, but I feel that is almost the point of a blog sometimes. All in all, graphs, sets, and anything of the liking have become somewhat a strength of mine. I may not have ever heard of the term 'Hypergraphs' or 'Euler Circles', but it was pretty easy to pick up on what the author was trying to convey. To put it simply, applying set theory principles to graph theory. In graph theory, you can connect points and add direction to the connection if desired. That is pretty basic, though. With these Euler Circles you can actually relate three or more points at a time. In fact, you are relating sets (picture crazily-shaped venn diagrams). You can use operations to check the intersections, complements, etc. In hypergraphs, shapes, locations, distances, and sizes do not matter. If you go to the article I hyperlinked at the beginning, then you can see some of the abnormal looking graphs. Regardless, at the end of the day, a hypergraph represents everything you can do with just one set. Euler Circles however, are ways to relate entire sets through structure.

But why? Why do we care? Can this help us in software development? Well, the immediate example that comes to my mind is the diagramming consequences. You could represent a class, fully with superclasses (or interfaces) that are implemented and show a hierarchy while gaining meaning from the different types of graphs you are using. You could break down people involved in a company in this manner. There may be a person interface from which everyone inherits. Customers may be allowed to perform certain functions with the company (such as put in requests). Employees, however, could be broken up based upon their positions, gender, etc. Between all the people you could have attributes you would represent in a graph, such as isMarriedTo, livesWith, etc. These arrows would connect these different subclasses. Essentially, this feels like an alternative way to represent a class diagram, to put this in terms of UML. This most recent example is actually exemplified by the use of higraphs.

Music Listened to While Blogging: Kanye West

Tuesday, September 10, 2013

The Mythical Man-Month

For this response, I will be responding to the following article: The Mythical Man-Month

For starters, I think it is important to note that just because a program works does not mean it is done. I can write a program that calculates the traveling salesperson problem (just a hypothetical, simplistic example, unfortunately) in polynomial time, but if there is no documentation, no way for anyone to use my results, no way for someone to read my code, then what is the point? It would be much better to have this program written with ample documentation, readability, etc along with directions for setting up the extinsibility (assuming open-source). Also, just because the program works does not mean it is the best. What if the program I wrote only works on my souped up computer that has terabytes of space. What if someone wants to use this on a lighter rig? There is no way that anyone with a reasonably-priced computer would be able use it. Memory space, I/O devices, and computer time are all important things to consider in code. Getting the program working is merely part of the overall process that is software development and engineering.

Let's say you do get the program working. Well, how long did it take? Did it take longer than you expected? The answer is most likely yes. The Mythical Man-Month cites the optimism of programmers and this makes sense in the context of this situation. At the beginning of a project, you are to over-exaggerate your own planning and computer skills. This only makes sense though because if you do not look at yourself and your skills in a good light, then how would you even get a job or have the motivation to do anything with your skills. Being an optimist produces results.

I had never heard the term man-months and did not even know what it means. A man-month is a way to refer to payment for a software project (people working on project times the months to complete the project). It makes perfect sense that this would be a terrible measure, in general. The author mentions the non-interchangeability of these two variables. If this were a good measure, then I should be able to double the speed of a projects completion by doubling the amount of people I have working on a project. It just doesn't work that way. Explaining this would require the use of a logarithmic graph (Obviously this is not always true - I feel it is just the typical case). This is because you can reach a point where you can have so many developers that one more is not going to add to the speed or quality of the project in the end. You can also have a project that is doomed to never complete due to bad design and poor requirements elicitation from early-on. These projects will not be able to make it to completion unless major changes are made. You cannot simply just hire someone else and have a linear relation between months and workers. As the author puts it, sometimes there is no threshold of people that can even affect time (the analogy of no matter how many women we introduce, there still is just one period of being pregnant that has a fixed time).

The next few sections of The Mythical Man-Month pretty much boil down to a few things that I stated here and a few others. Never undercut the time you think you need on a project. Always leave time for debugging and fixes - even "perfect" elicitation of requirements can still result in a number of bugs and issues. Make sure each member of your team fits into the role that they have on their project and make note that anybody can act in any role, but defining primary roles is always important (ex// a Software Architect can aid in the writing of a Systems Test). This fits into the 'Surgical Team' analogy where roles have been divided up. But even in surgery, anything can happen, anything can go wrong, and improvisation can become a necessity, as aforementioned. The improvisation is a little different from the author's intention, but I do feel that it is a corollary that needs to be added.

Considering everything, it is important to have a well-defined role in a project. It is crucial to have realistic, not to imply optimistic cannot be realistic, goals and deadlines. It is vital to be able to understand other parts of the project. I added this last point because I feel like it just needs to be said. If I'm a database administrator, I can work all day and really not know what is happening on the front-end of my application, but is that what we really want? Of course not. I would need to understand what is happening in the application. The nature of the security of the information may not be evident until understanding the rest of the application. All things said, the man-month is a ridiculous idea and this myth has been confirmed.

Music listened to while blogging: Robin Thicke & Sublime

Monday, September 9, 2013

The Future of Software Engineering and Programming

When formulating this response, I considered two articles in addition to my own thoughts and opinions:
The Future of Programming
Lifecycle Planning

The Future of Programming, by Robert Scoble, had me thinking before I even started to read or watch the video. I saw that the article was talking about programming in the cloud (using Cloud 9 as their example). Last year, I purchased a Chromebook and I was searching low-and-high for a cloud-based IDE that would allow me to execute and store code reliably. Cloud 9 was the platform I eventually landed on. While Cloud 9 is not the greatest IDE. For example, hidden characters can get included in files and cause issues - my problem was hidden characters coding for spacing and, using Python at the time, resulted in terrible, relentless issues only solvable by using VI. Which brings me into my next point: the terminal. Cloud 9 builds a lightweight virtual machine for each of their users. This has a multitude of benefits including added security (it is not possible for one user to tamper with another), terminal usage (great for developers, like me, who move around using the command line a lot), and structured file management (as is inherent with a full virtual machine). I only used Cloud 9 for python development, but, from the video, it is so evident that Cloud 9 is a multi-purpose IDE. I could develop a full-fledged project with a group of collaborators because of the cloud-based nature of this program. The cloud-based environment allows for live collaboration. This means that version control is not an issue because merge conflicts just will not happen if all the coding is done on Cloud 9. Additionally, Cloud 9 is not slow (unless you have not logged in for a few weeks and they have to reboot your virtual machine). I was able to conduct pretty heavy duty genetic algorithms on the cloud with Cloud 9 when I was in my Data Mining class. All things considered, programming, in the future, is not going to require all these base installs. We're slowly moving to a cloud-based world and cloud-based solutions are going to be the future. As a user, do you want to navigate to a site, download a program, install it, then have the ability to run it? Or, as a user, would you rather just navigate to a site? The answer is obvious.

When considering Software Engineering, it is always important to consider your design model. Most everyone involved in any form of Software Engineering is familiar with the waterfall model. The waterfall model, essentially, flows like this:

Software Concept <-> Requirements Analysis <-> Architectural Design <-> Detailed Design <-> Coding and Debugging <-> System Testing

My biggest problem with this model is that it is impossible to have a complete idea of the Software Concept at the beginning of the project. Also, the more important reason, Requirements Analysis is such a tedious problem to break down into all of its atomic elements from the beginning. Software developers do not know what the specific details of each feature are from the very beginning. It is a continuous process of finding out new things about the requirements, whether it is from limitations you find out later, a client that did not detail everything the correct way, etc. While the waterfall model is bidirectional, climbing back up the waterfall is always described as 'being difficult' (going left with the way I detailed the model). There is later mention of an overlapping waterfall model, but, really, that is just an attempt to grab up the good from the spiral model and add it to the waterfall model. This is because you can go back phases and elicit more requirements/functions/tests/etc, but all without ruining your model (more on spiral later).

Next, I will address the Code-and-Fix model. This model is pretty funny to me because it really is the model programmers use when they first start coding. You do not really know the structure of your applications and you just build in new features without good documentation or test cases. It really is the trial-and-error and 'hope I remember' form of coding. This is a bad form of coding because you never remember everything. Unfortunately, there are a few pieces of my big project right now where I have adopted this approach, but it is slowly undergoing major change in order to meet documentation and requirements analysis standards.

Next, the spiral model, I feel, is the most popular and efficient approach to software development. This is because you will never be able to elicit all your requirements, as I've previously mentioned, at the beginning. A continually developed set of requirements is the way to go about a project. This way new features can be added as needed, tests can be developed with software as new cases or exceptions are thought up. All in all, starting on an extremely small scale and steadily growing is how most projects tend to fall for these very reasons.

The evolutionary prototyping model is interesting because it really means you are having trouble with the requirements. I feel like this model is good for showing the customer what their software will look like in the end, but without all the tedious backend development because the customer could just be ignorant of what they actually desire. This process runs into issues with time management because a lot of time can get wasted pretty quickly on designs that the customer rejects.

The Staged Delivery and Design-to-Schedule models are extremely similar so I will consider them together. The heart of each of these models seem to be rooted in the same place as Agile Development. With deliveries staged throughout the design process, the developers never fall behind and, if a customer does have a problem with a requirement, then it can be fixed earlier rather than later. This is the preferred method of software development and software engineering, I feel. The idea of conducting specific tests on specific pieces of software built in iterations is the best way to do it. You can set up small test cases and build features that only fit those tests and do this rather quickly because the problem has been broken up into an almost atomic manner.

Considering everything, the type of software development and engineering cycle you use will depend on the type of project you are conducting, but there are reasons to use and not to use each of the models. What does this mean for the future of programming, though? With cloud resources becoming easier and easier to consume, then does that mean the software development and engineering cycle could become the same way? Yes, yes it does. Just the mere fact that you can collaborate with someone on a single file from the opposite side of the world at the same time without running into version control conflicts is amazing. If I'm a customer, then I can test a product's (provided I have a slight amount of domain expertise) features without having to be in a physical meeting with a software team. This means that meetings can also take place over the cloud and could be even more efficient than in the past. The future is now and we should all embrace it.

Music listened to while blogging: Mysonne & Tech N9ne

Thursday, September 5, 2013

CS 360 Homework 7

For this post, I will be responding to a few articles

The Magical Number Seven, Plus or Minus Two
Having taken cognitive psychology classes in the past, I was very familiar with topic-at-hand here. The article talks about how you can only hold about 7 things in your working memory at once (+ or - 2 depending on the person or situation). This feels like a pretty arbitrary schematic for detailing working memory, though. What would actually constitute one of these 'things'? A better way to detail working memory, I feel, is through detailing of the Central Executive. The Central Executive is an idea posited by Alan Baddeley to model working memory in 1974 (it has been refined since then). So instead of just remember 7+-2 things, you have three main areas in which you can process - the phonological loop (language), the visuospatial sketchpad (visual semantics - often abbreviated "the pad"), and the episodic buffer (short-term, episodic memory). This article refers to Miller's law; Miller's law has stood the test of time and is generally accepted, but there tends to be plentiful evidence against the 5+-2 from outside cognitive experiments. I believe Miller's law should just be taken as a rule of thumb for working memory, and not as actual factual evidence. Additionally, Miller's law did touch on things that Baddeley's working memory model covered, but Baddeley's model just feels more elegant and has a lot more backing.

Security and Privacy Vulnerabilities of In-Car Wireless Networks:
A Tire Pressure Monitoring System Case Study
It is a known issue that cars can be hacked and terrible things can occur. The most common example when talking about cars getting hacked is the tire pressure monitoring system. According to this article, these can be hacked from as far as 40 meters away. These authors, in their paper, clearly detail out how they conducted this experiment with ample graphs and descriptors. Also, they clearly followed the scientific method - they found a problem, tested it with two different tire pressure monitors, and reached conclusions which are all clearly outlined. Honestly, I wish the conclusion to this experiment was different. It is a little nerve-wracking knowing that the wireless signals being bounced around my car barely have any encryption (some signals having none).

Planning for Failure in Cloud Applications
Cloud applications. They're great, they're awesome, and they rock. You can access them from anywhere, anytime. If something seems this good, then there are bound to be some downsides. I have my own cloud-based application (Learn2Mine) and it goes down occasionally, but 95% of the time it goes down it is because we're relying on outside resources to keep our system safe and sound. Specifically, things as simple as Google APIs can sometimes take us down if their authentication causes a hiccup, which is known to happen. Other times, however, Portal (at CofC) will be down, which restricts users from 2/3 of the cloud application. This article does have me reconsidering our 'failure' pages for when services or certain operations do not work. It is a lot less shocking when you see a "this page is temporarily down and will be back up soon" type of page rather than seeing a "500 INTERNAL SERVER ERROR - HIDE YOUR KIDS AND WIVES, WE DON'T KNOW WHAT'S HAPPENING" kind of message. These different pages really give a different form of connotation. In my eyes, it seems like if developers have taken the time to customize an error page like that, then they probably know what they are doing. Also, I had not considered creating 'retry' blocks of code. I have NoSQL database that I reference and if a call fails, we make the assumption that a user may not be created. We could probably run into some weird errors if this hiccup were to happen to our application.

Music listened to while blogging: J. Cole

Wednesday, September 4, 2013

CS 360 - Requirements Engineering

For this post, I will be responding to questions/scenarios about Requirements Engineering from Sommerville's 9th edition of Software Engineering.

Using the technique suggested here, where natural language descriptions are presented in a standard format, write plausible user requirements for the following functions:
-> An unattended petrol (gas) pump system that includes a credit card reader. The customer swipes the card through the reader then specifies the amount of fuel required. The fuel is delivered and the customer's account debited. (customer = user)

Alert USER whenever credit card is declined
Alert USER whenever credit card is not read correctly
Alert USER if card reader is not working correctly
Alert USER if amount of gas in the pump system is lower than their specified gas amount
Stop flow of fuel whenever the USER inputted amount has been reached
Start pumping fuel whenever USER initiates pumping (mechanism not specified)
Print out receipt when finished pumping fuel
Deduct funds from the account of the USER upon finishing pumping
Alert USER if funds are not sufficient for the purchase

-> The cash-dispensing function in a bank ATM

Report insufficient funds to USER
Report empty ATM
Report invalid input for cash retrieval
Report insufficient amount of specific denominations
Dispense cash whenever prompted
Deduct funds from account upon dispensing cash

-> The spelling-check and correcting function in a word processor

Underline incorrectly spelled words
Analyze syntactical meaning of sentence with corrected word
Allow USER input to add new words
Allow USER input to add new definitions
Allow USER to override corrections

Suggest how an engineer responsible for drawing up a system requirements specification might keep track of the relationships between functional and non-functional requirements.
Using diagrams can be very helpful in these types of situations. Drawing up UML diagrams, for example, is an extremely fruitful way to go about this. Sequence diagrams, for example, are useful for representing functional requirements as well as some non-functional requirements. Any function that has a lifeline with a sequence diagram would be a functional requirement because that is something that is required by the system itself. That is an explicit, specified method. The time that the function is supposed to hang around could be a potential non-functional requirement. Perhaps there is a requirement that a user be notified within 5 seconds of something occurring; this would be a non-functional requirement. The sequence diagram example is just an example - all types of UML diagrams have their time, place, and meaning.

Using your knowledge of how an ATM is used, develop a set of use cases that could serve as a basis for understanding the requirements for an ATM system.
This is not an exhaustive list of requirements, by any means:

A USER withdraws a given amount of money
A USER deposits a given amount of money
A USER inputs an incorrect pin
A USER deposits a check that can be read
A USER deposits a check that cannot be read
A USER inserts too many bills for a deposit (30+)
A USER inserts too many checks for a deposit (30+)
The ATM is out of cash
The ATM does not have sufficient funds for the USER's withdrawal
A bank card cannot be read
A bank card is left in the ATM
A bank card is invalid
A bank card is flagged as a potentially stolen card

Use cases stemming from from these requirements can proceed as follows (again, not exhaustive):

1. A user goes to the ATM. The user inserts their bank card into the ATM. The user keys in their pin. The user deposits checks (no more than 30). The user confirms the check totals on the screen. The user finalizes the deposit. The bank card is returned to the user.

2. A user goes to the ATM. The user inserts a fraudulent bank card. The ATM reports that the bank card is invalid. An external company is notified (i.e. the police).

3. A user goes to the ATM. The user inserts their bank card into the ATM. The user keys in their pin. The user attempts to withdraw more cash than the ATM has left. The ATM reports that it does not have sufficient funds.

Music listened to while blogging: J. Cole & STRFKR

Monday, September 2, 2013

Subversion Repository Experience

So for this post I will blogging about my experience setting up a subversion repository for my CS 360 (Software Engineering) class. We are setting these repositories up for future work in the class (will definitely be talked about in later posts) and setting up this software itself is a learning experience.

In the past, I worked with Git under the brands of GitHub and BitBucket, and Subversion (SVN) has me missing them. I have done a little work with Git repositories at the command line, but, typically, I used a GUI for all those troubles because the GUIs make dealing with these version control systems really simplistic (the specific GUI I used back then can be obtained by using the "sudo apt-get install git-gui" command from any flavor of linux platform). The only issue that I ever had with those GUIs were the merge tools that they provided. There are other tools for that, though, and that is beside the point here.

SVN confused me at first because I'm using Linux Mint as my system. I read a few things from Mint support sites and Stack Overflow that Mint has had issues with SVN for a few of its releases. Luckily, 15 (Olivia) was not of that nature. From that point, it was smooth sailing to install SVN. Cloning the class repository was also straightforward and was easy to do with just a few commands. The only thing that threw me for a loop was when I noticed a lack of the description of the "pull" command from the introduction I was reading. Typically, I've found, committing a change is not enough to expose your results to the web. With SVN, it seems, that committing is the final step (and detailing the commit is just as easy) to the entire process. I want to read into what implications this has for merging files for future reference. This is because with the git programs in which I am familiar, you have to conduct a commit and then do a merge before doing a final push to expose your changes to the web.

All in all, setting up the repository was a good learning experience because I now know how to use another file version control system. The difficulties were minor in that this experience was not one meant to cause too much trouble.

Here is a link to the repository.

Music listened to while setting up repository and blogging: 50 Cent, Ozzy Osbourne, & Wale

Sunday, September 1, 2013

CS 360 Homework 5

For this post I will be responding to 6 articles:

An Investigation of Therac-25 Accidents (Nancy Leveson & Clark Turner)

After Stroke Scans, Patients Face Serious Health Risks (Walt Bogdanich)

FDA: SOFTWARE FAILURES RESPONSIBLE FOR 24% OF ALL MEDICAL DEVICE RECALLS (Paul Roberts)

The Role of Software in Spacecraft Accidents (Nancy Leveson)

Who Killed the Virtual Case File? (Harry Goldstein)

IG: FBI's Sentinel program still off-track, over budget (Gautham Nagesh)

Rather than break down each article and talk about them individually, I will, rather, break my response up into different points, citing examples and arguments from the articles. Citations will be using last names (distinctions will be made for the two Leveson articles).

So, really, what is going on here? Software is failing and killing people. This is a complete violation of dependency principles I listed in my previous blog post, namely the safety and reliability principles. Who is to blame for these accidents? Developers? Users? The software itself? While we could play the blame game for hours trying to debate who really is at fault, I will end that argument early and say that these issues are everyone's fault.

In the Therac-25 accidents (Leveson & Turner) there are multiple facets in which you can approach the issue-at-hand. The developer plays a part in the blame because of the terrible interface design that resulted in cryptic, meaningless error messages and because when trying to patch these life-threatening bugs they failed the first few times. At least they tried, but there are fundamental flaws that Leveson & Turner detail. The issue that sticks out the most to me is their unit testing flaws. With the right unit tests, it would be a lot more difficult for bugs to creep up and rear their heads. Additionally, Leveson & Turner state how documentation should not be an afterthought. As a programmer, I wholeheartedly understand how drab annotating software and writing ample documentation can be, but I also understand good software engineering practices. You have to have good documentation, you have to document as you code, and you have to make sure it is good enough so that even Joe Schmo, who happens to be an okay programmer, can read it and know exactly what is going on with the code.

The article written by Paul Roberts (FDA:...) states that software quality is becoming a more and more emphasized interest in the eyes of the FDA. This makes absolute perfect sense considering all of the tragedies from the articles. Roberts talks about how there was an instance of an AED containing a vulnerability that would allow unsigned updates to be allowed to push through the AED. So anyone with working knowledge of how these devices work could potentially silently take the life of anyone with the device. Obviously, this is an enormous problem. This issue mirrors an issue I saw in a Ted Talk (All Your Devices Can Be Hacked ~ Avi Rubin). This talk showed how many devices could be hacked to perform duties and operations that should not be allowed. For example, a car could be hacked to do things as innocuous as changing the radio station all the way to manipulating the signals coming from the tire pressure gauges. The implications of software coded without considering the principles of software engineering are always terrible. To reflect upon an earlier blog post, maybe there should be some sort of certification or test to allow people to work on software that could lead to the threatening of lives. Essentially, employers should make sure that they know whom they are getting in bed with before hiring them to work on major projects. So burden is shared with project leaders and employers whenever software does not work as expected.

There are other issues that can arise with software projects. Say, in development, there may be a terrible amount of inefficiency. Take the Sentinel project, for example, there were so many problems that arose, as Nagesh details. These problems lie within requirements that should have been clearly outlined at the beginning of the project. This project failed on the same level as the projects we have mentioned previously, but the consequences here are of a different nature and caliber. In the radiation incidents there were consequences where the taking of lives was involved , but here the consequences tend to fall around the loss of lots of money and time. While it is obvious that the radiation incidents had the worse consequences, the nature of the Sentinel project still fell in the realm of inefficient and terrible software engineering practices. This very same idea is recapitulated with the spacecraft incidents. Software has been the cause of a lot of the accidents, such as with the Ariane 501. Bad software caused a lot of those crashes, whether it had to do with bad programming (engines failing) or whether it had to do with user-error (reporting in different units - imperial and metric). There were less harmful faults, such as the SOHO issue where communication was lost for 4 months. Really, all the issues being examined here either caused the loss of life or the loss of lots of money.

All things considered, good software engineering principles lead to good software that conducts as expected. Whereas conduction means it was conducted with the target cost, within the target time, and works with efficiency expected by the customer. There is also a significant amount of user errors that can be glossed over easily (as with the units error) that should be specified within requirements elicitation. Good requirements lead to good software.

Music listened to while reading/blogging: Ellie Goulding & Jay-Z
TV watched while blogging: It's Always Sunny in Philadelphia

Wednesday, August 28, 2013

CS 360 Homework 4

For this post, I will be consulting the Software Engineering 9th edition textbook (by Sommerville) once again. I am going to give responses to the following scenarios:

Giving reasons for your answer, suggest which dependability attributes are likely to be most critical for the following systems:
First, the following are dependability properties/attributes:

Availability <- Informally, the availability of a system is the probability that it will be up and running and able to deliver useful services to users at any given time.*
Reliability <- Informally, the reliability of a system is the probability, over a given period of time, that the system will correctly deliver services as expected by the user.*
Safety <- Informally, the safety of a system is a judgment of how likely it is that the system will cause damage to people or its environment.*
Security <- Informally, the security of a system is a judgment of how likely it is that the system can resist accidental or deliberate intrusions.*
Repairability <- With the inevitability of system failures, diagnosing, accessing, and fixing the issue quickly represents good repairability. Open source software makes this easier.
Maintainability <- As software systems are used and live on, new requirements and features will emerge and maintaining the old usefulness of the system as well as accommodating for the new features represents good maintainability. Making changes and adding features should not break a software system.
Survivability <- The ability of a system to continuously deliver service whilst under attack and whilst parts of the system are disabled.*
Error Tolerance <- User input errors should not occur. Error handling is important (whether automatic fixing happens, or prompting for user input).

1-4 are the four principal dimensions of dependability.

5-8 are system properties that are also dependability properties.

* represents definitions straight from the textbook and a lack of * means that I inferred my own definition from their description and detailing.

An Internet server provided by an ISP with thousands of customers

Internet services tend to have their own niche of specialization when it comes to dependability attributes. If a service is hosted on the web, then that service should always be available and have many measures taken for safety and security. I hone in on these three because the whole purpose of hosting a service over the web is lost if that website or application is not available. Those thousands of customers will have to halt whatever it is that they are doing and that is something that should just never happen. The Internet, inherently, has a need for safety and security. If any sensitive information (including simple logins) is being sent to a server, then proper safety precautions should be taken in order to secure customers' data.

A computer-controlled scalpel used in keyhole surgery

The attribute that cries out when reading this scenario is safety. When dealing with the health of people, safety always comes first. Especially so in this scenario because a scalpel controlled by a computer could easily knick an artery or really any tissue inside the body not meant to be hit with a scalpel during this surgery. A second attribute that falls into this same vein is error tolerance. If the user on the computer-end of the scalpel hits a wrong key or makes an odd motion, then the scalpel software should know to halt whatever it is doing and not make some rash, expedient decision.

A directional control system used in a satellite launch vehicle

In this situation, maintainability and reliability are the most crucial dependency attributes. Because this satellite launch vehicle is going to be used in a way that launches a satellite that people will not be able to physically access any longer, the software better not fail because a bold programmer decided to add a new feature toward the end of the project. Also, if this control system were to be used for a different satellite at a later time, then the software should not produce unforeseen results and should still be able to be trusted. If an error does occur, then the software better have a good semblance of repairability so that issues do not last long enough to ruin the entire system.

An Internet-based personal finance management system

When it comes to any kind of personal system (and especially that of a fiscal nature), then security is always going to be the most important dependency attribute. I'll just reference the first scenario here and all the security issues I brought up with it rather than recapitulate ideas I have already expounded upon. Also, because the data that is being dealt with contains financial data, then it is also vital to put a lot of focus on error tolerance. No one wants to be the victim of some financial hiccup that sub-par software caused.

In a medical system that is designed to deliver radiation to treat tumors, suggest one hazard that may arise and propose one software feature that may be used to ensure that the identified hazard does not result in an accident.

For starters, this scenario sounds eerily relevant to the Therac-25 incidents. So a hazard that can arise is that someone could accidentally deal a dosage that is magnitudes higher than needed. For example, let's say that the software is set up to work with some hardware so that you enter a dosage amount on a keypad. You would have to press down on the keypad in order to give a dosage to a patient. Well, what happens when a nurse is trying to give a dosage of 5 (let's ignore units since this is not my domain of expertise), but that 5 gets pressed twice. Well now the dosage is 11 magnitudes higher. Now there is a potential death, lawsuit, and probably some other things no one should have to deal with. A software feature that could prevent this is AI built into the program. Say the program has access to patient information (we are assuming the software has ample security measures in order to prevent people from having access to sensitive information) and it has an idea of what the dosage that the doctor/nurse would be giving to the patient is. If the program would stop the treatment if the dosage exited a certain bound of acceptance, then the doctor/nurse could double-check and make sure they are not about to kill their patient.

Using the MHC-PMS as an example, identify three threats to this system (in addition to the threat shown in Figure 11.8). Suggest controls that might be put in place to reduce the chances of a successful attack based on these threats.

Asset <- The records of each patient that is receiving or has received treatment will now be at risk if this hacker gets into the system. This completely breaks doctor-patient confidentiality and that is a huge problem for our medical system.
Vulnerability <- The weak password system allowed users to use children names as passwords as well as not enforcing any kind of heavily secure password. Without enforcing the need for capital letters, numbers, or symbols, people will be lazy and not use secure passwords. In a regular environment, like email, this is not that big of an issue, but this is dealing with medical records so passwords should be forced to be secure.
Exposure <- There will be ample financial loss from an issue like this. The sports star could potentially sue the hospital, other patients could take their business elsewhere, and the hospital will have to pay for new software because obviously their current software is pretty bad

In order to fix these issues, the software engineers could enforce extremely secure passwords (ex// At least one uppercase letter, at least one lowercase letter, at least one symbol, at least one number, and no proper nouns or passwords related to your name).Additionally, they could set up a better protocol for identifying people that are related to patients before giving information out. It does not make sense that this person was able to claim that he/she was related to this sports star and, thus, was able to retrieve sensitive information about that person.

Music listened to while blogging: None because the computer I was working on will not allow my to use headphones (extremely frustrating)

Monday, August 26, 2013

CS 360 Homework 3

This post will consist of me answering two questions from chapter 10 of Sommerville's 9th edition of Software Engineering

A multimedia virtual museum system offering virtual experiences of ancient Greece is to be developed for a consortium of European museums. The system should provide users with the facility to view 3-D models of ancient Greece through a standard web browser and should also support an immersive virtual reality experience. What political and organizational difficulties might arise when the system is installed in the museums that make up the consortium?
The creators of the system are likely to run into a number of issues with this software. The following write-up assumes that not all museums are included in the consortium. Museums themselves will be likely to lose revenue that they would normally rake in even though this is, what I would consider, revolutionizing museums. Some museums are funded by external sources (taxes, subsidies, etc.), but not all of them. Others may sell concessions to maintain themselves as well as charging a minor fee for entering the museum. Now the museums within the consortium are going to make more money, or at least not lose any money, but that is pretty impertinent when considering difficulties. Additionally, a lot of what happened in ancient Greece is not exactly G-rated. There are bound to be parent groups that would pull the severely uptight parent card and become infuriated that their elementary-student child was studying Greece and stumbled upon a 3-D model of a nude statue or a mildly gory painting depicting Alexander's armies in action. Granted this issue may arise just because of the museum itself, but I have a feeling that the people complaining about these innovations are not exactly the most rational. Also, guides within museums may lose their importance because of the existence of a virtual tour guide through the museum (and more!). This could lead to issues because when workers at your establishment are not happy, then problems can arise. The problems could be as innocuous as a complaint or as bad as a strike. Regardless, the software will be blamed as the cause and that is no good.

You are an engineer involved in the development of a financial system. During installation, you discover that this system will make a significant number of people redundant. The people in the environment deny you access to essential information to complete the system installation. To what extent should you, as a systems engineer, become involved in this situation? Is it your professional responsibility to complete the installation as contracted? Should you simply abandon the work until the procuring organization has sorted out the problem?
I am going to start this off by saying I hope I am never in a situation like this. I would say that as a systems engineer I would be extremely involved in this situation. Redundancies in a financial system could have serious legal implications as numbers may not be up to date in certain places, resulting in misinformation that could exacerbate other problems, creating a terrible spiral for the person involved and the financial company. This falls under the responsibility of the systems engineer because of the need to comply with external regulations. So not only is it the 'right' thing to do, but it is also the 'legal' action to take. It would be laughable if someone thought the best solution would be to abandon work and not help at all. You are involved in the development of this system and oftentimes you have to step outside your comfort zone. Even if you were not responsible whatsoever to be involved in this situation you would still have an obligation because you still are a part of the project. It may not be encapsulated in your job title or description, but, as a team member, it falls down on you as much as everyone else.

Music listened to while blogging: Pharell, alt-j, & Blink-182