An Investigation of Therac-25 Accidents (Nancy Leveson & Clark Turner)
After Stroke Scans, Patients Face Serious Health Risks (Walt Bogdanich)
The Role of Software in Spacecraft Accidents (Nancy Leveson)
Who Killed the Virtual Case File? (Harry Goldstein)
IG: FBI's Sentinel program still off-track, over budget (Gautham Nagesh)
So, really, what is going on here? Software is failing and killing people. This is a complete violation of dependency principles I listed in my previous blog post, namely the safety and reliability principles. Who is to blame for these accidents? Developers? Users? The software itself? While we could play the blame game for hours trying to debate who really is at fault, I will end that argument early and say that these issues are everyone's fault.
In the Therac-25 accidents (Leveson & Turner) there are multiple facets in which you can approach the issue-at-hand. The developer plays a part in the blame because of the terrible interface design that resulted in cryptic, meaningless error messages and because when trying to patch these life-threatening bugs they failed the first few times. At least they tried, but there are fundamental flaws that Leveson & Turner detail. The issue that sticks out the most to me is their unit testing flaws. With the right unit tests, it would be a lot more difficult for bugs to creep up and rear their heads. Additionally, Leveson & Turner state how documentation should not be an afterthought. As a programmer, I wholeheartedly understand how drab annotating software and writing ample documentation can be, but I also understand good software engineering practices. You have to have good documentation, you have to document as you code, and you have to make sure it is good enough so that even Joe Schmo, who happens to be an okay programmer, can read it and know exactly what is going on with the code.
The article written by Paul Roberts (FDA:...) states that software quality is becoming a more and more emphasized interest in the eyes of the FDA. This makes absolute perfect sense considering all of the tragedies from the articles. Roberts talks about how there was an instance of an AED containing a vulnerability that would allow unsigned updates to be allowed to push through the AED. So anyone with working knowledge of how these devices work could potentially silently take the life of anyone with the device. Obviously, this is an enormous problem. This issue mirrors an issue I saw in a Ted Talk (All Your Devices Can Be Hacked ~ Avi Rubin). This talk showed how many devices could be hacked to perform duties and operations that should not be allowed. For example, a car could be hacked to do things as innocuous as changing the radio station all the way to manipulating the signals coming from the tire pressure gauges. The implications of software coded without considering the principles of software engineering are always terrible. To reflect upon an earlier blog post, maybe there should be some sort of certification or test to allow people to work on software that could lead to the threatening of lives. Essentially, employers should make sure that they know whom they are getting in bed with before hiring them to work on major projects. So burden is shared with project leaders and employers whenever software does not work as expected.
There are other issues that can arise with software projects. Say, in development, there may be a terrible amount of inefficiency. Take the Sentinel project, for example, there were so many problems that arose, as Nagesh details. These problems lie within requirements that should have been clearly outlined at the beginning of the project. This project failed on the same level as the projects we have mentioned previously, but the consequences here are of a different nature and caliber. In the radiation incidents there were consequences where the taking of lives was involved , but here the consequences tend to fall around the loss of lots of money and time. While it is obvious that the radiation incidents had the worse consequences, the nature of the Sentinel project still fell in the realm of inefficient and terrible software engineering practices. This very same idea is recapitulated with the spacecraft incidents. Software has been the cause of a lot of the accidents, such as with the Ariane 501. Bad software caused a lot of those crashes, whether it had to do with bad programming (engines failing) or whether it had to do with user-error (reporting in different units - imperial and metric). There were less harmful faults, such as the SOHO issue where communication was lost for 4 months. Really, all the issues being examined here either caused the loss of life or the loss of lots of money.
All things considered, good software engineering principles lead to good software that conducts as expected. Whereas conduction means it was conducted with the target cost, within the target time, and works with efficiency expected by the customer. There is also a significant amount of user errors that can be glossed over easily (as with the units error) that should be specified within requirements elicitation. Good requirements lead to good software.
Music listened to while reading/blogging: Ellie Goulding & Jay-Z
TV watched while blogging: It's Always Sunny in Philadelphia
No comments:
Post a Comment