« blog value | Main | TV viewing habits »
April 01, 2003
High Availability Programming
I can remember reading awhile back about how one of the Big Three American automobile makers was looking to drastically change their car designs. No longer would there be a steering column, or pedals like we're currently used to. The need to have a left or right hand only driving experience would go away as well, since the wheel could be transfered to either side (as could the pedals). Looking now I can't find the article or pictures about it, but the technology behind it was really interesting. Actuators to sense the rotational degree on the wheel, pressure sensors on the pedals, and all sorts of other neat changes. The result would be a drastic cut in cost for manufacturing, and hopefully a cheaper car for us to purchase with less parts that could malfunction.
Once the "wow! cool!" factor dissipated, it began to sink in that this isn't the coolest idea yet. This would change the state of a car from a mechanical system, to that of a total electronic device, meaning it would be controlled by software. Looking at the state of software development today, I'd find it near impossible for me to purchase such a system given my understanding of software. For example, what happens once an EM pulse disables your car while driving at high speeds? What about a software crash? Stack overflow? Out of memory?
At the time, I wanted to do a bit more research into how the industry planned to accomplish this, but details were limited to nonexistent. I ended my search with the concept of 'look into it more when a product is released' as the information would be available then and promptly put this in the back of my mind to forget.
The other day, I had a chance to listen to Dr. Rod Chapman discuss high reliability programming and SPARK. From the start I realized this would be a marketing speech with little bits of interesting ideas/concepts towards software reliability. He currently works for a company called Praxis Critical Systems, where SPARK was developed and is deployed for use in embedded systems. Typical application uses he threw out were Boeing 777's flight controls, Formula 1 racers, pacemakers, and various other embedded devices.
First about the presentation. Dr. Chapman is from the UK, and as such has an accent that is fairly strong, yet clear enough to understand. His use of local phrases though was lost on many in the audience who are not used to such mannerisms. The speed with which he presented everything was outstandingly fast. He threw out jokes which often took a whole sentence to be registered, or his commentary on how badly that flopped.
His talk started off by discussing what the problems were with programming in general, with verification being the ultimate goal attained by reliable software. If you can verify that something works as stated and only as stated, you should now have a piece of reliable software. The first problem he noted was not related to programming at all, but rather to requirements. More directly to the use of English, a language which by nature is very ambiguous in defining itself. This ambiguity is not good for creating a strict and formal set of guidelines to follow that can be verified. It can be seen through the multiple definitions of a word, dependent upon context use.
With English being a poor choice, what about a modeling language such as UML? As Dr. Chapman put it, the Unified Muddling Language is itself too ambiguous. My best example, by far, is the fact that none of the designers can give you the same explanation to what an aggregate relationship is (see UML Distilled, Fowler). Dr. Chapman went further to state that Object Oriented Programming is heading the way of UML, with the use of polymorphism, and exceptions. I can agree with polymorphism as that is by design, but exceptions are one area of programming that I see needing to be exploited more. Granted they add a new level of complexity to the code (and this was his point), but error condition handling in my mind is essential. His point is that it adds too much complexity, thus reducing the verifiability of the code.
So what methods did he like? Mathematic formulas for one. This follows in line with statements made by Alan Turing in 1948. The concept of "small steps" towards verification, with constant peer review of everything. The concept being that bugs fixed early in development cost far less than those discovered later in the development lifecycle, namely testing. One example of this thinking that I've experienced is the roundtable statement of all programming myths. Where each member of the development team is open to state their mind and beliefs on programming. Towards the end of the session each point is picked apart by everyone, and those that were left were probably safe to assume. This helped greatly in reducing a lot of effort put forth, for example, to building a system where the sharing of too much state data would occur, thus degrading the overall performance.
His biggest idea of providing software verification (and as such, high reliability) was the process of selecting the right tool for the job. Typically this does not include any technology that is the current rage/trend/whatever. For example, Java is not going to be one of the entries on his technology list any time soon for the simple reason of it not being time tested yet. I find that I agree with this line of thinking. When a system becomes not just dependent upon preserving data, but also the lives of many, I would find it much more relaxing to know many of the bugs in the development tools have been fixed. Also the fact that a JVM takes up so much space makes it also impractical for use in an embedded system requiring microsecond response times (i.e. Euro Fighter).
The rest of the talk went on to discuss how SPARK worked around these problems using a few interesting techniques. First, it's based off of Ada, a programming language designed originally for military use, but has since spawned off in a lot of other areas. Second, they use a "design by contract" system, forcing descriptions of how variables are used and altered by functions. Third there is no dynamic memory, relieving the stack overflow, and pointer errors so common today.
His third point brought up an interesting shift in thinking though, which made me question how useful these ideas are for non-embedded systems. When developing for an embedded device, there is typically a maximum amount of memory that can be used, a maximum number of controllers needed to be kept track of, and in general a very limited number of variables to concern yourself with all known at design time. In the case of a web server (for example), limiting the maximum number of connections could work, but it would really make your server pretty non-functional in the long run... or it would at least force random guessing on site popularity on a per-purchase system.
Looking into the current state of development though, I see none of this. The Open Source belief that more eyes can look over the source, hence finding more of the bugs (etc etc) improving reliability. This is great in theory, but how often is this actually done? More importantly how often is it that a system is truly understood by others? How many reliability or verification tests have been run on Open Source software? With the results published and distributed? How about distributed with the software? I'm sure testing has happened, but finding the links to it can be quite frustrating. Are there typically design documentations (beyond source code) that could be followed? Mathematical proofs? After having worked on a few Open Source projects I can honestly say that I don't think so, but I don't know.
Eric Raymond wrote an article entitled "Open Source: Programming as if Quality Mattered" (Google Cache only as I can't find a valid link) in January 1999. Essentially he claimed that high reliability is what will cause Open Source to triaumph over the traditional closed source application. I have yet to see the proof of Open Source providing a higher level of reliability. Much of the time I find my Open Source applications crashing as much as any commercial products.
So how can Open Source be made more reliable? I haven't yet figured out any ideas or suggestions, but I do believe the current idea/belief is broken.
Posted by Dan at April 1, 2003 03:08 PM