Programming is hard by Stephan Schmidt

Unit Testing, TDD and the Shuttle Disaster

I was reading the Feynman report about the Shuttle disaster: “Appendix F - Personal observations on the reliability of the Shuttle” and I was freaked out by the similarities of military engine development and bottom-up, test driven development. There is a small passage in the report about how military engines are built:

The usual way that such engines are designed (for military or civilian aircraft) may be called the component system, or bottom-up design. First it is necessary to thoroughly understand the properties and limitations of the materials to be used (for turbine blades, for example), and tests are begun in experimental rigs to determine those. With this knowledge larger component parts (such as bearings) are designed and tested individually. As deficiencies and design errors are noted they are corrected and verified with further testing. Since one tests only parts at a time these tests and modifications are not overly expensive. Finally one works up to the final design of the entire engine, to the necessary specifications. There is a good chance, by this time that the engine will generally succeed, or that any failures are easily isolated and analyzed because the failure modes, limitations of materials, etc., are so well understood. There is a very good chance that the modifications to the engine to get around the final difficulties are not very hard to make, for most of the serious problems have already been discovered and dealt with in the earlier, less expensive, stages of the process.

This sounds a lot like Unit Testing to me. Writing small parts of an application, testing the part, then integrating it. And even if this is not TDD (not possible with hardware?), then it sound similar, contrary to writing all code first and writing the tests last.

Compare this approach with the way NASA desigened the Shuttle Main Engine:

The Space Shuttle Main Engine was handled in a different manner, top down, we might say. The engine was designed and put together all at once with relatively little detailed preliminary study of the material and components. Then when troubles are found in the bearings, turbine blades, coolant pipes, etc., it is more expensive and difficult to discover the causes and make changes. For example, cracks have been found in the turbine blades of the high pressure oxygen turbopump. Are they caused by flaws in the material, the effect of the oxygen atmosphere on the properties of the material, the thermal stresses of startup or shutdown, the vibration and stresses of steady running, or mainly at some resonance at certain speeds, etc.? How long can we run from crack initiation to crack failure, and how does this depend on power level? Using the completed engine as a test bed to resolve such questions is extremely expensive. One does not wish to lose an entire engine in order to find out where and how failure occurs. Yet, an accurate knowledge of this information is essential to acquire a confidence in the engine reliability in use. Without detailed understanding, confidence can not be attained.

A further disadvantage of the top-down method is that, if an understanding of a fault is obtained, a simple fix, such as a new shape for the turbine housing, may be impossible to implement without a redesign of the entire engine.”

This sounds a lot like traditional, up front software development. With the same problems. When errors occure, “are they caused by flaws in the material [...]“ or where do they come from? It’s hard to decide which component is the root cause of an error in a complex system. Astonishingly Feynman sees another corresponding disadvantage with top-down versus bottom-up. Problems that arise may be too big to fix in a conventional way, the engine architecture needs to be redesigned. This happens with software too. If you do too much up front architecture, you may end with an architecture which doesn’t fit your problems (usually this means a long and difficult rewrite - something you should only do as a last resort). Going bottom up, best with Test Driven Development (TDD), you can’t end with a wrong architecture (with merciless small refactorings and path adjustments on the way of course). And usually you’re flexible enough with an architecture which was driven by unit testing to react to all changes on your way (scalability, performance etc.)

The engine development success and the shuttle problems compared show convincingly how developing in small steps with components and merciless testing results in easy to debug components with a low error rate. You should test more.

Thanks for listening. As ever, please do share your thoughts and additional tips in the comments below, or on your own blog (I have trackbacks enabled).

If you liked this post, then share it!
Do you want to tell others about this article? Use the social bookmark icons to submit this artice to the service of your choice. Thanks.

Get news on updates

If you did like this article but you don't want to get new articles with your RSS reader, you can follow me on Twitter or subscribe to new posts with your email:

 
About the author: Stephan Schmidt is currently a team manager at ImmobilienScout24 in Berlin. Stephan has been working as a head of development and CTO. He has used a lot of different technologies in the last 20 years including Java, Rails and Python. Stephans main field of interest is maintainablity and productivity in software development. Want to know more? All views are only his own.

Comments

That’s a really excellent, thought-provoking, analogy. Thanks for sharing it.

Especially interesting (to me), is the issue of the negative implications of ending up “with an architecture which doesn’t fit your problems”. I used to get that a lot with my programs, before I started practicing TDD.

Back then, I used to make my design bulletproof (http://javadots.blogspot.com/2008/11/bullet-proof.html) in order to keep me away from mistakes. This resulted in rigidness, inflexibility, lack of interoperability, and downright degraded productivity.

TDD (an even unit-testing for that matter) made this problems go away.

stephan

@Itay: The Feynman article was thought provoking for me too, and then a flash hit me: This is like unit testing. I was electrified.

@Stephan: Right! I went, more or less, through the same feelings as I read your post.

Stephan,

Great post. This isn’t limited to the software development and astrophysics worlds! ;-)

Trackback

Jason M.

“CALVIN: How do they know the load limit on bridges, Dad?
DAD: They drive bigger and bigger trucks over the bridge until it breaks. Then they weigh the last truck and rebuild the bridge.”

stephan

@Matt: Thanks

@Jason: I can remember that one :-)

Maik

Good post. I believe most “modern” software design methods are in fact very old and understood - just not in the field of software development. I remember a similar moment not too long ago: Someone commented in a well known Scrum blog that all of this stuff (small empowered teams, quick daily meetings, backlog with well defined tasks, estimation and commitment) was exactly what his father told him how engine maintenance was done at an Air Force Base 40 years ago (and probably is today, still). There are certain patterns that just emerge when dealing with complex systems. It’s ironic we as software developers are spending so much time reinventing the wheel (process-wise) when that is exactly what we are trying to avoid in our every day (programming related) work.
Oh, btw: Did you know the whole “design patterns” concept was originally taken from an architecture book? Interesting how much can be learned from other disciplines if we just keep an open mind. We are not as special as we might like to think ;)

nl0tz

I’m afraid that wouldn’t have helped the 1999 Mars Polar Lander: It crashed during landing because a rocket engine shut off prematurely (130 feet above ground). As it turned out, different subsystems were using different units of measurement (feet vs. meter).

The moral of the story: You should communicate more.

stephan

@Maik: Thanks for the insight. “Oh, btw: Did you know the whole “design patterns” concept was originally taken from an architecture book?” Yes I did and found that interesting, but it didn’t really catch on with architects, did it? And it seems thta the pattern hype was too big too (I’m not a pattern hater like some in the blogosphere though)

@nl0tz: Unit testing shouldn’t free you from integration testing.

Excellent post Stephen. Over the years I have come across numerous references to engineering/architecture overlapping with software development methodologies; UML, Design Patterns and now unit testing. I’m sure NASA have learnt from this and will use a bottom-up approach to the new Shuttle replacement program.

Leave a Reply