As I mentioned in the previous blog entry, Michael Nygard’s Release It! gives some advices how to deploy and run systems. But the most important point is that is changes the way people think about system development. The gap between developing a piece of code that runs in a lab and having a complete system that can run for days (weeks, months, years) without glitches and can sustain all kinds of extreme and often unpredictable conditions is huge. The more important and critical the system is, the more difficult and costly it is to ensure those reliability and availability parameters.
The book shows some interesting what-could-go-wrong scenarios, and outlines some solutions and patterns how to address them, but the main value is that it illustrates how apparently unrelated things could just come together and bring the whole system down. Check the recent Attack of Self-Denial post on Michael Nygard’s blog.