Oh boy, the fireworks fiasco at the Space Needle last night reminded me of one of my recent projects – “the horror”. In case you were one of those folks who slept through the big non-event last night, there was a tiny glitch with the fancy computerized pyrotechnics display on our local phallic symbol, er…tower of civic pride (just what kind of building is that thing?).
Anyway, the show sort of sputtered to a start, lasted for about 30 seconds, and then as the music swelled dramatically…nothing. The fireworks stopped completely. It was hilarious to watch the confusion on the faces of the television hosts for the show. No one had any idea what the problem was and the band just played on. About a minute later the fireworks sputtered to life again – now badly out of synch with the music. They continued for another minute or so – and died again!
At this point I was rolling on the couch at home laughing out loud. At first I thought the situation was funny, but then I realized I was laughing because I was glad it wasn’t me who was responsible for putting on that fireworks display. You know, kind of like when you see someone whack their funny bone. It hurts like hell, but you laugh because you’re glad it isn’t you.
Or maybe I was just losing my grip on reality…
I can only imagine the stress and panic of the technicians as they frantically tried to understand what the problem was and then figure out what to do about it. I’ve had to do product demos in high pressure situations before. And when they go bad (and I do mean bad), I’m sure the feeling is similar.
I heard this morning that the company responsible for putting on the fireworks display had successfully done a complete dry run earlier in the day without a problem. I would have expected that much at the very least. But in a situation that is as mission critical as theirs is, a simple dry run is not enough. They need to have some backup plans, some fail safes. Something other than just punching the 1500 ignition buttons in a panic.
Here are a few suggestions:
- Create a failover system. If it becomes apparent that the data on system ‘A’ is corrupt, then switch to system ‘B’. This could be a very sophisticated technical solution, or somebody could just have a backup laptop with the same software installed and a backup copy of the data. For what is a $100,000 job, the purchase of a $1000 backup machine seems like it would be well worth the expense.
- Create backups. OK, so the data is corrupted. You should be able to simply restore a backup and get up and running again. Where were the backups? This is just a common sense practice. Maybe they did have backups, but somehow I doubt it. Next time buy a USB drive – they’re cheap.
- Use a Mac instead.
Don’t get lulled into a false sense of complacency. It struck me as interesting to note that this is the same company that has been doing the same display for the last 14 years. I imagine the process has become fairly routine for them by now. If they were constantly trying to improve their process and eliminate problems, they probably wouldn’t have these sorts of problems. Instead, they were very likely just doing what worked the last time. At least that seems reasonable – I could see myself getting caught in that trap.
We do it often enough on Agile projects. Teams will get into a rhythm and just do the same thing each sprint without really doing anything to really inspect and improve their process. You know what they say – if it ain’t broke, don’t fix it. Of course, if your process, whether it is pyrotechnics or software, looks the same the fifth time as it did the first time, then you aren’t really improving your process (or product) at all.
So as I watched the fireworks show trip and sputter along last night, I looked on with a sense of both humor and dread. Part of me was indulging that, “Wouldn’t it be funny if…” notion. But there was the other side of me that was thinking, “Those poor bastards…”
Happy New Year folks. I hope the rest of the year goes a little smoother…