Scott Beale reports that many Web 2.0 websites were affected by today’s power outage at 365 Main in San Francisco. While unfortunate, as a systems guy I have to assume things like this are going to happen. They shouldn’t happen, but they can and they will. At the data center level, there should be multiple levels of redundancy that minimize the probability of a power outage. Things such as multiple power circuits, redundant UPSes, and generators are standard. For a complete power outage to occur there should have to be multiple simultaneous system failures. I looked for a statement from 365 Main as to what the problem was, but couldn’t find one.
The system architecture behind WordPress.com and Akismet is designed to take entire data center failures into account. For WordPress.com, we serve live content in real-time from 3 data centers (33% from each data center) and in the event of a data center failure, traffic is automatically re-routed to the 2 remaining data centers. Syncing content in real-time between multiple data centers has not been easy, but at times like this I am sure that we made the right decision.