Redundancy and power outages

Scott Beale reports that many Web 2.0 websites were affected by today’s power outage at 365 Main in San Francisco. While unfortunate, as a systems guy I have to assume things like this are going to happen. They shouldn’t happen, but they can and they will. At the data center level, there should be multiple levels of redundancy that minimize the probability of a power outage. Things such as multiple power circuits, redundant UPSes, and generators are standard. For a complete power outage to occur there should have to be multiple simultaneous system failures. I looked for a statement from 365 Main as to what the problem was, but couldn’t find one.

The system architecture behind WordPress.com and Akismet is designed to take entire data center failures into account. For WordPress.com, we serve live content in real-time from 3 data centers (33% from each data center) and in the event of a data center failure, traffic is automatically re-routed to the 2 remaining data centers. Syncing content in real-time between multiple data centers has not been easy, but at times like this I am sure that we made the right decision.

21 responses to “Redundancy and power outages”

  1. […] sys admin (and former Laughing Squid sys admin) Barry Abrahamson has a great write-up on why data centers should have better power redundancy and what they have done with WordPress to help it survive possible outages like this. Related […]

  2. I guess its a coincidence that an article was published today by 365Main (REDENVELOPE REPORTS TWO YEARS OF CONTINUOUS UPTIME AT 365 MAIN’S SAN FRANCISCO DATA CENTER) published. I assume it was before their power outage.

    http://www.365main.com/press_releases/pr_7_24_07_red_envelope.html

  3. Yeah, bad timing I guess…

  4. I just “outsourced” Cernio’s blog to Automattic last night. TypePad was one of my alternatives, but I ruled them out for various reasons. Today, I see that I chose wisely. 🙂

    (Of course, our San Francisco facility stayed online, but our utility power wasn’t affected by the explosion so our datacenter wasn’t really put to the test today. It just looked good by comparison.)

    Graham

  5. Looks as if the page has since been removed from their site.

  6. Tyler,

    Yeah, I saw that. Not sure what to think about that.

  7. 365 Main removed the press release that boasted about 2 years of 100% uptime, but it’s available on prnewswire.com.

  8. […] geeks du coin ont du mal à accepter (comme Barry de WordPress) qu’un centre de cette nature n’ait pas pu déclencher les multiples […]

  9. Yeah…. This is next on our list. Shouldn’t be too hard. We just need to get a dedicated operations team to support the extra machines.

  10. I work at a small company and when our server went down, we didn’t have backups let alone a disaster plan. Found this article on the net http://www.smartbrief.com/news/aaaa/industryBW-detail.jsp?id=B3A11DDD-AD9B-4399-9682-6E54C82E6757
    and we’ve been backing up regularly. I didn’t even know data recovery was an option. Hopefully it won’t happen again but if it does I will be taking my server to CBL. http://www.cbltech.com

  11. I read 365 Main’s explanation of what happened, and they imply that in the event of a power outage, they switch instantly to back-up generator power. This is, of course, not possible unless the generator is running, and even then it’s a problem. They have banks of batteries, I have to assume.

    My colo here in Seattle, digital forest, has two redundant and independent battery backups that can run the facility until the diesel (the size of a semi truck) fires up.

    I have nearly 600 days of uptime on one of my colo servers, and that’s only because i had to replace a drive 600 days ago. It had 200 days before that, with downtime only due to a facility move that digital forest made to a swank new location.

    (I have no financial interest in digital forest. I just dig ’em.)

  12. 365 main’s power and colo power

    The power redundancy at 365 main is based on four or six fly weal UPS systems directly connected in to a generator. This set up is really quite nice. It is not clear what happpend and I look forward to finding out. The fact is that not all of the colo went down. I have a computer located there and I experienced 0 down time though this.

    I think the more gennreal problem for colo centers is that power requirements are going up; more CPU density and hotter CPUs. They may not be engineered to handle the load.

    Making a data center stay up is also hard. This is not the first time 365 main went down. ( The last time was due to a failures of a fire sensor that caused a automated shut down of the power ) I have also been collocated in other facilities that have had full, gennreator failures, or partial, power distribution line failure, outages. This stuff is hard. It may be that 365 main is not as good as others, still to be seen, but expecting 100% up time in one data center is a bad gamble.

  13. Try the guys at i/o Data Centers. They have redundant power that can go on for days and days. Check out this article: You said it, Rich. These companies gotta have redundant sites. Check this out: http://www.bizjournals.com/phoenix/stories/2007/07/02/story15.html?from_rss=1

  14. […] Redundancy and power outages Scott Beale reports that many Web 2.0 websites were affected by today’s power outage at 365 Main in San […] […]

  15. The outages were a problem for alot of people/ Especially if you use typepad and a few other places. It didn’t effect me any though. Thanks goodness. Thinking about getting a dedicated server myself.

  16. Thank |barry|Key Master|

  17. i feel they switch instantly to back-up generator power and assume it was before their power outage.

  18. i feel without multiple simultaneous system failures – it would not happen.. protective measures are taken so that it never happens – it may happens once in a blue moon — i feel its rare…

  19. I spend 6 months of the year in the Bahamas and we have weekly outages, its a nightmare but we get to deal with it with candles etc 🙂

  20. […] plus de sécurité, les blogs de WordPress sont hébergés dans trois datacenters différents, ils ont intégrés cette problématique au coeur de leur architecture car historiquement ils ne […]

Leave a comment

Blog at WordPress.com.