Redundancy and power outages

25 Jul

Scott Beale reports that many Web 2.0 websites were affected by today’s power outage at 365 Main in San Francisco. While unfortunate, as a systems guy I have to assume things like this are going to happen. They shouldn’t happen, but they can and they will. At the data center level, there should be multiple levels of redundancy that minimize the probability of a power outage. Things such as multiple power circuits, redundant UPSes, and generators are standard. For a complete power outage to occur there should have to be multiple simultaneous system failures. I looked for a statement from 365 Main as to what the problem was, but couldn’t find one.

The system architecture behind WordPress.com and Akismet is designed to take entire data center failures into account. For WordPress.com, we serve live content in real-time from 3 data centers (33% from each data center) and in the event of a data center failure, traffic is automatically re-routed to the 2 remaining data centers. Syncing content in real-time between multiple data centers has not been easy, but at times like this I am sure that we made the right decision.

21 Responses to “Redundancy and power outages”

  1. Tyler Smalley July 25, 2007 at 1:14 am #

    I guess its a coincidence that an article was published today by 365Main (REDENVELOPE REPORTS TWO YEARS OF CONTINUOUS UPTIME AT 365 MAIN’S SAN FRANCISCO DATA CENTER) published. I assume it was before their power outage.

    http://www.365main.com/press_releases/pr_7_24_07_red_envelope.html

  2. Barry July 25, 2007 at 1:36 am #

    Yeah, bad timing I guess…

  3. cerniotechcoop July 25, 2007 at 4:40 am #

    I just “outsourced” Cernio’s blog to Automattic last night. TypePad was one of my alternatives, but I ruled them out for various reasons. Today, I see that I chose wisely. :)

    (Of course, our San Francisco facility stayed online, but our utility power wasn’t affected by the explosion so our datacenter wasn’t really put to the test today. It just looked good by comparison.)

    Graham

  4. Tyler Smalley July 25, 2007 at 5:19 am #

    Looks as if the page has since been removed from their site.

  5. Barry July 25, 2007 at 5:50 am #

    Tyler,

    Yeah, I saw that. Not sure what to think about that.

  6. Kevin July 25, 2007 at 7:24 am #

    365 Main removed the press release that boasted about 2 years of 100% uptime, but it’s available on prnewswire.com.

  7. burtonator July 25, 2007 at 7:50 am #

    Yeah…. This is next on our list. Shouldn’t be too hard. We just need to get a dedicated operations team to support the extra machines.

  8. JT July 25, 2007 at 3:17 pm #

    I work at a small company and when our server went down, we didn’t have backups let alone a disaster plan. Found this article on the net http://www.smartbrief.com/news/aaaa/industryBW-detail.jsp?id=B3A11DDD-AD9B-4399-9682-6E54C82E6757
    and we’ve been backing up regularly. I didn’t even know data recovery was an option. Hopefully it won’t happen again but if it does I will be taking my server to CBL. http://www.cbltech.com

  9. Glenn Fleishman July 25, 2007 at 3:22 pm #

    I read 365 Main’s explanation of what happened, and they imply that in the event of a power outage, they switch instantly to back-up generator power. This is, of course, not possible unless the generator is running, and even then it’s a problem. They have banks of batteries, I have to assume.

    My colo here in Seattle, digital forest, has two redundant and independent battery backups that can run the facility until the diesel (the size of a semi truck) fires up.

    I have nearly 600 days of uptime on one of my colo servers, and that’s only because i had to replace a drive 600 days ago. It had 200 days before that, with downtime only due to a facility move that digital forest made to a swank new location.

    (I have no financial interest in digital forest. I just dig ‘em.)

  10. Jonathan Moore July 25, 2007 at 4:24 pm #

    365 main’s power and colo power

    The power redundancy at 365 main is based on four or six fly weal UPS systems directly connected in to a generator. This set up is really quite nice. It is not clear what happpend and I look forward to finding out. The fact is that not all of the colo went down. I have a computer located there and I experienced 0 down time though this.

    I think the more gennreal problem for colo centers is that power requirements are going up; more CPU density and hotter CPUs. They may not be engineered to handle the load.

    Making a data center stay up is also hard. This is not the first time 365 main went down. ( The last time was due to a failures of a fire sensor that caused a automated shut down of the power ) I have also been collocated in other facilities that have had full, gennreator failures, or partial, power distribution line failure, outages. This stuff is hard. It may be that 365 main is not as good as others, still to be seen, but expecting 100% up time in one data center is a bad gamble.

  11. DataGuy35 July 25, 2007 at 4:48 pm #

    Try the guys at i/o Data Centers. They have redundant power that can go on for days and days. Check out this article: You said it, Rich. These companies gotta have redundant sites. Check this out: http://www.bizjournals.com/phoenix/stories/2007/07/02/story15.html?from_rss=1

  12. Zaiaku July 28, 2007 at 1:51 pm #

    The outages were a problem for alot of people/ Especially if you use typepad and a few other places. It didn’t effect me any though. Thanks goodness. Thinking about getting a dedicated server myself.

  13. rivafauziah August 26, 2007 at 9:24 am #

    Thank |barry|Key Master|

  14. Mitesh September 4, 2007 at 11:15 am #

    i feel they switch instantly to back-up generator power and assume it was before their power outage.

  15. Mitesh Rami October 9, 2007 at 6:04 am #

    i feel without multiple simultaneous system failures – it would not happen.. protective measures are taken so that it never happens – it may happens once in a blue moon — i feel its rare…

  16. Nappy Rash October 8, 2008 at 6:54 am #

    I spend 6 months of the year in the Bahamas and we have weekly outages, its a nightmare but we get to deal with it with candles etc :)

Trackbacks/Pingbacks

  1. Power Outages In San Francisco Bring Down Major Websites | Laughing Squid - July 25, 2007

    [...] sys admin (and former Laughing Squid sys admin) Barry Abrahamson has a great write-up on why data centers should have better power redundancy and what they have done with WordPress to help it survive possible outages like this. Related [...]

  2. Bay Area Web 2.0 has a Near Death Experience Today « Frank for San Leandro - July 25, 2007

    [...] http://barry.wordpress.com/2007/07/25/power-redundancy/ [...]

  3. Transnets » Blog Archive » SF: le web en panne - July 25, 2007

    [...] geeks du coin ont du mal à accepter (comme Barry de WordPress) qu’un centre de cette nature n’ait pas pu déclencher les multiples [...]

  4. Top Posts « WordPress.com - July 25, 2007

    [...] Redundancy and power outages Scott Beale reports that many Web 2.0 websites were affected by today’s power outage at 365 Main in San […] [...]

  5. Le nouveau datacenter de Wordpress.com - February 17, 2009

    [...] plus de sécurité, les blogs de WordPress sont hébergés dans trois datacenters différents, ils ont intégrés cette problématique au coeur de leur architecture car historiquement ils ne [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 854 other followers

%d bloggers like this: