SeatGeek Blog: What An Earthquake Does To Page Load Time

dorianj · on Aug 23, 2011

Also interesting: a Sun engineer uses dtrace to show HDD latency spiking after he yells at his disks

http://www.youtube.com/watch?v=tDacjrSCeq4

nkassis · on Aug 23, 2011

probably the same thing happened here. Vibration will cause latency in reading disk. If the hd platter are jumping around the needle will have a hard time reading it. That probably explains the spike. Time to buy some SSDs ;p

tseabrooks · on Aug 23, 2011

Tried to upvote.. and missed the arrow landing squarely on the down vote... I wish I could recast my vote after the fact. Sorry.

brk · on Aug 23, 2011

Disappointedly lacking on details.

Why did page load time go up during the quake? Was there a fiber fault that took a few seconds to be routed around? Did the vibrations cause HDDs to temporarily suspend?

This could have some interesting data behind it, but as it is the article doesn't even have conclusive proof that the earthquake did cause this outage.

josegonzalez · on Aug 23, 2011

As far as the reason, you'd have to ask who works on AWS. For obvious reasons, we don't have direct access to the Amazon datacenter.

As far as conclusive proof is concerned, yes, we can't guarantee there wasn't a gravitational singularity that affected response time, but it's very likely that this was the case.

egiva · on Aug 23, 2011

Yeah, latency is most likely due to the vibrations affecting the various rack components, as commented here. Actually, earthquake-proofing datacenters is a big business in places like the West-Coast USA and Japan: http://www.datacenterknowledge.com/archives/2007/07/17/earth...

pistoriusp · on Aug 23, 2011

I remember watching a video about how vibrations can have a negative impact on hard drive latency (Video is at the bottom.):

http://blogs.oracle.com/brendan/entry/unusual_disk_latency

rhizome · on Aug 23, 2011

So, "AWS happened to it," then?

BCM43 · on Aug 23, 2011

"Over here at SeatGeek, we were excitedly discussing the tremor when Mike, our trusty sysadmin, realized that our Amazon AWS servers were all in Virginia, right near the epicenter. Did it impact the service at all?"

rhizome · on Aug 23, 2011

I was reflecting on my parent's comment that the actual reasons were opaque due to SG's use of AWS.

jmmcd · on Aug 23, 2011

Presumably the earthquake caused a spike in social network usage, microblogging of various popular types, and reload-mashing on cnn.com and similar. If any of those is hosted on AWS then they might steal some cycles from other AWS users.

eli · on Aug 23, 2011

Purely anecdotal, but here in downtown DC, cell networks were fine during and immediately after the quake, but were completely overwhelmed 5-10 minutes later by everyone pulling out their phone at once.

heliodor · on Aug 23, 2011

Another thing that comes to mind is that some computers have accelerometers in them that stop the harddisks if the machine experiences sudden acceleration. If AWS has the same system in place, that might affect their servers' responsiveness.

acslater00 · on Aug 23, 2011

Well, I have no idea what actually happened down there, but the timing would be a pretty incredible coincidence.

d2 · on Aug 23, 2011

"Earthquakes make Web Servers sad". Dude. What the FUCK?

josegonzalez · on Aug 24, 2011

Yeah dude! These things happen.

blantonl · on Aug 23, 2011

Here at RadioReference.com had a MySQL Master server which is hosted on AWS East in the N. Virginia data center inexplicably crash on us right after the earthquake. The server uses a RAID-0 Stripe across 4 EBS instances and has been running for over a year without a reboot.

And, we were featured on CNN live right after the quake as a source for breaking news information.

We're scaled to handle a traffic floods because we get them occasionally when something big happens public safety wise, but I'm really wondering whether or not this crash was due to a huge influx of people or some hardware anomaly during the quake (frozen disk, network problem etc)

A reboot of the server and an INNODB recovery fixed the issue, and all is fine now.

tseabrooks · on Aug 23, 2011

I was imagining it was the sudden violent shaking of the HDD. Thus the "lesson learned" that "servers don't like earthquakes"

jordanmessina · on Aug 23, 2011

What does an earthquake do to ticket sales in the east coast? Now THAT would be interesting.

techiferous · on Aug 24, 2011

"our Amazon AWS servers were all in Virginia, right near the epicenter."

Very little is near the epicenter. The epicenter was in one of the least populated areas of the entire state. Here's a map:

http://earthquake.usgs.gov/earthquakes/shakemap/global/shake...

There are probably less than 50,000 people living inside the yellow circle and there are no cities.

Amazon's data centers are in northern Virginia. This earthquake did not happen in northern Virginia, it happened in central Virginia, between Richmond and Charlottesville, about 60-90 miles away from northern Virginia.

JoeAltmaier · on Aug 24, 2011

For some definitions of 'near'. In a country 3000 miles wide, 60 miles is a fine definition of 'right near'

techiferous · on Aug 24, 2011

I know, but it doesn't sound right if you live in Virginia. Just like saying "Los Angeles is near San Francisco" probably only sounds right to east coasters. And if you look at the shake map, it makes a big difference whether you are within 20 miles of the epicenter or 100 miles.

bradleyland · on Aug 23, 2011

I'm more interested in their real-time monitoring board setup.

kessler · on Aug 23, 2011

From the post:

"For about six months, we’ve been using a combination of StatsD, Graphite, and GeckoBoard to power a real-time dashboard of some of our system stats."

bradleyland · on Aug 23, 2011

I saw that much, but I'd love to see a write-up on their implementation. We primarily use Munin with a couple of custom plugins. It's fine for the sysadmin side, but we were thinking of pushing some app data stats to a customer facing interface. Tools like GeckoBoard look much better than Munin graphs.

ethank · on Aug 24, 2011

I'd love to know how they connected Graphite to Geckoboard.

singlow · on Aug 24, 2011

Would be useful to know if there was a corresponding traffic spike or if the response time spiked on typical usage.

Shenglong · on Aug 23, 2011

Looks like my server bandwidth graph during DDoS attacks.

DPS47 · on Aug 23, 2011

Pretty interesting article.