Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
SeatGeek Blog: What An Earthquake Does To Page Load Time (seatgeek.com)
123 points by acslater00 on Aug 23, 2011 | hide | past | favorite | 29 comments


Also interesting: a Sun engineer uses dtrace to show HDD latency spiking after he yells at his disks

http://www.youtube.com/watch?v=tDacjrSCeq4


probably the same thing happened here. Vibration will cause latency in reading disk. If the hd platter are jumping around the needle will have a hard time reading it. That probably explains the spike. Time to buy some SSDs ;p


Tried to upvote.. and missed the arrow landing squarely on the down vote... I wish I could recast my vote after the fact. Sorry.


Disappointedly lacking on details.

Why did page load time go up during the quake? Was there a fiber fault that took a few seconds to be routed around? Did the vibrations cause HDDs to temporarily suspend?

This could have some interesting data behind it, but as it is the article doesn't even have conclusive proof that the earthquake did cause this outage.


As far as the reason, you'd have to ask who works on AWS. For obvious reasons, we don't have direct access to the Amazon datacenter.

As far as conclusive proof is concerned, yes, we can't guarantee there wasn't a gravitational singularity that affected response time, but it's very likely that this was the case.


Yeah, latency is most likely due to the vibrations affecting the various rack components, as commented here. Actually, earthquake-proofing datacenters is a big business in places like the West-Coast USA and Japan: http://www.datacenterknowledge.com/archives/2007/07/17/earth...


I remember watching a video about how vibrations can have a negative impact on hard drive latency (Video is at the bottom.):

http://blogs.oracle.com/brendan/entry/unusual_disk_latency


So, "AWS happened to it," then?


"Over here at SeatGeek, we were excitedly discussing the tremor when Mike, our trusty sysadmin, realized that our Amazon AWS servers were all in Virginia, right near the epicenter. Did it impact the service at all?"


I was reflecting on my parent's comment that the actual reasons were opaque due to SG's use of AWS.


Presumably the earthquake caused a spike in social network usage, microblogging of various popular types, and reload-mashing on cnn.com and similar. If any of those is hosted on AWS then they might steal some cycles from other AWS users.


Purely anecdotal, but here in downtown DC, cell networks were fine during and immediately after the quake, but were completely overwhelmed 5-10 minutes later by everyone pulling out their phone at once.


Another thing that comes to mind is that some computers have accelerometers in them that stop the harddisks if the machine experiences sudden acceleration. If AWS has the same system in place, that might affect their servers' responsiveness.


Well, I have no idea what actually happened down there, but the timing would be a pretty incredible coincidence.


"Earthquakes make Web Servers sad". Dude. What the FUCK?


Yeah dude! These things happen.


Here at RadioReference.com had a MySQL Master server which is hosted on AWS East in the N. Virginia data center inexplicably crash on us right after the earthquake. The server uses a RAID-0 Stripe across 4 EBS instances and has been running for over a year without a reboot.

And, we were featured on CNN live right after the quake as a source for breaking news information.

We're scaled to handle a traffic floods because we get them occasionally when something big happens public safety wise, but I'm really wondering whether or not this crash was due to a huge influx of people or some hardware anomaly during the quake (frozen disk, network problem etc)

A reboot of the server and an INNODB recovery fixed the issue, and all is fine now.


I was imagining it was the sudden violent shaking of the HDD. Thus the "lesson learned" that "servers don't like earthquakes"


What does an earthquake do to ticket sales in the east coast? Now THAT would be interesting.


"our Amazon AWS servers were all in Virginia, right near the epicenter."

Very little is near the epicenter. The epicenter was in one of the least populated areas of the entire state. Here's a map:

http://earthquake.usgs.gov/earthquakes/shakemap/global/shake...

There are probably less than 50,000 people living inside the yellow circle and there are no cities.

Amazon's data centers are in northern Virginia. This earthquake did not happen in northern Virginia, it happened in central Virginia, between Richmond and Charlottesville, about 60-90 miles away from northern Virginia.


For some definitions of 'near'. In a country 3000 miles wide, 60 miles is a fine definition of 'right near'


I know, but it doesn't sound right if you live in Virginia. Just like saying "Los Angeles is near San Francisco" probably only sounds right to east coasters. And if you look at the shake map, it makes a big difference whether you are within 20 miles of the epicenter or 100 miles.


I'm more interested in their real-time monitoring board setup.


From the post:

"For about six months, we’ve been using a combination of StatsD, Graphite, and GeckoBoard to power a real-time dashboard of some of our system stats."


I saw that much, but I'd love to see a write-up on their implementation. We primarily use Munin with a couple of custom plugins. It's fine for the sysadmin side, but we were thinking of pushing some app data stats to a customer facing interface. Tools like GeckoBoard look much better than Munin graphs.


I'd love to know how they connected Graphite to Geckoboard.


Would be useful to know if there was a corresponding traffic spike or if the response time spiked on typical usage.


Looks like my server bandwidth graph during DDoS attacks.


Pretty interesting article.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: