Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When I read reports like this on HN I am absolutely floored by the level of detail and quality work put into not only writing them, but by getting to the bottom (well, almost!) of the problem! Fantastic work. How do you do it? My server-side team is ~5 engineers (and 1 devops), and we struggle just to keep up with the incoming feature requests, let alone do work on improving the infrastructure, and even further, let alone have an engineering blog, or do this kind of research or work. Is there a good way to foster the culture that this is something that should be held as important?


It's easy to see the finished product and wish for that kind of time, but it's likely the culmination of several months work of several people just to debug the problem, followed by the writeup sitting in someone's todo list for several weeks, with another few weeks of review.

One way to foster this culture is to write blog posts, and close with a "We're hiring", as Pagerduty has done here.


It was indeed. We poked at this issue off and on for several months by multiple engineers before figuring it out.


In this case a bug that takes out a core company infrastructure component will get rapidly promoted from “nice to have improvement” to “oh shit, we need to fix this” in short order I suspect.


We are lucky that our customers care deeply about our availability, so feature work/improving the infrastructure is the same thing most of the time.

As for fostering the culture, part of it is that we perform post-mortems on all outages and measure the customer impact. Many of our problems do not have textbook answers, so the post-mortems become a cave diving exercise to find the answer. We have to build our own answers, and sometimes, like this one, build our own workarounds.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: