Fortunately, the root cause was simple to correct. Unfortunately, we only had one layer of monitoring in place, and that monitoring was local to the machine and relied on the machine being available. Since the machine itself was affected, those alerts did not go out, and we found out about the problem much later than we would have preferred.
We now have an external monitoring service that will alert us immediately whenever there is any downtime in the future so that it can be kept to an absolute minimum. While it's almost impossible to maintain 100% uptime, we certainly don't see last night's outage as being an acceptable occurrence. We are incredibly sorry about the inconvenience, and we definitely don't take it lightly.
We built a simple bug and issue tracker named Sifter and we blog about it when we're not working on it. We think it’s a great way to get feedback and keep everyone updated on our status.
Grab our feedWe'll only send emails for significant product announcements, and those happen every couple of months at most. Of course, we won't give away or sell your e-mail address either.