06 Mar

Brent Ozar – RAID 0 SATA with 2 Drives: It’s Web Scale!

Before I start with this sordid tale of low scalability, I want to thank the guys at Phusion for openly discussing the challenges they’re having with Union Station. They deserve applause and hugs for being transparent with their users.

Today, they wrote about their scaling issues. That article deserves a good read, but I’m going to cherry-pick a few sentences out for closer examination.

via Brent Ozar – RAID 0 SATA with 2 Drives: It’s Web Scale!. Good discussion on scaling you database servers.

06 Dec

Tumblr Staff – Downtime

Yesterday afternoon, during planned maintenance that was not intended to interrupt service, an issue arose that took down a critical database cluster. This brought down our entire network while our engineers worked feverishly to restore these databases and bring your blogs back online.

via Tumblr Staff – Downtime. Tumblr went down for about 22 hours, this is their response. Frankly, Tumblr has had performance issues for a good long time. I thought about switching to Tumblr for hosting my site, I’m glad I pay for my server and such. At least when something breaks it’s typically my fault or I can relatively easy switch to a different provider (like I did recently after a spat of performance issues and extended down time). Possibly though the worst part is how Tumblr kept quiet the whole time. Hardly a peep from the team and for those companies who use Tumblr as a full time replacement have to be looking hard at switching after this.

05 Dec

High Scalability – Strategy: Google Sends Canary Requests into the Data Mine

A query can cause a program to fail because of bugs or various other issues. This means that a single query can take down an entire cluster of machines, which is not good for availability and response times, as it takes quite a while for thousands of machines to recover. Thus the Query of Death. New queries are always coming into the system and when you are always rolling out new software, it’s impossible to completely get rid of the problem.

via High Scalability – Strategy: Google Sends Canary Requests into the Data Mine. Google of course has a really solid solution to solving the Query of Death issue.