Before I start with this sordid tale of low scalability, I want to thank the guys at Phusion for openly discussing the challenges they’re having with Union Station. They deserve applause and hugs for being transparent with their users.
Today, they wrote about their scaling issues. That article deserves a good read, but I’m going to cherry-pick a few sentences out for closer examination.
Despite this, they do not really seem to be acknowledging the scale of what happened. They still try to put some blame back on users, suggesting that if they had a weak password they might be compromised. Well, that really does not make much of a difference when you expose the entire database table and have way too much faith in the 34 year old encryption algorithm reported to be used to safeguard the data. In truth, they had over a month to find this problem but diagnosed the early warning signs in November improperly, were very obviously breached (and told they were breach by others) on Saturday, and it still took until Monday afternoon to say anything to their user base. And in the meantime their representatives were releasing statements via Twitter up until Saturday evening that were either partially or totally incorrect.
via Forbes – The Real Lessons Of Gawker’s Security Mess. Basically whatever bad/stupid thing Gawker could have done they did including ignoring the problem. Perhaps their lowest moment comes when accounts of their users are posted on an internet forum and their response is well, who cares it’s “just the peasants”.
In perhaps a good example of don’t write it if you wouldn’t want someone to read it, this screenshot from the attackers showed up on thenextweb.com, detailing a conversation from July 22nd between internal Gawker employees noting that usernames and passwords for Gawker users had shown up on 4 chan. In the chat, Gawker’s Hamilton Nolan, after hearing that it is just Gawker users who have been compromised, remarks “oh, well. unimportant”. Gawker’s Richard Lawson wants to know if the breach is limited to “just the peasants?”
Hopefully this is another in the long list of reminders to use secure, safe passwords, perhaps more importantly use a tool like 1Password to generate random passwords for every site you log into.
Yesterday afternoon, during planned maintenance that was not intended to interrupt service, an issue arose that took down a critical database cluster. This brought down our entire network while our engineers worked feverishly to restore these databases and bring your blogs back online.
via Tumblr Staff – Downtime. Tumblr went down for about 22 hours, this is their response. Frankly, Tumblr has had performance issues for a good long time. I thought about switching to Tumblr for hosting my site, I’m glad I pay for my server and such. At least when something breaks it’s typically my fault or I can relatively easy switch to a different provider (like I did recently after a spat of performance issues and extended down time). Possibly though the worst part is how Tumblr kept quiet the whole time. Hardly a peep from the team and for those companies who use Tumblr as a full time replacement have to be looking hard at switching after this.
A query can cause a program to fail because of bugs or various other issues. This means that a single query can take down an entire cluster of machines, which is not good for availability and response times, as it takes quite a while for thousands of machines to recover. Thus the Query of Death. New queries are always coming into the system and when you are always rolling out new software, it’s impossible to completely get rid of the problem.
Convert to local time on display (local being defined by the user looking at the data)
When storing a timezone, you need the name, timestamp and the offset. This is required because governments sometimes change the meanings of their timezones (eg: the US govt changed DST dates), and your application needs to handle things gracefully… eg: The exact timestamp when episodes of LOST showed both before and after DST rules changed.
It took an exorbitant amount of time to configure, and it still didn’t work. The application ran awfully slow, and disk I/O was through the roof. Worse, as a result of the smaller drives and 25% usage requirement, the available disk space was quickly filling up.
via The Certified DBA – The Daily WTF. An extremely amusing article that details several things about being an “expert”; it doesn’t make you right on details marginally connected to your field of study, doesn’t imply that you will practice good judgment, doesn’t mean you should throw common sense out the window, and finally that you shouldn’t re-think your basic assumptions of “how it works” when questioned, especially when presented with evidence to the contrary of your logic.
SQL is a domain specific language (a language designed to do one thing and knock the socks off at doing it) that just accesses databases.
Okay but what is a database? Well a database is essentially a collection of data, yeah I know that doesn’t help much.
To clarify we’ll play a little mental exercise. Imagine you have a piece of software that you want to track purchases of your t-shirts for this store that you have. Well what are you tracking? T-shirts or lets call them products, sales that you make, and your customers. Each of those products, customers and sales that you have are separate collections of data that you want to store. Products are the t-shirts and each of those has some information tied to the particular shirt that you want to store separate from the customer who is different from the sales. For instance a product could have size, sku, price, etc, while a customer could have a listing of their name, number, sales they are associated with and possibly if they owe you money or not. Getting the idea?
So a database is essentially a programmer’s tool to store all this data that’s relatively both fast and easy to maintain, build, store and ensure remains accurate.
There are tons of different types of databases, the most common one currently is called a Relational Database (RDMS). The idea behind this is that there are definable types stored in the database that can relate to other types in the database. So again going back to our example, we would have products, customers and sales as the definable types and customer and sales would relate like we talked about. A RDMS has a database that stores tables that contain these definable types (the products, customer, etc). Each of these tables has entries that contain the information you have provided.
So in our example a product has a sku, price and size. Each of these little bits of information that is associated with the defianble type (the product in this case) is a column. So each database has tables (the types) which have columns (the information associated with a type), the last piece in the puzzle is the actual data stored in the database. After all what good is a database if we have no information? The information or entries in our table are called the rows.
So here is a sample database entry and what it would look like:
Pretty simple once it’s explained. Next I’ll start going through MySQL in particular and some more relating to the language SQL.