Extreme Makeover: Database or MySQL@YouTube

Arguably one of the most interesting keynotes (and technical to boot!), Paul Tuckfield not only entertained us in his 40 minute keynote, he also did so outside when the keynotes ended.

Just the DBA at PayPal, just the DBA at YouTube. Only 3 DBAs at YouTube that make it all happen. Only a MySQLer for ~8 months (Oracle for ~15 years). So guess PayPal is a Oracle shop.

MySQL is one (important) piece of the scalability picture.

Technologies: Python, Memcache, MySQL replication. Praises Python, a lot (its much quicker, than C++, to implement goodness).

Click tracking on a separate MyISAM site. But Read/write on InnoDB, using replication. Far more reads than writes at YouTube

4x2ghz Opteron core, 16GB ram, 12x10krpm scsi – constantly crashing, replication saved them

5.0 “mystery cache hits” – when you export and import (mysqldump and load back into 5.0), you boost your performance, rather than if you upgrade in place, because there’s a compact row format. They moved from 4.1 -> 5.0.

Cache is king. Writes, cache by RAID controller rather than the OS. Only the DB should cache reads (not raid, not linux buffer cache)

Software striping atop hardware array.

The oracle caching algorithm – in academia. Not something I’ve heard much about, and definitely need to look into it further.

The talk was too long, but would make a most interesting read, and an actual presentation rather than a keynote. I hope his presentation makes it online, sometime soon.

Note-to-entrepreneurs: If building a web business, and you want to be acquired by Google, its quite largely possible that their due diligence includes “python” compatibility. Most of their released tools, are all python-related or based. Oh, and make sure you use commodity hardware (in fact, do that if you want to get VC funded, even.)

Update: A little note on the oracle algorithm. If anyone has papers, and more credible links, please do drop me a line.

Technorati Tags: , , , , , ,


i