Colin Charles Agenda

Federation at Flickr: A tour of the Flickr Architecture

I’ve always been a big Flickr fan, and user, and love them even more now that their 2GB limit is gone. So this was a most interesting talk, and I think photographers and Flickr users alike will find it interesting what’s behind Flickr. Dathan also has a very interesting blog.

Federation at Flickr: Doing Billions of Queries Per Day, Dathan Pattishall

Database guy – query, performance, troubleshooting, application building, etc. Previously worked at Friendster in 2003.

Master-Slave lag – unable to keep up with the demand that Flickr was having. So they have multiple masters, the had multiple single points of failure (SPOF).

Everything had to be real time. Write intensive, so more than 1 master needed. No more slave lag. Serve pages fast, with many queries. Get rid of the SPOFs and be redundant.

Need ability to make live maintenance, repair data, and so forth, without taking the site down.

Federation Key Components:

Shards:

Global Ring:

Owner_id -> SHARD-ID
Photo_id -> Owner_id
Group_id -> SHARD_ID

It makes sense to hash by user, but then the notion of group. Group can be owned by many users; plus many users are contributing to the group.

This is cached in MEMCache, for about 1/2 hour.

Ticket servers – MySQL 5, low grade machines, circular replication. This is the ID generator. Globally unique ID that is unique. A lot of unique IDs are associated to each row.

Clicking a Favourite:

Can ask question from any shard, and recover data. Its absolutely redundant.

To get rid of replication lag…

Every users reads and writes are kept in one shard. Notion of replication lag is gone.

Each server in shard is 50% loaded. Shut down 1/2 the servers in each shard. So 1 server in the shard can take the full load if a server of that shard is down or in maintenance mode.

Periods of time when traffic spikes, they break the 50% rule though. They do something like 6,000-7,000 queries per second. Now, its designed for at most 4,000 queries per second to keep it at 50% load.

Average queries per page, are 27-35 SQL statements. Favourites counts are real time. API access to the database is all real time. Achieved the real time requirements without any disadvantages.

Over 36,000 queries per second – running within capacity threshold. Burst of traffic, double 36K/qps. More than 4 billion queries per day.

Each Shard holds 400K+ users data.

“So you’re a single point of failure” — a member of the audience. He’s the main/only DBA at the moment.

Search:

Hardware: EMT64 w/RHEL4, 16GB RAM, 6-disk 15K RPM RAID-10. Data size is at 12 TB of user metadata (these are not photos, this is just innodb ibdata files – the photos are a lot larger). 2U boxes. Each shard has~120GB of data.

“Able to serve the traffic: Priceless”

Things MySQL can do better: Multi-master replication, thread bug in INNODB for 4.1 (its already in 5.0), optimization for OR queries, bitwise, sets.

MySQL 5.0 is used in shards that have aux data, like logging, and also for generating the ticket ID on the ticket servers (no replication between them, single row per server). 64-bit ID, but when exposing to public they use a 32-bit ID to keep the URLs short.

Backup procedure: ibbackup on a cron job, that runs across various shards at different times. Hotbackup to a spare. Snapshots are taken every night across the entire cluster of databases.

Aggregating the data: Very fast, because its a process per shard. Stick it into a table, or recover data from another copy from other users shards.

max_connections = 400 connections per shard, or 800 connections per server & shard. Plenty of capacity and connections. Thread cache is set to 45, because you don’t have more than 45 users having simultaneous activity.

Schema updates? Shut down half the servers, force to other shards, push update, repeat.

32 concurrent InnoDB threads on each shard. 5% of 12TB in memory for pool, so thats about 13GB.

Linux allocates memory per thread, so you can get an OS ERROR saying you’ve run out of memory.

Photos are stored on the filer. Upon upload, it processes the photos, gives you different sizes, then its complete. Metadata and points to the filers, are stored in the database.

Technorati Tags: , , , , , ,