How Facebook serves pictures

I caught Facebook – Needle in a Haystack: Efficient Storage of Billions of Photos on Flowgram. First up, I’m not a big fan of Flowgrams – the format is sensible, slide and voice, is excellent, but the delivery in a web browser isn’t optimal… make downloadable videos!

The talk however, was excellent. Do watch it, and learn a bit more about Facebook’s infrastructure. Anyway, some notes I took from the talk:

  • “We’re one of the largest MySQL installations in the world”
  • Use memcache – “We have memcache because databases aren’t fast” (later on in the questions)
  • Separate team focusing on APE (Apache, PHP and Extensions that they work on)
  • 6.5 billion total images, 4-5 sizes stored for each, so 30 billion files, of about 540TB total… During peak? 475,000 images served per second, and growing by 100 million uploads per week
  • Images are usually pulled from a Content Delivery Network (CDN), so it reduces the request rate on their servers
  • They use NetApp Storage, but basically their upload servers speak NFS to write to NetApp.
  • Cachr (evhttp based) and File Handle Cache use memcache as a backing store… FHC is based on lighttpd!
  • Makes use of a “haystack” – user-level abstraction, storing a separate index file that has more efficient metadata (to reduce disk seeks – 1 disk seek or less for any workload). Pretty deep in the discussion of the haystack server architecture, also evhttp-based
  • MySQL use? Very few transactions, very few joins
  • Video is a very different beast, and the design is a little different

If you’re into information about photo storage sites, don’t hesitate to also read my previous notes on Flickr.

  • http://blogs.smugmug.com/don/ Don MacAskill

    I’m getting an XML-based access-denied error to the Flowgram URL. Do you have a different one?

  • http://blog.musmo.com KwangErn

    Works fine for me.

    Thanks for the summary! :)

  • Pingback: Links for this week : Peter Breuls’s Weblog

  • Sam

    where did you get the stat on the 100 million uploads per week (images)? Thanks!


i