Posts Tagged ‘memcached’

Frank Mashraqi on Hadoop, memcached, and why the MySQL Conference is cool

Today I spoke with Farhan “Frank” Mashraqi, former Fotolog DBA, now working at a startup, NetEdge, working on social analytics. He’s talking about the two sessions he’s giving next week at the MySQL Conference & Expo 2009, as well as the benefits of being at the MySQL Conference & Expo.

He’s giving two talks:

  1. Hadoop and MySQL: Friends with Benefits in where he will tell you about how you can combine data sets and queries, some of which run on Hadoop, and others which run on MySQL, but eventually probably end up in MySQL (he works on this cool stuff at NetEdge, the startup he’s currently attached to).
  2. Advanced memcached use cases in where you get best practice information on leveraging memcache, a software package that all the big boys use.

Frank also says, “If you’re not coming to the MySQL Conference, you’re losing out”. He’s right. You should be there. Look at all the amazing sessions, all the amazing networking opportunities, and more? He clearly specifies that the tutorial selection this year is pretty incredible, so make sure you’re signed up for Monday! Plan your sessions ahead, otherwise you might miss out some of the important things.

The MySQL Conference & Expo 2009 runs from April 20-23, 2009, at the Santa Clara Convention Centre. Don’t hesitate to register as there are plenty of interesting sessions there, next week.

Memcached and MySQL: webminar from a Web 2.0 company

At OSCON, Brian and Dormando gave their ever famous talk, Memcached and MySQL: Everything You Need To Know. I didn’t attend the tutorial, but they assured me it was similar to what was given at the MySQL Conference 2008 (everything, but the very nice buttons dormando was giving out with the memcached logo!). Great, because not only is memcached hot, but I have notes from their talk: Memcached and MySQL tutorial.

Interestingly enough (and this didn’t happen at OSCON), was at the MySQL Conference, Patrick Galbraith jumped on stage to speak about his experience with memcached at Grazr. Why not now, spend an hour listening to Patrick talk about Grazr, memcached, and MySQL?

There’s a webminar, titled: Grazr: Lessons Learned using MySQL and Memcached in Web 2.0 Applications. Its on Thursday, August 14, 2008, and you don’t even need to dress up to listen to Pat talk – the beauty of a webminar :)

P/S: If you don’t already know, subscribing to MySQL Enterprise entitles you to 24/7 production support for memcached. Neat, right?

Memcached and MySQL tutorial

Memcached by Brian Aker, Alan Kasindorf (dormando). Here are some quick, somewhat sparse notes. Follow the slides, it will help.


Memcached was actually created for LiveJournal. It has evolved a bit over time. Chaos to user based clustering, and then Brad implemented memcached. LiveJournal has about 30GB of cache available between 8-12 machines. The DB reads were down like 10x the moment they started using memcached (its much better now).

Its not only for simple objects (not just a single row)- you can use it for complex queries, and the result can be stored in memcached., Patrick Lenz, is also the guy. He put memcached on the same machine as the MySQL database server (he has 32-bit machines, and MySQL can only use a certain amount of RAM, so the rest was for memcached). This is definitely not the recommended way. Have separate memcached servers.

PatG comes up to talk about Grazr, which is more of a write-through cache. Refer to Page 8 of the slides. Now, the thought is that maybe Pat should’ve used gearman, rather than writing their own software. Memcached has allowed them to do it asynchronously. They’re using bulk inserts now as well.

DownUnder GeoSolutions uses lustre, which is a clustering filesystem. They’re not a web-based solution. They extract data off lustre, store it in memcached. Processing happens on the memcached RRU.

memcached by itself does very little. There’s a simple daemon, and it responds to gets/sets/add/replace. It sits on top of a very simple slab allocator. Everytime you called it, it ran malloc() and it would free() it when done, during the early days. So, now, it makes one slab allocator for different types of objects.

memcached is event based. libevent is a generic wrapper around epoll/kqueue, and its very scalable for network connections. 10,000 connections to a memcached, is ok – it only cares for how many of them are “active”.

The protocol is very simple. Everyone hates it, but everyone uses it. You can even fire up telnet to talk to memcached. Its very easy to write to protocol.txt and to talk to it.

memcached? A big stupid hash table. In a grid, its a distributed hash table. memcached is 2 hash tables – from client, and one in the server. 30 memcached’s don’t need to know about each other – they’re blind from each other. There is no cross traffic. You just add more servers, to scale up.

Clients hash keys to the server list. Take a single key (250 bytes max), the client hashes it. You have a value, you want to access it, here’s a key. There is multiple hashing going on, as some clients do things like compressing data.

How do I dump data? You don’t. Its a cache.

How is it redundant? Its not. The server itself doesn’t know about other servers around it! PECL and the next version of libmemcached will understand replication. The redundancy happens in clients.

How does it handle failover? It doesn’t. If it dies, it dies. A client can of course, handle it.

How does it authenticate? It doesn’t at all. Don’t stick one of this, open faced, to the Internet – when you connect to it, you have full access to any commands in the server and all contents in the server. You don’t want folk just typing flush in the server ;)

A very simple service, very simple server.

Details on the Server? Page 14, is pretty much all the commands you can use in memcached. You can run this from telnet, even
– set operation throws data inside memcached (it doesn’t care if there’s other data in it)
– add is lightly atomic – it won’t add data that is already there
– stats can give you particular pieces of information, or give you a full dump. Hit ratio, cache efficiency, and lots more, can come out of this

All drivers you are seeing, are just basically extending all these commands. cas (compare and swap atomic!) today is pretty limited

memcached can even run on FreeBSD 4. Most people run memcached on Linux. No one has deployed memcahced on OSX in the audience.

There’s MySQL integration. Most users grab object from database,
store object to memcached. The UDF memcached functions are probably the most successful UDF in MySQL’s history :)

There’s pgmemcache() for Postgresql, but not much is known about it

Apache – mod_memcached, has CAS operations exposed. Different to the lighttpd implementation.

There are limitations (page 23). If you wanted to change things, you can recompile memcached, but you might not want to do that. Largest slab class in the system, is 1 megabyte. So data size is under 1 megabyte. Beware if you’re running on a 32-bit system (going over 4GB and you will segfault). A 64-bit system should be fine, in general.

memcached supports threads, thanks largely to Facebook. You probably don’t need this, unless you are Facebook. Memcached’s CPU footprint is tiny.

If you gave memcached 16GB, you will not get your memory back, even if you run flush. The memory is permanently allocated from the OS (much like how Vista does things?). There is mlockall() support, so you can guarantee there will be no paging. Or just disable swap.

jallspaw: memcached1: 22:02:00 up 992 days, 11:57, 0 users, load average: 0.35, 0.37, 0.37

(posted on IRC at #mysqlconf). memcached hardly every crashes.

You can disable the LRU if you want (there’s a command line option for this).

Hashing comes in 2 flavours – normal and consistent hashing. All drivers support CRC today.

A consistent hash means, that instead of doing a modular divide, you can interlace among many servers across the network. When you have a 100 servers running and add a server into the network, you want to add a server, and not lose the entire cache network at once.

libmemcached can do replicas, so it can take data from servers, and apply it to the ring. So if a server is taken out of the network, it can be found elsewhere on the ring. You can keep these networks up and running, and easily growing, with new servers, without losing cache coherency.

Don’t only look at the return value, look at the fact that zero may actually be a credible value, even. An actual value of zero, versus a “we didn’t find anything” is very different.

Slide 35, the ghetto locking implementation for memcache-client. Creates a pseudo-lock around a process. You’re the only process thats processing this area, so you add a key lock, where you ensure you test for nil, not zero (you’re testing for the existence of the lock). If your process dies, someone else will try in 30 seconds (lock expire). Add will only work if there’s no key existing at that point (remember, an add is not a set).

PHP is probably the best supported language, for memcached. PECL memcached library is C backed, standard, and works fine. libmemcached will probably take over most of its features, eventually, but its not there yet now.

Default, if you call increment by a key, it bumps by one. You can also step it up instead of 1, say like 500 or something. Refer to slide 41. Just like you can increment a key, you can decrement also.

C/libmemcached. C driver, there’s a C++ wrapper. Sync and async cached keys. It supports replication through the network. Has read through cache support.

You can not only store a value, but you can also store flags. Flags to keep track of generations, keep track of MIME type internally (so not only store object type, but MIME type). This is unique for libmemcached. Most other drivers use this flags value to see if its compressed or not (the flag = 1 for compression, 0 for no).

Multiget is 7-9x faster than just a get. Look at Page 48 for an example.

Memcached for MySQL? Uses the UDF API. You can now incorporate most of the memcached stuff, in the SQL server, so you can do deletions and get operations easily.

What do you think about persistent connections? Use them. libevent supports them.

Spaces to watch: MogileFS. HyperTable. HBase. People have stopped talking about POSIX filesystems, and are more talking about object filesystems. Its what all the cool kids are doing.

Technorati Tags: , , , , , , , , , ,

Memcache, keeping data in the handiest place: memory

While I ducked out of Giuseppe’s miniconf talk, on MySQL Proxy (a great session, might I add – it takes up 2 slots right up until lunch), I went over to the LinuxChix miniconf, to attend a talk about memcache, by Brenda Wallace. Brenda, works at Catalyst IT, in New Zealand – they use a lot of memcache, in the telco business.

Memcache: volatile cache for keeping data in. Its a daemon. The code, can connect to memcache, put values in, read values, delete values. An example of how to use memcache, is given in PHP5.

A killer feature, is the setting of expiry. You can tell it to cache for 30 seconds, and then forget about it, no worries there.

What do you store? Database, generated content (front page of a website, just like a blog even), web service lookups (useful in telco, or say, if you’re playing with the Flickr API), LDAP, things that are far away (from the other side of the pond, etc.).

Wikipedia made memcache famous, Twitter uses it a lot, and there are probably heaps more.

Many APIs, and there’s a postgres client too. There’s a memcache storage engine for MySQL as well.

Code should be written such that if its in memcache, use that, otherwise, get it from the database and put it in memcache.

Another nifty feature, is incrementing a value – increment functions $memc->inc(‘name’);. You can also read stats, to see a cache hit or miss.

Memcache doesn’t have locks. Memcache is not atomic. There are other libraries out there to do locking, there is a known Perl library for this.

What not to store? Remember, its completely volatile. Don’t store anything you’d be sad to lose, and make sure the real copy is safe elsewhere. There is no method to get list of keys in store. There is a 1MB limit per item, so if data is larger than that, you’re in trouble.

Where do you run it? Remember, it is memory hungry, but CPU lite. Running memcache on the webserver is the recommended method, so beware of the security.

There is no authentication. You just connect (no username, no password). So, when running on the web server, you probably want a firewall. In shared hosting, everyone on that host can read/write to your memcache instance.

No check for validity. No referential integrity, its not a database.

There is transparent failover! So if one fails, the client just automatically connects it to another.

Usage ideas? Communicate between layers (talk to a PHP app, from Java). Instead of squid, you can store stuff in memcache, if you want.

Some competing technologies: Tugela – same concept, but its saved to disk, so it will survive a reboot. This is the Wikipedia fork, of memcached. Couch DB is mention, but its not really a competitor, seeing that its a document database. Lucene is another competitor, but remember, its a fast indexer, and its non-volatile.

I haven’t looked into memcached much, but its quite clear, its a great technology to look at. Now the fact that you can use the MySQL storage engine, it might actually be really, interesting.

Technorati Tags: , , ,