Archive for 28/1/2008

Memcache, keeping data in the handiest place: memory

While I ducked out of Giuseppe’s miniconf talk, on MySQL Proxy (a great session, might I add – it takes up 2 slots right up until lunch), I went over to the LinuxChix miniconf, to attend a talk about memcache, by Brenda Wallace. Brenda, works at Catalyst IT, in New Zealand – they use a lot of memcache, in the telco business.

Memcache: volatile cache for keeping data in. Its a daemon. The code, can connect to memcache, put values in, read values, delete values. An example of how to use memcache, is given in PHP5.

A killer feature, is the setting of expiry. You can tell it to cache for 30 seconds, and then forget about it, no worries there.

What do you store? Database, generated content (front page of a website, just like a blog even), web service lookups (useful in telco, or say, if you’re playing with the Flickr API), LDAP, things that are far away (from the other side of the pond, etc.).

Wikipedia made memcache famous, Twitter uses it a lot, and there are probably heaps more.

Many APIs, and there’s a postgres client too. There’s a memcache storage engine for MySQL as well.

Code should be written such that if its in memcache, use that, otherwise, get it from the database and put it in memcache.

Another nifty feature, is incrementing a value – increment functions $memc->inc(‘name’);. You can also read stats, to see a cache hit or miss.

Memcache doesn’t have locks. Memcache is not atomic. There are other libraries out there to do locking, there is a known Perl library for this.

What not to store? Remember, its completely volatile. Don’t store anything you’d be sad to lose, and make sure the real copy is safe elsewhere. There is no method to get list of keys in store. There is a 1MB limit per item, so if data is larger than that, you’re in trouble.

Where do you run it? Remember, it is memory hungry, but CPU lite. Running memcache on the webserver is the recommended method, so beware of the security.

There is no authentication. You just connect (no username, no password). So, when running on the web server, you probably want a firewall. In shared hosting, everyone on that host can read/write to your memcache instance.

No check for validity. No referential integrity, its not a database.

There is transparent failover! So if one fails, the client just automatically connects it to another.

Usage ideas? Communicate between layers (talk to a PHP app, from Java). Instead of squid, you can store stuff in memcache, if you want.

Some competing technologies: Tugela – same concept, but its saved to disk, so it will survive a reboot. This is the Wikipedia fork, of memcached. Couch DB is mention, but its not really a competitor, seeing that its a document database. Lucene is another competitor, but remember, its a fast indexer, and its non-volatile.

I haven’t looked into memcached much, but its quite clear, its a great technology to look at. Now the fact that you can use the MySQL storage engine, it might actually be really, interesting.

Technorati Tags: , , ,

Morning sessions at MySQL MiniConf

Upcoming MySQL Features – Stewart Smith

Stewart’s talk on Upcoming MySQL Features was sort of a roadmap of what one might expect to see in MySQL 5.1 and above – he touched on Falcon, online backup, batched key access, Maria, Proxy, Workbench, and some cluster changes. When he shares his slides, it might be great to link to Worklog items, and Forge pages about these new features and previews (because, believe me, the stuff thats coming in future, is clearly very exciting).

MySQL Indexing Methods – Jonathon Coombes

Sitting in Jonathon Coombes session on MySQL Indexing Methods now – he’s going through covering indexes, the B+-tree index, hash index, full text indexing.

Some select points, that aren’t in the slides (otherwise, the slides themselves are very verbose, and when they make it online, it will provide some great reading material):

  • InnoDB uses a B+-tree, and a secondary hash index.
  • MyISAM has R-Tree index support, so it can be used quite usefully for GIS applications. MyISAM is not the only engine that supports spatial indexing, but its the only one that uses R-Tree indexes.
  • T-Tree index is used in MySQL Cluster
  • Touched on the Lucene search engine, with at least about 8 hands going up, as to people using it in production. It does allow live indexing, does proximity searches
  • Sphinx, has a high indexing speed, and by default it sits outside the database, however you can set it to be a storage engine as well. Sphinx is distributed, so its similar to the way memcached is architectured. Distributed indexing. Full text fields.
  • Bitmap indexes, not available yet, but its something people are waiting for

Bit weird seeing Jonathon now showing off OpenQuery, as opposed to Cybersite!

MySQL Optimisation by Design – Arjen Lentz
Its a training course that Arjen is working on, a one day event. He has no slides, per se, so he’s going to use the whiteboard instead. So this is something you’ve got to hope to get a video of, at least.

If you use MySQL, quite often you stick a database abstraction layer in between. However, you never usually ever port to another database, and the abstraction layer always leaks.

Arjen now walks us thru engines available, by firing up the MySQL monitor, and going through the output of show storage engines;.

Arjen then goes through a demonstration, on how many rows per second you can INSERT. These are 60-byte rows, from data acquisition styled information (IP addresses from a router, web clicks, etc.). Using MyISAM. No indexes. Sustained inserts of about 150,000 rows per second (40 million rows, in about 5 minutes, till his disk got full), on his Mac OS X Leopard laptop (and he can’t max anything out). This demo, is of course, for Arjen’s new gizmo, that’s written using the MySQL C API, that currently is in development. The application makes use of a multi-row insert.

Using Blackhole, it holds at about 245,000 rows per second, for INSERTs. So, we’re seeing the slowdown, in really, disk I/O.

Codename: lossylogger.

Its called lossy, because its allowed to loose some data. If you’re tracking web clicks, does it really make a difference, if you’ve lost some clicks, in the grand scheme of things? Not really. It makes use of UDP, so you’re not opening connections either. So, occasionally, you get a buffer overflow, and occasionally you lose data, but its no big deal, in the grand scheme of things.

Most interestingly, at the tail-end of Arjen’s talk, I see the MySQL Australia account manager (sales), Ralph, walking in. Yes, we have sales in Australia/New Zealand now!

Technorati Tags: , , ,

MySQL Miniconf Starts!

Arjen and Stewart are on stage, and there’s an introduction session going on now. We’re now, introducing the ex-MySQLers (Arjen), and MySQLers (Stewart, Giuseppe, me). Trent has just walked in, so that makes all the MySQLers that are around at linux.conf.au.

Highlights of some of the attendees:

  • A user from LG, who has been using MySQL for about 3 years now
  • An technology manager in defense, interested in MySQL as an education exercise
  • A MySQL user for over 8 years
  • Systems administrator who’s been heavily using MySQL for 5 years, however with a total of about 8 years of use
  • Systems administrator at IBM, using MySQL for a long time
  • Travel startup in the Gold Coast, doing lots of MySQL, replication, proxy use
  • A software engineer at HP, in China, and they use MySQL for benchmarking on HP hardware
  • A Connector/J user
  • realestate.com.au, is a Perl+Apache+MySQL shop
  • A IBM guy who works in performance tuning
  • Canon Research Labs in Sydney, but now working in a new small-ish firm, using a lot of MySQL
  • MySQL for a VoIP company. They’ve tried mcluster, and it sucked ;) (and now, they want Cluster)
  • Bolt on’s, to old accounting packages. Using MySQL with Delphi; fiddling with Mondrian, Pentaho now and also looking at JBOSS
  • PHP developer using MySQL for CMS systems, using it for about 6+ years
  • m5 Networks, VoIP company, using MySQL, needs to scale, looking at Cluster
  • Lonely Planet, MySQL DBA, who needs more tips on scalability and high availability
  • Moodle host, also interested in high availability
  • Stronghold CMS, they’d like MySQL to support sequences on a transaction, and they don’t want it in a stored procedure. They work for the government, to some extent (afaik).
  • Using MySQL in education for research/study
  • Asterisk, and MySQL
  • Drupal, and optimising MySQL, so it can scale
  • vquence, who need to store over 100 million videos, and they’d like MySQL to scale for them

The trend is there are a lot of VoIP companies, and a lot of folk wanting high availability, and scaling to amazing lengths. Very interesting. OK, Stewart is going to tell us what’s new in MySQL now… He’s got a bottle of liquor available, in traditional Monty tradition :)

Technorati Tags: , , ,


i