Colin Charles Agenda

groonga – fulltext search library for cloud & web

This is an incomplete fragment from 2011. Figure its worth publishing this now, considering MariaDB is likely to get groonga in the near future. The groonga team have released MariaDB 10.0.6 binaries as well. This is all part of the mroonga project.

These were my quick notes from the groonga talk at the O’Reilly MySQL Conference & Expo 2011. I haven’t tried it yet (and don’t know if it really is faster than Sphinx), but its something I definitely want to play with. Maybe even get a MariaDB tree going.

groonga is a fulltext search library for cloud & web.

groonga is easy to embed & is scalable. It is written in C.

Highly precise search for any language. Fast searching and indexing in realtime.

PostgreSQL bindings are also available. Can be used with Spider storage engine. CPU scalable. There is also a Ruby binding.

“100x faster than Sphinx in practical use cases”

groonga components:

Spider can be used for data sharding on top of it. It is not a component of the groonga product, but works well with it to make it a distributed search engine.

Works for unsegmented languages (like CJK). No whitespaces in CJK.

groonga supports full inverted index (for unsegmented languages). Highly compressed index (no stop words are needed). They use Patricia TRIE lexicon (partial string match on lexicon). Inverted index is designed to reduce disk I/O.

Web is growing and searching & indexing must be performed simultaneously.

Tritonn – patched mysql, myisam and groonga

http://www.twistimage.com/

Problems with it?

  1. MyISAM based – table lock (when updating table, read accesses are blocked)
  2. Patch based – patch maintenance and building patched MySQL is messy

New solution? Groonga storage engine. Uses the new column store instead of MyISM. And it’s no patch any longer — it’s a pluggable storage engine

https://github.com/mroonga/mroonga

Advantages?

Includes some optimisations:

The groonga storage engine has fast phrase search, fast index update (realtime), inserting records doesn’t block reading records

Spider is a storage engine for database sharding transparently.

Benefits of Spider + Groonga:

groonga.org – they are all based on mysql 5.5 (packages available)

Contact Team Groonga: bit.ly/fSs5vx