groonga – fulltext search library for cloud & web

This is an incomplete fragment from 2011. Figure its worth publishing this now, considering MariaDB is likely to get groonga in the near future. The groonga team have released MariaDB 10.0.6 binaries as well. This is all part of the mroonga project.

These were my quick notes from the groonga talk at the O’Reilly MySQL Conference & Expo 2011. I haven’t tried it yet (and don’t know if it really is faster than Sphinx), but its something I definitely want to play with. Maybe even get a MariaDB tree going.

groonga is a fulltext search library for cloud & web.

groonga is easy to embed & is scalable. It is written in C.

Highly precise search for any language. Fast searching and indexing in realtime.

PostgreSQL bindings are also available. Can be used with Spider storage engine. CPU scalable. There is also a Ruby binding.

“100x faster than Sphinx in practical use cases”

groonga components:

  • groonga core – embedded search engine
  • groonga column store – data store, strings, numeric values, geographic values. None of the existing engines were good enough for typical search engine queries. Typical queries hits large number of records, filtered by multiple conditions (liker range queries) and then you group by sepcific conditions, order by a dynamic condition, and sometimes output limited number of records.
  • groonga storage engine – pluggable storage engine to mysql

Spider can be used for data sharding on top of it. It is not a component of the groonga product, but works well with it to make it a distributed search engine.

Works for unsegmented languages (like CJK). No whitespaces in CJK.

groonga supports full inverted index (for unsegmented languages). Highly compressed index (no stop words are needed). They use Patricia TRIE lexicon (partial string match on lexicon). Inverted index is designed to reduce disk I/O.

Web is growing and searching & indexing must be performed simultaneously.

Tritonn – patched mysql, myisam and groonga

http://www.twistimage.com/

Problems with it?

  1. MyISAM based – table lock (when updating table, read accesses are blocked)
  2. Patch based – patch maintenance and building patched MySQL is messy

New solution? Groonga storage engine. Uses the new column store instead of MyISM. And it’s no patch any longer — it’s a pluggable storage engine

https://github.com/mroonga/mroonga

Advantages?

  • table lock free – column store is lock free
  • only accesses columns required – not row-based
  • easy to build now

Includes some optimisations:

  • count(*) optimzation for queries like SELECT COUNT(*) FROM table where MATCH(col) against (‘query’);
  • Works also with ORDER BY score and LIMIT optimisation

The groonga storage engine has fast phrase search, fast index update (realtime), inserting records doesn’t block reading records

Spider is a storage engine for database sharding transparently.

Benefits of Spider + Groonga:

  • optimisation of fts with sorting by score
  • optimisation for the sorting by range partition key column
  • optimisation fts with filtering by partition key column

groonga.org – they are all based on mysql 5.5 (packages available)

Contact Team Groonga: bit.ly/fSs5vx

 


i