Posts Tagged ‘mysqlce2011’

groonga – fulltext search library for cloud & web

This is an incomplete fragment from 2011. Figure its worth publishing this now, considering MariaDB is likely to get groonga in the near future. The groonga team have released MariaDB 10.0.6 binaries as well. This is all part of the mroonga project.

These were my quick notes from the groonga talk at the O’Reilly MySQL Conference & Expo 2011. I haven’t tried it yet (and don’t know if it really is faster than Sphinx), but its something I definitely want to play with. Maybe even get a MariaDB tree going.

groonga is a fulltext search library for cloud & web.

groonga is easy to embed & is scalable. It is written in C.

Highly precise search for any language. Fast searching and indexing in realtime.

PostgreSQL bindings are also available. Can be used with Spider storage engine. CPU scalable. There is also a Ruby binding.

“100x faster than Sphinx in practical use cases”

groonga components:

  • groonga core – embedded search engine
  • groonga column store – data store, strings, numeric values, geographic values. None of the existing engines were good enough for typical search engine queries. Typical queries hits large number of records, filtered by multiple conditions (liker range queries) and then you group by sepcific conditions, order by a dynamic condition, and sometimes output limited number of records.
  • groonga storage engine – pluggable storage engine to mysql

Spider can be used for data sharding on top of it. It is not a component of the groonga product, but works well with it to make it a distributed search engine.

Works for unsegmented languages (like CJK). No whitespaces in CJK.

groonga supports full inverted index (for unsegmented languages). Highly compressed index (no stop words are needed). They use Patricia TRIE lexicon (partial string match on lexicon). Inverted index is designed to reduce disk I/O.

Web is growing and searching & indexing must be performed simultaneously.

Tritonn – patched mysql, myisam and groonga

http://www.twistimage.com/

Problems with it?

  1. MyISAM based – table lock (when updating table, read accesses are blocked)
  2. Patch based – patch maintenance and building patched MySQL is messy

New solution? Groonga storage engine. Uses the new column store instead of MyISM. And it’s no patch any longer — it’s a pluggable storage engine

https://github.com/mroonga/mroonga

Advantages?

  • table lock free – column store is lock free
  • only accesses columns required – not row-based
  • easy to build now

Includes some optimisations:

  • count(*) optimzation for queries like SELECT COUNT(*) FROM table where MATCH(col) against (‘query’);
  • Works also with ORDER BY score and LIMIT optimisation

The groonga storage engine has fast phrase search, fast index update (realtime), inserting records doesn’t block reading records

Spider is a storage engine for database sharding transparently.

Benefits of Spider + Groonga:

  • optimisation of fts with sorting by score
  • optimisation for the sorting by range partition key column
  • optimisation fts with filtering by partition key column

groonga.org – they are all based on mysql 5.5 (packages available)

Contact Team Groonga: bit.ly/fSs5vx

 

The SkySQL Reference Architecture

I have a bunch of notes from the O’Reilly MySQL Conference & Expo 2011, and I figure its about time I started blogging it. These are notes from the panel on the SkySQL Reference Architecture, led by Kaj Arno and Ivan Zoratti. The notes are raw (read their FAQ for more), and I talk a little bit about the SkySQL Configurator at the end (a tool I immediately used, and submitted some bugs/improvements for – 7 at last count, which I hear got fixed in the 0.02 release, which got pushed last night!).

There were 7 panelists. The MySQL world needs:

  • technical support
  • monitoring & administration tools
  • simplified interfaces
  • development & user tools
  • consulting & training
Services & consulting generally are difficult to scale.
The most comprehensive architecture around MySQL, scalable, adaptable and cloud ready
Implementation:
  • select and test specific components
  • integrate components
  • provision the components in a simple interface
  • simplify monitoring & administration
  • technical services & support
  • validate solutions
  • improvements and new releases can be done
  • knowledge sharing related to the reference architecture
Technologies selected from Webyog, Sphinx, Drizzle, Monty Program, Calpont, Tokutek, ScaleDB, Schooner, Linbit, Zimory, Canonical.

SkySQL Provisioning tools:

  • SkySQL Manager – control and administer the SkySQL/MySQL environment
  • SkySQL Configurator – configure and update SkySQL reference architecture modules
  • SkySQL Tuner – analyse the configuration and prepare the packages

I did a test, and it seemed like I got binaries built in under 5 minutes. Custom configurations with a stock build. You get a 70MB binary. Hosted at http://www.enovance.com/. A lot of people never configure their my.cnf, so I think having a GUI on the web might be a good idea to help people have sensible defaults.

lovegood:skysql byte$ ls
total 143352
drwxr-xr-x    3 byte  staff       102 14 Apr 06:13 ./
drwx------@ 598 byte  staff     20332 14 Apr 06:13 ../
-rw-r--r--@   1 byte  staff  73395132 14 Apr 06:12 SkySQL-mariadb-poboffcfrm5bi054559q8iea74.tar.gz

lovegood:skysql byte$ tar -zxvpf SkySQL-mariadb-poboffcfrm5bi054559q8iea74.tar.gz
x etc/
x etc/my.cnf
x install
x packages/
x packages/xtrabackup-1.4-74.rhel5.x86_64.rpm
x packages/MySQL-client-5.5.10-1.rhel5.x86_64.rpm
x packages/MySQL-server-5.5.10-1.rhel5.x86_64.rpm

SkySQL is also going to have a customer advisory board, and they are starting it this week. (I don’t know any further details about this as of yet.)

The SkySQL Configurator can only get better. I expect it will do custom packages including things like Sphinx/SphinxSE, Drizzle, and other things in due time.

MySQL Conference Early Bird ends 31/03/2011

If you’ve been busy and haven’t registered yet, remember that early-bird pricing ends on 31/03/2011. From April 1-10, you’ll have to pay USD$100 more. A discount code for use (I think you save 20-25%): mys11fsd.

We’re full up in terms of the schedule. People are still asking for an opportunity to speak, and there are still opportunities in the Products & Services track. Please contact Yvonne Romaine at yromaine@oreilly.com for more information on this.

Might I also suggest that if you want to speak and there’s no longer an opportunity, you submit a five-minute talk for the Ignite MySQL event. Even though submissions are now closed, contact Brian Aker — he’ll try and help make some magic happen for you.

Don’t forget you can also lead a Birds of a Feather (BoF) session. While it is not a talk, you can still gather like-minded folk and talk about things over pizza & beer (which has always been a popular combination in previous years).

If you’re looking for a new job, don’t forget the Career Zone. There are some great companies participating, so that’s another good reason to come.

Conferences are all about networking. While not enabled by default, I suggest you manually go and turn on access to the Attendee Directory, so you can write messages to people you want to meet, have chats with, and so on.

Some keynote updates about The O’Reilly MySQL Conference & Expo 2011

A quick update on a few keynotes that the O’Reilly MySQL Conference & Expo 2011 managed to recently close:

O'Reilly MySQL Conference & Expo 2011

  • The opening keynote, The State of the Dolphin, given by none other than Tomas Ulin, who is currently the VP of the MySQL Engineering team at Oracle. I am told that this is not just a “what’s new” and “what’s coming up”, as there will also be a Q&A session with an analyst, customer, and Tomas. You must not miss this on Tuesday morning at 9am, 12th April 2011.
  • On Thursday at 9.30am, we have The Next Decade in Data Management, a keynote given by Mike Olson, CEO of Cloudera. More and more I see people using Hadoop/Hbase alongside their MySQL installs, so I think this talk is a must-see.

Early bird registration ends March 15 2011. What are you waiting for? Procrastination will cost you!

Don’t forget to follow the conference via social media: Facebook, Twitter.

O’Reilly MySQL Conference & Expo 2011 – register now to save!

Its that time of year again. The O’Reilly MySQL Conference & Expo 2011 happening April 11-14 2011, in Santa Clara, California. As co-chair this year with Brian Aker, I’m pretty excited at the content available. It is certainly more diverse and if you thought you knew everything about MySQL, remember you also want to learn about the ecosystem surrounding it.

No one just deploys MySQL standalone these days. There’s an ecosystem. Heck, even in the MySQL world, there is an ecosystem building out. Look at the schedule grid, and see how diverse things are. Yes, there are talks on CouchDB, PostgreSQL, Cassandra, Eucalyptus, OpenStack, “NoSQL” and more.

Looking at the theme, the ecosystem and beyond, it will give attendees a pretty good idea on how to create a good reference architecture; their own reference architecture. Learn from all the talks, and the experiences of the people in the trenches.

So what are you waiting for? Register already! Best price registration ends soon (26/01), so don’t wait — save some cash for the drink-fuelled night chats at all the birds-of-a-feather sessions that spillover to the bar ;-)


i