{"id":2832,"date":"2013-11-30T03:52:57","date_gmt":"2013-11-30T08:52:57","guid":{"rendered":"http:\/\/www.bytebot.net\/blog\/?p=2832"},"modified":"2013-11-30T03:53:02","modified_gmt":"2013-11-30T08:53:02","slug":"groonga-fulltext-search-library-for-cloud-web","status":"publish","type":"post","link":"http:\/\/www.bytebot.net\/blog\/archives\/2013\/11\/30\/groonga-fulltext-search-library-for-cloud-web","title":{"rendered":"groonga &#8211; fulltext search library for cloud &#038; web"},"content":{"rendered":"<p><em>This is an incomplete fragment from 2011. Figure its worth publishing this now, considering MariaDB is likely to get groonga in the near future. The groonga team have <a href=\"http:\/\/mroonga.org\/en\/blog\/2013\/11\/29\/release.html\">released MariaDB 10.0.6 binaries<\/a> as well. This is all part of the <a href=\"http:\/\/mroonga.org\/\"><strong>mroonga<\/strong><\/a>\u00a0project.<\/em><\/p>\n<p>These were my quick notes from the groonga talk at the<a href=\"http:\/\/en.oreilly.com\/mysql2011\"> O&#8217;Reilly MySQL Conference &amp; Expo 2011<\/a>. I haven&#8217;t tried it yet (and don&#8217;t know if it really is faster than Sphinx), but its something I definitely want to play with. Maybe even get a MariaDB tree going.<\/p>\n<p><a href=\"http:\/\/groonga.org\/\">groonga<\/a> is a fulltext search library for cloud &amp; web.<\/p>\n<p>groonga is easy to embed &amp; is scalable. It is written in C.<\/p>\n<p>Highly precise search for any language. Fast searching and indexing in realtime.<\/p>\n<p>PostgreSQL bindings are also available. Can be used with Spider storage engine. CPU scalable. There is also a Ruby binding.<\/p>\n<p>&#8220;100x faster than Sphinx in practical use cases&#8221;<\/p>\n<p>groonga components:<\/p>\n<ul>\n<li>groonga core &#8211; embedded search engine<\/li>\n<li>groonga column store &#8211; data store, strings, numeric values, geographic values. None of the existing engines were good enough for typical search engine queries. Typical queries hits large number of records, filtered by multiple conditions (liker range queries) and then you group by sepcific conditions, order by a dynamic condition, and sometimes output limited number of records.<\/li>\n<li>groonga storage engine &#8211; pluggable storage engine to mysql<\/li>\n<\/ul>\n<p>Spider can be used for data sharding on top of it. It is not a component of the groonga product, but works well with it to make it a distributed search engine.<\/p>\n<p>Works for unsegmented languages (like CJK). No whitespaces in CJK.<\/p>\n<p>groonga supports full inverted index (for unsegmented languages). Highly compressed index (no stop words are needed). They use Patricia TRIE lexicon (partial string match on lexicon). Inverted index is designed to reduce disk I\/O.<\/p>\n<p>Web is growing and searching &amp; indexing must be performed simultaneously.<\/p>\n<p>Tritonn &#8211; patched mysql, myisam and groonga<\/p>\n<p>http:\/\/www.twistimage.com\/<\/p>\n<p>Problems with it?<\/p>\n<ol>\n<li>MyISAM based &#8211; table lock (when updating table, read accesses are blocked)<\/li>\n<li>Patch based &#8211; patch maintenance and building patched MySQL is messy<\/li>\n<\/ol>\n<p>New solution? Groonga storage engine. Uses the new column store instead of MyISM. And it&#8217;s no patch any longer &#8212; it&#8217;s a pluggable storage engine<\/p>\n<p>https:\/\/github.com\/mroonga\/mroonga<\/p>\n<p>Advantages?<\/p>\n<ul>\n<li>table lock free &#8211; column store is lock free<\/li>\n<li>only accesses columns required &#8211; not row-based<\/li>\n<li>easy to build now<\/li>\n<\/ul>\n<p>Includes some optimisations:<\/p>\n<ul>\n<li>count(*) optimzation for queries like SELECT COUNT(*) FROM table where MATCH(col) against (&#8216;query&#8217;);<\/li>\n<li>Works also with ORDER BY score and LIMIT optimisation<\/li>\n<\/ul>\n<p>The groonga storage engine has fast phrase search, fast index update (realtime), inserting records doesn&#8217;t block reading records<\/p>\n<p>Spider is a storage engine for database sharding transparently.<\/p>\n<p>Benefits of Spider + Groonga:<\/p>\n<ul>\n<li>optimisation of fts with sorting by score<\/li>\n<li>optimisation for the sorting by range partition key column<\/li>\n<li>optimisation fts with filtering by partition key column<\/li>\n<\/ul>\n<p>groonga.org &#8211; they are all based on mysql 5.5 (packages available)<\/p>\n<p>Contact Team Groonga: bit.ly\/fSs5vx<\/p>\n<p>\u00a0<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li class=\"share-email\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-email sd-button share-icon\" href=\"mailto:?subject=%5BShared%20Post%5D%20groonga%20-%20fulltext%20search%20library%20for%20cloud%20%26%20web&body=http%3A%2F%2Fwww.bytebot.net%2Fblog%2Farchives%2F2013%2F11%2F30%2Fgroonga-fulltext-search-library-for-cloud-web&share=email\" target=\"_blank\" title=\"Click to email a link to a friend\" data-email-share-error-title=\"Do you have email set up?\" data-email-share-error-text=\"If you&#039;re having problems sharing via email, you might not have email set up for your browser. You may need to create a new email yourself.\" data-email-share-nonce=\"c3fff2e117\" data-email-share-track-url=\"http:\/\/www.bytebot.net\/blog\/archives\/2013\/11\/30\/groonga-fulltext-search-library-for-cloud-web?share=email\"><span>Email<\/span><\/a><\/li><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-2832\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2013\/11\/30\/groonga-fulltext-search-library-for-cloud-web?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-2832\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2013\/11\/30\/groonga-fulltext-search-library-for-cloud-web?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-2832\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2013\/11\/30\/groonga-fulltext-search-library-for-cloud-web?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>This is an incomplete fragment from 2011. Figure its worth publishing this now, considering MariaDB is likely to get groonga in the near future. The groonga team have released MariaDB 10.0.6 binaries as well. This is all part of the mroonga\u00a0project. These were my quick notes from the groonga talk at the O&#8217;Reilly MySQL Conference [&hellip;]<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li class=\"share-email\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-email sd-button share-icon\" href=\"mailto:?subject=%5BShared%20Post%5D%20groonga%20-%20fulltext%20search%20library%20for%20cloud%20%26%20web&body=http%3A%2F%2Fwww.bytebot.net%2Fblog%2Farchives%2F2013%2F11%2F30%2Fgroonga-fulltext-search-library-for-cloud-web&share=email\" target=\"_blank\" title=\"Click to email a link to a friend\" data-email-share-error-title=\"Do you have email set up?\" data-email-share-error-text=\"If you&#039;re having problems sharing via email, you might not have email set up for your browser. You may need to create a new email yourself.\" data-email-share-nonce=\"c3fff2e117\" data-email-share-track-url=\"http:\/\/www.bytebot.net\/blog\/archives\/2013\/11\/30\/groonga-fulltext-search-library-for-cloud-web?share=email\"><span>Email<\/span><\/a><\/li><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-2832\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2013\/11\/30\/groonga-fulltext-search-library-for-cloud-web?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-2832\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2013\/11\/30\/groonga-fulltext-search-library-for-cloud-web?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-2832\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2013\/11\/30\/groonga-fulltext-search-library-for-cloud-web?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_options":[]},"categories":[1064,23],"tags":[1620,1618,1619,1230,621],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4vJD-JG","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":2250,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2012\/02\/05\/sphinx-user-stories-by-stephane-varoqui","url_meta":{"origin":2832,"position":0},"title":"Sphinx user stories by St\u00c3\u00a9phane Varoqui","date":"5\/2\/2012","format":false,"excerpt":"Stephane Varoqui, Field Services SkySQL, Vlad Fedorkov, Director of PS, Sphinx Inc, Christophe Gesche, LAMP Expert, Delcampe, Herve Seignole, Web Architect, Groupe Pierre & Vacances Center Parcs - this is a big talk! Pros: Filtering takes place on attributes in separate tables. Rely on the optimizer choice. HASH JOIN can\u2026","rel":"","context":"In &quot;MariaDB&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2938,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2014\/06\/11\/rhel7-now-with-mariadb","url_meta":{"origin":2832,"position":1},"title":"RHEL7 now with MariaDB","date":"11\/6\/2014","format":false,"excerpt":"Congratulations to the entire team at Red Hat, for the release of Red Hat Enterprise Linux 7 (RHEL7). The release notes have something important, under Web Servers & Services: MariaDB 5.5 MariaDB is the default implementation of MySQL in Red Hat Enterprise Linux 7. MariaDB is a community-developed fork of\u2026","rel":"","context":"In &quot;Distributions&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3182,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2016\/02\/29\/amazon-rds-updates-february-2016","url_meta":{"origin":2832,"position":2},"title":"Amazon RDS updates February 2016","date":"29\/2\/2016","format":false,"excerpt":"I think one of the big announcements that came out from the Amazon Web Services world in October 2015 was the fact that you could spin up instances of MariaDB Server on it. You would get MariaDB Server 10.0.17. As of this writing, you are still getting that (the MySQL\u2026","rel":"","context":"In &quot;MariaDB&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2693,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2013\/04\/12\/upcoming-talks-in-santa-clara","url_meta":{"origin":2832,"position":3},"title":"Upcoming talks in Santa Clara","date":"12\/4\/2013","format":false,"excerpt":"I'm planning my calendar and thought I'd share what talks I'd be giving in Santa Clara in a couple of weeks for the Percona Live MySQL Conference & Expo 2013 and the\u00a0MySQL & Cloud Database Solutions Day 2013. Its going to be a busy April 22-26 2013. MariaDB Cassandra Interoperability\u2026","rel":"","context":"In &quot;MariaDB&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2348,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2012\/03\/25\/more-mariadb-after-percona-live-santa-clara","url_meta":{"origin":2832,"position":4},"title":"More MariaDB after Percona Live Santa Clara","date":"25\/3\/2012","format":false,"excerpt":"Right after Percona Live Santa Clara (which MariaDB is quite present for), its worth noting there are a few more events happening on Friday, April 13, 2012 at the Hyatt Regency Santa Clara. MariaDB will be present at 2\/3 of those events. SkySQL & MariaDB Solutions Day - go ahead\u2026","rel":"","context":"In &quot;MariaDB&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3129,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2015\/11\/09\/rackspace-cloud-high-availability-databases-for-mariadb-mysql-percona-server","url_meta":{"origin":2832,"position":5},"title":"Rackspace Cloud High Availability Databases for MariaDB, MySQL, Percona Server","date":"9\/11\/2015","format":false,"excerpt":"Continuing on with the cloud theme, I think its worth noting that since mid-2014, Rackspace has offered MariaDB (as well as MySQL and\u00a0Percona Server) in the cloud, as part of their Cloud Databases offering. It\u2019s powered by OpenStack. Now there is an additional \u201cHigh Availability instance\u201d being offered \u2013 this\u2026","rel":"","context":"In &quot;MariaDB&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"amp_enabled":true,"_links":{"self":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/posts\/2832"}],"collection":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/comments?post=2832"}],"version-history":[{"count":1,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/posts\/2832\/revisions"}],"predecessor-version":[{"id":2833,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/posts\/2832\/revisions\/2833"}],"wp:attachment":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/media?parent=2832"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/categories?post=2832"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/tags?post=2832"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}