{"id":791,"date":"2008-04-15T14:36:52","date_gmt":"2008-04-15T19:36:52","guid":{"rendered":"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/mysql-full-text-search-by-alex-rubin"},"modified":"2008-04-15T14:38:59","modified_gmt":"2008-04-15T19:38:59","slug":"mysql-full-text-search-by-alex-rubin","status":"publish","type":"post","link":"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/mysql-full-text-search-by-alex-rubin","title":{"rendered":"MySQL Full Text Search by Alex Rubin"},"content":{"rendered":"<p>Download the PDF: <a href=\"http:\/\/www.mysqlfulltextsearch.com\/full_text.pdf\">http:\/\/www.mysqlfulltextsearch.com\/full_text.pdf<\/a><\/p>\n<p>Default search by relevance, default sort is by relevance<\/p>\n<p>Boolean search is also popular. cats AND dogs. No default sorting, so you need to order the results yourself<\/p>\n<p>Phrase search<\/p>\n<p>MySQL Full Text Index, only available with MyISAM, and it supports natural language and Boolean search. ft_min_word_len &#8211; 4 characters per word by default is indexed. Frequency based ranking, doesn&#8217;t count distance between words<\/p>\n<p>SELECT * FROM articles WHERE MATCH (title,body) AGAINST (&#8216;database&#8217; IN NATURAL LANGUAGE MODE);<\/p>\n<p>For Boolean, you use AGAINST (&#8216;cat AND dog&#8217; IN BOOLEAN MODE).<\/p>\n<p>n-gram fulltext plugin for CJK languages are available as plugins<\/p>\n<p>DRBD and MySQL FullText search? DRBD requires InnoDB, when there is a failover, DRBD needs to perform a reovery. Fulltext only works for MyISAM. So ou create a &#8220;FullText&#8221; slave MyISAM table with FullText indexes. The slide (diagram) is most useful for this, naturally.<\/p>\n<p>Speed up FT search? Fit the index into memory. key_buffer = total size of full text index (max=4GB). You can preload FT indexes into buffer.<\/p>\n<p>You can manually partition. Partitioning decreases index and table size, so search is faster. Application needs changing of course. MySQL 5.1 partitioning features, do not support FTs.<\/p>\n<p>Order by\/Group by is a performance killer. Using order by date, is much slower than with no order by.<\/p>\n<p><b>Real World Performance Killer<\/b><br \/>SELECT &#8230; FROM `ft` WHERE MATCH `album` AGAINST (&#8216;the way i am&#8217;)<br \/>The above query, is very slow! It took like 13 seconds or so.<\/p>\n<p>Note the stopword list and ft_min_word_len. I is not a stopword, but &#8220;the&#8221;, &#8220;way&#8221;, and &#8220;am&#8221; are stopwords.<\/p>\n<p>ft_min_word_len = 1 will mean that all words except &#8220;i&#8221; will be filtered out with the standard stoplist. &#8220;i&#8221; is contained in lots of text!<\/p>\n<p>Search with error correction? Use soundex() MySQL function (sounds similar). select soundex(&#8220;Dilane&#8221;) should equate to Dylan. You can sort it either by popularity or Levenstein distance (either by a stored procedure or a UDF).<\/p>\n<p>Sphinx &#8211; nice, open source, can be faster than MySQL full text index on a large dataset, supports multi-node clustering out of the box. It is however an external solution that isn&#8217;t built-in, and needs to be integrated.<\/p>\n<p>MySQL 5.0: need to patch source code. MySQL 5.1: copy Sphinx plugin to the plugin_dir.<\/p>\n<p>You can set Sphinx to be MySQL&#8217;s storage engine if you like.<\/p>\n<p><b>Resources<\/b><\/p>\n<ul>\n<li><a href=\"http:\/\/dev.mysql.com\/doc\/refman\/5.1\/en\/fulltext-search.html\">Full-Text Search Functions<\/a><\/li>\n<li><a href=\"http:\/\/dev.mysql.com\/doc\/refman\/5.1\/en\/fulltext-fine-tuning.html\">Fine-Tuning MySQL Full-Text Search<\/a><\/li>\n<li><a href=\"http:\/\/www.mysqlfulltextsearch.com\/\">MySQL Full Text Search<\/a> site, including links to worklog entries, etc.<\/li>\n<\/ul>\n<p>Technorati Tags: <a class=\"performancingtags\" href=\"http:\/\/technorati.com\/tag\/mysql\" rel=\"tag\">mysql<\/a>, <a class=\"performancingtags\" href=\"http:\/\/technorati.com\/tag\/mysqluc08\" rel=\"tag\">mysqluc08<\/a>, <a class=\"performancingtags\" href=\"http:\/\/technorati.com\/tag\/mysqluc2008\" rel=\"tag\">mysqluc2008<\/a>, <a class=\"performancingtags\" href=\"http:\/\/technorati.com\/tag\/full%20text%20search\" rel=\"tag\">full text search<\/a>, <a class=\"performancingtags\" href=\"http:\/\/technorati.com\/tag\/alex%20rubin\" rel=\"tag\">alex rubin<\/a>, <a class=\"performancingtags\" href=\"http:\/\/technorati.com\/tag\/boolean\" rel=\"tag\">boolean<\/a>, <a class=\"performancingtags\" href=\"http:\/\/technorati.com\/tag\/performance\" rel=\"tag\">performance<\/a>, <a class=\"performancingtags\" href=\"http:\/\/technorati.com\/tag\/partitioning\" rel=\"tag\">partitioning<\/a>, <a class=\"performancingtags\" href=\"http:\/\/technorati.com\/tag\/sphinx\" rel=\"tag\">sphinx<\/a><\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li class=\"share-email\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-email sd-button share-icon\" href=\"mailto:?subject=%5BShared%20Post%5D%20MySQL%20Full%20Text%20Search%20by%20Alex%20Rubin&body=http%3A%2F%2Fwww.bytebot.net%2Fblog%2Farchives%2F2008%2F04%2F15%2Fmysql-full-text-search-by-alex-rubin&share=email\" target=\"_blank\" title=\"Click to email a link to a friend\" data-email-share-error-title=\"Do you have email set up?\" data-email-share-error-text=\"If you&#039;re having problems sharing via email, you might not have email set up for your browser. You may need to create a new email yourself.\" data-email-share-nonce=\"fdc1f0ead8\" data-email-share-track-url=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/mysql-full-text-search-by-alex-rubin?share=email\"><span>Email<\/span><\/a><\/li><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-791\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/mysql-full-text-search-by-alex-rubin?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-791\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/mysql-full-text-search-by-alex-rubin?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-791\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/mysql-full-text-search-by-alex-rubin?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>Download the PDF: http:\/\/www.mysqlfulltextsearch.com\/full_text.pdf Default search by relevance, default sort is by relevance Boolean search is also popular. cats AND dogs. No default sorting, so you need to order the results yourself Phrase search MySQL Full Text Index, only available with MyISAM, and it supports natural language and Boolean search. ft_min_word_len &#8211; 4 characters per [&hellip;]<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li class=\"share-email\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-email sd-button share-icon\" href=\"mailto:?subject=%5BShared%20Post%5D%20MySQL%20Full%20Text%20Search%20by%20Alex%20Rubin&body=http%3A%2F%2Fwww.bytebot.net%2Fblog%2Farchives%2F2008%2F04%2F15%2Fmysql-full-text-search-by-alex-rubin&share=email\" target=\"_blank\" title=\"Click to email a link to a friend\" data-email-share-error-title=\"Do you have email set up?\" data-email-share-error-text=\"If you&#039;re having problems sharing via email, you might not have email set up for your browser. You may need to create a new email yourself.\" data-email-share-nonce=\"fdc1f0ead8\" data-email-share-track-url=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/mysql-full-text-search-by-alex-rubin?share=email\"><span>Email<\/span><\/a><\/li><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-791\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/mysql-full-text-search-by-alex-rubin?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-791\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/mysql-full-text-search-by-alex-rubin?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-791\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/mysql-full-text-search-by-alex-rubin?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_options":[]},"categories":[23],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4vJD-cL","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":2832,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2013\/11\/30\/groonga-fulltext-search-library-for-cloud-web","url_meta":{"origin":791,"position":0},"title":"groonga &#8211; fulltext search library for cloud &#038; web","date":"30\/11\/2013","format":false,"excerpt":"This is an incomplete fragment from 2011. Figure its worth publishing this now, considering MariaDB is likely to get groonga in the near future. The groonga team have released MariaDB 10.0.6 binaries as well. This is all part of the mroonga\u00a0project. These were my quick notes from the groonga talk\u2026","rel":"","context":"In &quot;MariaDB&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":564,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2007\/06\/07\/mysql-resources-across-the-interwebs","url_meta":{"origin":791,"position":1},"title":"MySQL Resources across the Interwebs","date":"7\/6\/2007","format":false,"excerpt":"In an effort to reduce the number of open tabs I have in Firefox, I have some rather interesting MySQL-related resources that I think the rest of Planet MySQL will quite enjoy.Linbit (the makers of DRBD) and MySQL are in a partnership now, and Irwan Jamaluddin, a systems adminstrator at\u2026","rel":"","context":"In &quot;MySQL&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2259,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2012\/02\/05\/how-to-offload-mysql-server-with-sphinx-by-vladimir-fedorkov","url_meta":{"origin":791,"position":2},"title":"How to offload MySQL server with Sphinx by Vladimir Fedorkov","date":"5\/2\/2012","format":false,"excerpt":"Vladimir Fedorkov of Sphinx. Presentation started out with a very nice presentation of candies to all the audience members. What is Sphinx? Another (C++) daemon on your boxes. Can be queried via API (PHP, Python, etc.) or MySQL-compatible protocol and SQL queries (SphinxQL). Some query examples are in the slides,\u2026","rel":"","context":"In &quot;MySQL&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":706,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2008\/01\/28\/morning-sessions-at-mysql-miniconf","url_meta":{"origin":791,"position":3},"title":"Morning sessions at MySQL MiniConf","date":"28\/1\/2008","format":false,"excerpt":"Upcoming MySQL Features - Stewart SmithStewart's talk on Upcoming MySQL Features was sort of a roadmap of what one might expect to see in MySQL 5.1 and above - he touched on Falcon, online backup, batched key access, Maria, Proxy, Workbench, and some cluster changes. When he shares his slides,\u2026","rel":"","context":"In &quot;MySQL&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1332,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2009\/02\/17\/drbd-management-console","url_meta":{"origin":791,"position":4},"title":"DRBD Management Console","date":"17\/2\/2009","format":false,"excerpt":"Wow, check out what just came out from Linbit: The DRBD Management Console. Written in Java (so it runs anywhere), completely open source (GPLv3), and allows you to manage DRBD and Heartbeat based clusters. You can install, configure, see your systems graphically, and a lot more. I'm interested to try\u2026","rel":"","context":"In &quot;Databases&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2247,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2012\/02\/05\/new-mysql-5-6-features-by-oli-sennhauser","url_meta":{"origin":791,"position":5},"title":"New MySQL 5.6 Features by Oli Sennhauser","date":"5\/2\/2012","format":false,"excerpt":"First talk at FOSDEM MySQL Devroom by Oli Sennhauser, of FromDual. Quick notes\/liveblog of the talk, plus links from a quick search. New Release Model: starts with at least in beta quality, milestone releases are RC quality (every 3-6 months), between milestones new features are allowed, GA releases every 12-18\u2026","rel":"","context":"In &quot;MySQL&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"amp_enabled":true,"_links":{"self":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/posts\/791"}],"collection":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/comments?post=791"}],"version-history":[{"count":0,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/posts\/791\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/media?parent=791"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/categories?post=791"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/tags?post=791"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}