{"id":2250,"date":"2012-02-05T04:01:25","date_gmt":"2012-02-05T09:01:25","guid":{"rendered":"http:\/\/www.bytebot.net\/blog\/?p=2250"},"modified":"2012-02-05T06:31:38","modified_gmt":"2012-02-05T11:31:38","slug":"sphinx-user-stories-by-stephane-varoqui","status":"publish","type":"post","link":"http:\/\/www.bytebot.net\/blog\/archives\/2012\/02\/05\/sphinx-user-stories-by-stephane-varoqui","title":{"rendered":"Sphinx user stories by St\u00c3\u00a9phane Varoqui"},"content":{"rendered":"<p><a href=\"http:\/\/varokism.blogspot.com\/\">Stephane Varoqui<\/a>, Field Services <a href=\"http:\/\/www.skysql.com\/\">SkySQL<\/a>, Vlad Fedorkov, Director of PS, Sphinx Inc, Christophe Gesche, LAMP Expert, Delcampe, Herve Seignole, Web Architect, Groupe Pierre &amp; Vacances Center Parcs &#8211; this is a big talk!<\/p>\n<p>Pros: Filtering takes place on attributes in separate tables. Rely on the optimizer choice. HASH JOIN can help (MariaDB 5.3). Table elimination can help (MariaDB 5.2). ICP Index Condition Pushdown can help (MariaDB 5.3\/MySQL 5.6). Max 80M documents at Pixmania, all queries come in less than 1s using 128GB of RAM (MariaDB 5.2). At PAP.fr, there is 16GB RAM with MariaDB 5.2.<\/p>\n<p>Cons: CPU intensive (replication with many slaves). Need covering indexes to cover various !filter !order. Join &amp; sorting cost on lazy filtering.<\/p>\n<p>The more indexes you have in the system, the more you need to increase the main memory of the server. Keep the Btree&#8217;s in memory.<\/p>\n<p>What about denormalized schemas? Not really CPU intensive, just IO. Can go to disk, full partition scan with filtering taking place on record order using covering index. Can shard but not that easy. Use the <a href=\"http:\/\/spiderformysql.com\/\">spider storage engine<\/a> or <a href=\"http:\/\/code.google.com\/p\/shard-query\/\">shard-query<\/a>. Can use memory engine for isolation. There are cons like duplicate data, duplicate indexes, missing material views, merge index cost, impact on write performance, and can consume a lot of memory with many indexes.<\/p>\n<p>MySQL can push hardware, so read less\/do less\/read serialized\/map reduce to get better latency. Chose data type wisely, replace string with numeric, vertical &amp; horizontal sharing, snowflake compression (combination of attributes, build a table of the combination and replace it with an ID). If you are lazy, just use Sphinx!<\/p>\n<p>Sphinx is just another daemon that can serve queries. Its easy to setup, easy to scale, storage engine makes it accessible to current MySQL users, API in core MariaDB (SphinxSE), SphinxQL, SphinxSE is transparent to the application layer of MySQL protocol.<\/p>\n<p>Demo done using the <a href=\"http:\/\/dev.mysql.com\/doc\/employee\/en\/employee.html\">Employees DB<\/a>.<\/p>\n<p>Pierre &amp; Vacances &#8211; Centerparcs. Free text search, they use MariaDB using Levenshtein UDF implementation. Went live 01\/2011. First implementation of Sphinx (12 indexes). Its grown, they use PHP API. The new goal is to never send an empty result. 1 index per website\/market, with a total of 15 million docs. Index built on standalone server. Using internal gearman job schedule to generate index before cache generated. Current monitoring is via Nagios &amp; perl, but the next step is to use Monyog &amp; MariaDB INFORMATION_SCHEMA plugin.<\/p>\n<p>Delcampe is an auction website with 45M &#8216;active&#8217; items. Its dedicated for collectors. 3 string fields, and 15 attributes. 40-120K new items daily. Started with mysql fulltext in 2007, moved to Sphinx in 2008. There was a need to have more filters. Now they have 5 sphinx servers + 1 MySQL server. HAproxy to load balance.<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li class=\"share-email\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-email sd-button share-icon\" href=\"mailto:?subject=%5BShared%20Post%5D%20Sphinx%20user%20stories%20by%20St%C3%83%C2%A9phane%20Varoqui&body=http%3A%2F%2Fwww.bytebot.net%2Fblog%2Farchives%2F2012%2F02%2F05%2Fsphinx-user-stories-by-stephane-varoqui&share=email\" target=\"_blank\" title=\"Click to email a link to a friend\" data-email-share-error-title=\"Do you have email set up?\" data-email-share-error-text=\"If you&#039;re having problems sharing via email, you might not have email set up for your browser. You may need to create a new email yourself.\" data-email-share-nonce=\"43c4191edd\" data-email-share-track-url=\"http:\/\/www.bytebot.net\/blog\/archives\/2012\/02\/05\/sphinx-user-stories-by-stephane-varoqui?share=email\"><span>Email<\/span><\/a><\/li><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-2250\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2012\/02\/05\/sphinx-user-stories-by-stephane-varoqui?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-2250\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2012\/02\/05\/sphinx-user-stories-by-stephane-varoqui?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-2250\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2012\/02\/05\/sphinx-user-stories-by-stephane-varoqui?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>Stephane Varoqui, Field Services SkySQL, Vlad Fedorkov, Director of PS, Sphinx Inc, Christophe Gesche, LAMP Expert, Delcampe, Herve Seignole, Web Architect, Groupe Pierre &amp; Vacances Center Parcs &#8211; this is a big talk! Pros: Filtering takes place on attributes in separate tables. Rely on the optimizer choice. HASH JOIN can help (MariaDB 5.3). Table elimination [&hellip;]<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li class=\"share-email\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-email sd-button share-icon\" href=\"mailto:?subject=%5BShared%20Post%5D%20Sphinx%20user%20stories%20by%20St%C3%83%C2%A9phane%20Varoqui&body=http%3A%2F%2Fwww.bytebot.net%2Fblog%2Farchives%2F2012%2F02%2F05%2Fsphinx-user-stories-by-stephane-varoqui&share=email\" target=\"_blank\" title=\"Click to email a link to a friend\" data-email-share-error-title=\"Do you have email set up?\" data-email-share-error-text=\"If you&#039;re having problems sharing via email, you might not have email set up for your browser. You may need to create a new email yourself.\" data-email-share-nonce=\"43c4191edd\" data-email-share-track-url=\"http:\/\/www.bytebot.net\/blog\/archives\/2012\/02\/05\/sphinx-user-stories-by-stephane-varoqui?share=email\"><span>Email<\/span><\/a><\/li><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-2250\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2012\/02\/05\/sphinx-user-stories-by-stephane-varoqui?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-2250\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2012\/02\/05\/sphinx-user-stories-by-stephane-varoqui?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-2250\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2012\/02\/05\/sphinx-user-stories-by-stephane-varoqui?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_options":[]},"categories":[1064,23],"tags":[1332,1052,1335,1336],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4vJD-Ai","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":2289,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2012\/02\/05\/optimizing-your-innodb-buffer-pool-usage-by-steve-hardy","url_meta":{"origin":2250,"position":0},"title":"Optimizing your InnoDB buffer pool usage by Steve Hardy","date":"5\/2\/2012","format":false,"excerpt":"Steve Hardy of Zarafa. Work that has been done to make Zarafa better. Why do you optimise your buffer pool? To decrease your I\/O load. How can you do it? Buy more RAM, page compression, less (smaller) data, rearrange data. MariaDB or Percona Server allows you to inspect your buffer\u2026","rel":"","context":"In &quot;MariaDB&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2173,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2011\/08\/07\/book-mariadb-crash-course","url_meta":{"origin":2250,"position":1},"title":"Book: MariaDB Crash Course","date":"7\/8\/2011","format":false,"excerpt":"Exciting news - MariaDB gets its first book! Many years ago I read Ben Forta's MySQL Crash Course . It is a book targeted at beginners of MySQL. Ben has now written another book, titled: MariaDB Crash Course. Its still targeted at beginners, and covers many of the new features\u2026","rel":"","context":"In &quot;MariaDB&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2380,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2012\/05\/25\/mariadb-5-5-has-deprecated-pbxt","url_meta":{"origin":2250,"position":2},"title":"MariaDB 5.5 has deprecated PBXT","date":"25\/5\/2012","format":false,"excerpt":"One of the things we (Team MariaDB) talked quite a bit about since we released was PBXT. It was a feature differentiation to MySQL as we shipped another storage engine. It was included in MariaDB 5.1, 5.2, and 5.3; however with our release of MariaDB 5.5, PBXT (docs in the\u2026","rel":"","context":"In &quot;MariaDB&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3140,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2015\/12\/12\/mariadb-server-gas-supported-for-5-years","url_meta":{"origin":2250,"position":3},"title":"MariaDB Server GA&#8217;s supported for 5 years","date":"12\/12\/2015","format":false,"excerpt":"There was some discussion a while back to maybe make MariaDB Server\u00a0follow the Ubuntu release model, i.e. having a Long Term Release (LTS) and then having a few regular fast releases with a shorter support cycle. However its good to note that the decision now going forward is to support\u2026","rel":"","context":"In &quot;MariaDB&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2348,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2012\/03\/25\/more-mariadb-after-percona-live-santa-clara","url_meta":{"origin":2250,"position":4},"title":"More MariaDB after Percona Live Santa Clara","date":"25\/3\/2012","format":false,"excerpt":"Right after Percona Live Santa Clara (which MariaDB is quite present for), its worth noting there are a few more events happening on Friday, April 13, 2012 at the Hyatt Regency Santa Clara. MariaDB will be present at 2\/3 of those events. SkySQL & MariaDB Solutions Day - go ahead\u2026","rel":"","context":"In &quot;MariaDB&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3048,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2015\/02\/01\/mariadb-turns-5","url_meta":{"origin":2250,"position":5},"title":"MariaDB turns 5!","date":"1\/2\/2015","format":false,"excerpt":"I stopped working on MySQL at Sun Microsystems in late 2009 (after a lengthy period of garden leave), to join Monty Program Ab, and was greatly anticipating a MariaDB release that we could take to market. The first GA release of MariaDB came out February 1 2010 - MariaDB 5.1.42.\u2026","rel":"","context":"In &quot;MariaDB&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"amp_enabled":true,"_links":{"self":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/posts\/2250"}],"collection":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/comments?post=2250"}],"version-history":[{"count":4,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/posts\/2250\/revisions"}],"predecessor-version":[{"id":2279,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/posts\/2250\/revisions\/2279"}],"wp:attachment":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/media?parent=2250"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/categories?post=2250"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/tags?post=2250"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}