Posts Tagged ‘FOSDEM’

Building simple & complex replication clusters with Tungsten Replicator by Giuseppe Maxia

Giuseppe Maxia of Continuent.

MySQL replication is single threaded. Multi-master replication is complex with MySQL. Circular replication works but is very fragile. Once you’ve achieved the feat, how do you avoid conflicts? The lack of global transaction ID today also means you may have slaves that may not be synchronized fully. Finally, some people like to replicate to PostgreSQL, Oracle and MongoDB.

This is where Tungsten Replicator comes into place. Opensource. 100% GPLv2. You can do easy failover (no need to synchronize the slaves manually when a master dies), have multiple masters, multiple sources to a single slave, conflict prevention, parallel replication, and replicate to Oracle/PostgreSQL database (heterogeneous replication – Oracle->MySQL is not opensource, but MySQL->Oracle is).

Parallel replication: ability to replicate with multiple thread at once. Sharded by database (all big sites already use this naturally). Good choice for slave lag problems. Bad choice for single database projects. In their tests to measure slave catch up time between standard MySQL & Tungsten slave with 1hr of sysbench and some 130GB of data, MySQL replication takes 4.5h to catchup, and Tungsten takes less than 1 hour to catchup. This comparison was not made with MySQL 5.6 (the same parallel replication feature at this moment in 5.6 is quite buggy). No need to install Tungsten on the master (Tungsten needs to be on slave). Replication can revert to native salve with 2 commands, but the failover is an issue — so install Tungsten on both!

Conflict prevention. When you have multiple masters you are tempted to do many things that you shouldn’t. Tungsten provides the multi-master topology. Tungsten can help you avoid conflicts. Decide you want to shard your information by database – Tungsten can enforce such rules. Define the rules, applied to either the master or the slave and you can then state to make replication fail or drop silently or drop with a warning.

You can create clusters very quickly with Tungsten. You can install from a centralized point to many servers. You need Java, Ruby, SSH access and a MySQL user with all privileges (used for replication).

How to offload MySQL server with Sphinx by Vladimir Fedorkov

Vladimir Fedorkov of Sphinx.

Presentation started out with a very nice presentation of candies to all the audience members.

What is Sphinx? Another (C++) daemon on your boxes. Can be queried via API (PHP, Python, etc.) or MySQL-compatible protocol and SQL queries (SphinxQL). Some query examples are in the slides, here’s one about SphinxSE in the KB.

MyISAM FTS is good but becomes slow with half a million documents. InnoDB has FTS now but he’s not tried it (and neither has anyone in the audience to see it compare with MyISAM FTS).

Geographical distance is the distance measuring the surface of the earth (two pairs of float values – latitude, longitude). In Sphinx, there is support for GEODIST(Lat,Long,Lat2,Long2) in Sphinx.

Segments are good for price ranges on a site, date ranges, etc. Use INTERVAL(field, x0, x1, …, xN).

Keep huge text collections out of the database. sql_field = path_to_file_text. Tell Sphinx to index text not from MySQL but out in the filesystem. Keep the metadata inside the database but keep the actual data outside of the database. max_file_field_buffer needs to be set properly.

You can do proximity search with Sphinx — find the words “hello world” within a ten word block, for example.

Resources: the documentation, a book by O’Reilly: Introduction to Search with Sphinx: From installation to relevance tuning (sold out at the FOSDEM O’Reilly booth!), and their community page including wiki, forum, etc.

MariaDB 5.3 query optimizer by Sergey Petrunia

Sergey Petrunia of the MariaDB project.

What exactly is not working in MySQL? MySQL is poor at decision support/analytics. With large datasets you need special disk access strategies. Complex queries like insufficient subquery support and big joins are common int he MySQL world.

DBT-3 is used, scale=30, with a 75GB database and run a query “average price of item between a range of dates”. Query time took some 45 minutes to execute. Why? Run iostat -x to see what is going on. See that the CPU is mostly idle, so its an IO-bound load. Next you run SHOW ENGINE INNODB STATUS and you’ll see how many reads per second is happening. Possible solution is to get more RAM or get an SSD (good to speedup OLTP workloads, but analytics over data is probably not viable since SSDs are small and not cheap).

The MySQL/MariaDB solution to the above problem is improved disk access strategies: multi-range read (MRR) and batched key access (BKA). In MariaDB, MRR/BKA need to be enabled (they are not turned on by default). The query time only took 3 minutes 48 seconds, which is some 11.8x faster than the previous 45 minutes.If you look at EXPLAIN output, its almost as same as before, expect the Extra filed. iostat -x will now show some CPU load, svctm down as well (so its not random disk seeks anymore — some 8ms seek time on a regular 7,200rpm disk), SHOW ENGINE INNODB STATUS will show some 10,000 reads per second rather than the previous 200.

If you are on Fedora, check out the Systemtap feature to look at I/O patterns. stap deviceseeks.stp -c “sleep 60”.

Subqueries handling in MariaDB 5.3: check out the Subquery Optimizations Map. Only about 10% of the audience use optimizer hints in MySQL.

MySQL HA reloaded by Ivan Zoratti

MySQL HA reloaded – old tricks and cool new tools to guarantee high availability to your MySQL Servers by Ivan Zoratti of SkySQL. This talk is a little longer, so check out: HA Reloaded – many ways to provide High Availability. The slides are already online.

Questions to ask: which level of high availability do I need? Do I require no loss of data? Do I need failover or is switchover enough? Can I provide a reasonable service when a component is down? Remember, five nine’s high availability also means a lot of infrastructure costs.

Other things to clarify: availability vs scalability. HA costs. HA for your entire architecture, not just for your database servers. Review your SLAs.

The best high availability solution today is combined solutions.

MySQL replication – asynchronous & semi-synchronous (lots of people use MySQL 5.5, about 4 people in room were on Percona Server – question asked due to semi-sync replication only being available in 5.5 & greater), there are pros & cons of row based replication vs statement based replication.

MySQL Replication via Multi-Master replication Manager (MMM). Features such as monitoring, automatic failover, data backup & resync. Unfortunately, it has some problems with the stability & automatic failover. The project is not improved anymore, so there are other solutions that you should consider today.

MySQL Replication with MHA is a preferred solution. Something to consider: –read-only=1 and log-bin on slaves. Master IP failover. FIltering rules. Multi-tier replication.

Tungsten Replicator – open source, heterogenous replication. Truly multi-master and fan-in with Global ID. Per-schema multi-thread. You can also use it to replicate to Postgresql, Oracle and other databases. There is also Tungsten Enterprise.

Synchronous replication with DRBD is typical for active/standby environments. People don’t really like this because they feel that there is a server doing nothing. You can always do it active/passive. It works with InnoDB only.

Synchronous replication with Galera works for InnoDB. Its multi-master with no SPOF. Its new/young technology so you may find some issues with it. Application failover must be managed, but the conflict resolution is quite tricky (when you commit a transaction you might be fine, but you may have transactions that are removed due to conflicts).

There is a commercial SchoonerSQL that provides synchronous master-slave replication for InnoDB. Its defined explicitly as a master-slave solution.

Active/Passive clusters using Shared Storage. Points to consider are the fact that redundancy & replication must be guaranteed by the shared storage. InnoDB only. What about filesystems?

Virtualized environments – data storage, high availability & load balancing are provided and managed by the virtualized software. The faults are handled by the software, not the database.

There is also geographical replication for disaster recovery, having a master-master asynchronous replication is used to update the backup data centre. There is also storage snapshots for disaster recovery (not-specific to MySQL, its storage systems based, use only InnoDB).

There is also MySQL Cluster but there is another presentation about this later at FOSDEM. Very nice closing slide, “The absolutely necessary comparison chart” which some may disagree, but Ivan thinks is the best way forward.

Sphinx user stories by Stéphane Varoqui

Stephane Varoqui, Field Services SkySQL, Vlad Fedorkov, Director of PS, Sphinx Inc, Christophe Gesche, LAMP Expert, Delcampe, Herve Seignole, Web Architect, Groupe Pierre & Vacances Center Parcs – this is a big talk!

Pros: Filtering takes place on attributes in separate tables. Rely on the optimizer choice. HASH JOIN can help (MariaDB 5.3). Table elimination can help (MariaDB 5.2). ICP Index Condition Pushdown can help (MariaDB 5.3/MySQL 5.6). Max 80M documents at Pixmania, all queries come in less than 1s using 128GB of RAM (MariaDB 5.2). At, there is 16GB RAM with MariaDB 5.2.

Cons: CPU intensive (replication with many slaves). Need covering indexes to cover various !filter !order. Join & sorting cost on lazy filtering.

The more indexes you have in the system, the more you need to increase the main memory of the server. Keep the Btree’s in memory.

What about denormalized schemas? Not really CPU intensive, just IO. Can go to disk, full partition scan with filtering taking place on record order using covering index. Can shard but not that easy. Use the spider storage engine or shard-query. Can use memory engine for isolation. There are cons like duplicate data, duplicate indexes, missing material views, merge index cost, impact on write performance, and can consume a lot of memory with many indexes.

MySQL can push hardware, so read less/do less/read serialized/map reduce to get better latency. Chose data type wisely, replace string with numeric, vertical & horizontal sharing, snowflake compression (combination of attributes, build a table of the combination and replace it with an ID). If you are lazy, just use Sphinx!

Sphinx is just another daemon that can serve queries. Its easy to setup, easy to scale, storage engine makes it accessible to current MySQL users, API in core MariaDB (SphinxSE), SphinxQL, SphinxSE is transparent to the application layer of MySQL protocol.

Demo done using the Employees DB.

Pierre & Vacances – Centerparcs. Free text search, they use MariaDB using Levenshtein UDF implementation. Went live 01/2011. First implementation of Sphinx (12 indexes). Its grown, they use PHP API. The new goal is to never send an empty result. 1 index per website/market, with a total of 15 million docs. Index built on standalone server. Using internal gearman job schedule to generate index before cache generated. Current monitoring is via Nagios & perl, but the next step is to use Monyog & MariaDB INFORMATION_SCHEMA plugin.

Delcampe is an auction website with 45M ‘active’ items. Its dedicated for collectors. 3 string fields, and 15 attributes. 40-120K new items daily. Started with mysql fulltext in 2007, moved to Sphinx in 2008. There was a need to have more filters. Now they have 5 sphinx servers + 1 MySQL server. HAproxy to load balance.

New MySQL 5.6 Features by Oli Sennhauser

First talk at FOSDEM MySQL Devroom by Oli Sennhauser, of FromDual. Quick notes/liveblog of the talk, plus links from a quick search.

New Release Model: starts with at least in beta quality, milestone releases are RC quality (every 3-6 months), between milestones new features are allowed, GA releases every 12-18 months, no more than 2 releases in active support. There is also MySQL Labs, in where features can make it into a release model, but not necessarily.

Oli’s guess: MySQL 5.6 GA in April 2012 (Collaborate) or June 2012. As a consequence, MySQL 5.0 and MySQL 5.1 will be EOL probably by April this year.

New improvements in partitioning: explicit partition selection, exchanging partitions (good for ETL jobs).

New improvements in InnoDB: InnoDB INFORMATION_SCHEMA has got some new entries. Buffer (INNODB_BUFFER), Fulltext (INNODB_FT), Metrics (INNODB_METRICS), Data Dictionary (INNODB_SYS). The InnoDB Monitor is now obsolete. innodb_purge_threads can be set > 1. Kernel mutex split which should lead to improved concurrency. Persistent optimizer statistics (since InnoDB used to do random dives, so when you do a mysqldump, things can be different) – you can do SET GLOBAL innodb_analyze_is_persistent = 1; (read more: InnoDB Persistent Statistics at last, InnoDB Persistent Statistics Save the Day).

New improvements in optimizer: ORDER by on non-indexed columns, Multi Range Read (MRR), Index Condition Pushdown (ICP), query execution plan for DML statements, Batched Key Access (BKA). There is also optimizer tracing.

There is now some new instrumentations in Performance Schema (introduced in MySQL 5.5). There is also improved replication features, but that will be covered in a later talk.

Fractional seconds (microseconds) is introduced in MySQL 5.6. GET DIAGNOSTICS for stored procedures. Pluggable authentication (socket). memcached/InnoDB still in labs, hasn’t made it to a milestone release. Some 400+ bugs have been fixed. Lots of cleanup’s (old unused variables, commands removed).

Pay attention to MySQL 5.6 incompatible changes when you plan to upgrade.

Probably some good resources: What’s New in MySQL 5.6, MySQL 5.6 (Early Access Features) – InnoDB & Replication.