Archive for the ‘MySQL’ Category

Twitter, Facebook MySQL trees online – pushing MySQL forward

Just yesterday, I’m sure many saw Twitter opensourcing their MySQL implementation. It is based on MySQL 5.5 and the code is on Github.

For reference, the database team at Facebook has always been actively blogging, and keeping up their code available on Launchpad. Its worth noting that the implementation there is based on MySQL 5.0.84 and 5.1.

At Twitter, most of everything persistent is stored in MySQL – interest graphs, timelines, user data and those precious tweets themselves! At Facebook, its pretty similar – all user interactions like likes, shares, status updates, requests, etc. are all stored in MySQL (ref).

The media has picked up on it too. A fairly misinformed piece on GigaOm (MySQL has problems focused on Stonebrakers fate worst than death? Pfft. Facebook wants to move its code to github? Read the reasoning — its spam handling on LP.), and a shorter piece on CNET.

Both Twitter and Facebook code trees mention that its what they use in their environments, but it’s not supported in any way, shape or form. Facebook recommends Percona Server or MariaDB. Facebook also has tools like online schema change in the repository, amongst others like prefetching tools written in Python.

I haven’t had the chance to play with the Twitter release yet, but it looks like this can only push Percona Server and MariaDB forward. Based on 5.5, some of these BSD-licensed features can make it in, and some have already made it in I’m sure. And what pushes these servers, will push MySQL forward (see lots of new features in MySQL 5.6).

On a personal note, it is amazing to see some MySQL-alumni push this forward. At Twitter, there’s Jeremy Cole and Davi Arnaut. At Facebook, the team includes Domas Mituzas, Harrison Fisk, Yoshinori Matsunobu, Lachlan Mulcahy. Nothing would be complete without mentioning Mark Callaghan (though not-MySQL alumni, active MySQL community member) who led a MySQL team at Google, and now at Facebook.

Paybox Services and seeing MariaDB in use

paybox servicesWhen I was at MySQL, I took for granted that pretty much every website I used had software at the back of it that was basically MySQL. It was a nice feeling. MariaDB is a lot younger, so when I was in Paris and had to make a payment for the taxi I was sitting in, I smiled a little when I saw that Paybox Services was processing my transaction. Some might recall that Paybox Services deployed MariaDB, since the 5.2 release. It was a wonderful feeling that somewhere in that transaction, MariaDB was behind it!

Paybox wanted some features inside of MySQL 5.5 and have been holding out for MariaDB 5.5. Its kind of nice to see that today, MariaDB 5.5.22 has been released as a release candidate. It is only a matter of time before Paybox can benefit from things they’ve wanted like semi-sync replication.

More MariaDB after Percona Live Santa Clara

Right after Percona Live Santa Clara (which MariaDB is quite present for), its worth noting there are a few more events happening on Friday, April 13, 2012 at the Hyatt Regency Santa Clara. MariaDB will be present at 2/3 of those events.

The one event MariaDB won’t be present at is the Drizzle Day. That doesn’t mean you can’t attend talks across all these events though — the schedules are synced so you can move freely across events. Friday in itself is like a mini-conf, because it seems like there will be four simultaneous tracks happening.

MariaDB at Percona Live Santa Clara

I for one can say that I’m truly excited that MariaDB will be part of Percona Live Santa Clara. The MariaDB session list includes:

  • A tutorial: Improving MySQL/MariaDB query performance through optimizer tuning by Timour Katchaounov and Sergey Petrunia. You can benefit from this even as a stock MySQL user naturally.
  • MySQL Plugins – why should I bother? by Sergei Golubchik is again something that isn’t only MariaDB specific since MySQL also has a plugin architecture. I don’t know if Sergei will talk about authentication plugins as well, but I can imagine this is on the mind of many people.
  • MariaDB: The 2012 edition by Colin Charles – well, this is me. You might wonder what we’ve been doing since mid-2009 having only made our first release in early-2010. We’ve achieved quite a lot in the project, so come find out why you might consider MariaDB to your alternative to MySQL. We may even have a surprise in the way of some new announcements here.
  • MySQL Optimizer Standoff MySQL 5.6 and MariaDB 5.3 by Peter Zaitsev is focused on the optimizer features between MySQL and MariaDB. MariaDB 5.3 as you might know features a lot of optimizer improvements.

The BoFs schedule has just been released, and on Wednesday evening from 9pm onwards, you can come visit Ballroom A to discuss MariaDB, help with the future roadmap as well as be filled with lots of salmiakkikossu.

We have already confirmed our acceptance of having a DotOrg booth and can’t wait till that information makes it up on the Internet.

All in all, I’m happy to be attending and speaking in Santa Clara yet again in the April timeframe. This has become somewhat of a yearly ritual. Make sure you register, and don’t forget to do a little searching to find some promo codes!

Managing MySQL with Percona Toolkit by Frédéric Descamps

Frédéric Descamps of Percona.

Percona Toolkit is Maatkit & Aspersa combined. Opensource and the tools are very useful for a DBA.

You need Perl, DBI, DBD::mysql, Term::ReadKey. Most tools are written in Perl, and whatever is in Bash is being re-written in Perl. There is also a tarball or RPM or DEB packages.

Know your environment. The hardware & OS are crucial for you to know. How much memory/CPU do you use? Do you use swap? Is this a physical/virtual machine? Do you have free space? What kind of RAID controller? Volumes? Disk? What about the network interfaces? What IO schedulers are used? Which filesystem is the data stored on? To answer all that, just use pt-summary.

Know your MySQL environment. Version? Build? How many databases? Where is the data directory? What about replication? What are key InnoDB settings? Storage engine in use? Index type? Foreign keys? Full text indexes? To answer all this and more use pt-mysql-summary.

pt-slave-find shows you the topology and replication hierarchy of your MySQL replication instances. An inventory of replicas!

Where is my disk I/O going? Use pt-diskstats which is an improved iostat. There is pt-ioprofile but it can be dangerous in production.

Now its time to get more intimate with your database. Let’s try to find the answer to these questions: how are the indexes used? Are there duplicate keys? Which queries are eating most of the resources? You can use pt-duplicate-key-checker to check for duplicate/redundant indexes or foreign keys. pt-index-usage can tell you which indexes are unused. If you think you have bad SQL, check out pt-query-advisor.

You can use pt-query-digest to analyze the slow query log and show a profile of the workload. You mostly use this with slow query logs & tcpdump’s. Be careful when you have dropped packets — results may tend to be fake then!

After all this, its time to maintain your environment.

pt-deadlock-logger checks InnoDB status to log MySQL deadlock information. It needs to run continually to capture things.

pt-fk-error-logger extracts and logs MySQL foreign key errors.

pt-online-schema-change to alter tables. It makes a “shadow copy” and swaps them. Extremely useful for large, long-running ALTER. Facebook uses the same technique.

Validate your upgrades as upgrades are the leading cause of downtime. Are queries using different indexes? Is query execution plan different? New errors? See pt-upgrade for this. Best to run this on a third machine (i.e. the old machine and a new machine to see how it goes).

Verify replication integrity – pt-table-checksum. Perform an online replication consistency check or checksum MySQL tables efficiently on one or many servers. Use it routinely (mandatory for 95% of MySQL users). Put it in a weekly crontab. Repair differences with pt-table-sync.

Repair out-of-sync replicas – pt-table-sync

Measure delay acfurately – pt-heartbeat

Deliberately delay replication – pt-slave-delay

Watch & restart MySQL replication after errors – pt-slave-restart

When there are problems, get the symptoms when it hurts. Look at pt-stalk (wait for a condition to occur them begin collecting data – eg. everytime the threads go over 2,000 you have a problem, so it collects stuff – it calls pt-collect), pt-collect (collect information from a server for some period of time), and pt-sift.

pt-mext looks at many samples of MySQL SHOW GLOBAL STATUS side-by-side. Default STATUS shows counter since the MySQL instances started. It is very helpful to see a delta of recent activity.

The future: pt-query-digest will do query reviews; pt-stalk will do “magical fault detection algorithm”. Its all opensource and its all on Launchpad at lp:percona-toolkit.

Replication features of 2011 by Sergey Petrunia

Sergey Petrunia of the MariaDB project & Monty Program.

MySQL 5.5 GA at the end of 2010. MariaDB 5.3 RC towards the end of 2011 (beta in June 2011).

MySQL 5.5 is merged to Percona Server 5.5 which included semi-sync replication, slave fsync options, atuomatic relay log recovery, RBR slave type conversions (question if this is useful or not), individual log flushing (very useful, but not many using), replication heartbeat, SHOW RELAYLOG EVENTS. About 2/3rds of the audience use MySQL 5.5 in production, with only 2 people using semi-sync replication.

MariaDB 5.3 brings replication features brings group commit in the binary log, which is merged into Percona Server 5.5. Checksums for binlog events which is merged from MySQL 5.6. Sergey goes in-depth about the group commit for the binary log. To find out a little more about MariaDB replication changes, see Replication in the Knowledgebase.

There are several implementations of group commit. Facebook started it, followed by MariaDB & Oracle. Percona 5.5 is GA so the feature is there, its not in MySQL 5.6 (yet?), and MariaDB 5.3 is where its at. Seems like the MariaDB implementation is the best so far – refer to the Facebook benchmark performed by Mark Callaghan.

Annotated RBR poses a compatibility problem. MariaDB 5.3 has annotate_rows, while MySQL 5.6 has rows_query event. They are different events. So you cannot have a MariaDB 5.3 master and a MySQL 5.6 slave at this moment. So MySQL 5.6 will have a flag to mark “ignorable” binlog events which will be merged into MariaDB and this will make binary logs compatible again.

There is now also optimized RBR for tables with no primary key.

MySQL 5.6 also has crash-safe slave (replication information stored in tables). Crash-safe master (binary log recovery if the server starts & sees the binary log is corrupted). Parallel event execution is something that is new in MySQL 5.6 which is the most important feature for Sergey.

Pre-heating: There is mk-slave-prefetch (famous quote: “Please don’t use mk-slave-prefetch on #MySQL unless you are Facebook.”). There is replication booster by Yoshinori Matsunobu. There is a Python version of mk-slave-prefetch that Facebook uses.


i