Posts Tagged ‘rocksdb’

Tab Sweep – MySQL ecosystem edition

Tab housekeeping but I also realise that people seem to have missed announcements, developments, etc. that have happened in the last couple of months (and boy have they been exciting). I think we definitely need something like the now-defunct MySQL Newsletter (and no, DB Weekly or NoSQL Weekly just don’t seem to cut it for me!).


During @scale (August 31), Yoshinori Matsunobu mentioned that MyRocks has been deployed in one region for 5% of its production workload at Facebook.

By October 4 at the Percona Live Amsterdam 2016 event, Percona CEO Peter Zaitsev said that MyRocks is coming to Percona Server (blog). On October 6, it was also announced that MyRocks is coming to MariaDB Server 10.2 (note I created MDEV-9658 back in February 2016, and that’s a great place to follow Sergei Petrunia’s progress!).

Rick Pizzi talks about MyRocks: migrating a large MySQL dataset from InnoDB to RocksDB to reduce footprint. His blog also has other thoughts on MyRocks and InnoDB.

Of course, checkout the new site for all things MyRocks! It has a getting started guide amongst other things.

Proxies: MariaDB MaxScale, ProxySQL

With MariaDB MaxScale 2.0 being relicensed under the Business Source License (from GPLv2), almost immediately there was a GPLScale fork; however I think the more interesting/sustainable fork comes in the form of AirBnB MaxScale (GPLv2 licensed). You can read more about it at their introductory post, Unlocking Horizontal Scalability in Our Web Serving Tier.

ProxySQL has a new webpage, a pretty active mailing list, and its the GPLv2 solution by DBAs for DBAs.


Vitess 2.0 has been out for a bit, and a good guide is the talk at Percona Live Amsterdam 2016, Launching Vitess: How to run YouTube’s MySQL sharding engine. It is still insanely easy to get going (if you have a credit card), at their site.

FOSDEM 2016 notes

While being on the committee for the FOSDEM MySQL & friends devroom, I didn’t speak at that devroom (instead I spoke at the distributions devroom). But when I had time to pop in, I did take some notes on sessions that were interesting to me, so here are the notes. I really did enjoy Yoshinori Matsunobu’s session (out of the devroom) on RocksDB and MyRocks and I highly recommend you to watch the video as the notes can’t be very complete without the great explanation available in the slide deck. Anyway there are videos from the MySQL and friends devroom.

MySQL & Friends Devroom

MySQL Group Replication or how good theory gets into better practice – Tiago Jorge

  • Multi-master update everywhere with built-in automatic distributed recovery, conflict detection and group membership
  • Group replication added 3 PERFORMANCE_SCHEMA tables
  • If a server leaves the group, the others will be automatically informed (either via a crash or if you execute STOP GROUP REPLICATION)
  • Cloud friendly, and it is self-healing. Integrated with server core via a well-defined API. GTIDs, row-based replication, PERFORMANCE_SCHEMA. Works with MySQL Router as well.
  • Multi-master update everywhere. Conflicts will be detected and dealt with, via the first committer wins rule. Any 2 transactions on different servers can write to the same tuple.
  • /
  • Q: When a node leaves a group, will it still accept writes? A: If you leave voluntarily, it can still accept writes as a regular MySQL server (this needs to be checked)
  • Online DDL is not supported
  • Checkout the video

ANALYZE for statements – Sergei Petrunia

  • a lot like EXPLAIN ANALYZE (in PostgreSQL) or PLAN_STATISTICS (in Oracle)
  • Looks like explain output with execution statistics
  • slides and video

Preparse Query Rewrite Plugins – Sveta Smirnova / Martin Hansson

  • Query rewwriting with a proxy might be too complex, so they thought of doing it inside the server. There is a pre-parse (string-to-string) and a post-parse (parse tree) API. Pre-parse: low overhead, but no structure. Post-parse: retains structure, but requires re-parsing (no destructive editing), need to traverse parse tree and will only work on select statements
  • Query rewrite API builds on top of teh Audit API, and then you’ve got the pre-parse/post-parse APIs on the top that call out to the plugins
  • video

Fedora by the Numbers – Remy DeCausemaker

MyRocks: RocksDB Storage Engine for MySQL (LSM Databases at Facebook) – Yoshinori Matsunobu

  • SSD/Flash is getting affordable but MLC Flash is still expensive. HDD has large capacity but limited IOPS (reducing rw IOPS is very important and reducing write is harder). SSD/Flash has great read iops but limited space and write endurance (reducing space here is higher priority)
  • Punch hole compression in 5.7, it is aligned to the sector size of your device. Flash device is basically 4KB. Not 512 bytes. So you’re basically wasting a lot of space and the compression is inefficient
  • LSM tends to have a read penalty compared to B-Tree, like InnoDB. So a good way to reduce the read penalty is to use a Bloom Filter (check key may exist or not without reading data, and skipping read i/o if it definitely does not exist)
  • Another penalty is for delete. It puts them into tombstones. So there is the workaround called SingleDelete.
  • LSMs are ideal for write heavy applications
  • Similar features as InnoDB, transactions: atomicity, MVCC/non-locking consistent read, read committed repeatable read (PostgreSQL-style), Crash safe slave and master. It also has online backup (logical backup by mysqldump and binary backup by myrocks_hotbackup).
  • Much smaller space and write amplification compared to InnoDB
  • Reverse order index (Reverse Column Family). SingleDelete. Prefix bloom filter. Mem-comparable keys when using case sensitive collations. Optimizer statistics for diving into pages.
  • RocksDB is great for scanning forward but ORDER BY DESC queries are slow, hence they use reverse column families to make descending scan a lot faster
  • watch the video

(tweet) Summary of Percona Live 2015

The problem with Twitter is that we talk about something and before you know it, people forget. (e.g. does WebScaleSQL have an async client library?) How many blog posts are there about Percona Live Santa Clara 2015? This time (2016), I’m going to endeavour to write more than to just tweet – I want to remember this stuff, and search archives (and also note the changes that happen in this ecosystem). And maybe you do too as well. So look forward to more blogs from Percona Live Data Performance Conference 2016. In the meantime, here’s tweets in chronological order from my Twitter search.

  • crowd filling up the keynote room for #perconalive
  • beginning shortly, we’ll see @peterzaitsev at #perconalive doing his keynote
  • #perconalive has over 1,200 attendees – oracle has 20 folk, with 22 folk from facebook
  • #perconalive is going to be in Amsterdam sept 21-22 2015 (not in London this year). And in 2015, April 18-21 2016!
  • We have @PeterZaitsev on stage now at #perconalive
  • 5 of the 5 top websites are powered by MySQL – an Oracle ad – alexa rankings? #perconalive
  • now we have Harrison Fisk on ployglot persistence at facebook #perconalive
  • make it work / make it fast / make it efficient – the facebook hacker way #perconalive
  • a lot of FB innovation goes into having large data sizes with short query time response #perconalive
  • “small data” to facebook? 10’s of petabytes with <5ms response times. and yes, this all sits in mysql #perconalive
  • messages eventually lands in hbase for long term storage for disk #perconalive they like it for LSM
  • Harrison introduces @RocksDB to be fast for memory/flash/disk, and its also LSM based. Goto choice for 100’s of services @ FB #perconalive
  • Facebook Newsfeed is pulled from RocksDB. 9 billion QPS at peak! #perconalive
  • Presto works all in memory on a streaming basis, whereas Hive uses map/reduce. Queries are much faster in Presto #perconalive
  • Scuba isn’t opensource – real time analysis tool to debug/understand whats going on @ FB. … #perconalive
  • InnoDB as a read-optimized store and RocksDB as a write-optimized store — so RocksDB as storage engine for MySQL #perconalive
  • Presto + MySQL shards is something else FB is focused on – in production @ FB #perconalive
  • loving the woz keynote @ #perconalive – wondering if like apple keynotes, we’ll see a “one more thing” after this ;)
  • “i’m only a genius at one thing: that’s making people think i’m a genius” — steve wozniak #perconalive
  • Happiness = Smiles – Frowns (H=S-F) & Happiness = Food, Fun, Friends (H=F³) Woz’s philosophy on being happy + having fun daily #perconalive
  • .@Percona has acquired @Tokutek in a move that provides some consolidation in the MySQL database market and takes..
  • MySQL Percona snaps up Tokutek to move onto MongoDB and NoSQL turf by @wolpe
  • One more thing – congrats @percona @peterzaitsev #perconalive Percona has acquired Tokutek with storage engines for MySQL & MongoDB – @PeterZaitsev #perconalive
  • Percona is now a player in the MongoDB space with TokuMX! #perconalive
  • The tokumx mongodb logo is a mongoose… #perconalive Percona will continue to support TokuDB/TokuMX to customers + new investments in it
  • @Percona “the company driving MySQL today” and “the brains behind MySQL”. New marketing angle? …
  • We have Steaphan Greene from @facebook talk about @WebScaleSQL at #perconalive
  • what is @webscalesql? its a collaboration between Alibaba, Facebook, Google, LinkedIn, and Twitter to hack on mysql #perconalive
  • close collaboration with @mariadb @mysql @percona teams on @webscalesql. today? upstream 5.6.24 today #perconalive
  • whats new in @WebScaleSQL ? asynchronous mysql client, with support from within HHVM, from FB & LinkedIn #perconalive
  • smaller @webscalesql change (w/big difference) – lower innodb buffer pool memory footprint from FB & Google #perconalive
  • reduce double-write mode while still preserving safety. query throttling, server side statement timeouts, threadpooling #perconalive
  • logical readahead to make full table scans as much as 10x fast. @WebScaleSQL #perconalive
  • whats coming to @WebScaleSQL – online innodb defragmentation, DocStore (JSON style document database using mysql) #perconalive
  • MySQL & RocksDB coming to @WebScaleSQL thanks to facebook & @MariaDB #perconalive
  • So, @webscalesql will skip 5.7 – they will backport interesting features into the 5.6 branch! #perconalive
  • likely what will be next to @webscalesql ? will be mysql-5.8, but can’t push major changes upstream. so might not be an option #perconalive
  • Why only minor changes from @WebScaleSQL to @MySQL upstream? #perconalive
  • Only thing not solved with @webscalesql & upstream @mysql – the Contributor license agreement #perconalive
  • All @WebScaleSQL features under Apache CCLA if oracle can accept it. Same with @MariaDB @percona #perconalive
  • Steaphan Greene says tell Oracle you want @webscalesql features in @mysql. Pressure in public to use the Apache CLA! #perconalive
  • We now have Patrik Sallner CEO from @MariaDB doing the #perconalive keynote ==> 1+1 > 2 (the power of collaboration)
  • “contributors make mariadb” – patrik sallner #perconalive
  • Patrik Sallner tells the story about the CONNECT storage engine and how the retired Olivier Bertrand writes it #perconalive
  • Google contributes table/tablespace encryption to @MariaDB 10.1 #perconalive
  • Patrik talks about the threadpool – how #MariaDB made it, #Percona improved it, and all benefit from opensource development #perconalive
  • and now we have Tomas Ulin from @mysql @oracle for his #perconalive keynote
  • 20 years of MySQL. 10 years of Oracle stewardship of InnoDB. 5 years of Oracle stewardship of @MySQL #perconalive
  • Tomas Ulin on the @mysql 5.7 release candidate. It’s gonna be a great release. Congrats Team #MySQL #perconalive
  • MySQL 5.7 has new optimizer hint frameworks. New cost based optimiser. Generated (virtual) columns. EXPLAIN for running thread #perconalive
  • MySQL 5.7 comes with the query rewrite plugin (pre/post parse). Good for ORMs. “Eliminates many legacy use cases for proxies” #perconalive
  • MySQL 5.7 – native JSON datatypes, built-in JSON functions, JSON comparator, indexing of documents using generated columns #perconalive
  • InnoDB has native full-text search including full CJK support. Does anyone know how FTS compares to MyISAM in speed? #perconalive
  • MySQL 5.7 group replication is unlikely to make it into 5.7 GA. Designed as a plugin #perconalive
  • Robert Hodges believes more enterprises will use MySQL thanks to the encryption features (great news for @mariadb) #perconalive
  • Domas on FB Messenger powered by MySQL. Goals: response time, reliability, and consistency for mobile messaging #perconalive
  • FB Messenger: Iris (in-memory pub-sub service – like a queue with cache semantics). And MySQL as persistence layer #perconalive
  • FB focuses on tiered storage: minutes (in memory), days (flash) and longterm (on disks). #perconalive
  • Gotta keep I/O devices for 4-5 years, so don’t waste endurance capacity of device (so you don’t write as fast as a benchmark) #perconalive
  • Why MySQL+InnoDB? B-Tree: cheap overwrites, I/O has high perf on flash, its also quick and proven @ FB #perconalive
  • What did FB face as issues to address with MySQL? Write throughput. Asynchronous replication. and Failover time. #perconalive
  • HA at Facebook: <30s failover, <1s switchover, > 99.999% query success rate
  • Learning a lot about LSM databases at Facebook from Yoshinori Matsunobu – check out @rocksdb + MyRocks …
  • The #mysqlawards 2015 winners #PerconaLive
  • Percona has a Customer Advisory Board now – Rob Young #perconalive
  • craigslist: mysql for active, mongodb for archives. online alter took long. that’s why @mariadb has … #perconalive
  • can’t quite believe @percona is using db-engines rankings in a keynote… le sigh #perconalive
  • “Innovation distinguishes between a leader and a follower” – Steve Jobs #perconalive
  • Percona TokuDB: “only alternative to MySQL + InnoDB” #perconalive
  • “Now that we have the rights to TokuDB, we can add all the cool features ontop of Percona XtraDB Cluster (PXC)” – Rob Young #perconalive
  • New Percona Cloud Tools. Try it out. Helps remote DBA/support too. Wonder what the folk at VividCortex are thinking about now #perconalive
  • So @MariaDB isn’t production ready FOSS? I guess 3/6 top sites on Alexa rank must disagree #perconalive
  • Enjoying Encrypting MySQL data at Google by @jeremycole & Jonas — you can try this in @mariadb 10.1.4 … #perconalive
  • google encryption: mariadb uses the api to have a plugin to store the keys locally; but you really need a key management server #perconalive
  • Google encryption: temporary tables during query execution for the Aria storage engine in #MariaDB #perconalive
  • find out more about google mysql encryption — or just use it at 10.1.4! #perconalive
  • Encrypting MySQL data at Google – Percona Live 2015 #perconalive
  • The @WebScaleSQL goals are still just to provide access to the code, as opposed to supporting it or making releases #perconalive
  • There is a reason DocStore & Oracle/MySQL JSON 5.7 – they were designed together. But @WebScaleSQL goes forward with DocStore #perconalive
  • So @WebScaleSQL will skip 5.7, and backport things like live resize of the InnoDB buffer pool #perconalive
  • How to view @WebScaleSQL? Default GitHub branch is the active one. Ignore -clean branches, just reference for rebase #perconalive
  • All info you need should be in the commit messages @WebScaleSQL #perconalive
  • Phabricator is what @WebScaleSQL uses as a code review system. All diffs are public, anyone can follow reviews #perconalive
  • automated testing with jenkins/phabricator for @WebScaleSQL – run mtr on ever commit, proposed diffs, & every night #perconalive
  • There is feature documentation, and its a work in progress for @WebScaleSQL. Tells you where its included, etc. #perconalive
  • Checked out the new ANALYZE statement feature in #MariaDB to analyze JOINs? Sergei Petrunia tells all #perconalive …