Posts Tagged ‘storage engine’

MariaDB 5.5 has deprecated PBXT

One of the things we (Team MariaDB) talked quite a bit about since we released was PBXT. It was a feature differentiation to MySQL as we shipped another storage engine. It was included in MariaDB 5.1, 5.2, and 5.3; however with our release of MariaDB 5.5, PBXT (docs in the Knowledgebase) has been deprecated and not built by default any longer.

The reason behind it is clear: PBXT is currently not under active development. We still include it in the source releases and if you would like to use it, you just have to build it. If and when development around it comes back to an active state with bugs being fixed and the engine being pushed forward, I’m sure we’ll start building it again. In the meantime, much thanks to Paul McCullagh for developing a great transactional engine.

Abusing MySQL (& thoughts on NoSQL)

The NoSQL/relational database debate has been going on for quite some time. MariaDB, like MySQL is relational. And if you read these series of blog posts, you’ll realise that if you use MySQL correctly, you can achieve quite a lot.

  1. It all starts with Kellan Elliott-McCrea with his introductory post on Using, Abusing and Scaling MySQL at Flickr. Follow the entire series.
  2. He starts of the series with Ticket Servers: Distributed Unique Primary Keys on the Cheap. Flickr scales using shards, and ticket servers give unique integers to serve as PKs.
  3. Richard Crowley talks about OpenDNS MySQL abuses. Nothing too out of the ordinary, but it shows MySQL getting the job done.
  4. Mikhail Panchenko talks about using The Federated engine for his series.

If you’re using the Federated engine, know that MySQL disables FEDERATED by default. In MariaDB 5.1.42, you get FederatedX, which is a maintained fork of FEDERATED, by the author himself! Bugs are fixed, and this is a supported engine, so if you’re using the FEDERATED engine, it might be wise to try out FederatedX.

I’d also like to bring to attention, an interesting essay by Dennis Forbes: Getting Real about NoSQL and the SQL-Isn’t-Scalable Lie. Monty says: “NoSQL is for very smart people who need a very sharp knife. People who are not capable of mastering SQL should not even attempt to try out NoSQL.”

Sharding for the masses: Introducing the SPIDER storage engine (OpenSQLCamp @ FrOSCon)

This is the Sharding for the masses: Introducing the SPIDER storage engine by Giuseppe Maxia, given at OpenSQLCamp, at FrOSCon, in August 2009. These are somewhat live notes, and the slides are available too.

Why sharding? Scaling, of course. The MySQL way to solve this, is replication (even Yahoo! and Google use this).

When the master doesn’t have enough resources to cope with what you do (i.e. large data sets), replication chokes.

You can use proxies for sharding. There exists MySQL Proxy (can be programmed using a scripting language – Lua), HSCALE (built on top of MySQL Proxy), SpockProxy (a fork of MySQL Proxy, without LUA scripting, specialised for sharding), in the market these days. This however, is the single point of failure – everything has to pass through one proxy.

Enter SPIDER – a MySQL storage engine, built on top of the partitions engine. It associates a partition with a remote server, and is transparent to the user. Its developed by Kentoku Shiba.

Installation: Get 5.1.37 sources, then get the source code for Spider 1.0, and then get the patch for condition pushdown.

Why the condition pushdown patch? Remote server works less, by receiving the condition. The SPIDER engine without the condition pushdown patch is still fast, but it can be more than 10x faster with condition pushdowns.

http://dev.mysql.com/doc/refman/5.1/en/condition-pushdown-optimization.html (works with NDBCLUSTER), http://dev.mysql.com/doc/refman/5.4/en/condition-pushdown-optimization.html (works with MyISAM). The patch by Kentoku, will add cond_push and cond_pop, to ha_partition – so now, every storage engine that uses table partitioning can get condition pushdown through ha_partition.

You need to setup the engine first: http://datacharmer.org/downloads/spider_setup.sql (the SQL is also available in the DOCS).

spider_remote_employees.sql – use this in conjunction with http://launchpad.net/test-db/ – a good example of how to use the SPIDER storage engine.

RethinkDB all the rage today

RethinkDB is all the rage today, as its a Y Combinator funded startup, which also launched a developer pre-alpha today. So what is RethinkDB you ask? Yet-another-MySQL-storage-engine, that’s what. But this time, its tuned for solid-state drives (SSDs), which also happen to be all the rage these days.

Anyway, check them out more, and the materials currently tell me that they’re using append-only algorithms, which allow for live schema changes and hot backups, with instantaneous recovery from power failure. Those are just some of the exciting bits.

What didn’t excite me so much was the fact that you were only getting 32-bit or 64-bit Linux binaries, built against MySQL 5.1.31 and you’ll just install it via the INSTALL PLUGIN option. But they are trying to get some semblance of a community growing, with their getting involved page, filled with some papers, as well as a support mailing list (I see Mark Callaghan is already busy asking them questions). And of course you can follow them on their blog, or on Twitter. All this without source ;-)

One of the developers also confirmed that they’re adding “features required by WordPress so we could eat our own dogfood”. They haven’t started profiling (much yet?), and they’ve probably got ways to go on performance. Seems like “getting it working for WordPress”, is slowly becoming a good testing ground – Jeff Waugh did so for WordPress and Drizzle, too.

Anyway, it seems like its time to get some SSDs, as we start seeing things like this pop up. RethinkDB will also face another problem for mass adoption – how many hosting providers are using SSDs? Probably not many (if at all).

Have you tried RethinkDB? Your thoughts?

Its a storage engine world, after all…

While Zack covered the storage engine and appliances sessions pretty well, I feel he’s missed out on a few important new engines (or engine related talks):

Lots of storage engine talks, no? Well, there are even related tutorials, so if storage engines catch your fancy, check out the storage engine talks in the schedule.

A few other picks: Monty talking about Maria will definitely be a crowd puller, as well as Kevin Lewis talking about the Falcon storage engine. You might also find that the architecture of ScaleDB interesting. And don’t forget the myriad of talks that are InnoDB related. Read: People are Talkin’ … about InnoDB, Talk,Talk, Talk: Innobase Speaks, and … and Who Could Forget Mark Callaghan?.

Posterous and FriendFeed talk infrastructure

A couple interesting things coming out of startup land.

For one, Posterous has a little writeup on Building and Scaling a Startup on Rails: 12 Things We Learned the Hard Way. Good things to take away include using Sphinx/Solr for search, but the real important takeaway for the MySQL crowd is Storage engine matters, and you should probably use InnoDB. If you’re writing an application, know your storage engines. There are also bits to tell you how to use query_viewer and New Relic to help you fix database bottlenecks, use memcached later, and more. Its a great read.

Next up, there’s How FriendFeed uses MySQL to store schema-less data. I hope Bret from FriendFeed writes more on their infrastructure over time. Its interesting to see that they thought of going the CouchDB route, but never saw it as “proven” technology (in comparison to MySQL).


i