Posts Tagged ‘storage engine’

Sharding for the masses: Introducing the SPIDER storage engine (OpenSQLCamp @ FrOSCon)

This is the Sharding for the masses: Introducing the SPIDER storage engine by Giuseppe Maxia, given at OpenSQLCamp, at FrOSCon, in August 2009. These are somewhat live notes, and the slides are available too.

Why sharding? Scaling, of course. The MySQL way to solve this, is replication (even Yahoo! and Google use this).

When the master doesn’t have enough resources to cope with what you do (i.e. large data sets), replication chokes.

You can use proxies for sharding. There exists MySQL Proxy (can be programmed using a scripting language – Lua), HSCALE (built on top of MySQL Proxy), SpockProxy (a fork of MySQL Proxy, without LUA scripting, specialised for sharding), in the market these days. This however, is the single point of failure – everything has to pass through one proxy.

Enter SPIDER – a MySQL storage engine, built on top of the partitions engine. It associates a partition with a remote server, and is transparent to the user. Its developed by Kentoku Shiba.

Installation: Get 5.1.37 sources, then get the source code for Spider 1.0, and then get the patch for condition pushdown.

Why the condition pushdown patch? Remote server works less, by receiving the condition. The SPIDER engine without the condition pushdown patch is still fast, but it can be more than 10x faster with condition pushdowns. (works with NDBCLUSTER), (works with MyISAM). The patch by Kentoku, will add cond_push and cond_pop, to ha_partition – so now, every storage engine that uses table partitioning can get condition pushdown through ha_partition.

You need to setup the engine first: (the SQL is also available in the DOCS).

spider_remote_employees.sql – use this in conjunction with – a good example of how to use the SPIDER storage engine.

RethinkDB all the rage today

RethinkDB is all the rage today, as its a Y Combinator funded startup, which also launched a developer pre-alpha today. So what is RethinkDB you ask? Yet-another-MySQL-storage-engine, that’s what. But this time, its tuned for solid-state drives (SSDs), which also happen to be all the rage these days.

Anyway, check them out more, and the materials currently tell me that they’re using append-only algorithms, which allow for live schema changes and hot backups, with instantaneous recovery from power failure. Those are just some of the exciting bits.

What didn’t excite me so much was the fact that you were only getting 32-bit or 64-bit Linux binaries, built against MySQL 5.1.31 and you’ll just install it via the INSTALL PLUGIN option. But they are trying to get some semblance of a community growing, with their getting involved page, filled with some papers, as well as a support mailing list (I see Mark Callaghan is already busy asking them questions). And of course you can follow them on their blog, or on Twitter. All this without source ;-)

One of the developers also confirmed that they’re adding “features required by WordPress so we could eat our own dogfood”. They haven’t started profiling (much yet?), and they’ve probably got ways to go on performance. Seems like “getting it working for WordPress”, is slowly becoming a good testing ground – Jeff Waugh did so for WordPress and Drizzle, too.

Anyway, it seems like its time to get some SSDs, as we start seeing things like this pop up. RethinkDB will also face another problem for mass adoption – how many hosting providers are using SSDs? Probably not many (if at all).

Have you tried RethinkDB? Your thoughts?

Its a storage engine world, after all…

While Zack covered the storage engine and appliances sessions pretty well, I feel he’s missed out on a few important new engines (or engine related talks):

Lots of storage engine talks, no? Well, there are even related tutorials, so if storage engines catch your fancy, check out the storage engine talks in the schedule.

A few other picks: Monty talking about Maria will definitely be a crowd puller, as well as Kevin Lewis talking about the Falcon storage engine. You might also find that the architecture of ScaleDB interesting. And don’t forget the myriad of talks that are InnoDB related. Read: People are Talkin’ … about InnoDB, Talk,Talk, Talk: Innobase Speaks, and … and Who Could Forget Mark Callaghan?.

Posterous and FriendFeed talk infrastructure

A couple interesting things coming out of startup land.

For one, Posterous has a little writeup on Building and Scaling a Startup on Rails: 12 Things We Learned the Hard Way. Good things to take away include using Sphinx/Solr for search, but the real important takeaway for the MySQL crowd is Storage engine matters, and you should probably use InnoDB. If you’re writing an application, know your storage engines. There are also bits to tell you how to use query_viewer and New Relic to help you fix database bottlenecks, use memcached later, and more. Its a great read.

Next up, there’s How FriendFeed uses MySQL to store schema-less data. I hope Bret from FriendFeed writes more on their infrastructure over time. Its interesting to see that they thought of going the CouchDB route, but never saw it as “proven” technology (in comparison to MySQL).

Sun Tech Days Hyderabad

I had the pleasure of addressing a crowd of over 1,000+ people yesterday, at the Sun Tech Days event in Hyderabad. I think this might as well be the biggest number of attendees at a talk that I’ve given. I spoke on MySQL: The Database for Web 2.0, and the notes for this talk are largely indexed at MySQL for Developers. Its more or less the standard deck for the Tech Days events these days.

The best part? The questions. I had intelligent questions, and they lasted well over twenty minutes, and there was even more chatter afterwards. Twenty minutes might not seem like a lot, but this is Asia, and in some audiences, you’d be hard pressed to get even a single question! MySQL is hot, in India. Really, really, hot.

I’m glad to see that most people are using MySQL 5 and 5.1. I’m not so glad to see that most people don’t know about storage engines – most are using MyISAM without even knowing it, and they don’t know there exist other engines. This is what I notice, every time I talk about storage engines, though. For the astute MySQL developer, the DevZone is known (thanks to the documentation, mainly), but the Forge is almost unheard of. Planet MySQL seems to be more popular, actually.

Arun Gupta has some nice pictures and videos of the event in general. For me, I was jet-lagged after a massive delay in my flight leaving Kuala Lumpur (plane was unserviceable), and I only mustered under three hours of sleep before addressing the large crowd of folk.

The Tech Days events for the (financial) year are winding down, and for the next (financial) year, we (MySQL/The Database Group, in general) need to plan to be first class citizens at the event. Not only in terms of talks, but we need booth space (we’re about the only Sun project lacking a booth). After all, we have interesting things to talk about: MySQL, Drizzle, MySQL Enterprise Tools/Merlin, Workbench, Proxy, Query Analyser/Quan, Cluster, Replication, DTrace, Virtualisation and the database (VirtualBox? xVM?), etc. This list is probably never ending, so some cool demos, lots of fact sheets, maybe even USB sticks of goodies (2GB sticks are dirt cheap, and loading it up with information not only make people want to get a stick, but makes them learn more – hopefully before they format it! :P).

Tokyo Cabinet in MySQL?

I read Tokyo Cabinet: Beyond Key-Value Store today from one of the news sites, and it reminded me of Brian’s hack on Tokyo Cabinet == Tokyo Engine. Looking at TokyoEngine in Brian’s Mercurial repository, there have been no updates in over a year. Is anyone planning on taking up development of this? Tokyo Cabinet looks really interesting, and Brian has already started the enabling of making it a MySQL engine.