Morning sessions at MySQL MiniConf

Upcoming MySQL Features – Stewart Smith

Stewart’s talk on Upcoming MySQL Features was sort of a roadmap of what one might expect to see in MySQL 5.1 and above – he touched on Falcon, online backup, batched key access, Maria, Proxy, Workbench, and some cluster changes. When he shares his slides, it might be great to link to Worklog items, and Forge pages about these new features and previews (because, believe me, the stuff thats coming in future, is clearly very exciting).

MySQL Indexing Methods – Jonathon Coombes

Sitting in Jonathon Coombes session on MySQL Indexing Methods now – he’s going through covering indexes, the B+-tree index, hash index, full text indexing.

Some select points, that aren’t in the slides (otherwise, the slides themselves are very verbose, and when they make it online, it will provide some great reading material):

  • InnoDB uses a B+-tree, and a secondary hash index.
  • MyISAM has R-Tree index support, so it can be used quite usefully for GIS applications. MyISAM is not the only engine that supports spatial indexing, but its the only one that uses R-Tree indexes.
  • T-Tree index is used in MySQL Cluster
  • Touched on the Lucene search engine, with at least about 8 hands going up, as to people using it in production. It does allow live indexing, does proximity searches
  • Sphinx, has a high indexing speed, and by default it sits outside the database, however you can set it to be a storage engine as well. Sphinx is distributed, so its similar to the way memcached is architectured. Distributed indexing. Full text fields.
  • Bitmap indexes, not available yet, but its something people are waiting for

Bit weird seeing Jonathon now showing off OpenQuery, as opposed to Cybersite!

MySQL Optimisation by Design – Arjen Lentz
Its a training course that Arjen is working on, a one day event. He has no slides, per se, so he’s going to use the whiteboard instead. So this is something you’ve got to hope to get a video of, at least.

If you use MySQL, quite often you stick a database abstraction layer in between. However, you never usually ever port to another database, and the abstraction layer always leaks.

Arjen now walks us thru engines available, by firing up the MySQL monitor, and going through the output of show storage engines;.

Arjen then goes through a demonstration, on how many rows per second you can INSERT. These are 60-byte rows, from data acquisition styled information (IP addresses from a router, web clicks, etc.). Using MyISAM. No indexes. Sustained inserts of about 150,000 rows per second (40 million rows, in about 5 minutes, till his disk got full), on his Mac OS X Leopard laptop (and he can’t max anything out). This demo, is of course, for Arjen’s new gizmo, that’s written using the MySQL C API, that currently is in development. The application makes use of a multi-row insert.

Using Blackhole, it holds at about 245,000 rows per second, for INSERTs. So, we’re seeing the slowdown, in really, disk I/O.

Codename: lossylogger.

Its called lossy, because its allowed to loose some data. If you’re tracking web clicks, does it really make a difference, if you’ve lost some clicks, in the grand scheme of things? Not really. It makes use of UDP, so you’re not opening connections either. So, occasionally, you get a buffer overflow, and occasionally you lose data, but its no big deal, in the grand scheme of things.

Most interestingly, at the tail-end of Arjen’s talk, I see the MySQL Australia account manager (sales), Ralph, walking in. Yes, we have sales in Australia/New Zealand now!

Technorati Tags: , , ,

5 Comments

  1. Hi Colin,

    Thanks for the feedback on the talks, but you seem to have things a little skewed here regarding what was missing from my slides?

    > * InnoDB uses a B+-tree, and a secondary hash index.
    > * MyISAM has R-Tree index support, so it can be used quite usefully for GIS applications. MyISAM is not the only engine that supports spatial indexing, but its the only one that uses R-Tree indexes.
    > * T-Tree index is used in MySQL Cluster
    > * Touched on the Lucene search engine, with at least about 8 hands going up, as to people using it in production. It does allow live indexing, does proximity searches
    > * Sphinx, has a high indexing speed, and by default it sits outside the database, however you can set it to be a storage engine as well. Sphinx is distributed, so its similar to the way memcached is architectured. Distributed indexing. Full text fields.
    > * Bitmap indexes, not available yet, but its something people are waiting for

    Many of these points are mentioned briefly in the slides and emphasized via the talk, but I miss how some are not considered mentioned e.g. bitmap indexes has a whole slide explaining them?

  2. byte says:

    Hi Jonathon,

    As we spoke earlier, the whole slides mentioning them, don’t contain say, your verbal cues – bitmap indexes for example, don’t mention the fact that they aren’t currently available yet. I didn’t bother taking notes for what what was already in the slides, because I don’t believe in repetition.

    However, if everything is mentioned in the slides, some people will fail to listen to you, and just focus on reading your slides later, and its always good speaker skill, to present points in slide, and not be too verbose.

    Notice, I didn’t write notes for your other talk? I felt the slides were verbose enough, and your use of the whiteboard, disabled me from wanting to write notes anyway…

  3. […] and around 1 hour in, you’ll get to see a demo of Arjen’s tool, lossylogger at work. I blogged about these […]

  4. Arjen Lentz says:

    Hi Colin,

    Thanks for your live reports…

    I don’t quite agree with you on the “were the slides verbose enough” discussion with Jonathon. Slides illustrate, they are not keycards or the presentation itself.
    From this you can conclude that they won’t contain the whole story, and reviewing slides afterwards is merely of interest to people who *have* attended the actual talk.
    Now, the fact that many conference presentations do contain the whole story, and that the presenter often looks back at the projector screen to then read the exact lines presented there, merely shows that those presentations suck and need serious help.

  5. byte says:

    Hi Arjen,

    Of course slides are cues, and if one reads from line-to-line, you’d be wasting the attendees time (and yes, you probably need serious help).

    But in todays world, you have conferences that are recorded as video (which sometimes doesn’t get the slides in properly), or conferences that are recorded as audio (which you will want to refer to slides later), or at worse, conferences where you get some video, you can’t see the slides, AND the useful/verbose slides WITH diagrams, get LOST.

    Whole story slides, like what was available for the Youtube talk at the MySQL Conference & Expo 2007, was excellent. It was a pity that Paul Tuckfield had not enough time, and too much technical detail in the slides, that it ended pretty abruptly. The real pity however is that he had lost his set of slides! So now, all we have to live with are a video of him, talking about things, with however no slides (even visible from the video) :(

    You want to give sparse slides? The only way you’re going to get away with it, is if you’re keynoting. A deck of 30 slides containing just random photos, in a technical talk, will make you lose your audience, real quick…

    Oh, and a lot of people read slides from talks they don’t attend, in the interest of learning something new. Not everyone can attend expensive conferences, but everyone wants to learn.

    Non-verbose slides (or not having distributed slides, video, etc.) mean that people don’t learn, as well as they should.

    I’m sure you’re not one for reducing the knowledge of the world?


i