Archive for the ‘Search’ Category

How I now drive a Hyundai Accent, thanks to a Google ad

About a month ago, I was surfing the Net, reading my mail on GMail, and I spotted a smart ad by Kah Bintang telling me in a short span of words (in the top — sponsored links in GMail – or it might have been a sidebar link) that the new Hyundai Accent 2008 model was a 1.6L car, with a very reasonable price tag.

Normally, I am blind to ads, but the message itself was very captivating, so I bit, and clicked the link. I arrived at the 2008 Accent Home, quickly jumped to its specifications, was impressed by its price tag (compared to the Toyota Vios S that I was driving, this car beats it in many ways), and brought it up in conversation.

Conversation, you ask? Yes, conversation with my parents. I was telling them it might be a nice car to have, it comes with leather seats, etc. Within a month, I hadn’t realised they had ordered it, and the car arrived early last week, and they handed the keys over to me – an early birthday present. Nifty. Thanks!

But that’s not the point. I would have never even heard of this car, had it not been for the Google ad. Someone at Kah Bintang, that’s in charge of marketing/gathering sales/et al, know that your Google ad, definitely works. In fact, I think the ROI is greater – imagine paying a blogger to write a review, versus actually running Google ads?

If you know the person from Kah Bintang responsible for this, don’t hesitate to have them call me, I’d love to interview them about their forward thinking nature. And I wish I took a screenshot of the ad itself — I can’t seem to replicate it now!

The Proton Exora


MIX fm :: lots of proton ads eh?

In other news, today I was listening to mix.fm. I heard them present some fun fact, and then, they tied it in with an advert for the Proton Exora. Smart, I’ve seen them do this with Harvey Norman ads before, but that’s just usually with discounts — the fact that with the Exora, they made some effort to expand my knowledge, then lead me back in, it did seem interesting.

Of course, going to mix.fm’s website, I seem to be a tad disappointed. There has got to be a better way to display ads, no?

Facebook Lexicon, the flu, and data mining

I recently found out about the Facebook Lexicon. There’s a FAQ, but in a nutshell, the Lexicon tracks and counts occurrences of words and phrases on Facebook Walls (profile, group, or even event Walls) over time. It doesn’t seem like status messages count, though maybe the new Lexicon might in due time.

Searched for “the flu“, only because I wanted to compare it with what you’d get over Google Flu Trends. Facebook doesn’t have the limitation that it has to be US only – its worldwide.

Then I thought about Twitter search, since lots of people post their updates on life, their feelings, et al – look at the results there, for the flu. Look at the mashup the New York Times built for the Superbowl on Twitter. Are there graphing tools, that track keywords? It might actually be cool.

Lots of new ways to data mine, it seems. Google shares some semblance of raw data. Facebook doesn’t. Twitter has whatever is available, that is limited by its API (what, some 3,200 entries?).

Imagine all this being used to predict flu clusters, or something more close to home, dengue clusters. Or voter turnout (status saying “voted”, even).

Keeping the (content on the) Internet relevant

The Internet is a great tool, but the problem with the Internet is outdated information. I was looking to find the famous Foh San Restaurant in SS2, and while the Internet suggested it did exist, Foh San closed down in SS2 sometime in 2007. The only Foh San Restaurant that exists now is in Ipoh (not SS2), and from what I hear, they plan to open another one in SS2 or the surrounding areas sometime in 2009.

Now, back to the outdated information on The Internet. Look at dineMalaysia. It looks like it was last updated in 2004. A lot can change with regards to restaurants and bars in a period of four years. Their database is also shared with some Expat eatery site. Another catalogue site lists it, but I wonder how many restaurants on that list don’t exist anymore.

The importance of catalogue websites is that they need to be constantly updated. It has to be spurred by someone (maybe the tourism ministry?), and have the capability to be cool enough to have a community built around it. The way I see it, is it should be That’s Melbourne! with a community.

The only clue I got that Foh San in SS2 had closed was from this blog entry – “… the new Korean BBQ shop (formerly Foh San Restaurant branch)”.

This however, didn’t help me, as I had already spent time looking for it. Searching by relevancy, which can also suggest dated content, doesn’t help when there is a lack of information, does it? I see a book about Google’s search algorithms in the bookstore, but I’ve yet to pick it up. I’m just curious, how catalogue information can:

  1. stay updated, constantly
  2. be relevant

(1) is easy to solve… It has to involve a community. I guess that will fix (2) too… so how do you get a community involved in catalogue information? Shouldn’t be too hard considering its food and beverage related. Bottom-line is, there needs to be traction built around it…

Where I used to live (or how I played with Google Street View)

Where I used to live - Google Street View

This is interesting. Google’s Street View. Yes, I’ve seen a lot about it on the blogosphere, but I decided to finally try it out. The photo is of the house, where I used to live. Zooming in, now I can tell you that to the left of that, is where my dodgy landlord still lives ;)

Actually, more to the point. These pictures were definitely taken this year. I know this because I had the room in front, upstairs, and there were things sticking out between the shutters and the window. This picture is too serene, so must’ve been after November 2007.

I see good potential in Street View. Think about mashups with a site that focuses on you finding rental properties. Now people can comment on the property, look at the surrounding neighbourhood, and basically help you make a better choice at renting.

The real estate industry has moved online (in Australia, I can think of Ray White, LJ Hooker, at the top of my head), but its not really been disrupted. No, domain.com.au isn’t disruption – look who owns it?

I was mildly surprised to find out about HomeSpace.sg from the e27 unconference I attended a few weeks back. Its focus currently is only for homes that are for sale, but they focus on the important aspects – like is it near an MRT, what kind of shopping malls are nearby, if you’re buying a property and have kids in mind, what zone to head to and so on.

They’re mashing it up with Google Maps. Pity there isn’t Street View in Singapore, huh?

Street View does 360° views as well. Nifty, if you ask me. See the surrounds. Does anyone know of a real estate disruptor in Australia, yet? Otherwise, there’s definitely room to start coding one…

MySQL Full Text Search by Alex Rubin

Download the PDF: http://www.mysqlfulltextsearch.com/full_text.pdf

Default search by relevance, default sort is by relevance

Boolean search is also popular. cats AND dogs. No default sorting, so you need to order the results yourself

Phrase search

MySQL Full Text Index, only available with MyISAM, and it supports natural language and Boolean search. ft_min_word_len – 4 characters per word by default is indexed. Frequency based ranking, doesn’t count distance between words

SELECT * FROM articles WHERE MATCH (title,body) AGAINST (‘database’ IN NATURAL LANGUAGE MODE);

For Boolean, you use AGAINST (‘cat AND dog’ IN BOOLEAN MODE).

n-gram fulltext plugin for CJK languages are available as plugins

DRBD and MySQL FullText search? DRBD requires InnoDB, when there is a failover, DRBD needs to perform a reovery. Fulltext only works for MyISAM. So ou create a “FullText” slave MyISAM table with FullText indexes. The slide (diagram) is most useful for this, naturally.

Speed up FT search? Fit the index into memory. key_buffer = total size of full text index (max=4GB). You can preload FT indexes into buffer.

You can manually partition. Partitioning decreases index and table size, so search is faster. Application needs changing of course. MySQL 5.1 partitioning features, do not support FTs.

Order by/Group by is a performance killer. Using order by date, is much slower than with no order by.

Real World Performance Killer
SELECT … FROM `ft` WHERE MATCH `album` AGAINST (‘the way i am’)
The above query, is very slow! It took like 13 seconds or so.

Note the stopword list and ft_min_word_len. I is not a stopword, but “the”, “way”, and “am” are stopwords.

ft_min_word_len = 1 will mean that all words except “i” will be filtered out with the standard stoplist. “i” is contained in lots of text!

Search with error correction? Use soundex() MySQL function (sounds similar). select soundex(“Dilane”) should equate to Dylan. You can sort it either by popularity or Levenstein distance (either by a stored procedure or a UDF).

Sphinx – nice, open source, can be faster than MySQL full text index on a large dataset, supports multi-node clustering out of the box. It is however an external solution that isn’t built-in, and needs to be integrated.

MySQL 5.0: need to patch source code. MySQL 5.1: copy Sphinx plugin to the plugin_dir.

You can set Sphinx to be MySQL’s storage engine if you like.

Resources

Technorati Tags: , , , , , , , ,


i