Archive for the ‘Languages’ Category

An introduction to ANTLR (sparse notes)

Monday, February 18th, 2008

I attended Clinton Roy’s excellent session titled An introduction to ANTLR: A parser toolkit for problems large and small. Now that the slides, and the video (1, 2) are online, I don’t know if my bits of notes are of any use (I made them while the tutorial was in progress), but they’re sitting on my desktop, and really should just get published. The files referenced below, you get via checking out the Antlr Tutorial Preparation wiki page.

Why use Antlr?
To parse configuration files, syntax highlighting, Domain Specific Languages (DSL), interpreters, translation/transformation.

Generates easy to follow code, LL(*) parsing algorithm. Bison is more powerful than Antlr. Compined lexer, and parser generator.

fun (int a, char b); <– as you do LL, till you hit the “;”, you have no idea if you’re dealing with a function or a declaration. Of course, there are look-ahead LL parsers too. An LL3 parser, which can see 3 tokens ahead, you still can’t see ahead enough, till you hit the ;. This is why, there exists an LL(*) - pick the smallest look-ahead, your grammar would need

Antlr, will help you get rid of using regular expressions.

Island grammars - one language, inside another (like HTML, inside PHP, or Doxygen inside C) are supported by Antlr.

Antlr Wiki is good, but hard to find things. Mailing list is great. The book by Terrence Parr is good, but out-dated, so go ahead, and get the online PDF version. A new cookbook/recipe list is coming out soon.

Using AntlrWorks. java -jar antlrworks.jar

conffile.g parses a = 1, b = foo.

IDENT   :       (’_'|’a’..’z'|’A’..’Z')(’_'|’a’..’z'|’A’..’Z'|’0′..’9′)*;
NUMBER  :       ‘0′..’9′+;
WS      :       ‘\r’ | ‘\n’ | ‘ ‘ {$channel=HIDDEN;};

The above are lexer rules. WS = whitespace. It reads from bottom up. White space, a number (0-9). IDENT will match foo, foo, foo1, but not 1foo (identifiers don’t start with numbers).

{$channel=HIDDEN;} <– IDENTs and NUMBERs get through the channel get through the parse. The whitespace, the parser sees them, but it will ignore them (i.e. hide them).

[-(byte@hermione)-(pts/6)-(11:21am:31/01/2008)-]
[-(/tmp/antlrworks)> l
total 248
drwxrwxr-x  2 byte byte   4096 2008-01-31 11:20 ./
drwxrwxrwt 44 root root 192512 2008-01-31 11:20 ../
-rw-rw-r–  1 byte byte    340 2008-01-31 11:20 conffile__.g
-rw-rw-r–  1 byte byte   8492 2008-01-31 11:20 conffileLexer.java
-rw-rw-r–  1 byte byte   4180 2008-01-31 11:20 conffileParser.java
-rw-rw-r–  1 byte byte     28 2008-01-31 11:20 conffile.tokens

conffile__.g - lexer file
conffile.tokes - tokens

CMinus.g takes input, which is a C program. Go to the interpreter, and you can then see the entire parse tree. Very impressive!

Technorati Tags: , ,

Workbench beta adventure on Linux with Mono/WINE

Tuesday, November 27th, 2007

MySQL Workbench has a beta out! No idea why its version 5.0.9, but its highly exciting. This software existed before, but this is quite unlike its predecessor. One snag for me is that it is Windows-only at the moment, with Linux and OS X versions to follow suit.

However, due to excitement, I decided to try running it on Linux, anyway.

Seeing that it is a .NET application, I thought I’d pass it through MoMA (the migration analyser). Everything passed, so I got excited. Running mono MySQLWorkbench.exe however, led me to a failure:
** ERROR **: Method ‘<Module>:<CrtImplementationDetails>.DoDllLanguageSupportValidation ()’ in assembly ‘/home/byte/Downloads/MySQL Workbench 5.0.9 OSS Beta/wb.wr.dll’ contains native code and mono can’t run it. The assembly was probably created by Managed C++.

So I hopped onto #workbench on Freenode, where the MySQL Workbench crew hang out, and spoke with Mike Zinner (team lead for this software). He mentioned to me that it probably wouldn’t work, as there are some 3rd-party FOSS controls that rely on Win32 API calls. Immediately, I think of WINE.

Running it against WINE, I get an error basically telling me I need Mono for Windows:
fixme:actctx:parse_manifest_buffer root element is L”asmv1:assembly”, not <assembly>
install the Windows version of Mono to run .NET executables


Workbench fails on me in WINE

Downloaded mono-1.2.5.2-gtksharp-2.10.2-win32-0.exe, installed it via WINE, and then made another attempt at running Workbench, only to see a similar failure, this time in GUI form.

A little disappointed, I think the next option is to run Workbench in a virtualized Windows environment. KVM immediately came to mind, with only one minor snag - while its full CPU hardware virtualization, it doesn’t virtualize the graphic layer as well (it just emulates a graphics card, like it does for pretty much every device). Windows will see a Cirrus Logic card, from ages ago. This means, no OpenGL support, which Workbench really needs (otherwise, it drops down to software rendering, and becomes much slower).

However, there is hope. Check out VMGL, which is OpenGL Hardware 3D Acceleration for Virtual Machines. This should work with Xen and KVM, so I’ll give it a twirl, and see how it goes.

If you’re on a Mac, I am told that VMWare Fusion does not do OpenGL, so you’re out of luck there. However, Parallels does - so let that be your virtualization option of choice, if you’re on an Intel Mac.

Next stop, to go out and buy Windows Vista - wish me luck!

Technorati Tags: , , , , , , , ,

What languages (and connectors) do you primarily use for MySQL?

Wednesday, October 17th, 2007

I’m not a big fan of the polls, but I do think this one’s fairly significant: What is your primary programming language for developing MySQL applications?

Why? Because this means we know what connectors to give more love to. What kind of articles to write for the DevZone. While we might think Ruby is the next big thing since sliced bread, you folk might tell us that Delphi is probably larger than Ruby and we should be applying appropriate love there (I’m sure this statement is untrue, but for arguments sake, mmmkay?).

To vote, you actually have to click the Community link, to get to the DevZone, scroll down a bit, and get to the MySQL Quickpoll. Why can’t I link directly to the quickpoll? Why can’t you vote on the results page? That is left as an exercise to the reader…

Technorati Tags: , , , , ,

Tagging differentiation

Tuesday, September 25th, 2007

Standardisation is important.

Tagging in Uploadr involves writing tags in the format such that its:
    australia victoria melbourne “notting hill” clayton

Tagging in ScribeFire, involves writing tags that are parsed in a different way (for Technorati):
    australia, victoria, melbourne, notting hill, clayton

Notice the commas (”,”)? Without them, your tags are all lumped together. I’m wondering if I should change Uploadr to similar behaviour as ScribeFire (or vice versa)? What do other applications do for tagging in a field?

It should be trivial to make this change, the question is if my patch will be accepted upstream. I’m already using a patched version of Uploadr, as I await the author to implement my patch (which adds a description field, which the Flickr API supports). Incidentally, PyGTK is pretty easy to get around with, with superb documentation making it easy for anyone to get on the bandwagon. More on pygtk programming later…

Technorati Tags: , , , , , , , ,

Best way to learn Mandarin in GNU/Linux or OS X?

Tuesday, September 18th, 2007

What’s a good, quick way, to learn a new language with the help of Linux?

In particular, I’m interested in learning conversational Mandarin. Basic reading, is a bonus, but hey, I’m not that fussed. I’d like to not pay for my software, if possible, and since I tote a Linux laptop most of the time (this might change to an OS X based one that actually works - rant on this soon), if it runs on Linux, all the better. The Popagandhi tells me I need to go to a good class - do these exist in Melbourne/

Some useful links I’ve found, so far:

  • QQ for Linux - QQ is the Chinese version of ICQ, that pretty much everyone there uses. Though MSN seems to be a lot popular these days (compared to what, 2.5-3 odd years ago)
  • ChinesePod - podcasts to help? Well, maybe here’s a reason to use an iPod again…
  • I saw this thread on the Ubuntu Forums, but it doesn’t really address anything of requirement

Technorati Tags: , , , , ,

Ruby Gems, Mono System.Windows.Forms on Ubuntu

Friday, August 10th, 2007

I’ve recently started doing more development locally on my Ubuntu (Feisty Fawn) laptop (as opposed to being logged in via ssh to various machines, generally running Fedora), and have noticed some quick snags.

Ruby Gems
They’re currently installed in /var/lib/gems/1.8 which is not in your PATH. So if for example, you use cheat, you’re not going to find it. Fix it via adding /var/lib/gems/1.8/bin to your PATH (my .bashrc has it looking such as: PATH=$PATH:$HOME/bin:/var/lib/gems/1.8/bin)

Mono, and System.Windows.Forms
I have no problems with Mono and .NET related applications, normally. I run Tomboy (which I like, a lot), I can fire up f-spot, and when I need to Beagle runs fine too. But of late, I’ve had to run an application that required System.Windows.Forms, aka WinForms. Little did I know I’d need to install the winforms stuff, so a sudo apt-get install libmono-winforms* fixed this for me.

This still hasn’t made my required application run properly, but I’m now a step closer to finding out compatibility with Windows-based .NET applications and Mono. All thanks to the useful Mono Migration Analyzer (MoMA). Hat tip to Ditesh for pointing me to MoMA.

Technorati Tags: , , , , , ,

Pimping my friends: an ODF e-Note and haze.net

Thursday, August 2nd, 2007

A couple of my good friends have had some recent achievements that I clearly should help them blow their trumpets for.

First up, we have Ditesh, who’s an active proponent of ODF, have a little e-Note published on Electronic Document Standards. I got to read it back when it was in an ODF document (*grin*), and not much has changed since all the comments were pushed. Do read it, and consider giving it to upper management to read as well. Its a very well thought out document, and should be making its rounds on the Internet soon enough. Ditesh welcomes comments via email or his blog entry.

Incidentally, this is also one of the first notes that the UNDP/APDIP have published that carry a disclaimer - “The views expressed in this APDIP e-Note are those of the author and do not necessarily represent those of the United Nations, including UNDP, or their Member States.” I thought that was a little soft-cock, but this is the power of lobbying I guess.

Next up, we have Aizat creating haze.net.my, aka the Malaysian Air Pollution Index. Yes, do laugh out loud - Malaysia is very well polluted, and the API readings are pretty high usually, and the government of the day always insists its still safe. Aizat built it using Ruby on Rails, and there’s some active scraping of data (via hpricot), which then all mashes up with Google Maps. The site’s well designed (i.e. its simple), there’s an RSS feed if you’re so inclined to read details that way, and if you’re just interested about a certain area (say, Kuala Lumpur), you can dig deeper, and look at the graphs (via Gruff Graphs) of when it started becoming unhealthy and so on. Exporting it to CSV works too, in case you were using it for a project/paper on the haze.

All in all, a good side-project, very informative for those living in Malaysia or visiting Malaysia. Don’t see a good income stream (ads? pfft.), but definitely very informative. Maybe sell it to a ministry :-)

Technorati Tags: , , , , , , , , , , ,

Scaling Twitter: “Is Twitter is UDP or TCP? Its definitely UDP.”

Monday, April 23rd, 2007

Presented by Blaine Cook, a developer from Odeo, now probably CTO of Twitter (Obvious Corp spawed, I think). There’s a video and slides (yes, you need evil Flash so I haven’t viewed it myself). Then there are my notes… possibly with some thoughts attached to them. No, they’re not organized, I’m too busy and tired…

Rails scales, but not out of the box. This will cause Twitter to stop working very quickly.

600 requests/second, 180 rails instances (mongrel), 1 DB server (MySQL) + 1 slave (read only slave, for statistics purposes), 30-odd processes for misc. jobs, 8 Sun X4100s.

Uncached requests in less than 200ms in most of the time.

steps:
1. realize your site is slow
2. optimize the database
3. “Cache the hell out of everything”
4. scale messaging
5. deal with abuse
6. profit.

Have stats (something Twitter didn’t have before): munin, nagios, awstats/google analytics (latter doesn’t obviously work if your site itself doesn’t load), exception notifier/logger (exception logger is what they use at Twitter, so you don’t get lots of email :P). You need reporting to track problems.

Benchmarks - they don’t do profiling, they just rely on their users! What torture for the poor users…

“The next application I build is going to be easily partionable.” - Stewart Butterfield
Dealing with abusers…
Inverse spamming - The Italians - receiving SMS gives you free call credits!
9,000 friends in 24 hours doesn’t scale!
Just be ruthless, delete said users. This is where you thank the reporting tools, to allow you to detect abusers.

They’ve looked at Magic Multi Connections, it looks great, but it wouldn’t work for Twitter.

Main bottleneck is really in DRb and template generation. Template optimizer that Steven Kays wrote doesn’t work for them.

Twitter: built by 2 people first. And now, they’re just 3 developers.

When mongrels hit swap, they become useless. So turn swap off.

Twitter themselves don’t seem to want to give out details of how many users, etc. they have. Shifty, beyond the fact that they claim its “a lot of users”.

Twitter is not built for partitioning. Social applications should be designed to be easily partionable. Wordpress, anything 37signals builds, tends to be partionable. Things start becoming hairy when you have 3,000+ friends!

Index everything - Rails won’t do it for you, but you need to repeat for any column that appears in a WHERE clause.

Denormalize a lot - heresy in the rails book? but he hopes not. This is single handedly what saved Twitter.

They use InnoDB. Don’t do status.count() when there’s millions of rows… it’ll stop working. MyISAM will be faster, but still, don’t.

email like “$#!$” - search. Twitter has disabled search right now… This makes their database enjoy life.

Average DB time is 50ms (to at most 100ms)

They’re not hurting on the DB. The master DB machine is at a quarter CPU usage. So they don’t see the need to partition at this point.

Twitter does a lot of caching, they use MemCache. If you really need status.count() use memcache.

Query for friends status on your Twitter homepage, is a complicated query using a lot of JOIN. They use ActiveRecord, they store the status in memory, and they don’t touch the DB. They plan to use memcache in the future for the statuses too.

ActiveRecord objects are huge (which is why its not stuck in memcache yet). They’re looking at implementing ActiveRecord nano or something simiar - smaller, store in cache critical attributes, and use add method missing if you don’t find what you’re looking for.

90% of Twitter’s requests are API requests. So cache them. No fragment or page caching on the front-end, but for API requests, lots of caching.

Producer(s) -> Message Queue -> Consumer(s)

DRb: zero redundancy, tightly coupled.

They use ejabberd for Jabber server.

When the Jabber client went down, everything went down. So they moved to using Rinda. Its O(N) for take() so if the queue has 70,000 messages, you just shut it down, restart it, and lose those 70,000 messages. Sigh.

“Someone asked if Twitter is UDP or TCP? Its definitely UDP.” — Blaine Cook

LiveJournal has a horizontally scaled MySQL, that is just MySQL + Lightweight Locking. RabbitMQ (erlang) is something they’re looking at, quite clearly, but it looks ugly, and they don’t want to possibly implement it.

Starling was written. Ruby, will be ported to something faster. Does 4000 transactional messages/second, will have multiple queues (like a cache invalidation one), speakes MemCache (set, get), writes it all to disk. First pass was written in 4 hours, and its been working fine for the last few days (i.e. since Wednesday). Twitter died on Tuesday at the Web 2.0 conference! Starling will probably be open source.

Use messages to invalidate your cache.

Dealing with abusers…
Inverse spamming - The Italians - receiving SMS gives you free call credits!
9,000 friends in 24 hours doesn’t scale!
Just be ruthless, delete said users. This is where you thank the reporting tools, to allow you to detect abusers.

They’ve looked at Magic Multi Connections, it looks great, but it wouldn’t work for Twitter.

Main bottleneck is really in DRb and template generation. Template optimizer that Steven Kays wrote doesn’t work for them.

Twitter: built by 2 people first. And now, they’re just 3 developers.

When mongrels hit swap, they become useless. So turn swap off.

Twitter themselves don’t seem to want to give out details of how many users, etc. they have. Shifty, beyond the fact that they claim its “a lot of users”.

Technorati Tags: , , , ,