Archive for the ‘Infrastructure’ Category

EveryDNS

Tuesday, October 21st, 2008

I have always recommended the use of OpenDNS, and of late, I’ve decided to stop having to dabble with DNS (in general), and use EveryDNS. For the uninitiated, they both have David Ulevitch in common.

There’s a web interface, its not as simple as tinydns, but it gets the job done. For free, you can host 20 domains, and 200 records (CNAME, MX, etc.). I don’t think I’ll be exceeding that limit anytime soon, but I think its time to make a donation.

Nothing but satisfaction, in the last few weeks of use. One service less to maintain. Anyone else use EveryDNS and is really happy with their service?

maybank2u slow for the masses

Thursday, October 16th, 2008

When I blogged about maybank2u 2.0, I also mentioned that loading the site was an issue.

before after % increase
148KB 196KB 32%
41 HTTP requests 61 HTTP requests 49%
5.47s 10.68s 95%

It seems they launched the site, and its now slow as.


Maybank2u.com slowarse
Our service is currently unavailable.
Please try again later

More HTTP requests… longer load times… its almost impossible to login. And when you do, you see the above.

Its a better design. The UI rocks. But you got to scale the site, dear. With all that increases, you got to increase capacity. Lets hope these teething problems disappear (I don’t know when the site got launched, been away these past few days).

Funnily enough, I was awakened by my dad, who’s pretty tech-savvy, to tell me that the website had changed, and he couldn’t find the login button. I wonder how many people were so used to the old UI, which said “login” on the left hand side-bar, that now they’ve got to look at the top-right hand spot. Little UI changes… can potentially cause big panics!

Update: Classic Maybank2u is still available in the meantime. Its still Web 1.0-ish, but its familiar. Good luck to the M2U team in scaling M2U 2.0!

Avoiding the fail whale

Tuesday, October 7th, 2008

Catchy title? Its a webminar hosted by Robert Scoble, with panel members like Matt Mullenweg (WordPress - their extensive use of PHP, MySQL and more, and scalable even for wordpress.com), Paul Bucheit (FriendFeed, creator of GMail) and Nat Brown (iLike, a pretty popular Facebook application), you’d be silly not to miss it.

Its all about building a scalable server environment that grows with your traffic (virtually overnight, in some cases). I hope its all fairly generic and not Rackspace specific… we should learn to have these “fun” panel webminars.

learn2scale - what’s up with Malaysian news sites? Will the cloud work for them?

Tuesday, September 16th, 2008

Seriously kids, what’s with the lack of scalability? I’ve never seen CNN or the NYTimes go down on “trimmed” versions.

Is it a question of bandwidth? Is it lack of hardware?


Malaysiakini - learn2scale

Take for example, Malaysiakini (the first alternative news source in Malaysia, with a subscription model built around it). It runs FreeBSD, uses PostgreSQL, and has a CMS on top of it (so almost a LAMP stack right there). There’s even use of Squid for caching. Yet there’s lacking load balancing? This is where the cloud can come into play, when there’s high traffic.


The Malaysian Insider - learn2scale

Next up, The Malaysian Insider. They’re the new kid on the block. Its probably Linux, Joomla, and MySQL is confirmed. No caching (hello, memcached at some stage?). Looks like a one server operation. Again, if you want to start lean, scale to the cloud…

Of course, what takes the cake, is one of the most famous dailies, The Star. The .asp tells me they’re on some kind of Microsoft platform, and I don’t know how scalable that is (maybe with their live.com/livemesh goo). But for a major newspaper (ala the NYTimes equivalent in Malaysia), I’m surprised they’re too busy to serve us content.


The Star Online - learn2scale

Is it the fault of the applications
Is the next wave, getting open source applications to act in a scalable fashion? A CMS like Drupal or Joomla, how ready is it for instant scaling? After all, EC2 has persistent storage (I don’t know if Sun’s network.com offers this or not?).

It seems like there’s a lot of OpenSolaris images for EC2 and web stuff, at OpenSolaris on Amazon EC2. I see a Joomla AMI, for example. How easy is this to plug-in for something like The Malaysian Insider? How easy will it be for them to scale up their services (i.e. start more instances, but will Joomla load balance? What considerations must they make if they went this route?). Similar question for the Drupal AMI.

I’m thinking I need to spend some time playing with “the cloud” in due time… Any thoughts or pointers on this, are also graciously appreciated.

Project Kenai

Thursday, September 11th, 2008

Sun is a huge company. So it comes as no surprise that I’m finding out about Project Kenai via Tim Bray, instead of some internal mailing list (believe me, there must be thousands).

Tim’s got a Q&A with Nick Sieger, who’s one of the chieftains behind Kenai. I find it amusing that the comparison is made against Google Code and GitHub - has SourceForge hit irrelevancy? I’m surprised Launchpad isn’t mentioned.

Project Kenai -- We're More Than Just a Forge - Coverflow style
Very Cover Flow like UI, with slider, etc. That’s Elliot Murphy, ex-Dolphin, current Ubuntero in the pic above

Nick goes on to say “We need a place to nurture and grow our open source communities that we ourselves can control” - can control. Control is a loaded word, no? Especially in the land of open source.

The architecture is such that they’re on Sun servers (SPARC based), using GlassFish, Apache, Memcache and a single MySQL 5.0.45 database server (I’m guessing there’s a maximum storage of 146GB because they’re using SAS disks - they will implement replication soon). It seems they’re currently on 32-bit MySQL - they’re getting less than 10% CPU usage, and the query cache is working well for them (98% hit ratio). If graphs, et al turn you on, look at the slides from Fernando Castano, Achieving High Throughput and Scalability with JRuby on Rails.

Its interesting to see the mix of software offered - Mercurial and Subversion (for project hosting - there be choice, unlike the other services out there), Sympa (as opposed to common Mailman), and Bugzilla as the bug tracker. Oh, its built on Rails, so it will be an interesting experiment nonetheless, to see how Rails scales.

Why does Kenai interest me? Because for every project, you have a forum, a separate wiki, access to source code, mailing lists, and a bug tracker. Why should Kenai interest the MySQL community? Because maybe down the line, there will be integration with the Forge. Today, the Forge does not offer hosting (we have got the bits built-in, technically, but Launchpad seemed like a better bet for us, in the long run - the Forge is not in the storage business, its more a catalogue of information), mailing lists, forums, or a bug tracker.

After all, the tagline is “We’re More Than Just a Forge”. There look like there are some social networking aspects to Kenai as well - maybe some ohloh like features will make its way in due time? Maybe a Facebook application, created using Zembly will mash things up even. Who’s to say what the future of Kenai can bring.

How Facebook serves pictures

Wednesday, June 25th, 2008

I caught Facebook - Needle in a Haystack: Efficient Storage of Billions of Photos on Flowgram. First up, I’m not a big fan of Flowgrams - the format is sensible, slide and voice, is excellent, but the delivery in a web browser isn’t optimal… make downloadable videos!

The talk however, was excellent. Do watch it, and learn a bit more about Facebook’s infrastructure. Anyway, some notes I took from the talk:

  • “We’re one of the largest MySQL installations in the world”
  • Use memcache - “We have memcache because databases aren’t fast” (later on in the questions)
  • Separate team focusing on APE (Apache, PHP and Extensions that they work on)
  • 6.5 billion total images, 4-5 sizes stored for each, so 30 billion files, of about 540TB total… During peak? 475,000 images served per second, and growing by 100 million uploads per week
  • Images are usually pulled from a Content Delivery Network (CDN), so it reduces the request rate on their servers
  • They use NetApp Storage, but basically their upload servers speak NFS to write to NetApp.
  • Cachr (evhttp based) and File Handle Cache use memcache as a backing store… FHC is based on lighttpd!
  • Makes use of a “haystack” - user-level abstraction, storing a separate index file that has more efficient metadata (to reduce disk seeks - 1 disk seek or less for any workload). Pretty deep in the discussion of the haystack server architecture, also evhttp-based
  • MySQL use? Very few transactions, very few joins
  • Video is a very different beast, and the design is a little different

If you’re into information about photo storage sites, don’t hesitate to also read my previous notes on Flickr.

Spacewalk, and what we can learn about naming

Monday, June 23rd, 2008

Red Hat releases Spacewalk. It is described as: “the upstream community project from which the Red Hat Network Satellite product is derived“. Congratulations to all whom have worked on it, especially my friends who tired endlessly over it in the past.

Red Hat, is sticking true to its promise, of open sourcing everything they make. Best of all, they recognise Fedora (they always did, since say, Fedora Core 2 or 3), CentOS (a direct “competitor”/rebuild of RHEL), and Scientific Linux (I know of a certain university’s sysadmin who will be blessing Spacewalk, as her life will now be a lot easier).

There have been a few blogs about it… Matt Asay asks about a community (Red Hat traditionally wasn’t good at this, but with Fedora, I believe they’ve learned, and I’m happy to say I think, I helped in the education process). No one however, focused on the technical aspects around Spacewalk/RHN.

Case in point: Oracle is at the heart of it. RHN was designed almost seven years ago, and I’ve heard amazing stories from Gafton, Greg, and Peter. How Gafton found hidden “secrets” inside Oracle to boost performance, and a whole bunch of interesting things, best to talk about over a beer (the irony? When I first met these folk, I couldn’t even legally drink a beer in the US…)

Read the Developer Documentation, note that they use Perl, Python and Java in the current code base (but only Perl and Java is the way forward). There’s a DB Schema available… and I wonder when someone will port this to MySQL?

The Spacewalk FAQ mentions the lack of resources in the past to add an open source database, but would want to do so soon. There’s even help on getting Oracle XE running. The glimmer that there is to be an open source database behind Spacewalk, is what tells me that the MySQL community, that benefit from such a tool (so you’re a DBA and a sysadmin at a fairly largeish installation), should port this to run on MySQL.

What else can we take away from Spacewalk? The excellent positioning. A community project from which the RHN product is derived. This is similar to what Fedora is positioned as: Another striking difference of Fedora is our goal to empower others to pursue their vision of what a free operating system should be like. Fedora now forms the basis for derivative distributions such as Red Hat Enterprise Linux , the One Laptop Per Child XO and Creative Commons’ Live Content DVDs.

Distinctive naming. Helps create a lack of confusion (at the price of an ubiquitous name? Sure, you just have two ubiquitous names now). MySQL Enterprise vs. MySQL Community. They’re both MySQL (don’t even get started on the odd/even numbering scheme…). I dream the day, when we have MySQL Enterprise and Sakila (formerly known as MySQL Community).

Services Oriented Architecture with PHP and MySQL

Tuesday, April 15th, 2008

Joe Stump, Lead Architect, Digg. Slides should make its way at Joe’s website soon enough.

Mainly works on the backend, makes sure its scalable, can all the Digg buttons be served, et al.

Application layer is loosely coupled from your data. Whole point of SOA? You can put a service in front of the DB, and move between DB’s if required.

They do use MySQL, but its pretty vanilla.

Old habits die hard
- Data requests are sequential (I need foo, bar, bleh, ecky)
- Data requests are blocking (When you need foo, nothing else is happening)
- Tightly coupled (mysql_query, and if you’re using DB abstraction layer even, you’re still using SQL… you then can’t use CouchDB for instance)
- Scaling is not abstracted (a lot of caching are in the front end code. Its a problem when you start scaling your teams out). They use memcached from what I gather.

SOA
- Data is requested from a service (via HTTP, custom, etc.)
- Data requests are run in parallel (over non-blocking sockets. 10 data requests in 1 webpage, and each request takes 10ms. It might now only take 70ms now, maybe, over 100ms. Generally 1.5-2.5x faster now, for blocking parallel requests)
- Data requests are asynchronous (non-blocking parallel requests)
- Data layer is loosely coupled
- Scalability is abstracted (can find engineers anywhere, that can parse JSON or XML :P)

Options?
- Run requests over HTTP (Google (Java), Amazon (Java), etc.)
- New York Times’ DBSlayer (small little HTTP server that runs and provides parallel and async requests to mysql)
- Danga’s Gearman (binary protocol, has worked, its kind of a queuing system)
- Remember the wall clock goes down, but the CPU time is still happening, its still the same

HTTP w/PHP
1. Group requests for data at the top
2. Open a socket for each request
- Sockets must be non-blocking
- Make sure to use TCP_NODELAY
3. Use __get() to block for results
4. See Services_Digg_Request

Use a pear package, called Services_Digg for the above example. Note Digg’s API documentation as well.

HTTP is widely supported in all languages. Its very easy to get up and running, with lots of options for servers/tuning. Overhead in the protocol is great, and Apache itself has a lot of overhead.

DBSlayer
- small HTTP daemon written in C. You post JSON to it for communications
- connection pooling (benchmark mysql connection, and there’s a whole bunch of overhead in the mysql authentication; mysql proxy does this too)
- load balancing and failover (like mysql proxy)
- tightly coupled to MySQL (no migration)
- tightly coupled to SQL (no CouchDB)
- no intelligence

Gearman
- highly scalable queuing system (worker bees, like PHP scripts. Sockets open, client comes to gearman server to do foo, and it says it has n number of workers, and gearman gets ‘em to work. So it works linearly. Jobs can return results back, run in parallel on many gearman servers and many CPUs)
- simple and efficient binary protocol
- sets of jobs are run in parallel
- queue can scale linearly
- php, perl, python, ruby, c clients
- poorly documented (”I think poorly documented is giving them too much credit.. All danga stuff has next to no documentation”)
- livejournal uses this, instead of using HTTP running
- its not very “robust” (it scales, they at digg don’t see massive number of failing jobs. Queue isn’t persistent though. When pushing stuff, and gearman gets restarted, the queue goes away - there is a workaround, for this, so ask Joe - its an undocumented feature available though)
- digg uses it in the submission process for crawling
- Chris at Yahoo! uses Gearman requests to run multiple memcached GETs (if you’re not using multi-get, check them).
- Check out Net_Gearman, which is a PEAR package

DIY option?
- not recommended, unless you have a highly customised solution, i.e. what Flickr does
- they ran into a problem where uploading an image, and then getting the image resized, for large images, was a problem. So they use a custom binary protocol that is much more efficient for the datasets (think, an SLR has files that are 7MB in size or something)
- this requires more resources (humans, engineers!)

What goes in the Services layer?
- smart caching strategies
- data mapping and distribution
- intelligent grouping of data results
- partitioning logic

Remember to intelligently group data into endpoints, and version them! This will help you improve your software.

Consider bundling and grouping requests (bulk loading).

EPIC FAIL!
- sending SQL over for translation? Pfft. DBSlayer does this, but it tightly couples you
- hundreds of teeny tiny endpoints (cohesive endpoints that return a decent amount of data)
- running SOA requests sequentially! You then get no benefits from an SOA architecture, at all. Parallel requests are good.

Technorati Tags: , , , , , , , , , , , , , , , , , ,