{"id":793,"date":"2008-04-15T20:04:08","date_gmt":"2008-04-16T01:04:08","guid":{"rendered":"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/services-oriented-architecture-with-php-and-mysql"},"modified":"2008-04-15T20:07:43","modified_gmt":"2008-04-16T01:07:43","slug":"services-oriented-architecture-with-php-and-mysql","status":"publish","type":"post","link":"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/services-oriented-architecture-with-php-and-mysql","title":{"rendered":"Services Oriented Architecture with PHP and MySQL"},"content":{"rendered":"<p><a href=\"http:\/\/www.joestump.net\/\">Joe Stump<\/a>, Lead Architect, Digg. Slides should make its way at Joe&#8217;s website soon enough.<\/p>\n<p>Mainly works on the backend, makes sure its scalable, can all the Digg buttons be served, et al.<\/p>\n<p>Application layer is loosely coupled from your data. Whole point of SOA? You can put a service in front of the DB, and move between DB&#8217;s if required.<\/p>\n<p>They do use MySQL, but its pretty vanilla.<\/p>\n<p><strong>Old habits die hard<\/strong><br \/>\n&#8211; Data requests are sequential (I need foo, bar, bleh, ecky)<br \/>\n&#8211; Data requests are blocking (When you need foo, nothing else is happening)<br \/>\n&#8211; Tightly coupled (mysql_query, and if you&#8217;re using DB abstraction layer even, you&#8217;re still using SQL&#8230; you then can&#8217;t use CouchDB for instance)<br \/>\n&#8211; Scaling is not abstracted (a lot of caching are in the front end code. Its a problem when you start scaling your teams out). They use memcached from what I gather.<\/p>\n<p><strong>SOA<\/strong><br \/>\n&#8211; Data is requested from a service (via HTTP, custom, etc.)<br \/>\n&#8211; Data requests are run in parallel (over non-blocking sockets. 10 data requests in 1 webpage, and each request takes 10ms. It might now only take 70ms now, maybe, over 100ms. Generally 1.5-2.5x faster now, for blocking parallel requests)<br \/>\n&#8211; Data requests are asynchronous (non-blocking parallel requests)<br \/>\n&#8211; Data layer is loosely coupled<br \/>\n&#8211; Scalability is abstracted (can find engineers anywhere, that can parse JSON or XML :P)<\/p>\n<p><strong>Options?<\/strong><br \/>\n&#8211; Run requests over HTTP (Google (Java), Amazon (Java), etc.)<br \/>\n&#8211; New York Times&#8217; DBSlayer (small little HTTP server that runs and provides parallel and async requests to mysql)<br \/>\n&#8211; Danga&#8217;s Gearman (binary protocol, has worked, its kind of a queuing system)<br \/>\n&#8211; Remember the wall clock goes down, but the CPU time is still happening, its still the same<\/p>\n<p><strong>HTTP w\/PHP<\/strong><br \/>\n1. Group requests for data at the top<br \/>\n2. Open a socket for each request<br \/>\n&#8211; Sockets must be non-blocking<br \/>\n&#8211; Make sure to use TCP_NODELAY<br \/>\n3. Use __get() to block for results<br \/>\n4. See Services_Digg_Request<\/p>\n<p>Use a pear package, called <a href=\"http:\/\/pear.php.net\/package\/Services_Digg\">Services_Digg<\/a> for the above example. Note Digg&#8217;s <a href=\"http:\/\/apidoc.digg.com\/ToolkitsServicesDigg\">API<\/a> documentation as well.<\/p>\n<p>HTTP is widely supported in all languages. Its very easy to get up and running, with lots of options for servers\/tuning. Overhead in the protocol is great, and Apache itself has a lot of overhead.<\/p>\n<p><strong>DBSlayer<\/strong><br \/>\n&#8211; small HTTP daemon written in C. You post JSON to it for communications<br \/>\n&#8211; connection pooling (benchmark mysql connection, and there&#8217;s a whole bunch of overhead in the mysql authentication; mysql proxy does this too)<br \/>\n&#8211; load balancing and failover (like mysql proxy)<br \/>\n&#8211; tightly coupled to MySQL (no migration)<br \/>\n&#8211; tightly coupled to SQL (no CouchDB)<br \/>\n&#8211; no intelligence<\/p>\n<p><strong>Gearman<\/strong><br \/>\n&#8211; highly scalable queuing system (worker bees, like PHP scripts. Sockets open, client comes to gearman server to do foo, and it says it has n number of workers, and gearman gets &#8217;em to work. So it works linearly. Jobs can return results back, run in parallel on many gearman servers and many CPUs)<br \/>\n&#8211; simple and efficient binary protocol<br \/>\n&#8211; sets of jobs are run in parallel<br \/>\n&#8211; queue can scale linearly<br \/>\n&#8211; php, perl, python, ruby, c clients<br \/>\n&#8211; poorly documented (&#8220;I think poorly documented is giving them too much credit.. All danga stuff has next to no documentation&#8221;)<br \/>\n&#8211; livejournal uses this, instead of using HTTP running<br \/>\n&#8211; its not very &#8220;robust&#8221; (it scales, they at digg don&#8217;t see massive number of failing jobs. Queue isn&#8217;t persistent though. When pushing stuff, and gearman gets restarted, the queue goes away &#8211; there is a workaround, for this, so ask Joe &#8211; its an undocumented feature available though)<br \/>\n&#8211; digg uses it in the submission process for crawling<br \/>\n&#8211; Chris at Yahoo! uses Gearman requests to run multiple memcached GETs (if you&#8217;re not using multi-get, check them).<br \/>\n&#8211; Check out <a href=\"http:\/\/code.google.com\/p\/netgearman\/\">Net_Gearman<\/a>, which is a PEAR package<\/p>\n<p><strong>DIY option?<\/strong><br \/>\n&#8211; not recommended, unless you have a highly customised solution, i.e. what Flickr does<br \/>\n&#8211; they ran into a problem where uploading an image, and then getting the image resized, for large images, was a problem. So they use a custom binary protocol that is much more efficient for the datasets (think, an SLR has files that are 7MB in size or something)<br \/>\n&#8211; this requires more resources (humans, engineers!)<\/p>\n<p><strong>What goes in the Services layer?<\/strong><br \/>\n&#8211; smart caching strategies<br \/>\n&#8211; data mapping and distribution<br \/>\n&#8211; intelligent grouping of data results<br \/>\n&#8211; partitioning logic<\/p>\n<p>Remember to intelligently group data into endpoints, and version them! This will help you improve your software.<\/p>\n<p>Consider bundling and grouping requests (bulk loading).<\/p>\n<p><strong>EPIC FAIL!<\/strong><br \/>\n&#8211; sending SQL over for translation? Pfft. DBSlayer does this, but it tightly couples you<br \/>\n&#8211; hundreds of teeny tiny endpoints (cohesive endpoints that return a decent amount of data)<br \/>\n&#8211; running SOA requests sequentially! You then get no benefits from an SOA architecture, at all. Parallel requests are good.<\/p>\n<p>Technorati Tags: <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/mysql\">mysql<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/mysqluc08\">mysqluc08<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/mysqluc2008\">mysqluc2008<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/mysqlconf\">mysqlconf<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/joe%20stump\">joe stump<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/digg\">digg<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/services%20oriented%20architecture\">services oriented architecture<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/soa\">soa<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/dbslayer\">dbslayer<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/gearman\">gearman<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/services_digg\">services_digg<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/net_gearman\">net_gearman<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/pear\">pear<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/sql\">sql<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/couchdb\">couchdb<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/flickr\">flickr<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/amazon\">amazon<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/scalability\">scalability<\/a>, <a class=\"performancingtags\" rel=\"tag\" href=\"http:\/\/technorati.com\/tag\/performance\">performance<\/a><\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li class=\"share-email\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-email sd-button share-icon\" href=\"mailto:?subject=%5BShared%20Post%5D%20Services%20Oriented%20Architecture%20with%20PHP%20and%20MySQL&body=http%3A%2F%2Fwww.bytebot.net%2Fblog%2Farchives%2F2008%2F04%2F15%2Fservices-oriented-architecture-with-php-and-mysql&share=email\" target=\"_blank\" title=\"Click to email a link to a friend\" data-email-share-error-title=\"Do you have email set up?\" data-email-share-error-text=\"If you&#039;re having problems sharing via email, you might not have email set up for your browser. You may need to create a new email yourself.\" data-email-share-nonce=\"866dd0382d\" data-email-share-track-url=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/services-oriented-architecture-with-php-and-mysql?share=email\"><span>Email<\/span><\/a><\/li><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-793\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/services-oriented-architecture-with-php-and-mysql?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-793\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/services-oriented-architecture-with-php-and-mysql?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-793\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/services-oriented-architecture-with-php-and-mysql?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>Joe Stump, Lead Architect, Digg. Slides should make its way at Joe&#8217;s website soon enough. Mainly works on the backend, makes sure its scalable, can all the Digg buttons be served, et al. Application layer is loosely coupled from your data. Whole point of SOA? You can put a service in front of the DB, [&hellip;]<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li class=\"share-email\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-email sd-button share-icon\" href=\"mailto:?subject=%5BShared%20Post%5D%20Services%20Oriented%20Architecture%20with%20PHP%20and%20MySQL&body=http%3A%2F%2Fwww.bytebot.net%2Fblog%2Farchives%2F2008%2F04%2F15%2Fservices-oriented-architecture-with-php-and-mysql&share=email\" target=\"_blank\" title=\"Click to email a link to a friend\" data-email-share-error-title=\"Do you have email set up?\" data-email-share-error-text=\"If you&#039;re having problems sharing via email, you might not have email set up for your browser. You may need to create a new email yourself.\" data-email-share-nonce=\"866dd0382d\" data-email-share-track-url=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/services-oriented-architecture-with-php-and-mysql?share=email\"><span>Email<\/span><\/a><\/li><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-793\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/services-oriented-architecture-with-php-and-mysql?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-793\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/services-oriented-architecture-with-php-and-mysql?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-793\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/www.bytebot.net\/blog\/archives\/2008\/04\/15\/services-oriented-architecture-with-php-and-mysql?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_options":[]},"categories":[23],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4vJD-cN","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":538,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2007\/04\/27\/diggcom-scales-japanese-character-set-data-warehousing","url_meta":{"origin":793,"position":0},"title":"Digg.com scales; Japanese Character Set; Data Warehousing","date":"27\/4\/2007","format":false,"excerpt":"I missed a couple of talks that I'd really have liked to attend, for various reasons (probably the fact that at the MySQL conferences, staff also have a tonne of meetings and customers\/people to meet). Thanks to the great bloggers, I don't feel so bad for missing such talks. And\u2026","rel":"","context":"In &quot;MySQL&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":528,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2007\/04\/23\/scaling-twitter-is-twitter-is-udp-or-tcp-its-definitely-udp","url_meta":{"origin":793,"position":1},"title":"Scaling Twitter: &#8220;Is Twitter is UDP or TCP? Its definitely UDP.&#8221;","date":"23\/4\/2007","format":false,"excerpt":"Presented by Blaine Cook, a developer from Odeo, now probably CTO of Twitter (Obvious Corp spawed, I think). There's a video and slides (yes, you need evil Flash so I haven't viewed it myself). Then there are my notes... possibly with some thoughts attached to them. No, they're not organized,\u2026","rel":"","context":"In &quot;MySQL&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":544,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2007\/05\/02\/scaling-mysql-presentations","url_meta":{"origin":793,"position":2},"title":"Scaling MySQL presentations","date":"2\/5\/2007","format":false,"excerpt":"Everyone likes to scale - Peter Van Dijck has got some top 10 presentations listed - Twitter, Flickr, LiveJournal, Six Apart (Vox), Last.fm, SlideShare, etc. Guess what these sites are all generally backed by? You guessed right - go MySQL. I however didn't know that Bloglines was backed by Sleepycat.\u2026","rel":"","context":"In &quot;MySQL&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1579,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2009\/08\/27\/sharding-for-the-masses-introducing-the-spider-storage-engine","url_meta":{"origin":793,"position":3},"title":"Sharding for the masses: Introducing the SPIDER storage engine (OpenSQLCamp @ FrOSCon)","date":"27\/8\/2009","format":false,"excerpt":"This is the Sharding for the masses: Introducing the SPIDER storage engine by Giuseppe Maxia, given at OpenSQLCamp, at FrOSCon, in August 2009. These are somewhat live notes, and the slides are available too. Sharding for the massesView more documents from Giuseppe Maxia. Why sharding? Scaling, of course. The MySQL\u2026","rel":"","context":"In &quot;MySQL&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1357,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2009\/03\/01\/posterous-and-friendfeed-talk-infrastructure","url_meta":{"origin":793,"position":4},"title":"Posterous and FriendFeed talk infrastructure","date":"1\/3\/2009","format":false,"excerpt":"A couple interesting things coming out of startup land. For one, Posterous has a little writeup on Building and Scaling a Startup on Rails: 12 Things We Learned the Hard Way. Good things to take away include using Sphinx\/Solr for search, but the real important takeaway for the MySQL crowd\u2026","rel":"","context":"In &quot;MySQL&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":437,"url":"http:\/\/www.bytebot.net\/blog\/archives\/2006\/10\/25\/recent-mysql-happenings-from-digg-to-1000-for-a-video-contest","url_meta":{"origin":793,"position":5},"title":"Recent MySQL happenings: from digg, to $1,000 for a video contest","date":"25\/10\/2006","format":false,"excerpt":"There have been some interesting MySQL happenings lately. First we had the Enterprise\/Community split. I'll talk more about that in terms of distributions shipping it, as I'll be liaising with them. But today, I'm going to talk about Digg. I listen to Diggnation, a surprisingly funny podcast to keep track\u2026","rel":"","context":"In &quot;MySQL&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"amp_enabled":true,"_links":{"self":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/posts\/793"}],"collection":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/comments?post=793"}],"version-history":[{"count":0,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/posts\/793\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/media?parent=793"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/categories?post=793"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.bytebot.net\/blog\/wp-json\/wp\/v2\/tags?post=793"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}