John Herren’s Blog

Yahoo Pipes adds support for serialized PHP

April 3, 2008 · 18 Comments

A few days ago I sent an email to Chad Dickerson, who I’ve met at Yahoo! and had a chance to hang out with at Mashup Camp in Dublin.

Chad,

From what I can tell, if you create a Pipe and add additional fields (Shortcuts, Term Extraction), the only way to get to them in an API-like way is to use the JSON renderer. The RSS renderer removes those extra fields to follow the RSS spec. PHP supports JSON decoding, but you need a PEAR library or a quite recent version of PHP. If Yahoo supported serialized php with Pipes like you do with the other common API’s, it would be a lot easier for folks on shared hosting to work with Pipe data on the server side. I imagine with the new badge stuff you released that there’s a push to keep things client side, but there’s a huge advantage to rendering server-side to keep things nice and spiderable.

Short Version:

Expose Pipe results as serialized PHP. Pretty please.

Chad sends this along to the Pipes team, and less than three days later:
Pipes Blog » Blog Archive » New Yahoo Pipes PHP serialized output renderer

kick.
ass.

John Herren and Chad DickersonTwo points to be made: first, I’m damn impressed that one of the largest sites on the ‘net would roll a feature request from an outside developer in less than three days. Second, developers should never resist the urge to ask for help from an API provider. If a company is taking the time to support an API, chances are very good that they will listen to developers and react. I can personally say I’ve gotten immediate results from Technorati, Dapper, and now Yahoo!. So blow off the idea that a big website would never listen to little ol’ developer you. With that negative attitude it’s guaranteed you’ll never get it. Ask, believe, receive, right?

So props to Chad, Jonathan Trevor, Paul Donnelly, and the rest of the Pipes team!

The Details

I’m a big fan of Yahoo Pipes. It’s an incredibly useful tool for putting together quick aggregators and filters for mashups. To integrate a Pipe on a webpage, you have a few options. You can go the cut and paste route and use a Badge, which works client side, or you roll your own code to integrate a pipe.

Put this in your pipe..

After you run a Pipe, you’re given a list of output formats. Copy the link location of these to get the URL of the output and tweak the parameters.

Until yesterday, the output formats useful for mashups were JSON and RSS. JSON is great for client side mashups, but as you know, search engines will not index client side content, so you lose any SEO love you might get. RSS is easy to consume server side, but Pipes will normalize the output to conform to the RSS spec. That means if you’re using term extraction or Shortcuts or any other meta data to your pipe, you’ll lose it with RSS ouput unless you put that data into one of the RSS fields (title, description, etc.). So that leaves us with hacking JSON on the server side. The JSON output format retains all that sweet metadata. In PHP, the best options are a JSON PEAR module or, if you’re rocking 5.2 and above, you have the handy json_decode() function.

Now that Yahoo supports serialized PHP, using Pipe output just got a lot easier. I made a Pipe to add Term Extraction info from any RSS feed. Basically what we’re doing is automatically tagging all the posts in the feed and to retrieve the tags in your own script, all it takes is:

<?

$pipeURL = ‘http://pipes.yahoo.com/pipes/pipe.run?_id=Zli1l6UB3RG_l7ZvX0sBXw&_render=php&rssurl=‘;
$feedURL = ‘http://rss.news.yahoo.com/rss/topstories‘;

$tags = array();
$response = unserialize(file_get_contents($pipeURL.rawurlencode($feedURL)));
foreach ($response['value']['items'] as $item) {
foreach ($item['tags'] as $itemTags){
$tags[] = $itemTags['content'];
}
}
var_dump($tags);

At this point $tags is and array of all of the terms from the feed. Now what could be done with that data?

Serialized PHP or JSON?

If you have json_decode() available in your PHP install, is there any advantage to using JSON over serialized PHP? Let’s find out.

File Size

Saving the output directly to disk gave me

JSON – 51192 bytes
Serialized PHP – 56885 bytes

Because of syntax and PHP’s type specification, serialized PHP is about 11% larger than JSON. This ratio will increase as the number of elements in your output increases.

Decoding Speed

How long does it take to slurp these formats into PHP variables? My tests decode each 100 times.

JSON
real    0m0.269s
user    0m0.264s
sys     0m0.004s

Serialized PHP
real    0m0.088s
user    0m0.088s
sys     0m0.000s

It’s clear that unwinding serialized PHP is faster than JSON, so it’s a better choice performance-wise despite being slightly bigger over the wire.

Categories: PHP · mashups

18 responses so far ↓

  • Danielle // April 3, 2008 at 3:22 pm | Reply

    NIIIIIIIIICE one Johnny!! :)

  • jtbarker // April 3, 2008 at 10:11 pm | Reply

    Nice BRO I love it keep the tech news cooming I eat it up as soon as it comes out. You should check out my tech section. Let me know what you think. I would like for you to be in my contest.

  • Glen // April 4, 2008 at 12:11 am | Reply

    A great story, congrats to Yahoo! and the Pipes team for such a fast turnaround.

  • Mag // April 4, 2008 at 3:07 am | Reply

    My experience is different. I asked them for a simple feature: randomize a list. I want to be able to choose n random items from a list, not the first or last n items.

    More than a year later, still no such feature. It would be really useful, but maybe they think it breaks their ads or something… I dunno.

  • Cristian George Strat // April 4, 2008 at 6:16 am | Reply

    There are some security issues to consider when unserialize()-ing data from 3rd parties.

    For one thing, try unserializing `a:10000000000:{}`. It will either hit the PHP memory limit in an instance, crashing the script, or it will hang indefinitely until it crashes.
    Also, it is rather easy to produce a fatal error. Try unserializing a PDO object with `O:3:”PDO”:0:{}` or an object of an unknown class.

    Second, a 3rd party could trigger any __wakeup() method or __autoload() mecahnism you may have. This is not a security flaw per se but becomes one if, for instance, your __wakeup() methods use up considerable resources.
    Let’s say you do 3 database queries during Widget::__wakeup() for integrity checking. Your 3rd party could easily make you run __wakeup() for a considerable number of times.

    There’s also the issue of input validation. Say you wanna check that Yahoo! Pipes sends a certain kind of nested array every time. Try writing the validation code before actually doing anything with the data. Most probably, it will be almost the same as decoding the data from a neutral format like JSON, XML or YAML.

    serialize() and unserialize() are really nice and convenient sometimes. Depending on the nature of your application and on how much you’re willing to trust Yahoo! or any other 3rd party, this may be the way to go.

  • John // April 4, 2008 at 8:47 am | Reply

    Does this mean that you will revive TagCloud?

  • John Herren // April 4, 2008 at 11:03 am | Reply

    @Christian: That’s an excellent summary of the security concerns with the serialization functions. Remember, safety first kids :)

  • fumiNET // April 4, 2008 at 1:44 pm | Reply

    Finally! Thanks for kicking this into motion.

  • taylordavis.com » Blog Archive » links for 2008-04-04 // April 4, 2008 at 6:46 pm | Reply

    [...] Yahoo Pipes adds support for serialized PHP « John Herren’s Blog (tags: mashups programming) [...]

  • Yahoo! Cool thing of the Day » Blog Archive » More Pipe Faucets // April 7, 2008 at 7:02 pm | Reply

    [...] if you’re on a shared service that doesn’t have JSON, don’t feel like dealing with XML, or just want to have a rather nice speed boost out of the deal, it’s now easier than [...]

  • tecosystems » links for 2008-04-08 // April 8, 2008 at 12:31 am | Reply

    [...] Yahoo Pipes adds support for serialized PHP « John Herren’s Blog good on Yahoo for their behavior here (tags: johnherren php pipes yahoo developers community) [...]

  • Community News: New Yahoo! Pipes PHP serialized output renderer | Development Blog With Code Updates : Developercast.com // April 8, 2008 at 8:56 am | Reply

    [...] Zend Developer Zone and by John Herren, Yahoo! has added a new feature to its Pipes functionality – serialized PHP results. Until now JSON output has been the only way to obtain all the data flowing through a Pipe. [...]

  • Cory Comer’s Personal Blog » Blog Archive » Yahoo Pipes // April 10, 2008 at 10:49 pm | Reply

    [...] I was browsing through a few blogs yesterday and I came across a post by John Herren about Yahoo [...]

  • Internet Alchemy » links for 2008-05-21 // May 21, 2008 at 6:47 pm | Reply

    [...] Yahoo Pipes adds support for serialized PHP « John Herren’s Blog A good comment on the pitfalls of unserialising PHP from a web service (tags: php security) [...]

  • Pipes Blog » Blog Archive » Pipes badges in the wild and cool blog posts // May 30, 2008 at 11:55 am | Reply

    [...] special thanks to John Herren for an awesome post on our newly added support for serialized php output. In his post he shows how to use Pipes [...]

  • Niks // October 6, 2008 at 8:04 am | Reply

    hii, i like the article you wrote. i also wrote an article on Serialization here : http://kaniks.blogspot.com
    feel free to post your comments

    thanks
    cheers

  • CELLBAN // February 4, 2009 at 12:01 am | Reply

    Ive been using yahoo pipes with wordpress syndication. You can actually take a mash up of several different feeds from yahoo pipes and have them post to your site. You can even have it randomize the rss so the feeds get mixed in with each other.

  • CELLBAN // February 4, 2009 at 3:34 pm | Reply

    These are the instructions for setting up an automatic rss post to your self hosted wordpress blog. (wordpress.org for details)
    you need to get the extension/plugin (FeedWordPress) from wordpress.org…
    run your rss feed with yahoo pipes and get the url.
    In your wordpress admin go to the syndication settings. in the add new source feed- past the pipes url. press the syndicate button. Now you will see the feed on your list at the bottom of the page. Click on the edit button below the feed. choose the settings for automatic update. select the category to post to, save it, and your done. You will now get post from yahoo pipes and they will appear in your blog post.

Leave a Comment