A few days ago I sent an email to Chad Dickerson, who I’ve met at Yahoo! and had a chance to hang out with at Mashup Camp in Dublin.
Chad,
From what I can tell, if you create a Pipe and add additional fields (Shortcuts, Term Extraction), the only way to get to them in an API-like way is to use the JSON renderer. The RSS renderer removes those extra fields to follow the RSS spec. PHP supports JSON decoding, but you need a PEAR library or a quite recent version of PHP. If Yahoo supported serialized php with Pipes like you do with the other common API’s, it would be a lot easier for folks on shared hosting to work with Pipe data on the server side. I imagine with the new badge stuff you released that there’s a push to keep things client side, but there’s a huge advantage to rendering server-side to keep things nice and spiderable.
Short Version:
Expose Pipe results as serialized PHP. Pretty please.
Chad sends this along to the Pipes team, and less than three days later:
Pipes Blog » Blog Archive » New Yahoo Pipes PHP serialized output renderer
kick.
ass.
Two points to be made: first, I’m damn impressed that one of the largest sites on the ‘net would roll a feature request from an outside developer in less than three days. Second, developers should never resist the urge to ask for help from an API provider. If a company is taking the time to support an API, chances are very good that they will listen to developers and react. I can personally say I’ve gotten immediate results from Technorati, Dapper, and now Yahoo!. So blow off the idea that a big website would never listen to little ol’ developer you. With that negative attitude it’s guaranteed you’ll never get it. Ask, believe, receive, right?
So props to Chad, Jonathan Trevor, Paul Donnelly, and the rest of the Pipes team!
The Details
I’m a big fan of Yahoo Pipes. It’s an incredibly useful tool for putting together quick aggregators and filters for mashups. To integrate a Pipe on a webpage, you have a few options. You can go the cut and paste route and use a Badge, which works client side, or you roll your own code to integrate a pipe.
After you run a Pipe, you’re given a list of output formats. Copy the link location of these to get the URL of the output and tweak the parameters.
Until yesterday, the output formats useful for mashups were JSON and RSS. JSON is great for client side mashups, but as you know, search engines will not index client side content, so you lose any SEO love you might get. RSS is easy to consume server side, but Pipes will normalize the output to conform to the RSS spec. That means if you’re using term extraction or Shortcuts or any other meta data to your pipe, you’ll lose it with RSS ouput unless you put that data into one of the RSS fields (title, description, etc.). So that leaves us with hacking JSON on the server side. The JSON output format retains all that sweet metadata. In PHP, the best options are a JSON PEAR module or, if you’re rocking 5.2 and above, you have the handy json_decode() function.
Now that Yahoo supports serialized PHP, using Pipe output just got a lot easier. I made a Pipe to add Term Extraction info from any RSS feed. Basically what we’re doing is automatically tagging all the posts in the feed and to retrieve the tags in your own script, all it takes is:
<?
$pipeURL = ‘http://pipes.yahoo.com/pipes/pipe.run?_id=Zli1l6UB3RG_l7ZvX0sBXw&_render=php&rssurl=‘;
$feedURL = ‘http://rss.news.yahoo.com/rss/topstories‘;
$tags = array();
$response = unserialize(file_get_contents($pipeURL.rawurlencode($feedURL)));
foreach ($response[‘value’][‘items’] as $item) {
foreach ($item[‘tags’] as $itemTags){
$tags[] = $itemTags[‘content’];
}
}
var_dump($tags);
At this point $tags is and array of all of the terms from the feed. Now what could be done with that data?
Serialized PHP or JSON?
If you have json_decode() available in your PHP install, is there any advantage to using JSON over serialized PHP? Let’s find out.
File Size
Saving the output directly to disk gave me
JSON – 51192 bytes
Serialized PHP – 56885 bytes
Because of syntax and PHP’s type specification, serialized PHP is about 11% larger than JSON. This ratio will increase as the number of elements in your output increases.
Decoding Speed
How long does it take to slurp these formats into PHP variables? My tests decode each 100 times.
JSON
real 0m0.269s
user 0m0.264s
sys 0m0.004s
Serialized PHP
real 0m0.088s
user 0m0.088s
sys 0m0.000s
It’s clear that unwinding serialized PHP is faster than JSON, so it’s a better choice performance-wise despite being slightly bigger over the wire.
April 3rd, 2008 at 3:22 pm
NIIIIIIIIICE one Johnny!! 🙂
April 3rd, 2008 at 10:11 pm
Nice BRO I love it keep the tech news cooming I eat it up as soon as it comes out. You should check out my tech section. Let me know what you think. I would like for you to be in my contest.
April 4th, 2008 at 12:11 am
A great story, congrats to Yahoo! and the Pipes team for such a fast turnaround.
April 4th, 2008 at 3:07 am
My experience is different. I asked them for a simple feature: randomize a list. I want to be able to choose n random items from a list, not the first or last n items.
More than a year later, still no such feature. It would be really useful, but maybe they think it breaks their ads or something… I dunno.
April 4th, 2008 at 6:16 am
There are some security issues to consider when unserialize()-ing data from 3rd parties.
For one thing, try unserializing `a:10000000000:{}`. It will either hit the PHP memory limit in an instance, crashing the script, or it will hang indefinitely until it crashes.
Also, it is rather easy to produce a fatal error. Try unserializing a PDO object with `O:3:”PDO”:0:{}` or an object of an unknown class.
Second, a 3rd party could trigger any __wakeup() method or __autoload() mecahnism you may have. This is not a security flaw per se but becomes one if, for instance, your __wakeup() methods use up considerable resources.
Let’s say you do 3 database queries during Widget::__wakeup() for integrity checking. Your 3rd party could easily make you run __wakeup() for a considerable number of times.
There’s also the issue of input validation. Say you wanna check that Yahoo! Pipes sends a certain kind of nested array every time. Try writing the validation code before actually doing anything with the data. Most probably, it will be almost the same as decoding the data from a neutral format like JSON, XML or YAML.
serialize() and unserialize() are really nice and convenient sometimes. Depending on the nature of your application and on how much you’re willing to trust Yahoo! or any other 3rd party, this may be the way to go.
April 4th, 2008 at 8:47 am
Does this mean that you will revive TagCloud?
April 4th, 2008 at 11:03 am
@Christian: That’s an excellent summary of the security concerns with the serialization functions. Remember, safety first kids 🙂
April 4th, 2008 at 1:44 pm
Finally! Thanks for kicking this into motion.
April 4th, 2008 at 6:46 pm
[…] Yahoo Pipes adds support for serialized PHP « John Herren’s Blog (tags: mashups programming) […]
April 7th, 2008 at 7:02 pm
[…] if you’re on a shared service that doesn’t have JSON, don’t feel like dealing with XML, or just want to have a rather nice speed boost out of the deal, it’s now easier than […]
April 8th, 2008 at 12:31 am
[…] Yahoo Pipes adds support for serialized PHP « John Herren’s Blog good on Yahoo for their behavior here (tags: johnherren php pipes yahoo developers community) […]
April 8th, 2008 at 8:56 am
[…] Zend Developer Zone and by John Herren, Yahoo! has added a new feature to its Pipes functionality – serialized PHP results. Until now JSON output has been the only way to obtain all the data flowing through a Pipe. […]
April 10th, 2008 at 10:49 pm
[…] I was browsing through a few blogs yesterday and I came across a post by John Herren about Yahoo […]
May 21st, 2008 at 6:47 pm
[…] Yahoo Pipes adds support for serialized PHP « John Herren’s Blog A good comment on the pitfalls of unserialising PHP from a web service (tags: php security) […]
May 30th, 2008 at 11:55 am
[…] special thanks to John Herren for an awesome post on our newly added support for serialized php output. In his post he shows how to use Pipes […]
October 6th, 2008 at 8:04 am
hii, i like the article you wrote. i also wrote an article on Serialization here : http://kaniks.blogspot.com
feel free to post your comments
thanks
cheers
February 4th, 2009 at 12:01 am
Ive been using yahoo pipes with wordpress syndication. You can actually take a mash up of several different feeds from yahoo pipes and have them post to your site. You can even have it randomize the rss so the feeds get mixed in with each other.
February 4th, 2009 at 3:34 pm
These are the instructions for setting up an automatic rss post to your self hosted wordpress blog. (wordpress.org for details)
you need to get the extension/plugin (FeedWordPress) from wordpress.org…
run your rss feed with yahoo pipes and get the url.
In your wordpress admin go to the syndication settings. in the add new source feed- past the pipes url. press the syndicate button. Now you will see the feed on your list at the bottom of the page. Click on the edit button below the feed. choose the settings for automatic update. select the category to post to, save it, and your done. You will now get post from yahoo pipes and they will appear in your blog post.
November 24th, 2009 at 9:38 pm
[…] Yahoo Pipes adds support for serialized PHP […]
April 15th, 2010 at 3:26 pm
I tried the above, and I can’t get it to work for me…
Your code, typed exactly as you typed above, works great as a test php page on my server. But any time I change the URLs to my pipe, it doesn’t work!
What exactly do I put in the first two lines?
I tried :
$pipeURL = ‘http://pipes.yahoo.com/pipes/pipe.run?_id=406d1664961fc2cce8f2d324fd4497a8&_render=php&rssurl=’;
$feedURL = ‘http://pipes.yahoo.com/pipes/pipe.run?_id=406d1664961fc2cce8f2d324fd4497a8&_render=rss’;
But I get the following error message:
Warning: Invalid argument supplied for foreach() in ../testing.php on line 16
(line 16 was this:
foreach ($item[‘tags’] as $itemTags){
)
I don’t really understand the technical aspect of this, I just want to get my feeds brougt in correctly. Can anyone help? What have I done wrong?
April 15th, 2010 at 3:26 pm
I posted a comment to this, which is being held for moderation (probably because of the links I included). Hopefully someone can retrieve it, and answer my question 🙂
April 15th, 2010 at 3:41 pm
Luise, $pipeURL is the address of the Yahoo pipe I created. You don’t want to change that one. $feedURL is the address of the feed you want to analyze. In my example, I’m using Yahoo’s Top Stories feed. You can replace that with any address you like. You’ll definitely want to add some error checking to this code.. it’s just for demonstration purposes. Good luck!
April 16th, 2010 at 1:30 pm
Now you’ve totally confused me — If I’ve created my own Yahoo Pipe, then why wouldn’t I use my pipe in the $pipeURL feed? Maybe I’m miss-understanding the purpose of your code. I thought that it was to retrieve the data from the pipe so that I can maniuplate it, include it in a web page and style it the way I want and make it look pretty. Is this not its purpose?. I’m trying to retrieve the title, the author, the pubDate, the description, etc. from the data that my yahoo pipe spits out.
April 16th, 2010 at 1:34 pm
FYI – my pipe is at: http://pipes.yahoo.com/bridgeblogging/mainfeed. (See http://www.bridgeblogging.com for its intended purpose).
I want to try to include this pipe in my web page using php. I know how to include it with Javascript, but I want to make use of SEO and include it in the page itself. It needs to be fast too, as it will be on the main page of the website.