Darwinian Web
Adam Green's thoughts on the evolution of the Internet

Posts tagged as: aggregator

Is there an optimal size for a reading list?

Posted on Friday, February 17, 2006 at 9:23 AM (permalink)

I was reading Jim Moore's blog for the first time, and I came across this recommendation in his instructions for new users of OPML reading lists:

"Q: How many feeds should my reading list optimally have?

A: 5-6 seems like the optimal number. Do try to keep your list under or at 10. "
I've seen this idea before and it's always puzzled me. Amy Bellinger makes the same point:
"I've subscribed to a couple of reading lists that are way way too big -- either too many feeds, or feeds having such a large number of posts that the effect in my aggregator is to make me sigh and want them out of there. It's going to give people a bad impression of reading lists if they try the concept and end up feeling inundated. I think list makers might want to show that they've done a lot of work to put their lists together so they go for volume. Maybe they'd be wiser to put their brainpower into making the most judicious choices, or in drawing up groups of lists with 4-6 feeds each."
And so does Dave Winer:
"Too many feeds in a reading list makes for an overwhelming user experience. Before you publish a reading list you should try using a few to get an idea what it's like. In most situations, ten feeds is a lot of feeds for a reading list. "
One issue may be the technical problem of an aggregator not being able to load too many feeds at once. The solution to that problem is to try a different aggregator. I just tested Anne Zelenka's BlogHer reading list in BlogBridge, and it took under 60 seconds to open 129 blogs and read 1,525 posts. Waiting a minute doesn't sound excessive, and posts were visually loading the whole time, so it was clear that the program was still working. I don't think that experience would scare any user away.

I'm sure Jim, Amy and Dave have the user's best interests in mind, but saying people will be intimidated by more than a dozen feeds is rather patronizing. It's like telling students "now don't read too many books on a subject, or you'll get overwhelmed." That is a very individual preference. What is too many books to buy when you want to learn about all the new Web 2.0 technologies? I don't know, but I can count over 50 on my bookshelves. How many books should I buy to learn about philosophy? 100? 200? I'm sure I bought more than that while I was studying history of science. (You probably thought I was going to drop the H-bomb again, didn't you?)

Maybe a better solution would be to signal the user on what to expect before they open an OPML file. I support the tradition of listing the file size next to links for big files, like videos or .mp3s. It could be a good idea if links to reading lists were followed with something like: "(100 feeds)." Then the users could decide for themselves if 10 feeds was optimal.

Update: Amyloo clarifies her comments: it's not my aggregator that stumbles when it gets fed a list yielding 1,500 posts; it's my brain." Her aggregator may not choke when it sees that many feeds, but it clearly isn't giving her a presentation that augments her brain. I still think it is a problem that can be solved with software, perhaps a good feed grazing tool, when it is available. The goal should be to allow users to comprehend and manipulate more information, not to restrict the available information to our inherent limits. I couldn't write papers of any serious length during college, so I switched my major from English to Chemistry. After college there were word processors, and I started writing books.

Dynamic reading lists just got easier to read

Posted on Wednesday, February 8, 2006 at 6:34 PM (permalink)

BlogBridge version 2.13 has just been released, making it a lot easier to view dynamic reading lists in OPML. It is free and supports Windows, Mac, and Linux. There are plenty of RSS aggregators that allow you to import OPML files as a quick way of subscribing to a large number of feeds, but these are basically a static form of subscription. BlogBridge, on the other hand, is able to stay in synch with the original OPML. If you subscribe to an OPML file on a server, and the contents of the file changes, then the set of feeds that show up in BlogBridge also changes. This is what separates a reading list from a regular OPML file. The internal format isn't different in a reading list, it is the fact that the contents of the file changes over time that makes an OPML file into a dynamic reading list.

It isn't completely obvious how to open an OPML file as a reading list in BlogBridge, as opposed to just subscribing to all of the file's feeds, so here are the basic steps:

  1. Select 'Add Guide' from the Guides menu.
  2. Enter a title for the new Guide.
  3. Select the 'Reading List' tab.
  4. Click the '+' button.
  5. Enter the URL of a reading list. If you can't find one to try, you can get started with the one I have created based on Tech Memorandum.
  6. Click 'Check and Add'.
  7. Click 'Add'.
  8. When the new Guide appears, all of the feeds listed in the reading list will be read, and the feed items will then appear.
By default, BlogBridge only checks for new contents in the reading list when it is first run. This is fine if you tend to start the program, read some feeds, and then close it. If you keep the program open, as I do, you will probably want to tell it to recheck the contents of the OPML regularly and resynch to match. This is done by:
  1. Selecting 'Preferences' from the Tools menu.
  2. Clicking the 'Reading Lists' tab.
  3. Changing the 'Check for changed Reading Lists' setting to 'Once per Day' or 'Once per Hour.'
My enthusiasm on this subject has prompted some emails asking if I have a financial interest in promoting BlogBridge and reading lists, and the answer is no, although I am friends with Pito Salas, BlogBridge's project leader. I am actively looking for other aggregators that support OPML reading lists, so if you know of one, let me know about it and I'll be glad to write it up. I've said before that I believe RSS is a key component of the Web's future growth, and OPML reading lists are a great way of delivering RSS.

I've got a fever, and the only prescription is more feeds!

Posted on Tuesday, February 7, 2006 at 7:11 AM (permalink)

I've been accused of being obsessive, but I can't help it. I gotta have more feeds. This whole subject of real-time feed aggregation, or feed grazing as it's now being called, has really caught my imagination. So I've been looking for something that will cure my fever. I haven't found an aggregator that will satisfy my craving completely, but there are a number of websites that demonstrate the type of interface I need. I'll list them in the hopes that someone will build an OPML capable aggregator with this type of presentation:

  • AliveNews has a cool realtime display, but it must be a proof of concept rather than a real site, because it doesn't have any options for expanding the list of pre-defined feeds. Still, the fade-in of feed excerpts is sweet.
  • Digg spy is really compelling for the ADD set, but it isn't a true feed aggregator, and it makes me twitch if I watch it for too long.
  • LiveMarks by Alex Bosworth applies a different take on this type of presentation to Delicious bookmarks. (via ProgrammableWeb)

I finally grok OPML reading lists

Posted on Sunday, February 5, 2006 at 8:52 AM (permalink)

The best way for me to understand a new software technology is to start writing code that supports it. I finally did that with OPML reading lists, and as Marc Canter would say, it is coolio! You can find all the details on my mashup blog. The short version of the story is that an OPML file based on all of the blogs cited on Tech Memeorandum is generated every hour and placed here where you can grab it and use it as a reading list. I poke fun at Dave Winer from time to time, but I can see that OPML reading lists really do take RSS to the next level. Good work Dave, and this time I'm around to see that your role doesn't get erased from history.

IE7's aggregator isn't impressive, but it is good enough

Posted on Thursday, February 2, 2006 at 12:37 PM (permalink)

Trying to recharacterize a quote once it is loose in the blogosphere can be a tricky business. In my initial thoughts on IE7 I wrote that it would likely kill many RSS aggregators that did little more than let you read feeds. Richard MacManus linked to this and wrote "Adam Green thinks IE7 will kill a lot of independent RSS Aggregator products, due to IE7's impressive RSS integration features." The first clause was mine, but the second clause wasn't. I don't fault Richard. He was using me as an example to prove his point, but I don't want to leave the impression active that I think IE7's use of RSS is impressive. In fact, it is just the opposite. IE7 is a very weak aggregator, but it will still drive out the other independent aggregators, because it will be part of IE.

Microsoft long ago mastered the trick of calculating exactly the minimal feature set needed to suck the air out of a market it wants to enter. They do about half of this the first time around, and eventually reach the minimal set by about the third version. Then they stop completely. This is the thing I hate the most about Microsoft's monopoly over the software market. Take a look at Excel and Word. They are basically frozen with a feature set that is over 10 to 15 years old. Microsoft knows that people aren't willing to go through the bother of switching products if most of their needs are met. More features beyond the minimal set means more bugs, so Microsoft has nothing to gain once a market is theirs. The result is a stifling of innovation. It was just this stifling that led so many in the software industry to flee to the Internet in the mid-Nineties.

My favorite example of the Microsoft effect is the graphing in Excel. It absolutely sucks. I have been using it for years, and I still have no idea how to create what I want. Each time I use it I just keep whacking away at it until I get close to what I want, and then I stop. Once when my son was creating some complex graphs for a science project, I went to some download sites and got a few shareware graphing packages. He was amazed by their power and ease of use. He asked why Microsoft didn't do graphs this well, and my answer was "Because they don't have to." I then explained my theory of the Microsoft effect. (Yes, having me as a father can be a bit tedious. My kids usually know better than to ask my opinion on software. My wife won't even stay in the room when software comes up.)

So does this mean that we are doomed to a life of mediocre aggregators when IE7 wins? I am afraid so, but I hope not. What I really hope is that Scott Karp's vision will be realized: "The New Media revolution will come when content is completely atomized and fully tagged, so that it can be remixed into perfectly tailored packages to suit every taste, i.e. truly what I want (when I want it)." But the aggregator publishers have to move fast. Once IE7 is cleaned up enough to release, it will shut down much of the opportunities to find new users. That doesn't mean that the average user is lazy or stupid. It means that they have a life, and seeking out the ultimate aggregator won't be a high priority for them.

News aggregation is the next battleground

Posted on Wednesday, February 1, 2006 at 11:34 AM (permalink)

The blogosphere has come to accept the idea of online feed aggregators, as long as only excerpts of posts are republished. Now that Google has become everyone's favorite target, the subject of news aggregation looks to be the next area of dispute. The World Association of Newspapers is making the latest "they're stealing our content" accusations. They are objecting to the use of excerpts by search engines as a violation of fair use, and the group's president, Gavin O'Reilly, has adopted the catchy phrase of "Napsterization" to describe the process. Remember the old line about never picking a fight with someone who buys ink by the barrel? I guess we will find out if the people who buy ink and paper can take on one of the world's biggest purchasers of networked PCs and bandwidth. (via Susan Mernit)

Initial thoughts on IE7 and RSS

Posted on Wednesday, February 1, 2006 at 7:43 AM (permalink)

There has already been plenty of discussion of the new preview release of IE 7, so I won't try to list everything new. Besides I'm too busy to dig deeply into features that are likely to change before it is released. What I would like to do is list a few clear effects the final release of IE7 will have on RSS and aggregation, most of which are illustrated by this screenshot.


  1. "RSS feed" will be contracted to just "feed" in common usage. IE7 uses the term "feed" throughout its interface without mentioning RSS once, as far as I can tell. This makes sense, since "RSS feed" is as redundant as saying "HTML web page." It also means that the public won't have to be aware of the many feed formats, such as Atom, or RDF versus non-RDF RSS.
  2. The icon will replace the many variations on RSS and XML icons. IE7 uses the former throughout its interface, so this will rapidly become synonymous with the term feed in the public's mind.
  3. Categories will finally be utilized. IE7 lists all of the categories in the currently displayed feed, and allows easy selection of posts via a category. I've done a good amount of research into the use of the category tag in feeds, and it is currently used by surprisingly few blogs.
  4. Feed serving bandwidth will go through the roof. IE7 allows automatic updates of feeds and you are reminded of this with every feed you read. Any Windows user knows what its like when Microsoft decides to remind you of something. Let's just say that only the truly anti-establishment will be able to ignore the continual requests to turn on automatic synchronization, and those people will be using Firefox anyway. It seems that turning this feature on automatically sets it for all subscribed feeds. As with any Microsoft software setting, once you turn on synchronization, you have to work real hard to find a way to turn it off. As the screenshot shows, synchronization will continue even when IE7 isn't running. From what I can tell, the default interval is 60 minutes, but this can be changed to a shorter period. I'd tell you how short the interval can be, but I can no longer figure out how to reach this setting in the program. The combination of these factors means that virtually all IE7 users will turn on synchronization of all their feeds and then leave this running whenever their computer is on. Get ready to start paying some serious hosting bills.
Will IE7 kill all the independent aggregation products? The simple answer is yes for any aggregator that just collects feeds and allows you to read posts as they are found in the feed. This is sad, but it also means that aggregator publishers will be forced to innovate at a much greater speed. After all, it's not as if they couldn't see this coming.

John Palfrey explains the issues of RSS copyright

Posted on Tuesday, January 17, 2006 at 7:02 PM (permalink)

John Palfrey has responded to the questioning of Top10Sources' use of RSS feeds exactly the way a law professor should, by turning it into an opportunity to educate the blogosphere on the finer points of copyright law in relation to RSS and blogs. As the Executive Director of the Berkman Center for Internet and Society, he explores the potential risks to the Internet from overly restrictive limits on the use of RSS feeds by aggregators. The Berkman Center currently holds the copyright for the RSS 2.0 specification, and Palfrey handles this responsibility by explaining the best way for RSS to fulfill its potential. Finally, as a founding partner of the RSS Investors LP venture fund and the founder of Top10Sources, Palfrey protects his investment by skillfully deflecting the criticisms levelled against the new aggregation site. He sure is in the middle of RSS, isn't he? Let's take a look at some of his arguments, since they are a blueprint of where RSS and copyright law intersect.

Palfrey contends that aggregators like Top10Sources are not violating copyright law, but acknowledges that this is still an unresolved issue. What I find interesting is the way he casts the opponents of his view:

"The strong form of the pro-copyright argument runs like this: the creator of the RSS feed retains, automatically, all copyrights in the content in the feed and retains all rights in its republication, use as a derivative work, and so forth. Given that those rights have been retained fully by the creator of the site, the argument goes, it is unlawful for someone -- presumably in a commercial context -- to republish that copyrighted context without license to do so. This is the Web 2.0 variant of the argument that is litigated frequently in the context of web-based content, with plaintiffs like the RIAA and the MPAA (in the p2p context), the publishers (like McGraw-Hill, or Perfect 10) who are suing Google, and the like."
I can't judge the legal argument, but I respect his tactics. I don't think there is a single blogger who wants to be on the same side as the RIAA or MPAA.

He warns his readers of the consequences of the "strong form" of copyright being applied to RSS:
"Is the blogosphere arguing itself right into a trainwreck of the sort that has played out over music and movies? Consider the world that A (prominent) VC envisions, here  and here, wherein content is micro-chunked and syndicated. This world cannot emerge if every plausible copyright claim is asserted and litigated.
Palfrey's most valuable recommendation is that bloggers should add a copyright statement to their feeds.
"Creative Commons licenses, as I've argued on this blog, are the way to go -- to embed them into the RSS feeds when they go out, with clear instructions for your intent. If you want people to run your feed in private aggregators, but not in public aggregators that are for-profit, to re-offer your content just as you've offered it, and to attibute authorship to you, why not add to your feed a BY-NC-SA license?"
I agree. When examining feeds for inclusion in my aggregator, I was surprised to find that none of them contained a copyright notice. My feed had one, but I've now updated it to match my site's Creative Commons license, which spells out exactly what a republisher is permitted do.

How does Top10Sources carry out Palfrey's less restrictive view of RSS copyright?
"As the editor compiles the site, the editor sends out an e-mail to the person who appears to be responsible for the site, or, sometimes, posts a comment to say that the site has been chosen. The site renders a list of those sites offering the feeds as directlinks to the page. The site also subscribes to those feeds and renders them all together on a single page."
So the site has adopted an opt-out model for aggregation. Top10Sources notifies the feed owner, and the owner has the responsibility of requesting that a feed be removed. As a practical matter, this is the only way to run an aggregator. As I've mentioned in other posts, my attempts to gain permission from feed owners in advance of launching my RubyRiver aggregator was met with almost a complete lack of response. RSS was built to promote syndication, and an aggregator is a valuable part of that model. Requiring an opt-in model would limit the potential of RSS, and stifle an important avenue for Internet communication. As Palfrey says, "fundamentally, RSS is ads" for the blog and aggregators are a vital channel for these ads.

One question left unanswered by Palfrey's response is the amount of a feed that should be republished, especially in light of the site's opt-out model. He admits that this is an evolving area:
"I expect to take up this issue again with the management team once again. I don't think there's anything being done wrong from the perspective of the law. But we should take up for discussion some of the ethical issues that Mike Rundle and Om Malik raise and suggestions that Adam Green makes about how much of a given feed that the site republishes -- maybe a truncated version of the feeds is the right thing to render."
This debate over aggregation will certainly continue, but for now I find it fascinating to watch Palfrey navigate the current controversy. From a PR perspective I give him an A. I attended his Harvard Extension School class on cyberlaw a few years ago (which probably accounts for the academic tone I find myself adopting here), and frankly, he is a lot more interesting now that he has to apply his legal theories to a company in which he holds an important stake. I wish all Harvard profs had this real world opportunity. I hope people like Om Malik continue to hold his feet to the fire. The blogosphere will benefit from his involvement.

Problem with RSS categories

Posted on Sunday, January 15, 2006 at 8:04 AM (permalink)

The first version of my RubyRiver aggregator displays all the items in a feed without any filtering, which allows items unrelated to Ruby to appear. This morning I decided to explore filtering based on the category tag. The RSS 2.0 specification states that "You may include as many category elements as you need to, for different domains, and to have an item cross-referenced in different parts of the same domain." So there should be no problem. All I had to do was extract the category tag and select items that contained "Ruby" in that tag.

Here is an example from the blog Eric's Ponderings, which contains some useful Ruby programming posts, but also switches to football whenever the University of Texas Longhorns win a big game. Here is a portion of one of his RSS feed items about Ruby and Java:

<category>Software Development</category>
<category>Ruby<category>
<category>Java<category>
<category>Ruby On Rails<category>
<category>java<category>
<category>ruby<category>
<category>build<category>
<category>maven<category>
I don't know why he includes Ruby twice, but that shouldn't get in the way of my code, as long as there is at least one category I can match.

Checking the rest of my feeds, however, brought out a problem with the way some blogs use categories. For example, the O'Reilly Ruby blog is entirely about Ruby, so the authors don't feel the need to include Ruby in the categories. This is apparently assumed from the context. Instead the categories are terms like Opinion, News, and Articles. This makes sense within the blog, but doesn't help when the feed is aggregated with many others.

I can solve the problem in my own code by identifying feeds within my .opml feed list as Ruby specific or multi-topic. This will allow my code to use the category tag only when parsing multi-topic feeds. Unfortunately this requires me to go to extra effort when adding new feeds, which in turn means that any user of my code will have to understand this issue as well. General purpose aggregators aren't likely to use this solution, which means that filtering for Ruby categories in a generic aggregator will filter out some of the blog posts that the user would want.

This type of inconsistency in applying tags to blog posts illustrates the hurdles that still must be overcome before RSS can fulfill its promise. The specification is willing, but the patterns of usage are weak.

Ruby RSS aggregator is running

Posted on Wednesday, January 11, 2006 at 8:40 AM (permalink)

My first RSS aggregator written in Ruby is now up at RubyRiver.org. It's still primitive, but it seems to be working OK. My goal in building this is to create a tutorial for new Ruby programmers. I've been publishing the code on my Ruby blog as I've been writing it. The complete code will be available for free download in a day or two, and the tutorial will be posted on RubyRiver as it is developed.

Running multiple blogs has given me an interesting insight into one of the problems of having people read a site through RSS. I've spoken to a number of people who read this blog, and when I mention my interest in Ruby, they usually say "You should start a Ruby blog." When I say that I already write one, they ask how they can find it. That's when I realize that they read the RSS feed and have never seen the link to the Ruby site on my navbar. So how do I solve this? Should I plug everything I do in every blog post? That was why navbars got added to websites.

Is a planet a splog?

Posted on Friday, December 16, 2005 at 9:43 AM (permalink)

I continue to be puzzled by the ethics surrounding RSS aggregators. I have been planning on building an RSS 'river of news' aggregator for Ruby, and my research has brought up the aggregators called planets, which aggregate full feeds from a large number of blogs. I've looked at many of these planet sites, and none of them have a description of the relationship between the aggregator and the aggregatees. Did they all choose to be included or did the aggregator simply add them to a list? Are these planets really splogs? They don't appear to be, because they aren't plastered with ads.

Beware the dark side of the force

Posted on Monday, December 5, 2005 at 6:47 AM (permalink)

Russell Beattie reveals that he is being tempted by the dark side of the force. I understand his dilemma. I'm building a personal, web-based aggregator for my Really Simple Blog project, which will allow me to add pages full of posts on specific subjects. So, am I a splogger? Aren't I just providing a convenient service for my readers? Where is the line? Am I still clean if I only allow partial posts? Aren't blog search engines just giant splogs?