Darwinian Web
Adam Green's thoughts on the evolution of the Internet

Posts tagged as: plagiarism

The fine line between plagiarism and aggregation

Posted on Monday, January 16, 2006 at 5:13 PM (permalink)

As the publisher of a brand new RSS aggregator I'm sensitive to this issue. I fretted publicly over the issues of publishing excerpts versus full feeds. In the end I decided to err on the side of caution and only publish the first few sentences of each RSS item, and to strip out all HTML tags to make the post less functional. My reasoning was that this would force the users to visit the original site if they were interested in a post. I was also afraid of getting caught up in the current blog/splog controversy. Om Malik has been on the warpath about this issue since he discovered sites that were blatantly copying his feed and claiming it as their own. Now he has set his sights on the new aggregator TopTenSources. In the interests of full disclosure, I should say that I know some of the people involved in this site, including one of the investors, John Palfrey. It is precisely because I know them that I find it hard to believe that they are knowing engaging in anything disreputable, let alone illegal. Palfrey is about as upright as they come, and along with being the director of the Berkman Center, he is a Harvard Law professor specializing in cyber law. So I'm going to assume that TopTenSources is fully complaint with the law. What remains to be determined is if they have stepped over the bounds of accepted aggregator behavior.

The first thing to look at is whether they are republishing the RSS feeds as their own content. That was the thing that set Om off in the first place, and that drove others, like John Battelle, into his camp. TopTenSources clearly states their role as an aggregator on the home page:

Top 10 Sources is a directory of sites that bring you the freshest, most relevant content on the Web. We know it's impossible for anyone to keep track of the 20 million+ online sources of information. So our editors search Web 2.0 -- blogs, podcasts, wikis, news sites, and every kind of syndicated sources online -- by hand.
The pages within the site don't include this statement, but each blog's feed items start with the name of the original blog. This blog name is not a link back to the owner's site, which is something I would change. Each item's headline is a link back to the original post. Overall, I'd say there is no attempt to blur the true owner of the feed items on the part of TopTenSource.

The more troublesome issue is TopTenSource's use of complete feeds, including images and links. In some cases the items republished are quite long. Again, I assume the legality of this use, but it does appear to step over the line of common behavior. I think they would be much safer only reprinting the first paragraph, especially in the current climate.

I hope this doesn't turn into another Tech Memorandum firestorm, because that will make it harder for any of us who want to work in the area of online aggregation.