Darwinian Web
Adam Green's thoughts on the evolution of the Internet

Posts tagged as: berkman

OPML Camp will be held at Harvard Law School on May 20-21

Posted on Tuesday, March 14, 2006 at 2:08 PM (permalink)

The original plan was to hold OPML Camp at my house in late April, which would have limited attendance to a maximum of 30 people. After two weeks we already had 17 people signed up, so it looked like we would fill up well before the event. The Berkman Center for Internet & Society has now agreed to be a partner in organizing OPML Camp, and is providing space at Harvard Law School in Cambridge, Mass. We'll have use of a lecture hall that can hold 75-100 people, and some smaller classrooms for breakout sessions. Due to scheduling requirements at the Law School, we have to move the date to the weekend of May 20-21, but that will give us time to fill the extra space. The weather should be a lot better also. Many thanks to John Palfrey, Berkman's Executive Director, and Colin Maclay and Catherine Bracy of the Berkman staff for making this possible.

You can find complete details and a sign up page on the OPML Camp blog. OPML Camp is inspired by events like Bar Camp and Mashup Camp, and is completely free and open to anyone interested in RSS and OPML.

For those of you who were looking forward to the camp sessions scheduled for my hot tub, don't worry. We'll hold a barbeque the first night of the camp at my house, and the pool will also be open and heated by then.

John Palfrey explains the issues of RSS copyright

Posted on Tuesday, January 17, 2006 at 7:02 PM (permalink)

John Palfrey has responded to the questioning of Top10Sources' use of RSS feeds exactly the way a law professor should, by turning it into an opportunity to educate the blogosphere on the finer points of copyright law in relation to RSS and blogs. As the Executive Director of the Berkman Center for Internet and Society, he explores the potential risks to the Internet from overly restrictive limits on the use of RSS feeds by aggregators. The Berkman Center currently holds the copyright for the RSS 2.0 specification, and Palfrey handles this responsibility by explaining the best way for RSS to fulfill its potential. Finally, as a founding partner of the RSS Investors LP venture fund and the founder of Top10Sources, Palfrey protects his investment by skillfully deflecting the criticisms levelled against the new aggregation site. He sure is in the middle of RSS, isn't he? Let's take a look at some of his arguments, since they are a blueprint of where RSS and copyright law intersect.

Palfrey contends that aggregators like Top10Sources are not violating copyright law, but acknowledges that this is still an unresolved issue. What I find interesting is the way he casts the opponents of his view:

"The strong form of the pro-copyright argument runs like this: the creator of the RSS feed retains, automatically, all copyrights in the content in the feed and retains all rights in its republication, use as a derivative work, and so forth. Given that those rights have been retained fully by the creator of the site, the argument goes, it is unlawful for someone -- presumably in a commercial context -- to republish that copyrighted context without license to do so. This is the Web 2.0 variant of the argument that is litigated frequently in the context of web-based content, with plaintiffs like the RIAA and the MPAA (in the p2p context), the publishers (like McGraw-Hill, or Perfect 10) who are suing Google, and the like."
I can't judge the legal argument, but I respect his tactics. I don't think there is a single blogger who wants to be on the same side as the RIAA or MPAA.

He warns his readers of the consequences of the "strong form" of copyright being applied to RSS:
"Is the blogosphere arguing itself right into a trainwreck of the sort that has played out over music and movies? Consider the world that A (prominent) VC envisions, here  and here, wherein content is micro-chunked and syndicated. This world cannot emerge if every plausible copyright claim is asserted and litigated.
Palfrey's most valuable recommendation is that bloggers should add a copyright statement to their feeds.
"Creative Commons licenses, as I've argued on this blog, are the way to go -- to embed them into the RSS feeds when they go out, with clear instructions for your intent. If you want people to run your feed in private aggregators, but not in public aggregators that are for-profit, to re-offer your content just as you've offered it, and to attibute authorship to you, why not add to your feed a BY-NC-SA license?"
I agree. When examining feeds for inclusion in my aggregator, I was surprised to find that none of them contained a copyright notice. My feed had one, but I've now updated it to match my site's Creative Commons license, which spells out exactly what a republisher is permitted do.

How does Top10Sources carry out Palfrey's less restrictive view of RSS copyright?
"As the editor compiles the site, the editor sends out an e-mail to the person who appears to be responsible for the site, or, sometimes, posts a comment to say that the site has been chosen. The site renders a list of those sites offering the feeds as directlinks to the page. The site also subscribes to those feeds and renders them all together on a single page."
So the site has adopted an opt-out model for aggregation. Top10Sources notifies the feed owner, and the owner has the responsibility of requesting that a feed be removed. As a practical matter, this is the only way to run an aggregator. As I've mentioned in other posts, my attempts to gain permission from feed owners in advance of launching my RubyRiver aggregator was met with almost a complete lack of response. RSS was built to promote syndication, and an aggregator is a valuable part of that model. Requiring an opt-in model would limit the potential of RSS, and stifle an important avenue for Internet communication. As Palfrey says, "fundamentally, RSS is ads" for the blog and aggregators are a vital channel for these ads.

One question left unanswered by Palfrey's response is the amount of a feed that should be republished, especially in light of the site's opt-out model. He admits that this is an evolving area:
"I expect to take up this issue again with the management team once again. I don't think there's anything being done wrong from the perspective of the law. But we should take up for discussion some of the ethical issues that Mike Rundle and Om Malik raise and suggestions that Adam Green makes about how much of a given feed that the site republishes -- maybe a truncated version of the feeds is the right thing to render."
This debate over aggregation will certainly continue, but for now I find it fascinating to watch Palfrey navigate the current controversy. From a PR perspective I give him an A. I attended his Harvard Extension School class on cyberlaw a few years ago (which probably accounts for the academic tone I find myself adopting here), and frankly, he is a lot more interesting now that he has to apply his legal theories to a company in which he holds an important stake. I wish all Harvard profs had this real world opportunity. I hope people like Om Malik continue to hold his feet to the fire. The blogosphere will benefit from his involvement.

The fine line between plagiarism and aggregation

Posted on Monday, January 16, 2006 at 5:13 PM (permalink)

As the publisher of a brand new RSS aggregator I'm sensitive to this issue. I fretted publicly over the issues of publishing excerpts versus full feeds. In the end I decided to err on the side of caution and only publish the first few sentences of each RSS item, and to strip out all HTML tags to make the post less functional. My reasoning was that this would force the users to visit the original site if they were interested in a post. I was also afraid of getting caught up in the current blog/splog controversy. Om Malik has been on the warpath about this issue since he discovered sites that were blatantly copying his feed and claiming it as their own. Now he has set his sights on the new aggregator TopTenSources. In the interests of full disclosure, I should say that I know some of the people involved in this site, including one of the investors, John Palfrey. It is precisely because I know them that I find it hard to believe that they are knowing engaging in anything disreputable, let alone illegal. Palfrey is about as upright as they come, and along with being the director of the Berkman Center, he is a Harvard Law professor specializing in cyber law. So I'm going to assume that TopTenSources is fully complaint with the law. What remains to be determined is if they have stepped over the bounds of accepted aggregator behavior.

The first thing to look at is whether they are republishing the RSS feeds as their own content. That was the thing that set Om off in the first place, and that drove others, like John Battelle, into his camp. TopTenSources clearly states their role as an aggregator on the home page:

Top 10 Sources is a directory of sites that bring you the freshest, most relevant content on the Web. We know it's impossible for anyone to keep track of the 20 million+ online sources of information. So our editors search Web 2.0 -- blogs, podcasts, wikis, news sites, and every kind of syndicated sources online -- by hand.
The pages within the site don't include this statement, but each blog's feed items start with the name of the original blog. This blog name is not a link back to the owner's site, which is something I would change. Each item's headline is a link back to the original post. Overall, I'd say there is no attempt to blur the true owner of the feed items on the part of TopTenSource.

The more troublesome issue is TopTenSource's use of complete feeds, including images and links. In some cases the items republished are quite long. Again, I assume the legality of this use, but it does appear to step over the line of common behavior. I think they would be much safer only reprinting the first paragraph, especially in the current climate.

I hope this doesn't turn into another Tech Memorandum firestorm, because that will make it harder for any of us who want to work in the area of online aggregation.

David Berlind at Harvard

Posted on Wednesday, January 11, 2006 at 7:39 AM (permalink)

David Berlind's luncheon talk at Harvard's Berkman Center was interesting, but we have different approaches to predicting the future. David takes the traditional journalist's perspective of discovering who is doing what to effect change, while I try to figure out why people will adopt that change and whether they have ever made a similar decision in the past. For example, David recounted the various software, standards making, and political activities taking place in an effort to replace Microsoft Office with an open standard. While I, on the other hand, tried to find any sign of people looking for an alternative to Office. Of course Word is buggy and cumbersome, but I can't remember any time in the last few years when a real end-user has told me that they wish they could switch from Word to something else. Are they too dumb to realize they would be better off with a replacement for Word? No, I think they are too smart to waste their time looking for alternatives when they just want to get their work done. They aren't lazy, they're busy. If you want to beat Word's monopoly, you have to give people a powerful reason to switch, you can't just point out a better alternative. In marketing terms you can't just present a better mouse trap, you have to convince people that they have mice and that not getting rid of them will cause serious consequences. This isn't meant as a criticism of David. What he does serves a valuable purpose, and he certainly knows his stuff when it comes to the issue of alternatives to Microsoft. He also needs to incorporate a more market driven and historical perspective into his analysis.

I made my pitch to get David to use the term copy protection instead of DRM. He was sympathetic to the issue, but not too interested in making the change. He understands the issues of DRM so much better than I do, including the legal aspects, that he sees the problem of DRM as much greater than just not being able to move content to a new computer. He is right, but you aren't going to get people upset enough to demand change by providing more details. You need a simple hook. Maybe copy protection isn't scary enough. I'm not great at coming up with really compelling marketing terms, but I do know how to recognize them. When I find a way to describe DRM that makes people's eyes go wide, I'll know I've hit the target. If you have any suggestions for a really repulsive name for DRM, please let me know. I'll do my best to promote it.

Podcast of Joshua Schachter talk

Posted on Saturday, October 29, 2005 at 4:17 PM (permalink)

The Berkman Center has made a podcast of the luncheon talk by Joshua Schachter available.

Joshua Schachter, part 2

Posted on Tuesday, October 25, 2005 at 10:05 PM (permalink)

I went back to Harvard tonight for Joshua's second session at the Berkman Center. What I found most interesting was his philosophy towards users. This shows up in his handling of both user tagging and spam attacks. He said that he is constantly getting demands from the more control oriented users to restrict the use of tags, either by establishing stylistic rules, such as capitalization and the use of special characters, or by constraining the tags that users can create. He refuses to do this, and always strives to give the users as much freedom as possible, even if it breeds confusion and inconsistencies. As a control freak I find this troubling, but I think he is right in this case. Tagging is so new that adding constraints now will cut off the more ineteresting behaviors before they have a chance to emerge.

Another example of this philosophy is the way he handles spam attacks. He has found that the site gets spammed every couple of days, with such common tactics as entering thousands of copies of the same URL, or entering thousands of URLs with the same tag. He seems to have good back-end systems to monitor this, but instead of returning an error message, as most programmers (including me) would do, he just has the system ignore these entries and hide them from the public. He says that eventually the spammer gets the idea and just gives up. He says that if he were to present an active defense, the spammers would just use this to find new methods of attack. So instead of trying to hit back, he just wraps the abuser in cotton and waits until they get bored. Very bright.

Joshua Schachter of Del.icio.us visits Harvard

Posted on Tuesday, October 25, 2005 at 3:31 PM (permalink)

I attended a luncheon discussion today with Joshua Schachter, the founder of Del.icio.us, sponsored by Harvard's Berkman Center. He said that Del.icio.us was the first public tagging site, a technology that is now attracting lots of interest from the major search engines. Basically, it allows users to store their browser bookmarks online, and to assign text tags to them. Anyone can search for URLs using plain text or the assigned tags, or view all the bookmarks created by any user. You can also subscribe to RSS feeds based on a combination of users and tags. The web 2.0 term for this type of socially created category scheme is a folksonomy. I have only used the system for a few searches and haven't registered to create my own tags yet, so I don't have a strong impression of its usefulness as a search engine, but the 2 dozen or so attendees at today's session were pretty rabid.

A blogpulse search shows relatively high interest in both tagging and Del.icio.us, but surprisingly folksonomy doesn't register very high.



Here are some of my notes from the meeting:

  • Joshua is a pretty low key, humble guy who built the system without any real expectation of how well it would catch on. He seems to be following his instincts and user requests, rather than trying to fulfill some grand plan.

  • I asked him about scaling both the technology and his role in the company, and he is taking on problems as they arise. The system is built with Mod_perl and a homegrown set of database tools. I know from my work with Slashdot that this can be scaled pretty far, but the hardware gets expensive. He has 8 employees, 6 of whom are programmers. Right now he is chief coder, architect, marketing head and CEO, which can't work for long. He said that he had just hired a President and a business development guy, but I wasn't sure if they counted within the current 8.

  • 50% of his traffic is from robots. He doesn't know the cause of this, but I bet his many upcoming competitors are trying to suck his tagging database out of the system.

  • Speaking of competitors, he applied for a job with Amazon, Google, Microsoft and Yahoo, and got turned down by all of them. Now they are all adding tagging to their system, and Amazon is an investor. Del.icio.us is also being courted by companies like Comcast and Nokia.

  • One of the most insightful things he said was that he refuses to post rankings of top sites, because that will just encourage people to find a way to spam the ranking system.
David Weinberger, a Berkman fellow, posted an interview with Joshua this afternoon that has more info. David is holding another session with Joshua this evening, which I plan on attending, so I'll post again about this tonight.