Adam Green's thoughts on the evolution of the Internet
Posts tagged as: rss
Feeling the urge to code
Posted on Monday, April 17, 2006
at 4:20 PM (permalink)
I haven't been blogging much lately, because I've been spending my time reading Python books. That's one thing that separates Python from Ruby that people don't seem to mention when comparing the languages. There are tons of Python books, while there is basically just one good Ruby book. I'm going to spend a day or two more of just reading, and then I'll start doing some basic RSS coding to test out the Python libraries. Once I feel comfortable with the language, I'll get back to exploring the various forms of microcontent and their potential relationship to OPML. Don't expect much blogging here for a week or so, but I will report on my initial RSS library experiments on my code blog.
Posted on Saturday, April 15, 2006
at 8:18 PM (permalink)
I spent the day reading Python books, and now I'm starting to look at the many libraries that are available for reading and writing RSS in Python. There are certainly more libraries available than in Ruby, and that makes Python seem like a place I should spend some time, but all of the libraries seem to have the same perspective, which is to treat RSS as a specific variety of XML. What if I don't care about DOM and SAX and stepping through a tree of XML? I keep thinking about what happened when dBASE first appeared in 1981. At that time the standard way to do database programming was to use CBasic or Microsoft Basic to construct a record in memory using string variables. It was up to the programmer to do what was called record blocking, where the strings were padded out to fixed lengths and then written out to a text file. If you wanted an index as well, you had to find a separate library and manage insertion and management of nodes in a B-tree. Then dBASE came along, and all you had to do was say APPEND or EDIT in a program. dBASE wasn't about database details, it was about application building. If you wanted an index, you just said "INDEX ON <field>," and the index was created and automatically maintained. Instead of being a database programmer, you could be an inventory programmer, or a doctor's office programmer. In fact, doctors could become programmers and create their own applications. That is what I want with RSS and OPML. I don't care whether a feed is RSS or RDF, or what the specific XML tags are. I want to be able to say "Give me the posts from this list of feeds that are less than 1 week old. Now combine them all into a new feed." That should be 3 or 4 lines of code. There has to be a reason that RSS programming is still about creating generic aggregators instead of RSS applications. It is a combination of tools and coders who are thinking about RSS as streams of XML data, and not application specific data that should be represented as vertical applications that have nothing to do with reading blogs.
Exploring the tacit knowledge between RSS and the Semantic Web
Posted on Friday, April 14, 2006
at 8:48 AM (permalink)
I started reading about the Semantic Web again last week, and my immediate reaction was the same as the first time I tried a few months ago. This is such a perfectly specified, intellectually rigorous collection of standards and practices that it seems almost impossible to find an entry point. If it is so hard to get started, then how does anyone work with it? The answer is that the people who understand it now are the same people who helped to build it. Each of the many sub-standards and protocols were introduced in reaction to a specific problem discovered during the creation of some other portion of this edifice. An analogy is trying to understand how an immense cathedral could possibly have been built by walking around the finished building. Once the scaffolding and the masses of workers are long gone, it seems like every part fits seamlessly into every other, and the thousands of decisions that were made during its construction are erased.
I decided to pull back a step and look at the areas of namespaces in RSS and the many competing standards for structured microcontent on the Web. This is much messier and clearly a work in progress, but once again as with the Semantic Web, the same individuals keep popping up in these many projects. The problem with the social nature of the construction of microformats, structured blogging, RSS, Atom, etc., is the unspoken, or at least underdocumented, aspects of the decision process. Why are there two competing sets of blog microcontent formats? Why are there apparently dozens of overlapping collections of RSS namespaces? The answers are lost in the maze of blog posts and standards announcements made over the last few years. Why isn't everyone involved with this area terminally confused? Because they lived through the process and understand the political, social, commercial aspects of each of these multiple body collisions.
What we now have is a continuum from the ultra-simplistic, under specified formats of RSS and OPML to the ultra-rigid, crystalline perfection of the Semantic Web. In between is a rabbit warren of partially completed, interconnected attempts to add more structure and functionality to RSS and HTML.
So what is the solution? I'm not conceited enough to believe that I can unravel the current mess lying between RSS and the Semantic Web, and I'm also not smart enough to try to storm the castle of the Semantic Web by brute intellectual force. What my past history has shown me is that I am capable of helping people build tools and writing documentation that can help bridge this gap. The process I'm going to follow is to start studying and coding with the RSS namespaces and microcontent formats until they gradually make sense, and then try to get tools built by others that will provide a more accessible conceptual model. In other words, I'm going to live there until I grok the neighborhood.
I went through the same process when I moved to Boston. The classic line when trying to explain how to navigate witihin Boston is "I can't tell you how to get there, but I can take you once and show you." This is a perfect example of tacit knowledge. It is something you and your community knows, but which can't be explained in words. It may be an urban legend, but there are many stories of truck drivers paying taxis to lead them through Boston's streets to a specific location. The only way to deal with Boston's streets is to carry a map for the first few weeks until your brain somehow builds the tacit knowledge you need to feel comfortable.
There has to be a better way to do XML programming
Posted on Tuesday, April 11, 2006
at 8:26 AM (permalink)
Two weeks ago I decided to do some programming for analysis of link patterns among bloggers. I had just given up on Ruby out of frustration over its poor XML lbraries, so I decided to try out PHP. I haven't used PHP since the late Nineties, but it is a simple enough language, so browsing a few books showed me what I needed to know. I was able to write the code to parse out the links from Tech Memorandum and then autodiscover the RSS feeds on these pages without much trouble at all. I'm not a great coder, but I can pick things up fast, and can generally force my way through most programming issues. Then I ran into the XML libraries in PHP and came to a dead halt again. I need to read through the RSS feeds of each blog I find on Tech Memeorandum to find the links to other blogs, and that means parsing the XML of these feeds. I've been beating on this problem on and off for the past week and a half, and am about to give up again. Giving up on a programming problem is not something I do lightly. The whole point of being a programmer is never letting the machine beat you. I also have enough confidence to think that if I'm having so many problems lots of other people are dealing with the same thing.
What I've decided to do in response is work with one of my favorite programmers from my Andover.net days to try and build a better language solution for XML processing, with an emphasis on RSS and OPML. A few weeks ago John Casey emailed me after I posted my frustration with Ruby, and asked why I don't just write my own language for this type of work. We've been talking about this ever since, and now I'm ready to go ahead. I'm not capable of writing my own XML parser, at least not one that isn't a horrible hack, but I do know a lot about language design, especially about making programming languages easy to use. John, however, is a great coder, and if he thinks he can write a clean, fast parser, I believe him.
The idea at first will be to create a library of functions that are real smart about RSS and OPML. We're not sure what language this will be working with, but since the library will be written in C, it should be possible to add it to all of the standard Web languages, like Perl, Python, PHP, etc. I'm interested in having the library handle all the standard tasks you would need when working with RSS and OPML, so it should be possible to read multiple feeds and combine them in interesting ways in just a few lines of code. Once this library is built, we can see about possibly extending it into more of a mini-language.
The working title for this library/language is OPML Script, but that name may change as its functionality expands to more general XML tasks. This will be released under an Open Source license of some type, so it will be available for no charge. John and I will share the ownership of the copyright, although there doesn't seem to be any likelihood of ever making money from it. I've said in the past that I didn't want to get directly involved with any startups for at least a year, but this is something that I need for my own work, so I don't have any choice. If I want something that will let me program in an easy manner, I'm going to have to help build it. We don't have any delivery schedule yet, but we hope to have something we can demonstrate by OPML Camp on May 20th.
Posted on Saturday, April 1, 2006
at 11:20 AM (permalink)
A couple of days ago I had breakfast with a former Chief Technology Officer of a REALLY big telco. He had attended the RSS Alley Geek Dinner the night before, and I could tell that even though he was one generation ahead of me, we had a similar take on software and computer technology. He was in Boston to have meetings with various people as a way of learning more about Web 2.0, so I volunteered to get together with him the next day to share my definition from a fellow CTO's perspective. I won't give his real name, because I didn't ask his permission, and this post isn't really about him. It is more about what any CTO needs to consider when trying to run a software development effort in the current Internet environment. For the purpose of this essay, I'll call him Jack.
The funny thing is that Jack's previous company had about 4,000 times more employees and sales than my company, yet we had exactly the same concerns about the new philosophy of development and business surrounding Web products. The insane thing is that Jack's company was valued at only 100 times that of my company when we got acquired, but that was the craziness of February, 2000.
I talked to Jack about four broad areas of change that any CTO needed to think about, but they all came down to one basic issue, a lack of control. It isn't that CTO's have to be control freaks, although they should be. It is a CTO's job to think ahead to what can go wrong, and try to make sure those blocks don't interfere with whatever technology tasks the company needs to accomplish. In a way, a CTO is like the lawyer for a company's technology, always looking for pitfalls well before they are reached. Web 2.0 forces a company to adopt the one thing any good CTO should loath, dependencies. You have to allow your company to be dependent on other people's code, their voices, their data, and their personal motivations that can't necessarily be overridden by money. Let me go through each of these dependencies:
Open Source. While much of Web 1.0 was built using Linux, Apache, Sendmail, and languages such as Perl and PHP, the philosophy of Open Source didn't become pervasive until the turn of the century. There are now Open Source components throughout a typical Web 2.0 application. For example, collective voting has applications in many areas beyond the traditional uses in sites like Digg.com or Reddit.com, and is now available through the Pligg software, which is Open Source. Other common Open Source components are found in blogging tools and wikis. Companies also have to consider the desire of their programmers to release their work for the company as Open Source. While this has obvious implications for intellectual property, it also creates a labor force of more productive programmers, because they can bring portions of their code with them when they change jobs.
Jack was understandably concerned about quality control when using code that isn't delivered and supported by a commercial vendor, but the benefits of a larger and more open community of users can deliver a more robust solution than one used by a few hundred or even thousands of commercial customers. Building with Open Source code also means faster development cycles, so instead of working for years and trying to deliver a perfectly specified and tested system, a more incremental approach based on existing components allows you to work towards a solution in an evolutionary fashion. The reality is that a project that takes several years to reach "perfection" has so much invested in it that it may be impossible to stop and rebuild when problems are discovered, so they are just built over with ever increasing layers of patches. In the long run, a CTO using Open Source code does have to reject the traditional Not Invented Here syndrome, and accept a greater dependence on other people's code. The trade off in shorter development cycles is worth it in my opinion.
Blogs. Web 2.0 also brings about a shift in the way a company's technology efforts are communicated to the outside world. Instead of thinking in terms of versions that are announced at long intervals through a traditional PR campaign, the use of corporate blogs helps customers stay much closer to the development process. This also means a cluster of independent bloggers interested in an area of technology can form around the companies working in this space. These tech bloggers have replaced the traditional trade press. It means that a CTO is dependent on voices that are not as tightly controlled as in the past, but these bloggers can also act as an important buffer when problems arise by explaining to the wider circle of users that the company is indeed working on solutions.
XML. The most common form of XML currently in use is RSS, but OPML is on the rise, and RDF based standards, such as Atom, are also gaining ground. In the long run, some form of global database resembling the Semantic Web will materialize. The key to all of this use of XML is the availability of a company's data outside the corporate database. While much is made of the emergence of APIs, it is the XML data that is available from these APIs that will cause the real changes in technological architectures. Just as Web 1.0 was built on loosely joined websites connected through HTTP and HTML, Web 2.0 will be built on loosely joined data structures based on data produced by many sources. So instead of a CTO building an application on a tightly controlled proprietary database schema, it will be necessary to plan for dependencies on data over which there is no control.
As a long-time database guy, Jack found that disturbing. I share his concern, but what must be understood is that users will demand this type of cross application sharing of data, because it is their data that is being combined from multiple sources. Sure there is a greater possibility of failure, and this must be handled by a CTO to allow for soft failures, instead of hard crashes. The one great fallacy that the XML proponents adhere to is the perfectability of XML data. Their motivation in building a Semantic Web is the goal of a Web that isn't filled with invalid data. I don't think that will ever happen, so a CTO should plan for badly formed XML, as is already the case in the RSS world.
Fear of excessive valuation. The traditional way to motivate developers, especially in a start-up situation, has been to offer them stock options. While that is still useful, the arithmetic has changed, because programmers who went through the Dotbomb have a deep fear of hype. A business journalist who was a former Dotcom employee recently told me that she still suffered from post traumatic stress disorder that prevented her from considering a start-up job. In the Web 1.0 period, there was an expectation of an IPO that would yield valuations in the hundreds of miliions of dollars. If a Web 2.0 company gets acquired for $10 - $20 million, that may be great for the founders, but it doesn't do much for a coder with a few thousand options. It is not just that the value of software companies have dropped. There is now deep suspicion of any claims of higher valuations in the future. Without the promise of getting rich, it is harder to persuade developers to put in the 18-20 hour days that helped build Web 1.0. This means that the CTO is more dependent on an employee's personal motivations, such as being able to build code that can earn them greater fame in the Open Source world.
Notice that I haven't mentioned any of the popular themes of Web 2.0, such as social bookmarking and tagging. These have their place, but I'm skeptical that there really will be a mass market for meta-meta-bookmarking sites. I don't think that the real contribution of Web 2.0 will be these specific areas of functionality. I do believe, however, that the tools and techniques I have described here will be used to build the next generation of products and sites, and that these will be what are used by the generation of users who are entering college now, and will be entering the workforce 4 to 5 years from now.
Posted on Monday, March 27, 2006
at 7:15 AM (permalink)
I've been watching Danny Ayers' attempts to have Semantic Web people consider outputting RSS and OPML data or using OPML tools to visualize Semantic Web data. I respect and applaud his efforts, but I wasn't surprised by the universally negative reactions. I know that users of RDF based formats have tremendous disdain for RSS and OPML as being poorly defined, which they admittedly are. What I was shocked by was the tone and terms used in the responses. There is an almost religious sense of RSS and OPML as evil, and a possible source of spiritual contamination. Now Semantic Web people are extremely intelligent, as they'll be quick to admit, so what could have happened to them to cause such an adverse reaction to what is simply a set of formats for text files? It is easy to point to the creator of RSS and OPML as the root of this negative feeling, he certainly is mentioned often in the response to Danny's pleas. But that is just scapegoating. I think the visceral emotion exhibited, almost a form of terror, at the idea of having to co-exist with RSS and OPML, has a deeper cause that fits into the religious fervor with which it is voiced.
When Tim Berners-Lee first gave mankind the Web, he made a tragic mistake. He granted us free will to use less than perfect HTML. His tools, and the tools of those to follow him, allowed users to develop sinful habits based on ignorance and sloth. The result was a Web of corrupt data, in which misformed tags abounded. This great fall from grace by the users of the Web prevented it from ever attaining the state of perfection desired by all computer scientists, a completely machine readable database. So the disciples of Berners-Lee, with his blessing, developed XML as a way of wiping the Web clean of the sinful and broken HTML, and replacing it with perfectly specified and implemented data. Now, just as the second coming of the Web is in sight in the form of the Sematic Web (well, its been in sight for years, but we'll put that aside), here comes a poorly specified corruption of XML, what Danny jokingly calls "quasi-XML", that threatens to again lead mankind astray. Is it any wonder that Semantic Web devotees are reacting as if RSS and OPML are the work of Satan?
Do you find all of this over the top? Good. That is the point of satire. I find the reactions of Semantic Web people over the top as well. It's just data. Converting from one format to another is so trivial that even I can write the code to do it. Surely anyone who can code for RDF could import or export RSS and OPML. Why should anyone do it? As Danny keeps pointing out, there are millions of RSS users. In time many of them are likely to use OPML as a container for RSS. There is no reason why OPML can't be viewed as a bridge between these two sides of the Web. But then if I was in league with the devil, I would say something like that, wouldn't I? After all, my namesake was led astray by the devil once before.
Posted on Friday, March 24, 2006
at 5:15 AM (permalink)
"OK," the answer comes back, "we can now see what you are doing with OPML, but why bother? OPML is poorly specified, it isn't nearly as complete as an RDF based standard like the Semantic Web, and it's inevitably going to be the center of political firestorms because of who created it." Let me present the basic arguments that persuaded me to spend so much time supporting the format:
My first blog post to get any links pointed out that RSS was helping to explode the Web's architecture. What I meant by that is the growing trend to make all Web content available via RSS. This effectively everts the traditional website, putting the content outside in a machine readable form. If you accept that RSS will be a major architectural component of the future Web, whether or not the users know they are using technology based on RSS, then OPML as a container for multiple RSS text streams deserves attention. OPML allows us to easily create and consume reading lists of multiple RSS feeds, pushing back the limits of infoglut by at least an order of magnitude. If you can read 10-20 blogs on their websites, and 100-200 RSS feeds in an aggregator, then reading lists each containing 100-200 feeds allow us to juggle over a 1,000 feeds. Not easily, but it is at least possible. It isn't the final solution, but the fallacy is believing there is an ultimate solution in technology. It is a journey, not a destination. RSS and OPML are just steps. RDF is another step, admittedly a big one. The Semantic Web, if we reach it, will just be a temporary resting place.
OPML is not just a container for RSS. It is a general purpose outline structure (hence the name Outline Processing Markup Language), which allows the construction of hierarchies based on any type of XML data. The new OPML 2.0 specification will make that possible through the use of namespaces. This means that any XML formatted data can be incorporated into an OPML outline. There are two big areas of Web data that fit into this model: microcontent and API results. If microcontent, meaning individual molecules of data floating free in the bloodstream of the web, is to become a viable delivery mechanism for information, then a structural equivalent of a protein is necessary to package these molecules in a consumable form. OPML is a first step towards that structure. As for API results, I've already performed a simple experiment demonstrating the use of OPML as a container for this type of data. I plan on doing a lot more work in this area. OPML data combined with a good viewer makes the construction and delivery of mashup data a trivial task.
If we need a building block for the next generation of the Web, why even stop at OPML? Why not go to something perfect like RDF, asks the RDF crowd. OK, maybe I'm mischaracterizing some of them, maybe they just think it is many orders of magnitude better. I tried reading about RDF and the Semantic Web, and I had to stop because I was afraid I was coming down with narcolepsy. I don't think I'm smart enough to grok the Semantic Web yet, and anyone who reads this blog knows I think I'm pretty smart. I need to work my way up to that level of complexity, and the way I do that is by blogging, and writing code, and helping to design tools with a simpler, more accessible format like OPML. I'm conceited enough to believe that I'm at least as smart as the average computer user, so if I need to work myself up to the Semantic Web one step at a time, they probably do also.
Does this mean that all of the work on OPML will eventually be wasted. Will it all have to be thrown away? First of all, after working in the software industry for 26 years, I know that all software is eventually thrown away. When I moved out of my last house, my wife made me throw away an entire dumpster full of software packages. At least now it can all be done by just wiping a hard disk. But that doesn't mean the present OPML development work is a waste. OPML tools are built to work with XML data, and despite its flaws, that is what OPML is inside. Converting from OPML 2.0 to OPML with namespace extensions to RDF is an evolutionary process, which is the way I believe that all software is created in practice. As long as I've slipped into Darwin territory, let me repeat one of his favorite mottos: "Natura non facit saltum." Nature does nothing in jumps. I believe that since software is a product of human nature, it also moves in slow, often inefficient, and gradual steps. I am fully convinced that the virtual product line I am helping to construct around OPML will make the transition to a fully XML based Web more smoothly and with more users than waiting several years until the computer scientists perfect the Semantic Web.
Finally, we get to the political issue. Sure there are firestorms around RSS and OPML. Are they more vicious than the ones around RDF? I have no idea, but if RDF is being created by humans, then there are fights, and cliques, and petty jealousies in the RDF world also. If you want to see how someone is able to overcome the name calling surrounding the battle between OPML and RDF, read Danny Ayers' blog. He's been doing an amazing job of trying to get RDF people to output OPML and OPML people to see how there is a better world on his side of this debate. I am learning a lot from Danny, and if I can't work out a way to get him to Boston for OPML Camp, I plan on flying over to Italy to see if he can teach me about the Semantic Web without me falling asleep.
Posted on Wednesday, March 22, 2006
at 7:49 AM (permalink)
I talked to Scott Matthews of Bitty yesterday by phone, and learned a lot about what he wants to accomplish with the product. Some bloggers have been describing Bitty as an OPML viewer, like Optimal or Grazr, but it really does more than that. Bitty is a full web browser than can display HTML web pages as well as RSS and OPML files. Scott's goal is to allow people to create a "picture in picture" experience within their web pages. For example, I have added a Bitty browser below to display the contents of my mashup blog:
Admittedly this is a silly example, but there are many great uses for Bitty. Imagine a site for a fan of a sports team using multiple Bitty browsers on the page. There could be a window for browsing the official team site, and other windows showing the sites of the top players. Bitty would also have been useful when people were demonstrating the differences between the US version of Google and their censored China search site. Both Google sites could be put side by side on the same web page. The key to this idea is that the user of a page with a Bitty browser can navigate a different site or even the entire web without leaving the original page. I know lots of people who dislike reading blogs, because they keep getting sent off to other pages to follow the bloggers train of thought. Within a few links, they lose track of where they started. Creating a browsable container within the constant context of a web page would be very useful as a way of solving this problem.
Bitty can also be used as viewer for OPML and RSS files. Here is an example with this blog's feed:
Whether you prefer Bitty's RSS display over Grazr or Optimal is a matter of personal taste. I like Optimal's expand and collapse outline, and Grazr's compactness. On the other hand, when the RSS item contains a link to a web page, Bitty can display the page without launching a new window. One thing to be aware of when displaying RSS or OPML with Bitty is the fact that it adds a set of Yahoo advertisements at the end of the RSS or OPML content. Bitty is the only one of these products to display ads. User reaction to advertising within one of these web page widgets is still something that has to be worked out in the marketplace.
One feature that Grazr should definitely adopt from Bitty is the ability to launch itself into a separate, smaller window. This is done by clicking the small launch button on the top right corner of the title bar. This window can then remain open after you have left the page where you found the Bitty browser. I'm sure Scott can also get ideas for Bitty by studying Grazr and Optimal as well.
In a field that is this new and this active, we are sure to see lots of cross pollination, which is great for users. I'm working on getting Scott and Optimal's Dan Mactough to come up to Boston for OPML Camp, so they can sit down with Gazr's Mike Kowalchik. I love it when things are at a stage where authors can just talk to other authors, instead of company reps talking to each other.
Posted on Sunday, March 19, 2006
at 8:44 AM (permalink)
The response to the public alpha test of Grazr has been tremendous, but one thing that seems to have escaped most people's notice is the fact that it can display the contents of an RSS feed, even if there is no OPML file. Here is an example of Grazr displaying the RSS file for this blog. Note that the URL it is using is my RSS feed, not an OPML file. One great application for this is a listing on every blog's page of its most recent posts. The visitor can read past blog posts without leaving the current page. You can use this capability to display the result of searching for tags on sites like Del.icio.us, or Digg. The alpha version of Grazr still needs work on the display format for posts, but that will be cleared up soon. Since Grazr can display photos without having to open a new browser window, you can also use it to display images from RSS feeds delivered by Flickr, or NASA.
Posted on Wednesday, March 15, 2006
at 8:50 AM (permalink)
I've been hinting about Grazr for a while now, but I didn't post anything explicit because it wasn't open for public testing. Mike Kowalchik, Grazr's author, has now opened up his website to allow anyone to try an Alpha version of the product with any OPML file, and to put a copy on their own web site. Here is an example of Grazr with a sample OPML file:
Grazr uses Javascript, so if you don't see a really cool OPML viewer here, your web browser or aggregator isn't displaying Javascript output. You may have to change your browser preferences. You can also experiment with different OPML files on the Grazr site, as well as create a script to run your own copy of Grazr. If you want to see the raw OPML that is being displayed above, it is here.
The dimensions of Grazr are flexible, and you can shrink the text size, so you can also put one on your side navbar. James Corbett has been using it on his Eirepreneur blog for a while to display his Open Irish Directory. This form factor makes a great reading list widget, because your visitors can actually do feed grazing right on your web page. The term grazing may still be foreign to many people, but basically Grazr is a very capable RSS aggregator. There are still a number of cosmetic issues to improve, but that is why this is still an Alpha version. If you want to make suggestions for improvements, I'm sure Mike will be glad to get the feedback. Here's an example of Grazr with my Tech.Memeorandum dynamic reading list:
I've been helping Mike with the design of Grazr for a while now, but I have no financial relationship with the product or any of Mike's efforts. This is true of all the OPML products I'm working with, and actually all other products and websites. I don't believe in blogging as an independent observer and taking money or equity positions at the same time. There is nothing wrong with writing a company blog, however, such as Mike's own blog, or the blog of any other company employee, as long as that is the clear position of the blogger. For now, I'd rather remain independent, so I'll avoid any financial relationships. When I decide to join a startup or become an investor in one, I'll probably stop this blog and start a new one. At the very least, I'll make my new situation known and change the focus of my writing.
Any product based on the idea of feed grazing owes a debt of gratitude to James Corbett for inventing the term and promoting the concept. James has been a big supporter of Grazr and the rest of the OPML community.
Posted on Saturday, March 11, 2006
at 3:28 PM (permalink)
Ever since I did my last burst of Ruby programming a week ago, I've been trying to figure out where I want to go with Ruby and Web programming in general. The truth is that I'm hitting a few roadblocks that are rather frustrating. I still find the Ruby syntax cleaner than any language I've used since dBASE. On the other hand, the available Ruby libraries are not as robust or mature as Python or Perl. This is a particular problem with XML parsing. I didn't expect this when I started programming again last fall, but since almost all of my programming has been with RSS, OPML, API calls, I'm extremely dependent on REXML, the XML parser built into Ruby. I've been hitting bugs and weirdnesses with REXML, and after emailing the author 3 times over the last 2 months, I've still not been able to get a response. On the other hand, I'm now getting to know a lot of Python programmers. There is even a Python programmer in Seattle who keeps emailing me, offering to build RSS and OPML libraries to do whatever I want.
I knew Ruby had fewer libraries than older languages when I started using it, but I assumed I would just write my own libraries to do whatever I wanted. That's what I've always done in the past. The other thing I've discovered in the last few months is that I'm going to have less time to code than I originally expected, so building libraries for things like XML parsing are not likely to happen. Actually my whole approach to programming is going to be different than I had expected. Instead of getting more and more proficient in one language and tackling all kinds of tasks, I now see that I will be focusing on RSS, OPML and API calls for possibly a few years. It might make more sense to see how this is done with many of the available languages, such as Perl, Python, PHP, and Javascript. Instead of becoming a language expert, I'd now rather become a domain expert.
This decision is also dependent on ideas I might have for writing books. I really enjoy that process, and know that a good book can be a valuable aid to developing my new career as a tech blogger. Instead of writing a Ruby book someday, it now seems more likely that I'll want to write an RSS and OPML book, which means I may want to be able to discuss techniques in multiple languages.
I haven't made up my mind yet. I'm going to spend a few days playing with Python and looking at other XML solutions for Ruby. Who knows, maybe someone will tell the author of REXML to answer my email and tell me how to keep it from choking on ampersands.
Kosso has an interesting idea about extending RSS and OPML
Posted on Saturday, March 11, 2006
at 12:02 PM (permalink)
Kosso sends his regrets for having to work last night, and not being able to join the OPML podcast, but he also drops an interesting hint on what he has been working on:
"Also, I noticed that Alex did a great job at annotating the podcast in the description. Now, this is perfect data to get organised in OPML with a 'time' attribute."
One way of implementing this would be to use a namespace to add an audio timestamp to RSS, and then create an RSS file with one item for each timestamped portion of a .mp3 file. A playlist linking to multiple annotated RSS files could then be constructed in OPML. Apple already has an iTunes namespace for RSS 2.0, but this doesn't allow for timestamping sections of the .mp3. Something tells me that namespaces for RSS and OPML are going to explode this year. Things may get messier than some people would like.
Posted on Wednesday, March 8, 2006
at 7:26 PM (permalink)
You have to give Rogers Cadenhead credit. After enduring several weeks of blistering attacks by Dave Winer over attempts to create a new version of RSS 2.0, he is once again trying to improve RSS. This time, Rogers and his fellow members of the RSS Advisory Board are proposing a new RSS namespace called XRSS. He evidently came to the same decision as I did with OPML. If the only way Dave will allow RSS or OPML to be extended is by the use of a new namespace, then that is what has to be done.
The funny thing is that I just tried to write an explanation of the difference between using a namespace and changing the core format, and I can't find a way to do it that shows why one is better than another. In the end, an application that uses these extensions still has to know what to do with them. The only benefit seems to be that a namespace gives the application permission to ignore the new tag. I can hear the crickets chirping right now. This is not a subject that will hold an audience. I'm going to start working with namespaces in OPML, and perhaps then I'll find a way of explaning them that won't make normal humans run screaming out of the room.
Apple is trying to patent RSS autodiscovery, aggregation, and reading lists
Posted on Wednesday, March 8, 2006
at 6:25 AM (permalink)
I'm not a lawyer, but these twopatents by Apple sure look like they are claiming ownership of RSS autodiscovery, aggregating multiple RSS feeds into one, and automatically updated reading lists. Time for some real lawyers to step in and comment on this. John Palfrey, as the representative of Harvard's RSS copyright, what do you think?
Nativetext: Delivering RSS feeds to the rest of the world
Posted on Tuesday, March 7, 2006
at 8:30 PM (permalink)
Yesterday I had the strange experience of finding a link to one of my posts on a blog that was part Dutch and part English. My quote was in the original English, but the commentary by the blog's author, Fred Zelders, was in Dutch. What's could this mean? If a reader of this blog could understand my quote, why not write the whole blog in English? Today I had lunch with James Cann, founder of Nativetext, who explained that this phenomenon is exactly what he wants to harness as the engine to translate the blogosphere from English into many other languages. James told me that there are bloggers like Zelders all over the world who see it as their mission to bring English blogs across the language barrier to their native country. Zelders probably speaks fluent English, but prefers to use his native language. His readers are the same. Beyond this boundary of multi-lingual bloggers, however, are many more readers whose English isn't good enough to read my blog without having it translated.
What James hopes to do with Nativetext is provide a set of web based tools that will make it easy for a blogger like Zelders to grab an RSS feed for a blog he is interested in and produce a translated version. In effect, Nativetext would be a multi-lingual Feedburner which takes English RSS feeds and serves copies of these feeds in many other languages. The interesting part of James' idea is that he doesn't plan on paying either the author of the original English feed or the translator who produces the new version. He'll also allow owners of the original feed to take the translated versions and serve them on their own sites free of charge. In fact, James is honest enough to admit that he doesn't even know how he will get paid.
I can hear the snarks warming up now. Yet another kumbaya singing, don't worry about money, Web 2.0 flight of fancy. Personally, I think James is right in not worrying about a business model. He is trying to solve a big problem, and if he suceeds, people will be throwing money at him. He can come up with a business model then, whether it is charging corporations, magazines, or newspapers to translate their content and passing a percentage on to the translators, or selling the whole thing to Rupert Murdoch. The important thing is creating an infrastructure that will attract a community of users to do the translation and provide quality control. Why should they do this work for free? Because they want to bring the blogosphere to citizens of their country, and maybe gain some recognition for doing so. Why should bloggers allow their blogs to be translated for free? Because they are already giving away their content in an RSS feed, so this just means more subscribers.
Now an interesting question emerges. At what point in reading this did you think about using Nativetext to go the other way and translate Arabic, Hebrew, or Japanese blogs into English? Did you even consider it? I actually didn't until I was halfway through writing this. I guess I'm just an ugly American, but now that I have thought about it, it is pretty exciting. Do Chinese tech bloggers think Google is out of control? What are French political bloggers saying about George Bush? I can't wait to find out.
Posted on Monday, February 27, 2006
at 7:06 PM (permalink)
This weekend I reached my 4 month anniversary as a blogger, so it seems like a good time to pause and take stock. First a few numbers:
I am now maintaining 4 blogs: Darwinianweb.com, Mashup.darwinianweb.com, Ruby.darwinianweb.com, and OPMLcamp.com. One Ruby project turned into RubyRiver.org, which is an online RSS aggregator that runs automatically. So I have a total of 5 active sites.
I have written 400 posts across these blogs, with the majority here on Darwinianweb.com.
The total traffic for all of these sites now averages 1,500 visitors a day. This includes RSS subscriptions, but not the people who read RSS feeds at online aggregators.
Technorati.com ranks the Darwinianweb.com domain at 13,506, and seems to lump all the subdomains together. Which puts it in the top 0.05% of the 29 million blogs that Technorati tracks. I guess that's pretty good for 4 months, but it also demonstrates how little traffic the average blog out of those 29 million actually gets.
I wasn't sure what I wanted to focus on when I started, but felt that the general area of Web 2.0 would be most interesting. I've looked at a lot of Web technologies, but my focus has clearly narrowed down to RSS, OPML, and mashups over the last two months. I've also found myself drawn to the social and political forces controlling the Web and the blogosphere. This may seem surprising for a technologist, but it matches fairly closely the way I ended up spending my time as a dBASE guru in the Eighties and early Nineties. I love software, but software is built by people, and you can't understand why they build what they do if you don't understand their motivations.
One reason why I became a blogger was the hope that it would allow me to meet people who are doing cool things with software and the Web. That has certainly worked out. I've gotten to know dozens of new people in the tech world, many of whom are doing exciting development work or writing interesting things about technology. I also wanted to meet younger developers who I might be able to advise on their products. I get a real thrill from helping to design desktop software and online apps, and I'm now working closely with a few startups with real promise. I don't want to get involved with investing in any new companies yet, so my relationships are purely taking a mentoring role.
Over the next 4 months I'd like to do a lot more work with APIs, help a few products get launched, and see if the OPML Camp idea can go anywhere. I'm finding the Camp phenomenon fascinating. I had to get my capitalist head into the anti-capitalist Open Source space in the late Nineties. Now I have to adapt my control freak experience of running seminars and conferences to the idea of an anarchic model of running an event.
One thing I must try not to do in the coming months is start any more blogs. I have more than enough of those.
Posted on Saturday, February 18, 2006
at 7:16 AM (permalink)
"Syndication politics are every bit as twisted as any soap opera you'll see on daytime television. Only without the sex. And with a bunch of bearded fat guys in place of the pretty models. " ( Dave Walker)
As a bearded, person of size, who is involved with RSS, I guess I have a right to chime in here. I first realized that something was up with the RSS Advisory Board when I saw this post from Dave Winer. Then I found John Palfrey's statement backing up Dave. It was clear that another struggle over control of RSS was getting ready to explode. Sure enough, Rogers Cadenhead, Chairman of the advisory board, posted his response a few hours later.
I'm not going to try to unravel the past machinations of Dave versus the RSS community, but I'd like to point out a few things that have changed from past struggles on this issue. When RSS emerged in the late Nineties, it was a simple format that only a few bloggers used to make their personal musings more accessible to their friends. By the time the battle over syndication formats really started to rage, blogging had become an important part of the Internet, but the infrastructure was still in the hands of a few people who did it mostly for love instead of money. Dave was able to maintain control over the RSS specification by sheer force of will, even though RDF and Atom did emerge as competitors.
This time things are different. RSS has now become a critical component of the next wave of software innovation, for which I've given Dave full credit. Unfortunately for Dave, however, "nagging until defacto" won't cut it this time. There is too much money and momentum behind making RSS as useful as possible. We have moved beyond aggregator publishers who just nod and smile when yelled at, because they can't afford to have Dave as an enemy. RSS is now a delivery protocol for many types of information beyond blogs. And the simple fact is that RSS sucks. Anyone who works with it knows that there are huge holes and weaknesses in the spec and the current implementations. Many people, including myself, are ready to put a great deal of time into building tools on top of RSS and its related format OPML, and we care a lot more about improving functionality than we worry about Dave getting mad at us.
So go ahead Rogers and the rest of the RSS Advisory Board. If Dave wants to put his considerable intellect into improving RSS, his suggestions will be given a great deal of respect, but this time around change is coming whether he approves or not.
Update: Sam Ruby: "In the long run, the success of the work currently under the working title of RSS 2.0.2 depends little on what Harvard thinks, but instead depends very much on what people like Nick [Bradbury] and companies like Microsoft actually do."
Update: Steve Gillmor raises the temperature of the debate by declaring Sam Ruby's post pure bullshit.
Posted on Thursday, February 16, 2006
at 10:38 AM (permalink)
There is so much activity based on RSS and OPML in the Boston area, that it seems appropriate to start calling it RSS Alley. I've even created an RSS Alley map. Along with locations for companies and bloggers, there are many points of historical interest, including the site where Dave Winer lived while working on RSS and blogs at Harvard. Please email me (adam AT darwinianweb DOT com) if you have any additonal locations for the map.
Update: The details on how this map was constructed are available on my mashup blog.
Posted on Wednesday, February 8, 2006
at 6:34 PM (permalink)
BlogBridge version 2.13 has just been released, making it a lot easier to view dynamic reading lists in OPML. It is free and supports Windows, Mac, and Linux. There are plenty of RSS aggregators that allow you to import OPML files as a quick way of subscribing to a large number of feeds, but these are basically a static form of subscription. BlogBridge, on the other hand, is able to stay in synch with the original OPML. If you subscribe to an OPML file on a server, and the contents of the file changes, then the set of feeds that show up in BlogBridge also changes. This is what separates a reading list from a regular OPML file. The internal format isn't different in a reading list, it is the fact that the contents of the file changes over time that makes an OPML file into a dynamic reading list.
It isn't completely obvious how to open an OPML file as a reading list in BlogBridge, as opposed to just subscribing to all of the file's feeds, so here are the basic steps:
Select 'Add Guide' from the Guides menu.
Enter a title for the new Guide.
Select the 'Reading List' tab.
Click the '+' button.
Enter the URL of a reading list. If you can't find one to try, you can get started with the one I have created based on Tech Memorandum.
Click 'Check and Add'.
Click 'Add'.
When the new Guide appears, all of the feeds listed in the reading list will be read, and the feed items will then appear.
By default, BlogBridge only checks for new contents in the reading list when it is first run. This is fine if you tend to start the program, read some feeds, and then close it. If you keep the program open, as I do, you will probably want to tell it to recheck the contents of the OPML regularly and resynch to match. This is done by:
Selecting 'Preferences' from the Tools menu.
Clicking the 'Reading Lists' tab.
Changing the 'Check for changed Reading Lists' setting to 'Once per Day' or 'Once per Hour.'
My enthusiasm on this subject has prompted some emails asking if I have a financial interest in promoting BlogBridge and reading lists, and the answer is no, although I am friends with Pito Salas, BlogBridge's project leader. I am actively looking for other aggregators that support OPML reading lists, so if you know of one, let me know about it and I'll be glad to write it up. I've said before that I believe RSS is a key component of the Web's future growth, and OPML reading lists are a great way of delivering RSS.
Danny Ayers builds a reading list about reading lists
Posted on Wednesday, February 8, 2006
at 8:01 AM (permalink)
Danny has created an automatically updating reading list based on the Del.icio.us tags "readinglist+tech". I wonder if any of the reading list posts in this reading list reference reading lists? If automatic inclusion gets turned on for these lists dangerous things could happen. Of course, he has to make the obligatory "RSS sucks but I'll use it anyway" comment. Yes, RSS sucks and OPML sucks even more, but reading lists are cool. I'm glad Danny doesn't let the suckage of the spec get in the way of the coolness of the app.
Meanwhile, James Corbett has thrown out a little snark bait by saying that feed grazing is really Web 3.0. Come on now, Jimmy boy, that's just asking for trouble.
I've got a fever, and the only prescription is more feeds!
Posted on Tuesday, February 7, 2006
at 7:11 AM (permalink)
I've been accused of being obsessive, but I can't help it. I gotta have more feeds. This whole subject of real-time feed aggregation, or feed grazing as it's now being called, has really caught my imagination. So I've been looking for something that will cure my fever. I haven't found an aggregator that will satisfy my craving completely, but there are a number of websites that demonstrate the type of interface I need. I'll list them in the hopes that someone will build an OPML capable aggregator with this type of presentation:
AliveNews has a cool realtime display, but it must be a proof of concept rather than a real site, because it doesn't have any options for expanding the list of pre-defined feeds. Still, the fade-in of feed excerpts is sweet.
Digg spy is really compelling for the ADD set, but it isn't a true feed aggregator, and it makes me twitch if I watch it for too long.
Posted on Sunday, February 5, 2006
at 8:52 AM (permalink)
The best way for me to understand a new software technology is to start writing code that supports it. I finally did that with OPML reading lists, and as Marc Canter would say, it is coolio! You can find all the details on my mashup blog. The short version of the story is that an OPML file based on all of the blogs cited on Tech Memeorandum is generated every hour and placed here where you can grab it and use it as a reading list. I poke fun at Dave Winer from time to time, but I can see that OPML reading lists really do take RSS to the next level. Good work Dave, and this time I'm around to see that your role doesn't get erased from history.
Posted on Friday, February 3, 2006
at 1:43 PM (permalink)
Since RSS seems to be an underlying theme beneath much of the conversation and innovation happening on the Web right now, it is interesting that the RSS Advisory Board is finally getting active. New members have been appointed and they are discussing revisions to the RSS 2.0 spec. You can follow this process with their feed. The odd thing is that the feed is in Atom format. Just kidding.
IE7's aggregator isn't impressive, but it is good enough
Posted on Thursday, February 2, 2006
at 12:37 PM (permalink)
Trying to recharacterize a quote once it is loose in the blogosphere can be a tricky business. In my initial thoughts on IE7 I wrote that it would likely kill many RSS aggregators that did little more than let you read feeds. Richard MacManus linked to this and wrote "Adam Green thinks IE7 will kill a lot of independent RSS Aggregator products, due to IE7's impressive RSS integration features." The first clause was mine, but the second clause wasn't. I don't fault Richard. He was using me as an example to prove his point, but I don't want to leave the impression active that I think IE7's use of RSS is impressive. In fact, it is just the opposite. IE7 is a very weak aggregator, but it will still drive out the other independent aggregators, because it will be part of IE.
Microsoft long ago mastered the trick of calculating exactly the minimal feature set needed to suck the air out of a market it wants to enter. They do about half of this the first time around, and eventually reach the minimal set by about the third version. Then they stop completely. This is the thing I hate the most about Microsoft's monopoly over the software market. Take a look at Excel and Word. They are basically frozen with a feature set that is over 10 to 15 years old. Microsoft knows that people aren't willing to go through the bother of switching products if most of their needs are met. More features beyond the minimal set means more bugs, so Microsoft has nothing to gain once a market is theirs. The result is a stifling of innovation. It was just this stifling that led so many in the software industry to flee to the Internet in the mid-Nineties.
My favorite example of the Microsoft effect is the graphing in Excel. It absolutely sucks. I have been using it for years, and I still have no idea how to create what I want. Each time I use it I just keep whacking away at it until I get close to what I want, and then I stop. Once when my son was creating some complex graphs for a science project, I went to some download sites and got a few shareware graphing packages. He was amazed by their power and ease of use. He asked why Microsoft didn't do graphs this well, and my answer was "Because they don't have to." I then explained my theory of the Microsoft effect. (Yes, having me as a father can be a bit tedious. My kids usually know better than to ask my opinion on software. My wife won't even stay in the room when software comes up.)
So does this mean that we are doomed to a life of mediocre aggregators when IE7 wins? I am afraid so, but I hope not. What I really hope is that Scott Karp's vision will be realized: "The New Media revolution will come when content is completely atomized and fully tagged, so that it can be remixed into perfectly tailored packages to suit every taste, i.e. truly what I want (when I want it)." But the aggregator publishers have to move fast. Once IE7 is cleaned up enough to release, it will shut down much of the opportunities to find new users. That doesn't mean that the average user is lazy or stupid. It means that they have a life, and seeking out the ultimate aggregator won't be a high priority for them.
Posted on Wednesday, February 1, 2006
at 7:43 AM (permalink)
There has already been plentyofdiscussion of the new preview release of IE 7, so I won't try to list everything new. Besides I'm too busy to dig deeply into features that are likely to change before it is released. What I would like to do is list a few clear effects the final release of IE7 will have on RSS and aggregation, most of which are illustrated by this screenshot.
"RSS feed" will be contracted to just "feed" in common usage. IE7 uses the term "feed" throughout its interface without mentioning RSS once, as far as I can tell. This makes sense, since "RSS feed" is as redundant as saying "HTML web page." It also means that the public won't have to be aware of the many feed formats, such as Atom, or RDF versus non-RDF RSS.
The icon will replace the many variations on RSS and XML icons. IE7 uses the former throughout its interface, so this will rapidly become synonymous with the term feed in the public's mind.
Categories will finally be utilized. IE7 lists all of the categories in the currently displayed feed, and allows easy selection of posts via a category. I've done a good amount of research into the use of the category tag in feeds, and it is currently used by surprisingly few blogs.
Feed serving bandwidth will go through the roof. IE7 allows automatic updates of feeds and you are reminded of this with every feed you read. Any Windows user knows what its like when Microsoft decides to remind you of something. Let's just say that only the truly anti-establishment will be able to ignore the continual requests to turn on automatic synchronization, and those people will be using Firefox anyway. It seems that turning this feature on automatically sets it for all subscribed feeds. As with any Microsoft software setting, once you turn on synchronization, you have to work real hard to find a way to turn it off. As the screenshot shows, synchronization will continue even when IE7 isn't running. From what I can tell, the default interval is 60 minutes, but this can be changed to a shorter period. I'd tell you how short the interval can be, but I can no longer figure out how to reach this setting in the program. The combination of these factors means that virtually all IE7 users will turn on synchronization of all their feeds and then leave this running whenever their computer is on. Get ready to start paying some serious hosting bills.
Will IE7 kill all the independent aggregation products? The simple answer is yes for any aggregator that just collects feeds and allows you to read posts as they are found in the feed. This is sad, but it also means that aggregator publishers will be forced to innovate at a much greater speed. After all, it's not as if they couldn't see this coming.
Posted on Sunday, January 29, 2006
at 5:21 PM (permalink)
If you are a regular reader of the MIT Advertising Lab blog, you already know the answer. They are both locations for innovative advertising models. Most blogs tend to follow the crowd and the general buzz, but I never know what to expect when a new post from this unique blog appears in my RSS reader. My favorite post from the recent past was a plan to introduce heat activated urinal billboards or HAUB as they're known in the bathroom advertising industry. To return to the Internet theme of this blog, there have also been great posts on ads in RSS readers and Google Maps.
Posted on Friday, January 27, 2006
at 9:55 AM (permalink)
Now that I've started looking at OPML, I'm discovering a lot more activity all around me. When talking to Pito this morning about the next Geek Dinner, he pointed out that he's just added the ability to publish OPML reading lists to his BlogBridge aggregator. I've been using BlogBridge as my RSS aggregator for a couple of weeks, and its great for managing lots of feeds. I'll have to try out his new OPML features and report back. By the time we get to February 15th, we may have to call it an OPML Dinner.
Posted on Thursday, January 19, 2006
at 5:47 AM (permalink)
The Tech Memeorandum archive reveals an interesting progression from Tuesday night to Wednesday There was a shift in tone from "You're stealing my content"/"No, I'm not", to "That's not the right way to use my content"/"OK, maybe there is a better way" 24 hours later. One reason why much of the heat has dissipated, and the battle has morphed into a search for a middle ground may be that women have entered the discussion. While this started with the men riding out to shoot up the cattle rustlers, the womenfolk are now asking questions and looking for answers.
As long as I've got myself down in this gender-biased pit, I also find it amusing to see how the two "sides?" "teams?" (OK boy, stop digging) reacted to the H-Bomb. Om folded the minute Palfrey arrived:
"John Palfrey has a post up, in which he justifies everything. Good points, they are emailing all new inclusions. That's all that was needed. Issue closed."
But Shelley brushed Harvard aside and got straight to the heart of the matter:
"No, 'gentlemen of the court' niceties; no A-list deference; no but it's Harvard obfuscation; no Web 2.0 bullshit. As clearly and precisely as possible: am I right, or am I wrong?"
Posted on Tuesday, January 17, 2006
at 7:02 PM (permalink)
John Palfrey has responded to the questioning of Top10Sources' use of RSS feeds exactly the way a law professor should, by turning it into an opportunity to educate the blogosphere on the finer points of copyright law in relation to RSS and blogs. As the Executive Director of the Berkman Center for Internet and Society, he explores the potential risks to the Internet from overly restrictive limits on the use of RSS feeds by aggregators. The Berkman Center currently holds the copyright for the RSS 2.0 specification, and Palfrey handles this responsibility by explaining the best way for RSS to fulfill its potential. Finally, as a founding partner of the RSS Investors LP venture fund and the founder of Top10Sources, Palfrey protects his investment by skillfully deflecting the criticisms levelled against the new aggregation site. He sure is in the middle of RSS, isn't he? Let's take a look at some of his arguments, since they are a blueprint of where RSS and copyright law intersect.
Palfrey contends that aggregators like Top10Sources are not violating copyright law, but acknowledges that this is still an unresolved issue. What I find interesting is the way he casts the opponents of his view:
"The strong form of the pro-copyright argument runs like this: the creator of the RSS feed retains, automatically, all copyrights in the content in the feed and retains all rights in its republication, use as a derivative work, and so forth. Given that those rights have been retained fully by the creator of the site, the argument goes, it is unlawful for someone -- presumably in a commercial context -- to republish that copyrighted context without license to do so. This is the Web 2.0 variant of the argument that is litigated frequently in the context of web-based content, with plaintiffs like the RIAA and the MPAA (in the p2p context), the publishers (like McGraw-Hill, or Perfect 10) who are suing Google, and the like."
I can't judge the legal argument, but I respect his tactics. I don't think there is a single blogger who wants to be on the same side as the RIAA or MPAA.
He warns his readers of the consequences of the "strong form" of copyright being applied to RSS:
"Is the blogosphere arguing itself right into a trainwreck of the sort that has played out over music and movies? Consider the world that A (prominent) VC envisions, here and here, wherein content is micro-chunked and syndicated. This world cannot emerge if every plausible copyright claim is asserted and litigated.
Palfrey's most valuable recommendation is that bloggers should add a copyright statement to their feeds.
"Creative Commons licenses, as I've argued on this blog, are the way to go -- to embed them into the RSS feeds when they go out, with clear instructions for your intent. If you want people to run your feed in private aggregators, but not in public aggregators that are for-profit, to re-offer your content just as you've offered it, and to attibute authorship to you, why not add to your feed a BY-NC-SA license?"
I agree. When examining feeds for inclusion in my aggregator, I was surprised to find that none of them contained a copyright notice. My feed had one, but I've now updated it to match my site's Creative Commons license, which spells out exactly what a republisher is permitted do.
How does Top10Sources carry out Palfrey's less restrictive view of RSS copyright?
"As the editor compiles the site, the editor sends out an e-mail to the person who appears to be responsible for the site, or, sometimes, posts a comment to say that the site has been chosen. The site renders a list of those sites offering the feeds as directlinks to the page. The site also subscribes to those feeds and renders them all together on a single page."
So the site has adopted an opt-out model for aggregation. Top10Sources notifies the feed owner, and the owner has the responsibility of requesting that a feed be removed. As a practical matter, this is the only way to run an aggregator. As I've mentioned in other posts, my attempts to gain permission from feed owners in advance of launching my RubyRiver aggregator was met with almost a complete lack of response. RSS was built to promote syndication, and an aggregator is a valuable part of that model. Requiring an opt-in model would limit the potential of RSS, and stifle an important avenue for Internet communication. As Palfrey says, "fundamentally, RSS is ads" for the blog and aggregators are a vital channel for these ads.
One question left unanswered by Palfrey's response is the amount of a feed that should be republished, especially in light of the site's opt-out model. He admits that this is an evolving area:
"I expect to take up this issue again with the management team once again. I don't think there's anything being done wrong from the perspective of the law. But we should take up for discussion some of the ethical issues that Mike Rundle and Om Malik raise and suggestions that Adam Green makes about how much of a given feed that the site republishes -- maybe a truncated version of the feeds is the right thing to render."
This debate over aggregation will certainly continue, but for now I find it fascinating to watch Palfrey navigate the current controversy. From a PR perspective I give him an A. I attended his Harvard Extension School class on cyberlaw a few years ago (which probably accounts for the academic tone I find myself adopting here), and frankly, he is a lot more interesting now that he has to apply his legal theories to a company in which he holds an important stake. I wish all Harvard profs had this real world opportunity. I hope people like Om Malik continue to hold his feet to the fire. The blogosphere will benefit from his involvement.
Posted on Sunday, January 15, 2006
at 8:04 AM (permalink)
The first version of my RubyRiver aggregator displays all the items in a feed without any filtering, which allows items unrelated to Ruby to appear. This morning I decided to explore filtering based on the category tag. The RSS 2.0 specification states that "You may include as many category elements as you need to, for different domains, and to have an item cross-referenced in different parts of the same domain." So there should be no problem. All I had to do was extract the category tag and select items that contained "Ruby" in that tag.
Here is an example from the blog Eric's Ponderings, which contains some useful Ruby programming posts, but also switches to football whenever the University of Texas Longhorns win a big game. Here is a portion of one of his RSS feed items about Ruby and Java:
I don't know why he includes Ruby twice, but that shouldn't get in the way of my code, as long as there is at least one category I can match.
Checking the rest of my feeds, however, brought out a problem with the way some blogs use categories. For example, the O'Reilly Ruby blog is entirely about Ruby, so the authors don't feel the need to include Ruby in the categories. This is apparently assumed from the context. Instead the categories are terms like Opinion, News, and Articles. This makes sense within the blog, but doesn't help when the feed is aggregated with many others.
I can solve the problem in my own code by identifying feeds within my .opml feed list as Ruby specific or multi-topic. This will allow my code to use the category tag only when parsing multi-topic feeds. Unfortunately this requires me to go to extra effort when adding new feeds, which in turn means that any user of my code will have to understand this issue as well. General purpose aggregators aren't likely to use this solution, which means that filtering for Ruby categories in a generic aggregator will filter out some of the blog posts that the user would want.
This type of inconsistency in applying tags to blog posts illustrates the hurdles that still must be overcome before RSS can fulfill its promise. The specification is willing, but the patterns of usage are weak.
Posted on Wednesday, January 11, 2006
at 8:40 AM (permalink)
My first RSS aggregator written in Ruby is now up at RubyRiver.org. It's still primitive, but it seems to be working OK. My goal in building this is to create a tutorial for new Ruby programmers. I've been publishing the code on my Ruby blog as I've been writing it. The complete code will be available for free download in a day or two, and the tutorial will be posted on RubyRiver as it is developed.
Running multiple blogs has given me an interesting insight into one of the problems of having people read a site through RSS. I've spoken to a number of people who read this blog, and when I mention my interest in Ruby, they usually say "You should start a Ruby blog." When I say that I already write one, they ask how they can find it. That's when I realize that they read the RSS feed and have never seen the link to the Ruby site on my navbar. So how do I solve this? Should I plug everything I do in every blog post? That was why navbars got added to websites.
Posted on Saturday, December 24, 2005
at 12:45 PM (permalink)
I've adopted the new subscription icon () available from Feed Icons for my RSS feed. Getting rid of RSS and XML icons would be a big advance. Other than Dave's obsessive need to preserve his legacy, I can't see who benefits from pushing the internal format's name in people's faces. "Sell the solution, not the technology" should be the guiding principle. When people change their pages to use this new icon they should also drop the 4 or 5 links for the different RSS and Atom formats. Why is that necessary? I've yet to try an aggregator that doesn't support the common variants.
Posted on Friday, December 16, 2005
at 9:43 AM (permalink)
I continue to be puzzled by the ethics surrounding RSS aggregators. I have been planning on building an RSS 'river of news' aggregator for Ruby, and my research has brought up the aggregators called planets, which aggregate full feeds from a large number of blogs. I've looked at many of these planet sites, and none of them have a description of the relationship between the aggregator and the aggregatees. Did they all choose to be included or did the aggregator simply add them to a list? Are these planets really splogs? They don't appear to be, because they aren't plastered with ads.
Posted on Monday, December 12, 2005
at 7:15 AM (permalink)
Imagine a world where people leave all their possessions outside their door and expect strangers to use these things. It would require a gifted science-fiction writer to construct a viable social contract around such a culture. RSS is creating just such a world in the blogosphere, and there are signs it isn't all going well. Steve Rubel's complaint that splogs are stealing his content, and the associated comments, are a graphic example of the fuzzy boundaries that now exist in the area of RSS republishing. What is the proper definition of a splogger? Is Mark Cuban right in claiming that "Aggregation is not value add." What about Tech.Memeorandum or Technorati? Where is the boundary between a search engine for RSS items and an aggegator? If a splogger hires an A-list blogger to select the RSS items that are posted is this still splogging? What if a splogger uses Mechanical Turkers to select the posts for publication? What if the public is used to select posts? Trying to stuff all those billions of little RSS feed items back into their original bottles is going to be impossible.
Posted on Sunday, December 11, 2005
at 7:04 AM (permalink)
I didn't get a chance to keep up with my feeds yesterday, so this morning I sat down with a bulging RSS reader. Yummy! I didn't get past the As before I hit what everyone would have to accept as a perfect example of a Web 2.0 application, Amazon's Mechanical Turk. I decided to list every blog post I read in my morning review that points to such an exemplar:
Mechanical Turk, this site provides an economic model for a global, microcontent focused workforce.
Technorati Icon, Dave adds an automated vanity search to his navbar. Well, I guess that is more timeless than Web 2.0.
Writely, a web-based Word competitor with "import and export into Word format, embedded images, a wysiwyg editor, drag and drop functionality, sharing with others, and tagging of documents."
Posted on Saturday, December 10, 2005
at 11:33 AM (permalink)
Here is an analogy that my namesake would have been proud of. Darwin spent an inordinate amount of time studying worms and the accumulation of their waste. He showed that the remains of ancient civilzations didn't sink into the earth, rather the worm "castings," which they produced as they burrowed through the earth, actually raised the soil level and drowned the ancient buildings. I've been thinking about worms as I plan an RSS aggregator for a Ruby tutorial I want to write. What RSS based websites really resemble are worms burrowing through a rich substrate of feeds, excreting modified versions of their input as RSS feeds in turn. The functional analogy to worms is clear, the metaphorical warning for society is open for discussion.
Posted on Thursday, December 8, 2005
at 5:03 AM (permalink)
Today I started producing blog pages using a model based more completely on microcontent. What this means in practice is that each post now has its own page. I originally wrote the code to produce this blog with a daily organization of posts: the home page held seven days of posts grouped by date, and there was an archive page for each day with all the posts for that day. I realized after a month or so that I don't really write in patterns that fit a day, perhaps because I'm minimally autobiographical. I tend to write 2-3 multi-paragraph posts on different subjects each day. So it makes more sense to treat each post as a separate bead on the string, and move from one post to the next, instead of one day to the next. The internal advantage, is that I can now track the readers' collective attention more easily. Incoming links will be directly to the post, rather than the date, so my readership stats will reflect what is read on a post-by-post basis.
Posted on Wednesday, December 7, 2005
at 2:40 PM (permalink)
I'm not sure when "stack" came to mean a list of languages/technical standards used to build an app, but it is a useful description. It helps convey the logical architecture within a multi-layered development environment. The best example of a useful stack is LAMP (Linux, Apache, MySQL, Perl or Python or PHP), which summed up what most of us used to build Web 1.0. I've spent the last few months reading and skimming as many new technology books as possible, and I've narrowed down the list of things I need to become proficient in to understand how Web 2.0 works. What I still need is a catchy acronym. Here's the list:
XHTML. This is basically HTML with some really prissy rules, like case sensitivity, and needing to close all tags. There are said to be tools that will make this conversion for you, but I haven't tried any.
CSS. Once you understand the basic rules, CSS is a fun way to design a site, especially if you start with a pre-written stylesheet, so you can just change things like colors and spacing.
XML. While XML itself can be understood in minutes, the many, many ancillary standards and protocols make it tough to find a real-world entry point. I've found RSS programming to be a good starting place.
Ruby. I've been programming with Ruby for a month, and I'm getting to like it more and more. I think it may have the same level of ease and productivity that made the dBASE language so popular in its time.
SQL. Yes, its still here, and its still the same, which is the problem. The issue will be fitting the object-oriented data structures of XML into the tables of SQL. The consultants will be paying their mortgages on this one for years.
Javascript. I could say Ajax instead to assure a higher rating on the Web 2.0 Validator, but Ajax really means Javascript that maintains contact with a server without reloading a page.
Frankly, its not as much as I expected when I started researching Web 2.0 this summer. The good part is that it all fits together easily, and none of the parts are particularly challenging. That's when I am most productive. By the time a language gets as richly, and complexly supported as Java, for example, I get bored and confused and move on.
Posted on Tuesday, November 29, 2005
at 12:54 PM (permalink)
Sometimes you realize that two or three different terms actually refer to the same idea, and suddenly the world seems a little clearer. That just happened for me with "microcontent." I had filed that away as one of those buzzwords I woud have to decipher eventually. I was much more interested in the idea of individual chunks of data, such as blog posts, floating freely through the datasphere. I was reading a post on Joshua Porter's Bokardo, which led me to a great essay by Terry Heaton. I saw that Terry's idea of "unbundled media," and Googlebase entries, and RSS items are all examples of microcontent. Now I feel better. A lot of walls have collapsed into a large common area.
Posted on Wednesday, November 23, 2005
at 6:50 AM (permalink)
Man, I'm getting co-opted before I have any opt to co. I knew this Web 2.0 stuff was fun and cool, but it looks like the powers that be get it a little too quickly. The Washington Post now has its own tag cloud connected to an RSS aggregator style feed of their own RSS feeds. They are sitting outside their own site and reading and displaying the content in new ways. That is so Web 2.0.
And in a related story, Amazon now has a product Wiki. So in the past 2 weeks Amazon has started discussion fora, user tagging, and wikis to their product pages. Could they be a little crazed abut collecting user content?
Posted on Tuesday, November 22, 2005
at 2:17 PM (permalink)
I've gotten way ahead of what I really know about. Before I start building an API based on XML and compatible with RSS and Atom, I better spend some time reading about all of these protocols. Besides, it's a rainy November afternoon in Boston.
Posted on Tuesday, November 22, 2005
at 2:09 PM (permalink)
I finished the coding for tags on this and the Ruby site. I even have a simple tag cloud in the navbar. These tags are still only entered by me, but I'll have user tags eventually. I keep coming back to Joshua Schachter's comment that tags are about memory more than categorization. I'm trying to lose that rigid relational database kind of thinking. Once I have a full Ruby based version of this site I'll be able to tie into other tag based sites. For now these pages are still static html that is recreated and upload to this server every time I make a new post. I'll watch the stats and see if anyone actually uses the tag pages.
Posted on Monday, November 21, 2005
at 9:03 PM (permalink)
I haven't worked with the issue of synchronization between clients or between multiple servers, so I can't evaluate the merits of Microsoft's new Simple Sharing Extensions to RSS. Alex Barnett has a good roundup of SSE posts. There is a lot of cynicism so far, but it will be hard for the Open Source advocates to ever accept anything from Microsoft. I just like the idea of the folks at Google having to lose sleep catching up on anyone else's announcements.There is a huge amount of FUD flying in all directions, but the mud is all landing in the right general directions. Eventually this will all turn into products.
Posted on Monday, November 21, 2005
at 2:39 PM (permalink)
Ever since I became aware of the Web 2.0 meme I've been telling people that Dave Winer was one of the pivotal forces behind this new wave, maybe the central force. Everyone would have to admit that with GoogleBase turning out to be the world's biggest RSS database, and Ray Ozzie announcing Microsoft's synchronization and replication protocol based on RSS, Dave Winer is having the best week ever!Ozzie's announcement letter can only be described as effusive in his praise of Winer's role:
What we really longed for was "the RSS of synchronization" ... something simple that would catch on very quickly.
Using RSS itself as-is for synchronization wasn't really an option. That is, RSS is primarily about syndication - unidirectional publishing - while in order to accomplish the "mesh" sharing scenarios, we'd need bi-directional (actually, multi-directional) synchronization of items.
But RSS is compelling because of the power inherent in its simplicity.
Can SSE be used with Atom? This version of SSE does not define extensions to Atom. Nevertheless, in principle these extensions could be used in Atom.
In essence, by connecting these dots between what we'd done to extend RSS and his vision for OPML, Dave's catalyzing a new form of decentralized collaborative outlining.
Microsoft and Google are being maneuvered into a massive game of chicken. I'll show everyone my Office data if you'll show your search data, and Dave is instigating it. My question is what comes next Dave? What are you working on for the wave after this, because I think this one is going to be pretty condensed.
Lest anyone reading this get the wrong idea, I should also make it clear that Dave and I haven't spoken in a couple of years and I'm hardly a sycophant, but that doesn't diminish my estimate of his influence on where the computer industry is headed. For right or wrong, we're riding the RSS train now.
Posted on Sunday, November 20, 2005
at 8:33 PM (permalink)
I've been thinking about rebuilding the architecture and some of the design of this site to adopt to tags and XML. I'm starting to see the site as a large feed reader for my own content. The intruiging part is that if I rebuild this site to work directly off of my RSS feed then it will work on anyone's feed. The site becomes simply a database app for a standard type of data. I've always thought as websites as the result of database programs, but the more I grok RSS as a delivery and storage mechanism the more opportunities I see for working with it as the core architectural structure rather than an export or import protocol. Hopefully these ideas will become more clear as I build the next iteration of this site.
Posted on Sunday, November 20, 2005
at 8:20 PM (permalink)
How do we know the origin of a blog posting after it leaves its author's website? If combining a collection of posts, remixing them based on an algorithm or community assesment, and distributing them as a new RSS feed is the model we are ready to embrace, how do we detect and prevent the inevitable fraud? For example, how do I know a map mashup that claims to be delivering an unbiased search engine's results combined with a map isn't actually eliminating selected matches for profit, censorship, or malice? How do I know if a news item with the URL of a famous news source is real or a press release if I don't get it from the new source's site. Of course we've had feeds for years, but after several layers of remixing, the purity of the stream is going to be questionable.