Darwinian Web
Adam Green's thoughts on the evolution of the Internet

Posts tagged as: semanticweb

A CTO's guide to Web 2.0

Posted on Saturday, April 1, 2006 at 11:20 AM (permalink)

A couple of days ago I had breakfast with a former Chief Technology Officer of a REALLY big telco. He had attended the RSS Alley Geek Dinner the night before, and I could tell that even though he was one generation ahead of me, we had a similar take on software and computer technology. He was in Boston to have meetings with various people as a way of learning more about Web 2.0, so I volunteered to get together with him the next day to share my definition from a fellow CTO's perspective. I won't give his real name, because I didn't ask his permission, and this post isn't really about him. It is more about what any CTO needs to consider when trying to run a software development effort in the current Internet environment. For the purpose of this essay, I'll call him Jack.

The funny thing is that Jack's previous company had about 4,000 times more employees and sales than my company, yet we had exactly the same concerns about the new philosophy of development and business surrounding Web products. The insane thing is that Jack's company was valued at only 100 times that of my company when we got acquired, but that was the craziness of February, 2000.

I talked to Jack about four broad areas of change that any CTO needed to think about, but they all came down to one basic issue, a lack of control. It isn't that CTO's have to be control freaks, although they should be. It is a CTO's job to think ahead to what can go wrong, and try to make sure those blocks don't interfere with whatever technology tasks the company needs to accomplish. In a way, a CTO is like the lawyer for a company's technology, always looking for pitfalls well before they are reached. Web 2.0 forces a company to adopt the one thing any good CTO should loath, dependencies. You have to allow your company to be dependent on other people's code, their voices, their data, and their personal motivations that can't necessarily be overridden by money. Let me go through each of these dependencies:

  • Open Source. While much of Web 1.0 was built using Linux, Apache, Sendmail, and languages such as Perl and PHP, the philosophy of Open Source didn't become pervasive until the turn of the century. There are now Open Source components throughout a typical Web 2.0 application. For example, collective voting has applications in many areas beyond the traditional uses in sites like Digg.com or Reddit.com, and is now available through the Pligg software, which is Open Source. Other common Open Source components are found in blogging tools and wikis. Companies also have to consider the desire of their programmers to release their work for the company as Open Source. While this has obvious implications for intellectual property, it also creates a labor force of more productive programmers, because they can bring portions of their code with them when they change jobs.

    Jack was understandably concerned about quality control when using code that isn't delivered and supported by a commercial vendor, but the benefits of a larger and more open community of users can deliver a more robust solution than one used by a few hundred or even thousands of commercial customers. Building with Open Source code also means faster development cycles, so instead of working for years and trying to deliver a perfectly specified and tested system, a more incremental approach based on existing components allows you to work towards a solution in an evolutionary fashion. The reality is that a project that takes several years to reach "perfection" has so much invested in it that it may be impossible to stop and rebuild when problems are discovered, so they are just built over with ever increasing layers of patches. In the long run, a CTO using Open Source code does have to reject the traditional Not Invented Here syndrome, and accept a greater dependence on other people's code. The trade off in shorter development cycles is worth it in my opinion.
  • Blogs. Web 2.0 also brings about a shift in the way a company's technology efforts are communicated to the outside world. Instead of thinking in terms of versions that are announced at long intervals through a traditional PR campaign, the use of corporate blogs helps customers stay much closer to the development process. This also means a cluster of independent bloggers interested in an area of technology can form around the companies working in this space. These tech bloggers have replaced the traditional trade press. It means that a CTO is dependent on voices that are not as tightly controlled as in the past, but these bloggers can also act as an important buffer when problems arise by explaining to the wider circle of users that the company is indeed working on solutions.
  • XML. The most common form of XML currently in use is RSS, but OPML is on the rise, and RDF based standards, such as Atom, are also gaining ground. In the long run, some form of global database resembling the Semantic Web will materialize. The key to all of this use of XML is the availability of a company's data outside the corporate database. While much is made of the emergence of APIs, it is the XML data that is available from these APIs that will cause the real changes in technological architectures. Just as Web 1.0 was built on loosely joined websites connected through HTTP and HTML, Web 2.0 will be built on loosely joined data structures based on data produced by many sources. So instead of a CTO building an application on a tightly controlled proprietary database schema, it will be necessary to plan for dependencies on data over which there is no control.

    As a long-time database guy, Jack found that disturbing. I share his concern, but what must be understood is that users will demand this type of cross application sharing of data, because it is their data that is being combined from multiple sources. Sure there is a greater possibility of failure, and this must be handled by a CTO to allow for soft failures, instead of hard crashes. The one great fallacy that the XML proponents adhere to is the perfectability of XML data. Their motivation in building a Semantic Web is the goal of a Web that isn't filled with invalid data. I don't think that will ever happen, so a CTO should plan for badly formed XML, as is already the case in the RSS world.
  • Fear of excessive valuation. The traditional way to motivate developers, especially in a start-up situation, has been to offer them stock options. While that is still useful, the arithmetic has changed, because programmers who went through the Dotbomb have a deep fear of hype. A business journalist who was a former Dotcom employee recently told me that she still suffered from post traumatic stress disorder that prevented her from considering a start-up job. In the Web 1.0 period, there was an expectation of an IPO that would yield valuations in the hundreds of miliions of dollars. If a Web 2.0 company gets acquired for $10 - $20 million, that may be great for the founders, but it doesn't do much for a coder with a few thousand options. It is not just that the value of software companies have dropped. There is now deep suspicion of any claims of higher valuations in the future. Without the promise of getting rich, it is harder to persuade developers to put in the 18-20 hour days that helped build Web 1.0. This means that the CTO is more dependent on an employee's personal motivations, such as being able to build code that can earn them greater fame in the Open Source world.
Notice that I haven't mentioned any of the popular themes of Web 2.0, such as social bookmarking and tagging. These have their place, but I'm skeptical that there really will be a mass market for meta-meta-bookmarking sites. I don't think that the real contribution of Web 2.0 will be these specific areas of functionality. I do believe, however, that the tools and techniques I have described here will be used to build the next generation of products and sites, and that these will be what are used by the generation of users who are entering college now, and will be entering the workforce 4 to 5 years from now.

The second coming of the Web

Posted on Monday, March 27, 2006 at 7:15 AM (permalink)

I've been watching Danny Ayers' attempts to have Semantic Web people consider outputting RSS and OPML data or using OPML tools to visualize Semantic Web data. I respect and applaud his efforts, but I wasn't surprised by the universally negative reactions. I know that users of RDF based formats have tremendous disdain for RSS and OPML as being poorly defined, which they admittedly are. What I was shocked by was the tone and terms used in the responses. There is an almost religious sense of RSS and OPML as evil, and a possible source of spiritual contamination. Now Semantic Web people are extremely intelligent, as they'll be quick to admit, so what could have happened to them to cause such an adverse reaction to what is simply a set of formats for text files? It is easy to point to the creator of RSS and OPML as the root of this negative feeling, he certainly is mentioned often in the response to Danny's pleas. But that is just scapegoating. I think the visceral emotion exhibited, almost a form of terror, at the idea of having to co-exist with RSS and OPML, has a deeper cause that fits into the religious fervor with which it is voiced.

When Tim Berners-Lee first gave mankind the Web, he made a tragic mistake. He granted us free will to use less than perfect HTML. His tools, and the tools of those to follow him, allowed users to develop sinful habits based on ignorance and sloth. The result was a Web of corrupt data, in which misformed tags abounded. This great fall from grace by the users of the Web prevented it from ever attaining the state of perfection desired by all computer scientists, a completely machine readable database. So the disciples of Berners-Lee, with his blessing, developed XML as a way of wiping the Web clean of the sinful and broken HTML, and replacing it with perfectly specified and implemented data. Now, just as the second coming of the Web is in sight in the form of the Sematic Web (well, its been in sight for years, but we'll put that aside), here comes a poorly specified corruption of XML, what Danny jokingly calls "quasi-XML", that threatens to again lead mankind astray. Is it any wonder that Semantic Web devotees are reacting as if RSS and OPML are the work of Satan?

Do you find all of this over the top? Good. That is the point of satire. I find the reactions of Semantic Web people over the top as well. It's just data. Converting from one format to another is so trivial that even I can write the code to do it. Surely anyone who can code for RDF could import or export RSS and OPML. Why should anyone do it? As Danny keeps pointing out, there are millions of RSS users. In time many of them are likely to use OPML as a container for RSS. There is no reason why OPML can't be viewed as a bridge between these two sides of the Web. But then if I was in league with the devil, I would say something like that, wouldn't I? After all, my namesake was led astray by the devil once before.

Why OPML?

Posted on Friday, March 24, 2006 at 5:15 AM (permalink)

"OK," the answer comes back, "we can now see what you are doing with OPML, but why bother? OPML is poorly specified, it isn't nearly as complete as an RDF based standard like the Semantic Web, and it's inevitably going to be the center of political firestorms because of who created it." Let me present the basic arguments that persuaded me to spend so much time supporting the format:

  • My first blog post to get any links pointed out that RSS was helping to explode the Web's architecture. What I meant by that is the growing trend to make all Web content available via RSS. This effectively everts the traditional website, putting the content outside in a machine readable form. If you accept that RSS will be a major architectural component of the future Web, whether or not the users know they are using technology based on RSS, then OPML as a container for multiple RSS text streams deserves attention. OPML allows us to easily create and consume reading lists of multiple RSS feeds, pushing back the limits of infoglut by at least an order of magnitude. If you can read 10-20 blogs on their websites, and 100-200 RSS feeds in an aggregator, then reading lists each containing 100-200 feeds allow us to juggle over a 1,000 feeds. Not easily, but it is at least possible. It isn't the final solution, but the fallacy is believing there is an ultimate solution in technology. It is a journey, not a destination. RSS and OPML are just steps. RDF is another step, admittedly a big one. The Semantic Web, if we reach it, will just be a temporary resting place.
  • OPML is not just a container for RSS. It is a general purpose outline structure (hence the name Outline Processing Markup Language), which allows the construction of hierarchies based on any type of XML data. The new OPML 2.0 specification will make that possible through the use of namespaces. This means that any XML formatted data can be incorporated into an OPML outline. There are two big areas of Web data that fit into this model: microcontent and API results. If microcontent, meaning individual molecules of data floating free in the bloodstream of the web, is to become a viable delivery mechanism for information, then a structural equivalent of a protein is necessary to package these molecules in a consumable form. OPML is a first step towards that structure. As for API results, I've already performed a simple experiment demonstrating the use of OPML as a container for this type of data. I plan on doing a lot more work in this area. OPML data combined with a good viewer makes the construction and delivery of mashup data a trivial task.
  • If we need a building block for the next generation of the Web, why even stop at OPML? Why not go to something perfect like RDF, asks the RDF crowd. OK, maybe I'm mischaracterizing some of them, maybe they just think it is many orders of magnitude better. I tried reading about RDF and the Semantic Web, and I had to stop because I was afraid I was coming down with narcolepsy. I don't think I'm smart enough to grok the Semantic Web yet, and anyone who reads this blog knows I think I'm pretty smart. I need to work my way up to that level of complexity, and the way I do that is by blogging, and writing code, and helping to design tools with a simpler, more accessible format like OPML. I'm conceited enough to believe that I'm at least as smart as the average computer user, so if I need to work myself up to the Semantic Web one step at a time, they probably do also.
  • Does this mean that all of the work on OPML will eventually be wasted. Will it all have to be thrown away? First of all, after working in the software industry for 26 years, I know that all software is eventually thrown away. When I moved out of my last house, my wife made me throw away an entire dumpster full of software packages. At least now it can all be done by just wiping a hard disk. But that doesn't mean the present OPML development work is a waste. OPML tools are built to work with XML data, and despite its flaws, that is what OPML is inside. Converting from OPML 2.0 to OPML with namespace extensions to RDF is an evolutionary process, which is the way I believe that all software is created in practice. As long as I've slipped into Darwin territory, let me repeat one of his favorite mottos: "Natura non facit saltum." Nature does nothing in jumps. I believe that since software is a product of human nature, it also moves in slow, often inefficient, and gradual steps. I am fully convinced that the virtual product line I am helping to construct around OPML will make the transition to a fully XML based Web more smoothly and with more users than waiting several years until the computer scientists perfect the Semantic Web.
  • Finally, we get to the political issue. Sure there are firestorms around RSS and OPML. Are they more vicious than the ones around RDF? I have no idea, but if RDF is being created by humans, then there are fights, and cliques, and petty jealousies in the RDF world also. If you want to see how someone is able to overcome the name calling surrounding the battle between OPML and RDF, read Danny Ayers' blog. He's been doing an amazing job of trying to get RDF people to output OPML and OPML people to see how there is a better world on his side of this debate. I am learning a lot from Danny, and if I can't work out a way to get him to Boston for OPML Camp, I plan on flying over to Italy to see if he can teach me about the Semantic Web without me falling asleep.

Green's Law: A text format of sufficient complexity might as well be binary.

Posted on Saturday, January 14, 2006 at 3:05 PM (permalink)

I've been trying to work my way through this book, but I keep falling asleep. I finally started reading it in bed so I could slip into unconsciousness more comfortably. I don't blame the author. Well, not completely. How could anyone make the following material interesting:

To make a statement about another statement, for instance, you have to create a statement-type resource that collects three other statements: one saying that the target statement has a certain resource as its subject, one that the target statement has a certain other resource as its predicate, and so on. Only then can you make assertions about this new statement-type resource.
The funny (or sad) thing is that the Amazon reviews describe this book as "a breath of fresh air." The complexity of the Semantic Web's conception of the future can be seen from this standard illustration by Tim Berners-Lee.

Trying to implement this model results in a collection of text formats that are virtually indecipherable, hence Green's Law. I'm not going to give up on this book or on learning more about the Semantic Web. It makes a fascinating contrast with the organic development of the Web that is going on around tags and other aspects of social computing.

Book Note: Explorer's Guide to the Semantic Web

Posted on Monday, January 9, 2006 at 5:53 PM (permalink)

My kids often accuse me of over analogizing, but sometimes the analogy is just staring you in the face. I was reading this book on the Semantic Web this morning while in my mechanic's waiting area. When I got to the section on Resource Description Framework (RDF) allowing software agents to automatically explore websites to solve all types of sophisticated queries, I thought to myself "this will never happen." At that moment I looked up and saw the mechanic sitting in my car's driver's seat reading the owner's manual. How did he know where to find it? It was in the glove compartment, of course, where everyone keeps it. How did he know there would be an owner's manual? All cars come with one. The hope of Semantic Web supporters is that it will emerge in the same way, through a gradual growth of standard resources and behaviors. The real question is how long a run? We probably have to think in terms of human lives, not Internet time. The mechanic was probably in his late thirties, and the tradition of user manuals in glove compartments is at least as old as he is, maybe older. When viewed with this time frame, the ideas in this book may actually come to pass. I'll have more to say about the Semantic Web and this book when I'm done reading it.