Darwinian Web
Adam Green's thoughts on the evolution of the Internet

Posts tagged as: architecture

Is a planet a splog?

Posted on Friday, December 16, 2005 at 9:43 AM (permalink)

I continue to be puzzled by the ethics surrounding RSS aggregators. I have been planning on building an RSS 'river of news' aggregator for Ruby, and my research has brought up the aggregators called planets, which aggregate full feeds from a large number of blogs. I've looked at many of these planet sites, and none of them have a description of the relationship between the aggregator and the aggregatees. Did they all choose to be included or did the aggregator simply add them to a list? Are these planets really splogs? They don't appear to be, because they aren't plastered with ads.

Moving from it to them

Posted on Saturday, December 3, 2005 at 11:00 AM (permalink)

I just had an interesting conversation with my 22 year-old daughter, who is having problems buying some tickets for Jerry Seinfeld online. At first the entire theater's website was down, now it is working, but the link for buying tickets is not responding. She doesn't understand why it isn't working when the page loads fine. I tried to explain that you can't think about "it", but rather "one of them" being broken. The theater is probably on a different server from the ticketing system, so one can be working while the other is down. She has been using the Web for 10 years, so she certainly understands that sites can go down, but when distributed portions of a site fail, this is often hidden by the page design. I know this is going to be a major confusion for users when sites become dependent on external web services. The dirty little secret of Web 2.0 is that a loosely joined architecture is also more error prone.

I remember that early users of the Web in about 1996-7 were often confused about which site they had actually purchased things from. I ran a software downloading site called DaveCentral, and people where constantly emailing us thinking that they had bought software from our site, when they had actually followed a link to the product's publisher. They also were confused about whether they were still on Yahoo, if they followed a link from there. This may sound silly to today's experienced Web users, but I'm sure that the type of aggregated application that is the basis of much of Web 2.0 will be the cause of lots of user confusion. This isn't a fatal flaw, but it will be an obstacle to overcome. I can imagine the first time a major protion of Google glitches, like the maps, and applications all over the world crash.

When I added the code to this blog to notify blog pingservers with new posts, I found that about 40% of the servers failed for one reason or another. I didn't bother debugging each one, since I had a lot of pings that got through, so I was assured of good search engine coverage anyway. This type of "percent yield" on RSS feeds and APIs should be part of the design, user interface, and user expectations of Web 2.0.

The urge to scale

Posted on Saturday, November 19, 2005 at 8:26 AM (permalink)

I guess being a dot-com CTO is in my blood. I like to think through various architectures for managing groups of websites. You need to lock down a model for scaling early or you face big problems if you ever need to handle large amounts of traffic. The real key is a logical architecture for domain names. For example, if I thought I was going to serve a lot of podcasts, I would create something like data.darwinianweb.com or podcasts.darwinianweb.com. That would allow me to move that part of my content where it could be best and most cheaply served.

Right now I have darwinianweb.com to handle this main blog where I plan on covering general issues on the changing form of the Internet. I also have ruby.darwinianweb.com, which is a blog that allows me to go into as much depth as I want about learning the Ruby programming language.

I don't want to have too many subdomains, categorization can be handled more easily and on a larger scle with tags, which I am working on adding. At the same time, a separate domain creates more of a distinct place or channel of thought for the user. People automatically switch contexts when they change to a new site, just like a new TV channel.

I plan on having only a few more content subdomains, such as ajax.darwinianweb.com, and xml.darwinianweb.com. Programming languages or standards like XML are so broad and have so many supplementary tools and resources that they work better in their own site or subsite.

I'll also be creating separate domains for exchanging data with other servers. I don't know what will happen with my API experiment, or if that will become a target for abuse, so I'll also create api.darwinianweb.com to serve API calls. It isn't a matter of large amounts of traffic. I want to be able to shut down the API server easily. Of course, that brings up the issue of dependency on critical servers in a distributed environment called for by Web 2.0.

One solution, which also comes easily in an XML/RSS based communication model, is cache the most recent messages as text files, so the most recent result of an API call can be reused instead of calling the API again.

These issues will be played out on a much larger scale throughout the web. Chains of API dependencies will play interesting roles in the future.