Darwinian Web
Adam Green's thoughts on the evolution of the Internet

Posts tagged as: api

scrAPI for OPML

Posted on Wednesday, March 22, 2006 at 6:34 AM (permalink)

Now there's a title to conjure with. John Musser has an interesting ProgrammableWeb post on the use of screenscraping as a poor man's API. The idea is to use a script to parse a web page, and then return some specific set of data in an XML format. John credits the idea to Thor Muller, who provides some excellent details on the pros and cons of a scrAPI vs. an official API. Thor in turn recognizes Paul Bausch for coining the term SCRAPI in 2002. John notes that the result of the scrAPI can be returned "in some cleaner XML format." Yeah, like ... uh ... OPML?

A scrAPI is clearly your only alternative for a site that doesn't offer an API, but surely once an API is available you wouldn't need to adopt the scrAPI approach. Except when the API provider has a limit on the number of times you can call the API each day, and refuses to respond to email requests on how to get beyond the limit. I ran into exactly this problem with Technorati when I built my Tech Memeorandum - Technorati mashup. I hit an undocumented limit on daily Technorati API calls, and despite repeated emails to the company, including directly to Dave Sifry, I never got a response. I've been reading up on PHP the last few days, and this sounds like exactly the type of example project I should try out. Yes, it is a terrible kludge, but then what isn't on the Web? I think scrAPIs that return OPML can become a useful way of building a mashup, and I'll experiment with it until a better coder can implement the idea more cleanly.

Update: I guess it isn't too surprising that Dave Sifry would have a vanity feed at Technorati. At least I assume he has, because soon after this post pinged Technorati, I got an email response from him to my earlier messages. If they actually do give me a larger daily allotment of API calls, I'll be able to start posting some Technorati API scripts on my programming blog. If not, I'll just have to start scraping. Getting something done because of a blog post is fun, but it encourages bad behavior. This is probably how Scoble turned into the spoiled kid he can sometimes be. I love it when he starts yelling on his blog, "This blog is broken, and somebody better fix it right now!" I wonder if his wife lets him get away with that stuff at home, "I'm hungry, and someone better feed me right now!"

Update: Sean O'Hagan emailed me to point out the scrape command he wrote for YubNub. I guess this lazyweb thing really works.

Technorati - Memeorandum mashup

Posted on Sunday, March 5, 2006 at 4:10 PM (permalink)

I finally got a chance to explore the idea of storing API results in an OPML file. The window below shows the results in the Optimal Browser. You can also retrieve the raw OPML file here. To create this OPML file I took the list of blogs that my Ruby script has been collecting from Tech Memeorandum and searched the Technorati API with each blog's URL. I then combined the blog's rank, number of blogs linking in, and the most recent blog posts linking to the URL into a single OPML outline. I still have a number of programming details to work out, so I won't be pubishing the source code for this for another few days, but I will start describing all the details of the API calls and how to construct the OPML file on my mashup blog. One problem I discovered is that Technorati apparently limits the number of API calls per day, a fact that doesn't seem to be mentioned anywhere on their website. Until I can get someone there to raise this limit, I will have to leave this OPML file as it is. With a higher limit I hope to have it refreshing every hour.

Four month anniversary

Posted on Monday, February 27, 2006 at 7:06 PM (permalink)

This weekend I reached my 4 month anniversary as a blogger, so it seems like a good time to pause and take stock. First a few numbers:

  • I am now maintaining 4 blogs: Darwinianweb.com, Mashup.darwinianweb.com, Ruby.darwinianweb.com, and OPMLcamp.com. One Ruby project turned into RubyRiver.org, which is an online RSS aggregator that runs automatically. So I have a total of 5 active sites.
  • I have written 400 posts across these blogs, with the majority here on Darwinianweb.com.
  • The total traffic for all of these sites now averages 1,500 visitors a day. This includes RSS subscriptions, but not the people who read RSS feeds at online aggregators.
  • Technorati.com ranks the Darwinianweb.com domain at 13,506, and seems to lump all the subdomains together. Which puts it in the top 0.05% of the 29 million blogs that Technorati tracks. I guess that's pretty good for 4 months, but it also demonstrates how little traffic the average blog out of those 29 million actually gets.
I wasn't sure what I wanted to focus on when I started, but felt that the general area of Web 2.0 would be most interesting. I've looked at a lot of Web technologies, but my focus has clearly narrowed down to RSS, OPML, and mashups over the last two months. I've also found myself drawn to the social and political forces controlling the Web and the blogosphere. This may seem surprising for a technologist, but it matches fairly closely the way I ended up spending my time as a dBASE guru in the Eighties and early Nineties. I love software, but software is built by people, and you can't understand why they build what they do if you don't understand their motivations.

One reason why I became a blogger was the hope that it would allow me to meet people who are doing cool things with software and the Web. That has certainly worked out. I've gotten to know dozens of new people in the tech world, many of whom are doing exciting development work or writing interesting things about technology. I also wanted to meet younger developers who I might be able to advise on their products. I get a real thrill from helping to design desktop software and online apps, and I'm now working closely with a few startups with real promise. I don't want to get involved with investing in any new companies yet, so my relationships are purely taking a mentoring role.

Over the next 4 months I'd like to do a lot more work with APIs, help a few products get launched, and see if the OPML Camp idea can go anywhere. I'm finding the Camp phenomenon fascinating. I had to get my capitalist head into the anti-capitalist Open Source space in the late Nineties. Now I have to adapt my control freak experience of running seminars and conferences to the idea of an anarchic model of running an event.

One thing I must try not to do in the coming months is start any more blogs. I have more than enough of those.

Saving API mashup data in OPML files

Posted on Sunday, February 26, 2006 at 8:47 AM (permalink)

Now that I can use the Optimal browser to display OPML files easily on a web page, I can go ahead with an idea I've had for creating mashups of API results. I wrote about this on my mashup blog 2 weeks ago, so you can find the details there. I'm going to start coding now, and as usual the details will appear on the mashup blog as I work them out, and the source code will appear on my Ruby blog. I'll report back here when anything interesting is ready for viewing.

API programming is this week's priority

Posted on Monday, February 13, 2006 at 2:00 PM (permalink)

I let myself get a little distracted with reading lists and blogosphere politics over the last week, but now I have to get to some serious coding to prepare for Mashup Camp next week. That means blogging will be light over here. You can follow my progress on my mashup blog, and I'll post the source for anything I write in Ruby on my Ruby blog. My focus will be on using my Tech Memeorandum XML and OPML files as sources for calling various APIs.

Extending feed grazing beyond simple reading lists

Posted on Friday, February 10, 2006 at 7:03 PM (permalink)

This evening I decided to spend some more time on my Tech Memeorandum mashup. The original goal of this project was to use the list of people and blogs cited by TM as the starting point for experimenting with multiple APIs, such as Technorati and Del.icio.us. I was sidetracked for a while following the path of dynamic OPML reading lists, but now I was again ready to tackle the API issues. I had been stuck on the problem of creating a data structure that could hold the TM blog citations and all of the API results that were based on this list. Only this time I was approaching the problem with a better understanding of OPML, and an appreciation of James Corbett's thinking on this issue.

James coined the term feed grazing, and has been blogging for a while about the ways OPML could be used to tie together disparate information. I decided to reread his blog on this subject, and discovered that he had already solved my problem in a post he wrote today:

"Better again, imagine if Adam's script could generate a multi-level OPML hierarchy with the feeds for the original story in the top level nodes and the referencing blogs leading off those as sub-nodes. Now that would be getting close to the "evanescent, biotic OPML hierarchies" I spoke of yesterday."
This was my solution. Instead of some complex data structure, I could create a simple collection of inter-related OPML files. The result of each API query could go into its own OPML file, and I could modify my current TM reading list to point to all of these lower branches on the tree. I knew that James had written a post based on the awful pun of OakPML that addressed this idea, so I went back and found this rather colorful, but highly useful metaphor:
"You might think of new RSS feed items as the acorns at the extremities of the tree, popping in and out of view like strobe lights flashing as the timelapse runs through the decades. Now picture a little squirrel on the ground looking up, eying those acorns with envy. He is of course a Feed Grazer. Starting out at the root he makes his way along multiple levels in the hierarchy, from sub-node to sub-node until he reaches the acorn (latest feed item). He plucks (grazes) it and then scuttles back down the tree (up the hierarchy) until he comes to another interesting looking fork. Then back towards the extremity to fetch another acorn. And so on, and so on. That, in a... ahem... nutshell is Feed Grazing."
Once I had finished groaning, I could see that feed grazing was more than just a way of reading a list of feeds, it was a metaphor for traversing a potentially complex outline based on OPML. The squirrel doesn't understand the whole tree, he just knows how to find his next acorn. I still have to create a user interface for this collection of OPML files, but I'll leave that for the future. Hopefully, James will come up with the answer when I need it.

Giving James Corbett proper credit

Posted on Tuesday, February 7, 2006 at 10:08 PM (permalink)

I have a feeling that 'feed grazing' is going to catch on, both the term and the underlying idea of reading feeds based on continually changing OPML files. Since the origin of common phrases is often a source of controversy, I wanted to nail this one down before everyone is using it. Danny Ayers' reported it as appearing in a comment on his blog from James Corbett on February 6th, which is true, but the better and earlier source is a post James made on his own EirePreneur blog on January 31st. I also discovered that James coined the term "river of feeds" on his blog two days before I used it on my mashup blog. Damn. I was feeling so good about coming up with that. Anyway, it is good to set the record straight. Now everyone will know where to look when the Wikipedia article is written.

As long as I'm pointing to James' blog, let me link to one more great post in which he compares Steve Gillmor to James Joyce.

Starting a new blog

Posted on Thursday, January 12, 2006 at 9:55 AM (permalink)

Now that I have my Ruby projects planned out for at least a year, I'd like to figure out my next blog. My principal reason for writing these blogs is to learn as much as I can about new Internet technologies. Ruby will likely be my core language for exploring much of this on the server side, but I also need to learn AJAX to handle client-side programming. I had thought about starting an AJAX blog, but that is too limiting. I also want to learn about lots of other stuff:

  • APIs
  • Map programming and related geocoding techniques. As O'Reilly would say in his incredibly pithy manner, everything having to do with Where.
  • Mashing up APIs and maps to create new hybrid apps.
  • The various interop technologies, such as REST, XML-RPC, and SOAP.
This next blog will probably be the last one I start for at least a year, so I want it to be inclusive enough to handle all of these issues. I'm thinking that Mashups combine them in a convenient package, so my most likely decision will be to start Mashup.Darwinianweb.com as the new blog. That would leave me with three blogs:
  • DarwinianWeb.com: Focusing on general issues of the evolution of the Internet and software.
  • Ruby.DarwinianWeb.com: A central clearing house for all my Ruby programming work.
  • Mashup.DarwinianWeb.com: A common location for my explorations of the rest of these new technologies.
My wife keeps telling me to just write one blog and combine everything there, but I think that the separate Ruby blog has been a success, because it allows me to cover details about the language and source code listings that would surely chase away any general reader who wasn't programming in Ruby. I think the mashup work requires the same type of separation. I'll make a decision on this in the next day or so. Luckily the code I've written for these blogs makes it easy to start a new one with just an hour or so of effort.

Structured Blogging is a key step toward a defacto SAPI

Posted on Wednesday, December 14, 2005 at 10:08 AM (permalink)

At Syndicate today Marc Canter announced a set of XML data standards for encoding various types of microcontent, such as movie reviews, that he is calling Structured Blogging. This is clearly needed for the growth of features around RSS and the standardization of the XML returned by a SAPI.

The coming SAPI war

Posted on Wednesday, December 14, 2005 at 9:10 AM (permalink)

If the Web, at least the interesting part of it, is going to look like a huge collection of search engine items, then everyone is going to start building search engines. It's easy to predict a two-tier business model in the future, with major search engines offering API access to their code and data, and a second layer of application developers building cool mashups, remixes, aggregates, whatever, on top of this world wide data base. A major choke point is going to be the Search API (SAPI) used to access this data. It is far too early to tell which API will win, but it is in the adoption of a defacto standard SAPI that the war will be fought.

There is a tradition within the computer trade press to describe such competitive situations as wars. The wide range of military metaphors this provides makes it an obvious choice. Headline writers alone are immensely grateful for its use. We have had spreadsheet wars, and OS wars, and browser wars. Now we can have a search engine war with SAPI as the ammunition.

Search engines have long been tools of individual habit and taste. I use Google, my wife uses Yahoo!. There are toolbar schemes to lock people into one search engine, but users are still able to migrate or use multiple engines of their choosing. If there is a viable business model for an application layer on top of search engines, something still to be proven, then the battle for SAPI lock-in will become brutal, because it will make customer migration or multiple use more difficult. Users won't know, or care, what search engine is running under the hood. To be Web 1.0 about it, SAPI will become the superglue of search engine stickiness.

Map mashups are so tempting

Posted on Monday, December 5, 2005 at 7:20 AM (permalink)

Programmable Web has a great roundup of mashup tutorials. I have to find time to start working with maps.

Exploring the world of ping servers

Posted on Thursday, December 1, 2005 at 7:55 PM (permalink)

Now that I have an archive of 6 weeks of posts I'm ready to start actively attracting new readers. My next step will be to add ping servers to this blog's code. I know in general that ping servers accept blog updates through some API, but I haven't ever worked with one. I think I'll try building a solid list of ping server links at Del.icio.us and then report bck on what I find. You can watch this list as I search.

What do you mean it's free?

Posted on Wednesday, November 30, 2005 at 8:00 AM (permalink)

I had dinner last night with an old friend from the software business and once again had one of those conversations where we try to come to grips with a new Internet economic model. In 1995 I was telling my software friends to drop everything and start publishing web sites, but what is the business model they asked? Why should they give content away on public web pages when they could publish with AOL or Compuserve? In 1999 I was telling them to read The Cathedral and the Bazaar and try to wrap their heads around free software. But how can we give away software and still make money they cried? Last night I explained what I knew about microcontent, and said that in their rush for customers the major content holders and search engines would provide unlimited APIs and RSS feeds for all of their content.

The next wave of freely available intellectual property will once again distort the Internet economy, but that won't prevent it from happening. I don't think that there is some inevitable progression to all IP being free. Each set of changes took place for different reasons and in different times, but it is clear that massive change can occur before there is an economic justification. The Internet doesn't care if anyone makes money or loses money, the Internet serves the crowd.

The "Berlin Wall" of this next burst of data freedom will be when Google unlocks the limits on its search engine API. I say to you Brin and Page, tear down that wall!

Time to do some reading

Posted on Tuesday, November 22, 2005 at 2:17 PM (permalink)

I've gotten way ahead of what I really know about. Before I start building an API based on XML and compatible with RSS and Atom, I better spend some time reading about all of these protocols. Besides, it's a rainy November afternoon in Boston.

The urge to scale

Posted on Saturday, November 19, 2005 at 8:26 AM (permalink)

I guess being a dot-com CTO is in my blood. I like to think through various architectures for managing groups of websites. You need to lock down a model for scaling early or you face big problems if you ever need to handle large amounts of traffic. The real key is a logical architecture for domain names. For example, if I thought I was going to serve a lot of podcasts, I would create something like data.darwinianweb.com or podcasts.darwinianweb.com. That would allow me to move that part of my content where it could be best and most cheaply served.

Right now I have darwinianweb.com to handle this main blog where I plan on covering general issues on the changing form of the Internet. I also have ruby.darwinianweb.com, which is a blog that allows me to go into as much depth as I want about learning the Ruby programming language.

I don't want to have too many subdomains, categorization can be handled more easily and on a larger scle with tags, which I am working on adding. At the same time, a separate domain creates more of a distinct place or channel of thought for the user. People automatically switch contexts when they change to a new site, just like a new TV channel.

I plan on having only a few more content subdomains, such as ajax.darwinianweb.com, and xml.darwinianweb.com. Programming languages or standards like XML are so broad and have so many supplementary tools and resources that they work better in their own site or subsite.

I'll also be creating separate domains for exchanging data with other servers. I don't know what will happen with my API experiment, or if that will become a target for abuse, so I'll also create api.darwinianweb.com to serve API calls. It isn't a matter of large amounts of traffic. I want to be able to shut down the API server easily. Of course, that brings up the issue of dependency on critical servers in a distributed environment called for by Web 2.0.

One solution, which also comes easily in an XML/RSS based communication model, is cache the most recent messages as text files, so the most recent result of an API call can be reused instead of calling the API again.

These issues will be played out on a much larger scale throughout the web. Chains of API dependencies will play interesting roles in the future.

Architecture for tags

Posted on Thursday, November 17, 2005 at 9:31 AM (permalink)

I've been thinking about adding tags to this site, which stimulated some thinking about site architecture. I wrote my own blogging code to manage this site, so I can have maximum flexibility in areas like this. I've decided to walk the walk by building out this site with Web 2.0 architectures. That means I'm going to create my own API that returns XML as either RSS or OPML, and then have other parts of the site deliver page content based on this API. I'll then use that functionality to build a tag viewing interface similar to Delicious for my own posts here.

It sounds like overkill, but look at it this way. The content of this site is in a MySQL database on a server, which may not always be on the same physical machine as the site's Apache web server. As long as I have to adopt a client-server architecture, I can just as easily go around the outside through API calls over HTTP. It may be slower than making database calls directly to MySQL, but it will be a relative issue. If the performance slows down, I can just speed up the hardware or get someone to optimize the code . It is a totally scalable architecture. Of course, I won't try and deliver the entire site this way. The vast majority of the content is generated as a static html file. Just the controlling bits, and results of searches have to pass through the API/XML processing.

I'll write about the coding details on the Ruby site and post here when I have something you can try out.

Everyone has their own Googlebase

Posted on Wednesday, November 16, 2005 at 8:30 AM (permalink)

The initial reaction in the blogosphere is very different from mine. Most people are reacting to the mere fact that Google has a "database," and filtering it through their personal view of Google, Ebay, Microsoft, etc. Few are actually looking at how it works, and the ones who do often say that they are intimidated. Dude! have you ever SEEN a database?

One problem, which Bosworth may have realized more than I, is that today's end-users may actually be less application savvy than even the average user in 1985. Many of them, especially bloggers, even A-list bloggers, use the computer purely as a communication and publishing device. They are extremely adept at IM, blogging, email, IRC, and even the new areas like tags, but they probably have no reason to use Excel, and I wonder how many users under the age of 25 have ever seen Access or any other database of similar complexity.

I'm not proclaiming the dumbing down of the average computer user, just the shifting of their experience to text oriented, social interactions. So are they ready for the type of database I want? Will they ever be? Surely if we have to start with tags and work our way back up to even flat files, which are still way beyond the capability of Googlebase, this is going to be a long education process. But there are still going to be application developers. Will they be solely professionals? Will we not see a new wave of user-developers emerge on the web as we saw during the PC revolution?

I still think Googlebase has to become much more, and that they realize it. For example, to drive the Google maps API, which is red-hot right now, you need a list of locations. Surely they understand that that list should be stored in Googlebase. So they must be planning to deliver the ability to store and manage lists. Which means they must deliver the capability of a flat file database at a minimum.

Ebay drops API limits

Posted on Wednesday, November 16, 2005 at 7:15 AM (permalink)

We'll see how long Google holds out against the competitive pressure to open up their search API limits.

Tags: api ebay google

I was right, Google maps are easy

Posted on Tuesday, November 15, 2005 at 6:48 AM (permalink)

I got a chance to read over the Google Maps API documentation last night, and it looks like my guess that it would be easy is correct. I can see why so many map mashups have appeared. I should have my first one working within a few hours of starting to code, which will hopefully be tonight after I get back from the SSA.

Google maps API comes next

Posted on Monday, November 14, 2005 at 11:43 AM (permalink)

Based on the speed at which Google Maps mashups are appearing I get the feeling it must be pretty easy to do. I'm supposed to be at this conference later today and tomorrow, but Im going to see if I can fit in a quick check of the Maps API late tonight.

My first Amazon API program

Posted on Sunday, November 13, 2005 at 8:01 PM (permalink)

I now have a very simple program that queries the Amazon API for books on Ruby programming and displays the results as a list of titles linked to their product pages. The most interesting part wasn't the coding, which is pretty simple, but the incredible depth of Amazon's API. They make it possible to build a complete e-commerce site built on their engine. I now understand why Jeff Bezos was quoted on a financial program as saying that Amazon may eventually become a e-commerce systems provider instead of a retailer. You can learn a lot about a company's plans by studying the functionality they surface in their API. Even if I don't end up building a real product with anyone's API, I will get a better understanding of their strategy.

Reading Amazon RSS with Ruby

Posted on Saturday, November 12, 2005 at 4:21 PM (permalink)

I'm having a ball with my Ruby project. I had the usual start-up issues, but once I found a webhost that knew what it was doing, and ironed out a few bugs in the database library for MySQL, things have gone great. I now have a simple script that will read the current Amazon RSS feed for computer books, and publish it as a list of linked titles on a web page. Now I'm ready to sign up for an Amazon API account and get started with web services programming.

I agree with Dave Winer about the Google API

Posted on Wednesday, November 2, 2005 at 9:12 PM (permalink)

Dave has put forth the proposal that the Google search API be made a common standard, and more importantly that Microsoft implement it without limits.. I couldn't agree more. I haven't written about the Google API yet, but in the past when I read over Google's API terms of use they made my blood boil with their massive arrogance. Here's the part of their FAQ that really gets to me:

"What happens if I go over my limit of 1,000 queries? If you make more than 1,000 queries in a day, our server will respond with a SOAP Fault stating that you exceeded your daily query total. You might want to get some sleep and start querying again tomorrow."
In other words if you want to build a business on this API, you can just take a flying leap.

It would serve Google right if their API became a standard, and others allowed it to actually be used by everyone to make money, not just Google.

Lots of balls in the air

Posted on Wednesday, November 2, 2005 at 7:40 PM (permalink)

I've got so many projects ideas that I think I should list them, if only to remind me of where I want to go:

  • CSS: Redesign Darwinianweb.com site to use style sheets.
  • Ruby: Amazon API based app to determine the best book on a given subject.
  • Ruby on Rails: Rebuild the CMS for this blog from the current FoxPro code.
  • Ajax: Google API based app using Google Maps.
  • Ajax: A stripped down version of TiddlyWiki as a form of self-modifying page.

Does the word "Base" give you a hint?

Posted on Wednesday, October 26, 2005 at 11:44 AM (permalink)

Am I the only one who sees GoogleBase as an online database? You know, like Access only on the web. The blogosphere and the MSM are rushing to describe it as competition for Ebay and Craigslist in offering a free site for classified ads, and that probably is part of Google's plan. But I've known Adam Bosworth for more than 20 years, and he is a database guy through and through. It can't be a coincidence that he started working for Google almost a year before GoogleBase made its first appearance. The only blogger I've seen who has recognized the possibility of GoogleBase becoming a standard end-user database has been Om Malik, and he assumes that since Quickbase wasn't a success GoogleBase will also fail to attract an audience. He failed to mention that Quickbase sucks. The performance is unacceptable and the functionality is weak. I can't believe Bosworth would make those mistakes. The acceptance of online apps is also much greater now than when Qucikbase appeared. We'll have to wait until it appears, but I think a fast end-user database with large amounts of storage and a good API could become a significant part of the web services infrastructure.