scrAPI for OPML
Now there's a title to conjure with. John Musser has an interesting ProgrammableWeb post on the use of screenscraping as a poor man's API. The idea is to use a script to parse a web page, and then return some specific set of data in an XML format. John credits the idea to Thor Muller, who provides some excellent details on the pros and cons of a scrAPI vs. an official API. Thor in turn recognizes Paul Bausch for coining the term SCRAPI in 2002. John notes that the result of the scrAPI can be returned "in some cleaner XML format." Yeah, like ... uh ... OPML?
A scrAPI is clearly your only alternative for a site that doesn't offer an API, but surely once an API is available you wouldn't need to adopt the scrAPI approach. Except when the API provider has a limit on the number of times you can call the API each day, and refuses to respond to email requests on how to get beyond the limit. I ran into exactly this problem with Technorati when I built my Tech Memeorandum - Technorati mashup. I hit an undocumented limit on daily Technorati API calls, and despite repeated emails to the company, including directly to Dave Sifry, I never got a response. I've been reading up on PHP the last few days, and this sounds like exactly the type of example project I should try out. Yes, it is a terrible kludge, but then what isn't on the Web? I think scrAPIs that return OPML can become a useful way of building a mashup, and I'll experiment with it until a better coder can implement the idea more cleanly.
Update: I guess it isn't too surprising that Dave Sifry would have a vanity feed at Technorati. At least I assume he has, because soon after this post pinged Technorati, I got an email response from him to my earlier messages. If they actually do give me a larger daily allotment of API calls, I'll be able to start posting some Technorati API scripts on my programming blog. If not, I'll just have to start scraping. Getting something done because of a blog post is fun, but it encourages bad behavior. This is probably how Scoble turned into the spoiled kid he can sometimes be. I love it when he starts yelling on his blog, "This blog is broken, and somebody better fix it right now!" I wonder if his wife lets him get away with that stuff at home, "I'm hungry, and someone better feed me right now!"
Update: Sean O'Hagan emailed me to point out the scrape command he wrote for YubNub. I guess this lazyweb thing really works.


