Posts tagged as: search
Members of Congress don't want their search history made public either
Posted on Thursday, February 9, 2006
at 2:33 PM (permalink)
I knew that the DOJ-Google lawsuit would have beneficial repercussions. CNet is reporting that there is a bill in Congress to force websites to 'delete information about visitors, including e-mail addresses, if the data is no longer required for a "legitimate" business purpose.' It makes you wonder how many members of Congress have been using Google to find out if the Hilton in Paris allows you to play videos. They may not all have iPods, but I bet they all use Google.
Google privacy fallout
Posted on Tuesday, January 24, 2006
at 8:14 AM (permalink)
A few days ago I wrote that the DOJ-Google lawsuit was a good thing, because it raised the public consciousness about the lack of privacy when using search engines. This morning Squawkbox on CNBC is running a segment on anonymous surfing tools. What's next, Jon Stewart telling jokes about losing his PGP key?
The Google subpoena is a good thing
Posted on Saturday, January 21, 2006
at 2:05 PM (permalink)
This morning NPR teased a segment on the DOJ-Google lawsuit with the line "You may not be aware that when you use Google, the sites you visit are recorded." If this one fact enters the public consciousness, then the lawsuit will have done a great service. A few months back I started to write a post saying that I found myself restricting searches that I didn't want made public (Hey, I'm a heterosexual male), but I thought it sounded too paranoid. Now this is becoming common knowledge. This is an important moment for the Web. Next we can start arguing about the propriety of search engines like Google storing user search history in the first place. It's about time that debate was started.
Maybe I can take up psychopharmacology again
Posted on Thursday, January 19, 2006
at 4:10 PM (permalink)
Whenever I tell people I majored in psychopharmacology in college they laugh and say "Yeah, me too." Only in my case it's true. My undergraduate major was in Organic Chemistry with a concentration in psychopharmacology. My first job out of school was synthesizing morphine derivatives. The person at the next bench used pure THC as her starting material. The company I worked for kept a huge jar of it in a bank vault, and took it out to draw samples. It looked sort of like honey. Aahh, the good old days...
Where was I? Oh, yes, psychopharmacology. So when I read about the Chemistry search engine called Chmoogle (via SiliconBeat), I just had to take a look. What is really cool is that you can enter a search query using a java applet that lets you draw the molecular structure. I don't remember the structures of the molecules I used to work with, so I went to Wikipedia and looked up Nicotine:  With all the current fuss about the Feds grabbing Google's records, there's no way I'm going to put the structure of something fun, er, restricted on my blog. Anyway, I used Chmoogle's drawing tool, which is better than anything people dreamed about when I was in school, and then had Chmoogle search for it. I had the option of an exact match, or using this structure as a portion of a larger molecule. The substructure search is great if you are using a specific molecule as a starting material for synthesizing something larger, as I used to do with morphine. The exact match allows you to retrieve all types of useful information about that molecule. So why should you care? It's not as if you're going to start a personal Meth lab. (Jesus, this post is going to come back to haunt me.) What is important to all of us is that Chmoogle shows what can be done beyond the same old, text style search among general purpose websites. This type of domain specific search holds tremendous promise for all types of applications. You could almost say it is an example of Web 2.0.
I hope I never have to say "paper" books
Posted on Thursday, January 5, 2006
at 3:14 PM (permalink)
I spent this afternoon in the stacks at Harvard's Widener Library. For some reason I felt the need to be surrounded by old books. It was probably caused by writing about ebooks the other day. I love books, and Widener is one of the world's great libraries, although finding things can be a challenge. At Widener they laughingly refer to the Dewey decimal system as the "new classification scheme." But then Harvard people like to remind you that the university was there before America was a country.
My favorite part of Widener is D West, which is the farthest section in the library's sub-sub-sub-basement, not a place for the claustrophobic. The smell of old books is something I've loved since my father started taking me to used-book stores in New York and Philadelphia as a child, and this smell seems strongest in D West. Perhaps it is because this is where they keep the old magazines, like Punch and Harpers Weekly. Selecting a shelf at random I found myself in front of the First Edition of the Encyclopedia Britannica dated 1771. They have stuff like this sitting on the shelves for anyone to read. Last year I had a bet going with the other grad students to see who could take out the oldest book. The best I ever did was a book on comparative anatomy by Couvier dated 1796. The Britannica was for in-library reference only, so I sat down and browsed through it.
I soon realized that this was exactly what I was yearning for. In this Google age we have come to believe that it is possible to find the "best" answer to a search. Sitting with this centuries-old book drove home the idea that there is no right answer, there are just answers that fit the context of their time. For example, the entry for America was only one paragraph long, and described it as "one of the four continents" with an indigenous population of "copper-coloured" natives. My favorite entry was for buccaneer, which described them in the present tense and seemed to have a sense of national pride (Britannica was published in Scotland) at the way they harassed the Spanish navy. Google, on the other hand, thinks they are a football team from Tampa.
This doesn't mean that Google is wrong. I'm sure most people looking for Buccaneer today do want the football team. The problem is that search engines in general make it impossible to recognize the changing context of information. I can't ask the Web "what did people mean by a particular term in the 18th century, or the 19th?" Soon even the 20th century will be overlaid by a new set of best answers. Once Google indexes all the world's books, will their algorithms determine the best answer to every question?
For some reason this makes me incredibly sad. Was my generation the last one to get most of its education from books? Even worse, will I live to see the time when the qualifier "paper" books will be necessary, just like the snide use of "snail mail" to differentiate it from email?
The coming SAPI war
Posted on Wednesday, December 14, 2005
at 9:10 AM (permalink)
If the Web, at least the interesting part of it, is going to look like a huge collection of search engine items, then everyone is going to start building search engines. It's easy to predict a two-tier business model in the future, with major search engines offering API access to their code and data, and a second layer of application developers building cool mashups, remixes, aggregates, whatever, on top of this world wide data base. A major choke point is going to be the Search API (SAPI) used to access this data. It is far too early to tell which API will win, but it is in the adoption of a defacto standard SAPI that the war will be fought.
There is a tradition within the computer trade press to describe such competitive situations as wars. The wide range of military metaphors this provides makes it an obvious choice. Headline writers alone are immensely grateful for its use. We have had spreadsheet wars, and OS wars, and browser wars. Now we can have a search engine war with SAPI as the ammunition.
Search engines have long been tools of individual habit and taste. I use Google, my wife uses Yahoo!. There are toolbar schemes to lock people into one search engine, but users are still able to migrate or use multiple engines of their choosing. If there is a viable business model for an application layer on top of search engines, something still to be proven, then the battle for SAPI lock-in will become brutal, because it will make customer migration or multiple use more difficult. Users won't know, or care, what search engine is running under the hood. To be Web 1.0 about it, SAPI will become the superglue of search engine stickiness.
Now Joshua can buy some servers
Posted on Saturday, December 10, 2005
at 11:59 AM (permalink)
Just as I was about to write about my desire to replace Delicious due to its lousy performance, I read about its acquisition by Yahoo!. At least Joshua Schachter will now be able to buy enough servers to handle the load. As far as the long-term impact on the search engine balance of power, I don't think the addition of Delicious will help Yahoo! as much as their future commitment to tagging. The move that really matters is a decision by Yahoo! to fully integrate tagging into their search engine, not just My Web. The principal reason to buy Delicious now is obvious, to remove it from the table before it either gets too expensive, or is bought by a competitor. No this isn't bubble mentality, it is a good business decision in a rising market.
Sometimes it gets a little scary
Posted on Sunday, November 27, 2005
at 10:06 AM (permalink)
I may have felt amused and a little titlated by the possibilities of the Amazon A9 street cams, but then I realized that Amazon also has Turkers working for them. Now I feel a little creeped out. So there is an autonomous economic model in place that says if enough people ask Big Search to see something, eventually people will be paid microcents somewhere in the world to take a picture of it and feed it back to Big Search. ... So what happens if Big Search starts asking people to "do things" to retrieve this information? Talk about a global Heisenberg principle. We will be changing the world just by looking at it.
Big Search needs to be fed
Posted on Sunday, November 27, 2005
at 9:31 AM (permalink)
Orwell was about 20 years too late and too focused on politics instead of economics. The drive of Big Search is to "always be growing content," so we can expect more complete coverage and more closer-to-realtime coverage in the future. Amazon's maps now have street by street photographic coverage of many cities. Will I use it? Hell, yeah, it is damn useful. I can point you to Border Cafe in Harvard Square, which just happens to be one of the best college student dives in the country. Unfortunately, the image defaults to the side of the street with Starbucks. You'll have to click the film strip for the opposite side of Church Street. (via Robert Scoble)
If you love tags, set them free
Posted on Tuesday, November 15, 2005
at 6:51 PM (permalink)
It's clear from this morning's SSA session that control is going to be the key friction point between users and providers of a tagging system. I was struck by the way the knowledgebase and database guys immediately hit on the looseness of tags as a weakness of a folksonomy. Well, yeah, of course they're loose, that's where the "folks" come from. It is this looseness, which creates a tag cloud around a concept. That cluster of words can be just as useful (or even more) than the web objects they point to.
Google has convinced advertisers and the financial markets that a large text stream, such as Google search, is extremely valuable, so it would follow that a tag stream would be more valuable, because tags are more concentrated. A freely growing tag cloud can help advertisers find the magic words that users associate with their world. I was impressed by the wisdom of Jashua Schachter's comments that he wouldn't put constraints on the way users named tags. Now that I've thought about it for a few weeks, I agree. As soon as you try manipulating the tag stream you change it in unknown ways. I think there needs to more education of the public on the difference between categorizing databases and tagging systems. We can agree to allow both to exist for different purposes.
Besides, allowing variation within a collection of tags makes evolution of the language possible. As users get better at tagging there can be an open breeding ground of new tags and compound tags. Losing this rich potential is not justified in exchange for more consistency.
Google academic papers
Posted on Sunday, November 13, 2005
at 9:02 PM (permalink)
The footnotes to Chapter 2 of "The Search" cited a 1998 paper, "The anatomy of a large-scale hypertextual Web search engine," by Sergey Brin and Larry Page. It described the early Google architecture and their plans for it. It's pretty readable, and provides an interest glimpse of their views before they dreamed of controlling the world from their own 767 continually circling the globe at 40,000 feet.
A little research revealed a set of additional papers by Brin and Page, or in some cases one of them along with other co-authors. Someday these papers may provide a historian of science with some valuable source material. I wonder if they saved their early emails? I also came across Sergey's home page from Stanford circa 1998, where I found this adorable picture of him.
Book Note: The Search
Posted on Sunday, November 13, 2005
at 8:30 PM (permalink)
After a long day of Ruby coding it's time for a little bedtime reading. Here's another installment of the abridged version of "The Search." Much of this chapter is a recounting of lost opportunities with search engines that preceded Google, which I'll spare you, but there are some fascinating factoids about search and some interesting insights. Chapter 2. Who, What, Where, Why, When, and How (Much) At the end of the day, the holy grail of all search engines is to decipher your true intent--what you are looking for, and in what context. ... When you type in a one-word query for "York," for example, do you want results for "New York"? Most likely the answer is no. (p. 23) [Indexing the Web] is no small task: by most accounts Google alone has more than 750,000 computers dedicated to the job. (p. 24) Pew estimates that on any gven day in the United States, 38 million people are using a search engine. All those searches add up to nearly 4 billion queries each month. (p. 25) Piper Jaffray estimates that the world conducted about 550 million searches each day in 2003. (p. 26) From its inception as a business in the late 1990s to 2004, paid search as an industry grew from a base in the low millions to $4 billion in revenue, and it is estimated to hit $23 billion by 2010, according to Piper Jaffray. (p. 234) Google alone boasts more than 225,000 unique advertiser relationships. (p. 35) According to a report from Dieringer Research Group, nearly 100 million people made purchases after doing online research in 2003, and nearly 115 million searched for product information. (p. 36)
Book Note: The Search
Posted on Wednesday, November 9, 2005
at 2:15 PM (permalink)
I finished this book a month ago, but I was so impressed that I decided to go back and see if I could collect the key insights from each chapter to create an abbreviated guide. Hopefully this will encourage you to buy and read the whole thing. Chapter 1. The Database of Intentions By the fall of 2001, the Internet industry was in full retreat. Hundreds of once promising start-ups--mine among them--lay smoldering in bankruptcy. (p. 1) [Google] Zeitgeist had more than its finger on the pulse of our culture, it was directly jacked into the culture's nervous system. This was my first glimpse into what I came to call the Database of Intentions--a living artifact of immense power. (p. 2) Google was a technology business, he [Eric Schmidt] told me. (p. 3) A year later I met with Eric again. Among his first words: "Isn't the media business great?" (p. 4) Much as the Windows interface defined our interactions with the personal computer, search defines our interactions with the Internet. (p. 4) The Database of Intentions is simply this: the aggregate results of every search ever entered, every result list ever tendered, and every path taken as a result. (p. 6) This structure will provide the seedbed for scores of new cultural phenomena over the next decade. (p. 7) Companies like Overture and Google made their first profits in the darkest hours of the dot-com collapse. (p. 8) In essence we have taken much of our once-ephemeral and quotidian lives--our daily habits of whom we talk to, what we look for, what we buy--and made those actions eternal. (p. 10) Search drives clickstreams, and clickstreams drive profits. To profit in the Internet space, corporations need access to clickstreams. And this, more than any other reason, is why clickstreams are becoming eternal. (p. 12) If Google and companies like it know what the world wants, powerful organizations become quite interested in them, and vulnerable individuals see them as a threat. (p. 13) As a Google executive noted to me when I brought this up: "We're one bad story away from being seen as Big Brother." (p. 14) But imagine the disorientation you might feel if search becomes self-aware--capable of watching you as you interact with it. (p. 15) My problem is not finding something," says Danny Hillis, a MacArthur Foundation genius and computer scientist who now runs a consulting company. "My problem is understanding something." (p. 16)
|