by when Bing will begin providing Yahoo’s search results (though some testing has already started). Combined, Microsoft and Yahoo! provide about 30% of the search results in the United States, but only roughly 10% of the search results overall; Google, at 63% U.S. and 85% overall pretty much owns search.
Google’s dominance is one of the reasons many people get excited about alternative search engines. Choice is important, especially in something as important as access to Web-based information, and so is competition, which often leads to innovation. There’s often excitement leading up to the introduction of well funded and reputedly innovative search engines, such as Powerset (quickly acquired by Microsoft) and Cuil, both of which debuted in 2008, and Blekko, which is currently in closed private beta, but earned a positive review from Michael Arrington at the influential TechCrunch. Innovation in search is a good thing for many reasons, not least of which is the issue Paul Ford recently called, “the Barnes & Noble problem”:
Until I was about 26 almost everything I wanted to read was in Barnes & Noble. Eventually they had less and less of what I wanted. Now B&N’s a place I go before a movie, and I get my books anywhere else. I’m increasingly having B&N moments with full text search ala Google. It’s just not doing the job; you have to search, then search, then search again, often within the sites themselves. The web is just too big, and Google really only can handle a small part of it. It’s not anybody’s fault. It’s a hard, hard problem.
It’s possible that many ways exist to avoid the Barnes & Noble problem in Web search, but the two ways most companies seem to be trying at the moment are represented exceptionally well by Endeca and DuckDuckGo. Endeca, which provides search for Borders, Walmart, Home Depot, and many other large corporations and institutions (as well as North Carolina State University Libraries), will “guide users through asking and answering any question;” DuckDuckGo tries to out-google Google by adding features people want, removing annoyances, and finding out what’s working by engaging its users in a fun, ongoing conversation about their interests.
Both of your companies provide search for specialized collections. Do you believe that people want a single, universal interface that will work everywhere or do they want an interface that’s been built to suit the collection they’re using?
Gabriel: I think that vertical search engines can work if they are compelling enough, e.g. Kayak, which aggregates prices on airline tickets, hotel rooms, car rentals, and helps people find good deals on travel. However, there are only so many verticals where they can be compelling due to business model, i.e. high transaction value.
In general, I believe people want the “single, universal interface that will work everywhere.” At DuckDuckGo, I have a longer term goal to help people navigate towards vertical engines that may be better for them. I’m doing this currently in a completely self-selected basis via !bang syntax.
If you look at each vertical, there is usually a search engine out there that produces better results than Google for that vertical. But no one is going to go to each of these hundreds of sites in specific situations.
Pete: Good experiences are always designed around tasks—around specific users searching for specific content. And I’m using the word “search” to mean much more than the search box—I’m talking about all the navigation, visualizations, and content that helps people find what they need. Now, if you ask people what they want, they’ll say they just want a Google box. But if you test that against a task-built experience—say, image search at Jupiter Images—they’ll overwhelmingly pick the latter. Marti Hearst tested a great example of this as part of her Flamenco Search Interface Project on faceted search User Interfaces (UIs).
Is it important for search interfaces to match the way people think or will people adjust their thinking to suit search interfaces?
Pete: There’s a difference between zero-training and easy-to-use. Zero-training means it has to match the way people think, and for any popular public-facing website, it has to be fluid. On the other hand, there can be easy-to-use sites that take a few minutes to learn. They better become fluid after those few minutes though. For example, we’ve built some search applications for manufacturers that give their design engineers thousands of facets. They’re willing to spend a couple of minutes to orient themselves to get power-user features. First time I switched from a PC to a MAC, I was surprised that there was still a learning period, but it faded fast.
Gabriel: If you want fast, low-cost adoption, I believe the interface should be as fluid and simple as possible. However, sites like Amazon have proved that you can push through User Experience (UX) with enough money. By which I mean basically what Pete said, in that if you are allowed to train people for a few minutes then you can end up with a better UX overall. Amazon has done it essentially via brute force, i.e. push through by simply being around long enough that people end up spending those few minutes over time.
How do you weigh precision versus recall? Has your thinking changed along the way?
Gabriel: I’ve been pretty much about precision from the beginning, in part because I rely on external APIs for the long-tail; that is, for less popular searchers, I rely mostly on the raw search results I get from Bing API 2.0 and Yahoo! “Build your Own Search Service”. I think my value-add for those types of queries is in added precision. But more generally, there are just so many Web pages out there and people don’t look at many of them (they choose from just the top few results), so precision is most important for general search. For specialized search I think it can reverse depending on the vertical.
An example of a vertical in this context would be searching for bug reports. There are usually very few pages out there that have the exact output of your bug report, and if they exist, you want to find them. For things like that, we rely on Yahoo & Microsoft to have crawled those pages. For less specific queries we layer on top of those APIs some Neuro-Linguistic Programming (NLP) stuff that, among other things, tries to extract the concepts/entities in the query and gives you pages more associated with them. For other queries where we know a vertical engine will give you better info, e.g. weather or complicated math, we will automatically query an API and display the better results—I think this is another form of recall.
Pete: You can cheat the precision vs. recall trade-off. At Endeca, we’ve become disciples of the Human Computer Information Retrieval school, and all that Gary Marchionini and Daniel Tunkelang have done to popularize the HCIR model.
When we started, it was orthodoxy that there was a trade-off between precision and recall. That assumes people make a query into a black box, get back a ranked list of results, and then either accept one of those top results or recompose their query. It’s the TREC evaluation model. But ranking is dubious—it conflates many dimensions of relevancy into a single score.
With HCIR, there is no strict trade-off between recall and relevancy. Instead, you engage the user in a multi-step “conversation” with the data, as in a faceted search. You start with a probe query that returns a set of results. And then the system characterizes the set—it tells you the attributes and facets associated with that set. That helps you refine to a subset, then lather, rinse, repeat. The trick is to treat search as a set retrieval problem instead of a ranked list retrieval problem.
For example, if your task were to find a photo of dogs with kids to illustrate a book jacket, and all you had was a classic search box, you’d probably maximize for recall with some searches like “dogs kids jpg” or “dogs children photos” and then eyeball the results. But with HCIR, the system has a chance to teach you about the results. Back to Jupiter Image search, we could search for “dogs,” and then discover facets about ages, concepts, and image technique, and use those to whittle down. You’re returning a set of results, and then learning about subsets. The effect is that you get unexpected results that you could never hope to discover with keywords.
What usability testing methods do you find most informative?
Pete: Agile testing is best. Make mistakes often and learn from them quickly. I’m with Jared Spool—you can learn a lot, inexpensively, by testing a small set of people and iterating.
Gabriel: I find natural feedback coming through the site to be most informative. Often this kind of feedback comes from users who have put in a lot of thought. I’ve also found Reddit comments from ads to be particularly informative, especially for first impressions. Finally, I’ve gotten use out of PickFu. I have plans to investigate usertesting.com and feedbackarmy.com as well, but haven’t done so yet.
Can you expand on “natural feedback”? And how you’ve used Reddit and PickFu?
Gabriel: By natural feedback I mean feedback that flows from real users using your site. On DuckDuckGo, there is a feedback button on every search result page (in the lower right corner). Most of our feedback comes through there and is in a “natural” context of searching for something particular.
I posted a PickFu review on my blog. Basically, it is good way to get quick opinions on two choices. People vote which one they like better, but more importantly they give you their take on why, which provides some insight into what people were thinking.
Reddit is more straight advertising, but with each ad there is also a comment thread. Reddit users are known to actually check out things and report back in comments, and they luckily do this for Reddit’s ads as well. But that’s not all, because you can actually engage with Reddit users as well, and have conversations about your product. All in all, it is a great feedback experience.
Gabriel: On a feature level, our about page attempts to answer this question directly: But at a higher level, I’m trying to make DuckDuckGo results pages more readable and understandable. A lot of the features are in this vein. For example, I put Zero-click Info on top, which is readable topic summaries (sometimes full paragraphs) from crowd-sourced sources like Wikipedia and Crunchbase. Other examples are labeled official sites, human-edited link titles and descriptions (also from crowd-sourced sources), disambiguation pages, and fewer useless sites in our results pages. Another angle is discovery. I provide related topics (as opposed to related searches) and category pages, which are groupings of topics of a similar theme.
How closely do you think profitability aligns with quality? In evaluating your competition, do you get the sense that it’s the better engineered search products or the better run businesses that are succeeding?
Pete: Just to set the context for Endeca, in our market, our customers want to customize a search experience for their specific users and content. There’s a healthy market for one-size-fits all sites generated by inexpensive appliances, but that’s not our market. NCSU, WalMart.com, and ESPN have different experiences from each other. We call these search applications.
There are a few ways to go about that. You could invest many, many, many millions on in-house developers, like Amazon and eBay did. But our customers choose the platform route—they’re buying Endeca’s “Legos,” and partnering with our services team to design their site.
Now, that’s a complex project. It brings together teams from two companies that haven’t worked together before. And it mixes a lot of specialties—user experience, application development, information architecture—that might not understand a lot about each other. My friend Joseph Busch does high-end taxonomy and document management projects, and he likes to joke that he’s 5% a library scientist, 95% a social worker.
People tend to focus on technology when they’re planning a new site. But with projects like these, business process, user experience, support, professional services, education, and so on all matter, too. So to answer your question, in the search applications market, technology is part of it, but execution matters just as much.
Gabriel: I think it is product for the most part, at least for general search and with a few caveats. Google’s share just kept climbing and climbing, and I think that is largely due to its product. Recently, Bing canceled their cashback program after tons of money because it presumably didn’t yield new customers. That’s more evidence of product dominance.
The first caveat is distribution deals. A lot of people use what is in front of them, and sometimes have no choice. It’s very hard (if not impossible) for a startup to capture those distribution deals since Microsoft and Google have so much money behind them.
The second caveat is, without distribution it is very hard to get people to switch search engines. All the recently well-funded search startups who failed are evidence of this fact. I think they didn’t wow people enough in the product, however. But the bar is pretty high.
The third caveat is brand. Google did a study comparing its results with its competitors’ and found a huge implicit trust from using the Google logo at the top. They earned that, but that is additionally hard to overcome for a startup (or even for Microsoft).
What are your thoughts on expert search features, such as specialized syntax or regular expressions?
Gabriel: I’ve been trying to “walk the line” in this arena, by offering specialized syntax that I think could get mainstream support from power users. I think regular expressions are a bit out there for the normal user although I did already incorporate them in some capacity already (though probably not what you meant): http://duckduckgo.com/?q=regexp+/(.*%3F)+(.*%3F)+(.*)/+duck+duck+go.
Something I think more walks that line is the !bang syntax I created where you input !amazon x in the search box and it searches for x in amazon. I think that’s easy to grasp and it is useful. Additionally, I think it can help market to specific groups of users, e.g. I also added hex color codes and unicode query responses.
Pete: You know the rule of thumb that 90-odd percent of users never change the defaults. Whatever the number is, it’s increasing. That said, it’s not fair to round down to zero and say that the few people that do use expert features don’t count. They tend to be some of the most valuable users. We’ve got extensive XQuery hooks into our engine that make it possible to build up some great queries.
What do you think of Wolfram Alpha?
Gabriel: As a collection of cool data that gets aggregated usefully in response to queries, I love it! As a standalone product, however, I worry that it will die for lack of a business model. I think a lot of what they’ve done would be great in a search engine, and I’ve tried to integrate it as much as possible into Duck Duck Go (see Duck Duck Goodies).
Pete: There’s a continuum of search tasks that range from fact finding on one end to discovery on the other. (Fact finding: Who wrote Ulysses? Discovery: Which Irish writer should I read on the beach this afternoon?) Wolfram Alpha is really cool for fact finding, and lousy for discovery. You can’t have discovery without human input—HCIR.
What do you think of WorldCat.org?
Gabriel: I had not heard of it until this moment, so this is a first impression. I’m not the target customer since I haven’t checked out something from a library since college :). But I imagine this could be really useful for people who do check stuff out from libraries, i.e. students, researchers, etc. The implementation seems a bit cluttered and I’m not sure how big that market is. I suppose the business model is clicking through to Amazon or whatever; it’s an empirical question on how much that actually converts.
Pete: I enjoy WorldCat. They’ve done an impressive job on their primary mission. That’s sincere—I’m not damning them with faint praise. But if you want me to focus on search and give constructive criticism, there’s a lot more they could do.
If you hold up some great sites as the bar, you’ll see ideas WorldCat should adopt on user experience, relevancy, text mining, and visualizations. Just to name a couple of sites, IEEE Explore and Food Network both have ideas that could improve WorldCat.
And if you expect OCLC to take a leadership role, they should push the bar on searching digital collections: full text, images, multi-media. We’ve been working with the JFK Presidential Archive on their next generation site to search their digital archives. That’s given me a real appreciation for how big the challenges are on searching digital collections. There’s a lot of work to do, and it would be good to see OCLC start experimenting.
If you’re interested in hearing more from Pete Bell, I recommend his always interesting contributions to Endeca’s excellent Search Facets blog as well as a very good interview with him conducted by Steve Arnold. For more on Gabriel Weinberg, I recommend his superb blog, book (still a work in progress, but we get to follow its development online), and the DuckDuckGo community for educators and librarians.
Thanks to Pete Bell and Gabriel Weinberg for participating in the interview, to Andrew Nagy for his question and his assistance with the article, and to my Lead Pipe colleague, Ellie Collier, for her comments.
- Andrew Nagy, an open source evangelist and library technologist, joined Serials Solutions in late 2008 where he has been an evangelist for Discovery services and seminal in the development of [Summon] (http://www.serialssolutions.com/summon/). Prior to joining, he was the Technology Development Specialist for the Falvey Memorial Library at Villanova University where he was responsible for developing many innovations, including VuFind , an internationally adopted open-source Library Discovery solution. [↩]