Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


Softpanorama Search Engines

News Seven sisters Recommended Links Papers Site Search Engine
Implementations
Etc
Harvest         Etc
We all know more or less how Google works: Links act as votes, and the more votes a page has, the higher its PageRank(tm). This system allows Google to do a decent job of locating the authoritative source for any particular topic. The links are the key to this system, they determine relevance. In theory one can exploit this system by creating "fake" or "pointer" sites that serve only to drive up the rankings for another site. Such a practice is known as Google Bombing, and while possible, involves a good deal of coordination among a large group of people, or a single person with lots of free time. There has been some debate about the degree to which weblogs can affect Google's rankings by facilitating large numbers of people to all simultaneously link to the latest and greatest trends, fads, memes, or news bites on the internet.

This past thursday, Google unleashed the Googe API, a SOAP interface that allows developers to query Google and retrieve results without having to use the normal HTML form interface. This is unleashing a new form of Google bombing.

For general information please visit HOWTO search the WEB. For information on other search engines, see Search Engine Watch.

"The Thomson Gale publishing group has put together a comprehensive review of Google Scholar, and they find it highly lacking compared with similar offerings from Highwire Press, Scopus, and The Web of Science. Will Google's overhyped offerings drive these superior services out of the market?"


Notes:
  • This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
  • The site contain some broken links as it develops like a living tree... Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.
Google Search
Open directory

Research Index

News

January 29, 2007  The Top 100 Alternative Search Engines by  Charles Knight

January 29, 2007 / 104 comments

Written by Charles S. Knight, SEO, and edited by Richard MacManus. The Top 100 is listed at the end of the analysis.

Ask anyone which search engine they use to find information on the Internet and they will almost certainly reply: "Google." Look a little further, and market research shows that people actually use four main search engines for 99.99% of their searches: Google, Yahoo!, MSN, and Ask.com (in that order). But in my travels as a Search Engine Optimizer (SEO), I have discovered that in that .01% lies a vast multitude of the most innovative and creative search engines you have never seen. So many, in fact, that I have had to limit my list of the very best ones to a mere 100.

But it's not just the sheer number of them that makes them worthy of attention; each one of these search engines has that standard "About Us" link at the bottom of the homepage. I call it the "why we're better than Google" page. And after reading dozens and dozens of these pages, I have come to the conclusion that, taken as a whole, they are right!

The Search Homepage

In order to address their claims systematically, it helps to group them into categories and then compare them to their Google counterparts. For example, let's look at the first thing that almost everyone sees when they go to search the Internet - the ubiquitous Google homepage. That famously sparse, clean sheet of paper with the colorful Google logo is the most popular Web page in the entire World Wide Web. For millions and millions of Internet users, that Spartan white page IS the Internet.

Google has successfully made their site the front door through which everyone passes in order to access the Internet. But staring at an almost blank sheet of paper has become, well, boring. Take Ms. Dewey for example. While some may object to her sultry demeanor, it's pretty hard to deny that interfacing with her is far more visually appealing than with an inert white screen.

A second example comes from Simply Google. Instead of squeezing through the keyhole in order to reach Google's 37 search options, Simply Google places all of those choices and many, many more all on the very first page; neatly arranged in columns.

Artificial Intelligence

A second arena is sometimes referred to as Natural Language Processing (NLP), or Artificial Intelligence (AI). It is the desire we all have of wanting to ask a search engine questions in everyday sentences, and receive a human-like answer (remember "Good Morning, HAL"?). Many of us remember Ask Jeeves, the famous butler, which was an early attempt in this direction - that unfortunately failed.

Google's approach, Google Answers, was to enlist a cadre of "experts." The concept was that you would pose a question to one of these experts, negotiate a price for an answer, and then pay up when it was found and delivered. It was such a failure, Google had to cancel the whole program. Enter ChaCha. With ChaCha, you can pose any question that you wish, click on the "Search With Guide" button, and a ChaCha Guide appears in a Chat box and dialogues with you until you find what you are looking for. There's no time limit, and no fee.

Clustering Engines

Perhaps Google's most glaring and egregious shortcoming is their insistence on displaying the outcome of a search in an impossibly long, one-dimensional list of results. We all intuitively know that the World Wide Web is just that, a three dimensional (or "3-D") web of interconnected Web pages. Several search engines, known as clustering engines, routinely present their search results on a two-dimensional map that one can navigate through in search of the best answer. Search engines like KartOO and Quintura are excellent examples.

Recommendation Search Engines

Another promising category is the recommendation search engines. While Google essentially helps you to find what you already know (you just can't find it), recommendation engines show you a whole world of things that you didn't even know existed. Check out What to Rent, Music Map, or the stunning Live Plasma display. When you input a favorite movie, book, or artist, they recommend to you a world of titles or similar artists that you may never have heard of, but would most likely enjoy.

Metasearch Engines

Next we come to the metasearch engines. When you perform a search on Google, the results that you get are all from, well, Google! But metasearch engines have been around for years. They allow you to search not only Google, but a variety of other search engines too - in one fell swoop. There are many search engines that can do this, Dogpile, for instance, searches all of the "big four" mentioned above (Google, Yahoo!, MSN, and Ask) simultaneously. You could also try Zuula or PlanetSearch - which plows through 16 search engines at a time for you. A very interesting site to watch is GoshMe. Instead of searching an incredible number of Web pages, like conventional search engines, GoshMe searches for search engines (or databases) that each tap into an incredible number of Web pages. As I perceive it, GoshMe is a meta-metasearch engine (still in Beta)!

Other Alt Search Engines

And so it goes, feature after feature after feature. TheFind is a better shopping experience than Google's Froogle, IMHO. Like is a true visual search engine, unlike Google's Images, which just matches your keywords into images that have been tagged with those same keywords. Coming soon is Mobot (see the Demo at www.mobot.com). Google Mobile does let you perform a search on your mobile phone, but check out the Slifter Mobile Demo when you get a chance!

Finally, almost prophetically, Google is silent. Silent! At least Speeglebot talks to you, and Nayio listens! But of course, why should Google worry about these upstarts (all 100 of them)? Aren't they just like flies buzzing around an elephant? Can't Google just ignore them, as their share of the search market continues to creep upwards towards 100%, or perhaps just buy them? Perhaps.

The Last Question

Issac Asimov, the preeminent science fiction writer of our time, once said that his favorite story, by far, was The Last Question. The question, for those who have not read it, is "Can Entropy Be Reversed?" That is, can the ultimate running down of all things, the burning out of all stars (or their collapse) be stopped - or is it hopelessly inevitable?

The question for this age, I submit, is… "Can Google Be Defeated"? Or is Google's mission "to organize the world's information and make it universally accessible and useful" a fait accompli?

Perhaps the place to start is by reading (or re-reading) Asimov's "The Last Question." I won't give it away, but it does suggest The Answer….

Charles Knight is the Principal of Charles Knight SEO, a Search Engine Optimization company in Charlottesville, VA.

The Top 100

For an Excel spreadsheet of the entire Top 100 Alternative Search Engines, go to: http://charlesknightseo.com/list.aspx or email the author at Charles@CharlesKnightSEO.com.

This list is in alphabetical order. Feel free to share this list, but please retain Charles' name and email.

Update: Thanks Sanjeev Narang for providing a hyperlinked version of the list.

Update, 5 February 2007: Charles Knight has left a detailed comment (#94) in response to all the great feedback in the comments to this post. He also notes:

"...while it looks like a very simple, almost crude list of 100 names, it has taken countless hours to try and do it properly and fairly. The list will be updated all year long, and the Top 100 can only get better and better until the Best of 2007 are announced on 12/31/07."

Charles, keep up the good work! I plan to showcase a new user interface for our visual search (www.quintura.com) at the Future of Web Applications (FOWA) event in London on February 20 - 22. Stay tuned to our developments! PS I have a question, though. Why is Quintura for Kids not on the list and only a runner-up? :) The service has started being used in some elementary schools after only one month since a beta release.

Posted by: Yakov | January 29, 2007 6:01 AM

Interesting list. I guess this list is the top 100 AFTER Google, Ask, MSN and Yahoo.
I would have to list Vivisimo above some of the others you have listed here.
I too don't have the time to go through all of them but I'd like to know if any of your top 100 are vertically focussed. I've been following one vertical in particular, health, and I don't see any of the ones I found to be useful.

Slashdot News for nerds, stuff that matters

"The Thomson Gale publishing group has put together a comprehensive review of Google Scholar, and they find it highly lacking compared with similar offerings from Highwire Press, Scopus, and The Web of Science. Will Google's overhyped offerings drive these superior services out of the market?"

Harvest A Distributed Search System

Noted-L (1 of 3) [Noted] Google Adds Wildcards to Phrases

http://www.researchbuzz.com/news/2002/jan03jan0902.html#googleadds

--<cut>--
If you're a regular ResearchBuzz reader you already know how to
search for phrases in Google using "wildcard words" -- you just use
the word "the," which Google always considers a stopword. So, search
for "three the mice" in Google and you'll find three green mice,
three blue mice, three blind mice, etc.

Google has made using "the" unnecessary by adding a word-sized
asterisk to its search syntax. What is a word-sized asterisk? It's
an asterisk you can use in place of a word; "three * mice" will find
three green mice, three blind mice, etc. This asterisk CANNOT be
used for part of a word. If you try to search for "three bl* mice"
you'll get no results. Thanks to Gary Price for this tip.
--<cut>--

--
J C Lawrence
              

HOWTO search the WEB

Isearch 
Fortunato - May 20th 1998, 07:05 EST

Homepage: http://www.etymon.com/Isearch/ 

Isearch is software for indexing and searching text documents. It supports full text and field based search, relevance ranked results, Boolean queries, and heterogeneous databases. Isearch can parse many kinds of documents "out of the box," including HTML, mail folders, list digests, SGML-style tagged data, and USMARC. It can be extended to support other formats by creating descendant classes in C++ that define the document structure. It is pretty easy to customize in this way, provided that you know some C++ (and you will need to ftp the source code). A CGI interface is also included for web based searching.
ftp://ftp.redhat.com/pub/contrib/hurricane/i386/Isearch-1.41-1.i386.rpm (1 hit)

[July 8, 1999] Story Search Stinks! But You Don't Have to Take it Anymore

Anyone who's ever used a search engine knows about broken links. Lousy interfaces. 10,000 returns with no meaningful results. Missing pages.

Essentially, a search engine is a type of software that creates indexes of databases or Web sites based on content. When you submit a search term, it goes out and "reads" its indexes and returns applicable results. Excite, HotBot and Lycos are popular examples.

A recent report by Nielsen/NetRatings Inc. shows search engine popularity is slipping. Click for more. And a study conducted at the NEC Research Institute confirms search engines can't keep up with the Web's rapid growth. Click for more. NECRI discovered:

METASEARCHES
Metasearch sites are a timesaver because they query multiple engines simultaneously. Examples include:

DIRECTORIES
These hubs weed through the information glut with topic guides featuring recommended sites. They'll also recommend sites based on your search terms. Theoretically, Yahoo is a directory. But I prefer sites that put an emphasis on quality rather than quantity. Examples:

TOOLS
The right utility does wonders for streamlining searching. Here are three of my favorites, and you'll find more at the ZDNet Software Library. Click for more.

Alexa got my Natural Born Killer's nod. Click for more. This free browser add-on acts as helpful backseat driver when you surf. Offering stats on sites, including ratings by Alexa's thousands of users. Click for more.

BullsEye lets you search and store info. The searchers are highly customizable. For instance, it will search for industry-specific business news. Click for more.

Copernic has an easy learning curve for the novice (or frustrated searcher). It searches multiple sites (as narrowly or widely as you'd like), validates links and features good custom sorting and multi-threading options. Click for more.


Papers

**** Scientific American Feature Article Hypersearching the Web June 1999

Your Keyword Density

Use Keywords to Improve Your Search Engine Placement


Seven Sisters

www.snap.com -- Snap is a human-compiled directory of web sites, supplemented by search results from Inktomi.. Snap launched in late 1997 and is backed by Cnet and NBC. Competitor to Yahoo...

www.google.com -- probably the most promising

Northern Light Search Softpanorama -- many broken links. Looks like Alta Vista based

Looksmart -- old references. 

Ask Jeeves!

Alta Vista -- Many broken links. Troubles in Alta Vista lead to deterioration of quality

Yahoo -- below average quality of results; no broken links. Propriatory + Inktomi matches; not bad, but nothing special...


Google

The PageRank Citation Ranking Bringing Order to the Web - Page, Brin, Motwani, Winograd (ResearchIndex)

kuro5hin.org Comments Google and Recursion

The "Google Boxes" you mention won't allow you to increase your site's ranking, unless it's already up there in the top ten. Very different from Google bombing, which can bring an unknown site to the top.

As you pointed out and can be read about originally here, each site A which links to a site B increases B's ranking by a small amount, but doesn't affect A directly.

A's own ranking can only be affected positively if there are loops of the form A->B->some other sites->A, and the smaller the loop, the higher the effect on A. That's because all sites within the loop are affected, with decreasing benefits. The most affected site will be B, followed by B's successor, followed by B's successor's successor, etc, up to A, who gets a very small boost if the loop is large.

The total increase in A's ranking is a result of adding the small increases for all possible loops.

Now suppose you're an unknown site. Nobody links to you, but you decide to link to the top ten. Since these don't link back to you, or in a very very roundabout way, you'll get zero benefit from your Google Box.

Now suppose you're a top ten site, and half the top ten sites link back to you directly. If you link to each of them, you'll get a relatively large boo