History of Search Engines
The History of Search Engine
The basic need of a person looking for something on the web is to find the information easily in an orderly and relevant manner. The people working in the Internet field knew that the Internet would be utterly useless to anyone if the information retrieval were not efficient. For this purpose, there was a need to have a directory of the pages available on the Internet, which could be used for archiving data available and make it accessible to people. One can immediately relate this to the yellow pages, which was a boon to people looking to find something specific the city had to offer. Only difference here is that the Internet has a lot more to offer.
Before the search engines made their mark, the only way users came to know about the availability of web pages and websites was by word of mouth, or if they were actually sent an e-mail by the creator or a well-wisher. Thus the need to have a system of searching the files available on the web became an urgent necessity, and that is when the initial attempts towards development of the search engine started.
The Internet by itself was invented in 1969 when the first ARPANET connections were made to communicate between computers by Advanced Research Projects Agency (ARPA) of the USA as a means to keep control over the strategic assets of the country. Almost 20 years later, the Internet resources were used for creation of the World Wide Web (www) for sharing information globally.
There could be no better explanation of the difference between the Internet and the web than that given by Tim Burners- Lee, creator of the World Wide Web. According to him-
"The Internet [Net] is a network of networks. Basically it is made from computers and cables .... The (World Wide) Web is an abstract imaginary space of information. On the Net, you find computers - on the Web, you find documents, sounds, videos, ... information. On the Net, the connections are cables between computers; on the Web, connections are hypertext links. The Web exists because of programs, which communicate between computers on the Net. The Web could not be without the Net. The Web made the Net useful because people are really interested in information and don't really want to have to know about computers and cables. "
It might sound strange that the first search engine was named Archie. However, this name had no relevance to the famous comic books, but was short for Archives. That is what the search engine was trying to do, collate information and archive it logically and systematically for easy retrieval. Alan Emtage, a student at McGill University in Montreal, created Archie sometime in 1990. Archie, the program, downloaded the directory listings of all the files located on public anonymous FTP (File Transfer Protocol, a protocol used for uploading and downloading contents between computers) sites, thereby creating a searchable database of filenames.
In 1991, Mark McCahill, of the University of Minnesota, created Gopher. The name had no significance besides being the school's mascot. People were hardly imaginative in those days. No wonder techies are called geeks. Anyway, in contrast to Archie, which indexed computer files, Gopher indexed plain text files.
In the same year, two programs were created which allowed search and retrieval of information from the Gopher index. They were called Veronica and Jughead. By now I think you are absolutely convinced that the folks working on these engines were hardcore fans of the Archies (of the Riverdale fame). And maybe they were, because JUGHEAD turns out to be an abbreviation for Jonzy's Universal Gopher Hierarchy Excavation And Display and VERONICA for Very Easy Rodent-Oriented Net-wide Index to Computerised Archives. While Jughead, the program, was a tool to obtain information from the Gopher, the program Veronica on the other hand, provided a keyword-based search of the Gopher index.
Later in 1993, Mathew Gray of MIT designed software to measure the growth of the World Wide Web. This program was designed to count the number of active servers on the Internet. He called the program, no not Betty or Reggie, as he might have been tempted to, but World Wide Web Wanderer. Subsequently, the wanderer was upgraded to retrieve and store URLs (Uniform Resource Locator, the address of the page or site) thereby becoming the first database of the websites. The database was called Wandex. The World Wide Web Wanderer was the first program that pro actively traversed the web and collected information about the websites and hence named wanderer. Subsequently, such programs came to be known by different nomenclature like a web crawler, robot, bot or a spider. The web wanderer had a major problem. Since it would access the same pages multiple times, it caused the website servers to be loaded.
In the same year Martijn Koster created ALIWEB (Archie Like Indexing of the Web), which allowed users to submit their websites for indexing. He also developed the standards for robots so that the website owners could now block the robots from indexing the site or even individual pages of the website.
By now it was clear' that the search business could be profitable, although no one knew exactly how investments started pouring in giving a huge fillip to the developmental efforts. Many individuals as well as companies threw their hats in the ring. The end of 1993 saw the development of three full-fledged search engines. First came the Jump Station that gathered information about the title and header of Web pages and retrieved these, using a simple linear search. But, unfortunately as the web grew bigger, Jump Station ceased to function. Second was the World Wide Web Worm that indexed titles and URLs. The above two search engines had a basic flaw, which was that they would present the results on a first come first served basis. The third search engine was the Repository-Based Software Engineering (RBSE) spider that implemented a ranking system.
The same year in 1993, six students of Stanford University namely: Mark Van Haren, Ryan McIntyre, Ben Lutch, Joe Kraus, Graham Spencer and Martin Reinfried introduced Excite that used statistical analysis of the word relationships to make the search more efficient.
This marked the time that directories came into the picture. The difference between directories and search engines was that, while the search engines were automated, the directories were mostly indices of websites compiled and edited manually by humans. The first directory to be developed was the VLib or the virtual library, created by none other than Tim Burners-Lee (the creator of www). The EINet Galaxy followed this in 1994. The EINet Galaxy or simply Galaxy is a directory with a search feature.
In the same year, Jerry Yang and David Filo created the Yahoo directory. The Yahoo directory began simply as a collection of the developers' favourite websites. But, as the collection grew bigger, it necessitated reorganization and the inclusion of a search feature. Yahoo then acquired some of the better search companies and used their product to power the Yahoo Search till 2004 when the company started powering its own search. Yahoo grew phenomenally and as a result soon started charging enterprises for inclusion in the directory.
This led to the creation of DMOZ (The Open Directory Project) by Rich Skrenta and his friends. Today, DMOZ is the largest directory on the web that can be used by anyone. However, the trend of having large general directories has given way to smaller directories specialising in verticals.
Coming back to the Search Engines, in 1994, WebCrawler was developed by Brian Pinkerton of the University of Washington. The WebCrawler was unique and the first search engine to index entire pages. It became so popular that it was impossible to use it during the day due to the heavy traffic flow. In a way WebCrawler showed the way for robot based full text search engines to follow.
Then came Lycos, developed by Dr. Michael Mauldin of Carnegie Mellon University. Lycos is the short form for Lycosidea, which means the wolf spiders who hunt for their prey. The unique feature of Lycos was that in addition to the relevance ranking of search results during retrieval, it also provided prefix matching and word proximity features. Lycos grew to be the largest repository of indexed documents by 1996.
Infoseek was developed in 1995 and the addition they made to the existing features of search engines was the facility to allow webmasters to submit their pages to the search engine in real-time.
Around the same time, came Alta Vista (meaning "High View") that brought along with it many more features. For starters, it allowed natural language (the way we speak or write) inquiries and advanced searching techniques. It was fast and also had a huge bandwidth. In addition to allowing users to add or delete their URLs within a short time, it also allowed users to check inbound links. Alta Vista also started the trend of educating users with the idea of making their search experience easier by giving tips about usage. The many innovations introduce by Alta Vista soon helped them take the top spot in popularity away from Lycos.
In 1996, two Berkley students launched HotBot with "concept induction" technology and "new links" strategy for marketing. They claimed to be indexing the web much faster and more regularly than any other website. They however failed to sustain a profitable business model and were eventually bought out by Yahoo.
Ask Jeeves was launched in 1997 as a natural language search engine using human editors to match search queries. Their search engine tried to rank websites based on popularity, but it was soon evident that the engine was prone to spamming. In 2000, Ask Jeeves bought and migrated to Teoma, a new search engine using link popularity and indexing websites as per the subject. Teoma (meaning expert) used the concept of analysing links with reference to the subject. So a website on a particular subject linking to another with an entirely different subject would render the link utterly useless. Ask Jeeves is now known as Ask.com.
In 1996, Sergey Brin and Larry Page, two Stanford University students launched Backrub, as a part of their research. Backrub was a search engine that analysed the quality of website links and ranked it accordingly. The following year Backrub was renamed Google, a term derived from the mathematical term googol, which means the numeral "one" followed by a hundred zeroes. Google with its simplistic front end and highly accurate results soon ran over the other search engines in the market. Google, at present, is working on a new project code named Caffeine, which promises even more accurate and faster results.
In 1998, MSN launched its own search engine powered by Inktomi. The company upgraded the search engine, and in 2006 renamed the search engine to Windows Live Search. In 2008 the brand was renamed as Bing, which refers to the sound made at the moment of discovery and decision-making. Does Bingo ring a bell?
As of now the major search engines, that is Google, Bing and Yahoo, apart from the general search, are competing for search verticals like news, videos, answering services, books, research, etc. There are those which have regional popularity like AOL, Yandex, Baidu, etc.
There were many others search engines in addition to those mentioned above that could never become big or stable enough to sustain their identity. All the companies and people, including those who faded away after a bright start, made significant contributions to the development of search engines, as we see them today. The journey was fraught with takeovers, acquisitions, mergers and collaboration. For those who are interested in a chronological history of the Search Engines, you can visit http://www. seoconsultants.com/searchengines/history/#ResourcesReferences.