What is a search engine?

In the context of computer and the internet, a search engine is a program that helps locate information. The user provides queries the search engine which then processes it and displays the result.

As far as the Internet is concerned, a web search engine helps you find information on the web based on your query. The results of your query to the search engine can be web pages, text documents (PDF or Word files), videos, images etc or a combination of all. Google, Yahoo!, MSN, AOL are some of the famous web search engines - for more, refer the major search engines and directories list.

Without search engines, it would be impossible to find something on the web. I know it's a doggone cliché but it would be like searching for a needle in a haystack... a haystack that's miles and miles across.

So what is a web search engine?
Is a web search engine just one program? NO! There are five main ingredients to it:
          1.an interface through which you enter your search query
          2.a database of the indexed information (web pages and their contents)
          3.a "bot" or an automated program that continuously scourges the web for new and changed information
          4.a program that indexes the information (contents of online web pages)
          5.the actual search engine program

We shall now look at each to understand web search engines better.

Web search engine interface
Most people think of the interface as the actual search engine. It is not. For instance, what you see on www.google.com is the interface - a text field in which you type in your search query, a couple of buttons and a few links. The real Google search engine is one that works behind the scenes - it wakes up when a query it entered in the txt field and the search button in clicked.

A typical web search engine interface has a text field, in which the surfer needs to enter their query, and a submit button that passes the query to the actual search engine program. This interface is either presented on a web page or may be a part of another program such as the web browser or the add-on browser toolbar.

By far, the simplest web page search engine interface is that of Google and I guess it's one of the reasons that made it so famous. Google has been very particular and "careful" of its interface. Ever since it was launched, the Google homepage was simplicity personified - a logo, text field and search buttons. Distracting elements were altogether absent and because of which the interface loads very fast even on slow internet connections.

The database of indexed web pages

When you run a query on a web search engine, the program doesn't run off to each web site on the internet hunting for the required information.. This would be impossible to do and would take an immense amount of time - by the time you would have aged a few years. So what happens and how are the search results displayed so quickly?

Each web search engine keeps a repository of web pages. This collection is stored in a database. Furthermore, the web pages in this database are indexed (or organized, if you don't like the fancy word) based on the information (text, images...) they contain. This indexing is very important and is responsible for rapidly searching for the required web pages based on your query.

The automated search engine bots
The main job of a search engine bot is to go around the web, hunt for new information and add or update the database. The bot follows links from web pages quite like you do when you click on one. However, it moves from one web site to another like a spider without human involvement. On finding a new web page, it sends the information which is then stored in the database. The same goes when it finds a web page that has been changed or deleted.

As mentioned above, because of the dynamic nature of the internet (with information being changed and added each second), it is virtually impossible for a bot to have all the current information of the web in the database. There simply cannot be a current snapshot of the web. This, as you would have gathered, means the results you get for your query would not include web pages that have been added a few seconds back (sometime even hours and days).

The search engine indexing program
At the heart of it all is the search engine indexing program. This program is in-charge of organizing and segregating information which the bot gathers and stores in the database. It's also responsible for getting you relevant results based on your query.

The indexing of online information (web pages and their contents) involves complicated algorithms and processes. And these are closely guarded secrets because the success of the search engine depends on them. Google, Yahoo!, MSN, AOL all have different algorithms for indexing web pages which is apparent from the different search results they display for the same query.

The actual search engine program
The web search engine program takes the query provided by the user, runs it through the indexed database and provides the results. Note: the relevancy of the results depends on how the information has been indexed - the actual search program simply goes through the index and presents the results.

Most web search engines follow the same basic format in presenting the results - web pages are listed one after another, 10 or 20 at a time. More results (if found) are displayed on additional pages.

Do you now understand the complexities in a web search engine? The simple interface which you see on the web page of Google.com is just the tip of the iceberg. There is so much more the web search companies do to make it easy for all of us. I hope you've found this article entertaining and useful; if so drop me a comment.

0 comments: