History of the Internet
Search Engines - their history and how they operate
How to Use Search Engines
Future Developments - the direction search engines are going. Big Brother?
Berners-Lee developed the html code behind webpages as an easy platform to
exchange information between fellow researchers at the Cern physics laboratory
near Geneva, whereby the
technical backbone required for the internet was mostly developed in the US
since the 1950s. So what started as a simple means
for scientists to exchange information on a relatively small network of
computers has ballooned into what it is today. After all, that is what
"internet" basically means. "Intranet" refers to a network of computers within
an organisation, which are all usually hooked up through a central server
(computer), which is used to bridge the communication between them all. The
internet, which can be thought of as an "interaction" of many such "networks",
is like a spider web (and hence the expression "world wide web" or www) connecting all
these networks and individual computers. Individual computers can be hooked up
to the internet through a simple dial-up connection, and hence part of the
entire system when it is "online". One can connect to the internet through many
means, including a wireless connection, such as wifi, through a mobile phone, or
via satellite. We can consider these wireless connections as invisible wires and
part of the same web. When one hooks up to the internet through a dial-up
connection, for example, their computer is hooked up to a modem through one of
the ports at the back of their computer (traditionally the com, or
communication, port). The modem is then hooked up to a telephone line and acts
as a bridge between that computer and the internet. The modem dials a particular
telephone number, connects to another modem (itself hooked up to another
computer), and the two modems begin to communicate (called a handshake) so that
they can establish a connection. The handshake and this initial establishment of
communication is the high pitch whistling sound you hear when the volume of your
modem is set above zero and what a friend of mine was always excited to hear
when visiting my house, in the early days of my business, because he felt he was
in an important global hub and that things were really happening. Anyway, this
establishment of communication is basically the same as with any other
connection type to the internet, be it DSL, wifi (wireless fidelity), or the
secret satellite system used by the KGB.
Once the connection is established, the computer communicates to the internet through the portal it hooked up to. This portal could be a local internet "provider", which could be some company which itself has a very fast internet connection. This company could be hooked up, for example, via satellite or optical cable to achieve a very fast connection. If its connection is a thousand MB (mega bytes, which is a thousand kilobytes, which is a thousand bytes, where one byte is roughly equivalent to one letter of the alphabet - although crazy letters of languages other than English may take up to as much as three bytes to "describe") per second and the company has a thousand customers, it might allocate to each of them a connection speed limit of one MB per second. This limit can be set by programs or software which the company has installed on its computer. But what can often happen is that the company will try to increase its profits and, instead, have 10,000 customers, assuming that not all of them will be hooked up to the internet at the same time and not all using up their full capacity (in this case, one MB per second) at the same time. However and inevitably, there will be periods when all 10,000 of its customers will be using enough of this limited one thousand MB per second that they will not get their full one MB potential. The 1,000 MB/second is the maximum the company can use, and which is then divided among all those who are online, according to how they are using their connection. It is estimated that only one in five internet providers actually provide the bandwidth (in our case 1 MB/sec) they promise to their customers. And with the increasing popularity of such services as youtube, which is a place where people can upload and view videos, the battle for bandwidth is on the increase.
In the old days, most people might have been online typing emails. While typing an email online, you are not using any bandwidth at all, except when you press Send, which would then send your data into the internet. Since each character or letter of an alphabet is 1 byte, and your average email is about 10 kilobytes in size, or 1% of 1 MB, the traffic demands in this case are very small. If the user is "surfing" the internet, they will be downloading the text from the webpages they visit (usually about 30 to 40 kbytes, if we include the script or html code behind the text they see - meaning the programming commands which make the text pretty and explain where the pictures are located etc.) and any pictures on that page (background colours are not pictures but rather commands to your computer, where your computer does the rest of the work, meaning your computer does not have to download those colours). For more serious users, they might be downloading programs, games or music, and finally we have the increasing demands of video streaming (which is basically temporarily downloading the video to your computer so that you can view it). Actually, experts are saying that youtube and the video images now embedded in many webpages is starting to strain the limitations of the entire internet network, which has not anticipated such demands when it was first created.
This brings us to how the company, your service provider, is itself connected to the internet. Well, it is connected in much the same way as you are, to yet another computer, or hub. An entire country is often connected to the internet in the same way you or the company are. Each country is assigned a country domain, such as .cz for the Czech Republic. The country pays for a particular super fast connection, which it then divides among its population as it sees fit. Some countries, such as China and Iran, go as far as censoring the information that flows through its hub, so that it's population cannot gain full access to the world of information (although my people behind these "curtains" tell me they can use a "proxy connection" to get around these censorships). The country may decide to allocate its armed forces, universities and government institutions priority connection, meaning that little users like you and me will suffer more during fallouts or when the system is being overburdened by use. This reminds me of when I was on my trip to Bulgaria. Several times I was at an internet cafe, everyone was busy typing away on their email accounts, but I could not gain access to my Yahoo or Hotmail accounts. I asked the person at the counter, they phoned around a bit, and apologetically informed me that, "once again", the country's connection was severed to the outside world. Everyone else was using some Bulgarian email provider, and hence could happily type away and communicate with each other, but I was severed from the outside world, and hence could not access my own account.
Which brings us to
Everyone hooks up to the internet in some way, but what are they really hooking up to? Well, they are hooking up TO webpages according to the system the person invented and who is mentioned at the top of this page. Each page has its own specific address, much like a postal address.
When someone's computer is hooked up to the internet, they are usually assigned a "dynamic IP address". The IP address is like a postal address. But if someone wants to host a webpage to the internet, they need a "fixed IP address". They can get this from their internet provider, and usually have to pay a little extra for it. Their fixed IP address then becomes like a postal address on the internet. Often these fixed IP addresses can be assigned a domain, like an alias. So instead of punching in something like http://634.999.267.59, one could instead punch in cnn.com. Much easier to remember, isn't it? Furthermore, most web browsers will automatically add the http://, or http://www. before (if necessary). This alias or domain was invented to make it easier to get to the fixed IP address. Of course, if you have your own fixed IP address, and a domain or alias that comes with it, for someone to hook up to your address and hence your webpages, your computer must be turned on and connected to the internet. If not, the person trying to access your page will get a message "page does not exist" or something like that. Much like if you are not home when the postman comes for a visit, or your mailbox is welded shut.
So now we have all these computers hooked up to the internet: some used to look for webpages, and others hosting the webpages.
For those computers hosting webpages, they can create their own sub-address, such as cnn.com/information/web/how-to-surf-the-internet.html, where each "/" in this address is essentially a subfolder on their computer. When viewing these folders on their computer, their computer uses the character "\" instead (ie- \information\web\how-to-surf-the-internet.html). In this case, cnn has a folder "information" in its main webroot directory or folder, a subfolder "web" inside their information folder, and within the "web" subfolder, they have the file how-to-surf-the-internet.html. So one can perceive this as a flat on a particular floor of a building, where the building's postal address is http://www.cnn.com, and the individual page's address is like a flat in the building.
Now what if you wanted to find this address, but did not know the exact postal address of the building it was at, or even which city, or the individual flat where that page was located. Well, my friends, this is where search engines come in.
Although it may be hard for mere humans like us to
fathom, these search engines are actually computers which compile and store all
the webpages which exist on the web. Every single one of them - at least those
which have been "indexed" (spotted by the search engines and stored in their
memory banks). To get indexed, these search engines use robots or spiders, which
are basically little programs which surf the internet on their own and collect
data, sending this data back to the big mother base, which is to the search
engine super computer. For example, google now also stores the previous state of
each webpage, which it calls the cache. It does this so that it can examine any
changes made to it. It therefore keeps an older copy of the webpage, because of
search engine optimization (my services). If you do a search for the word
"translation", you might get hundreds of millions of results. After all, there
are billions upon billions of webpages out there. An unbelievable amount. But if
you received hundreds of millions of pages in your search, which one are you
supposed to choose from? Well, most people just look at the top ten. And
being successful at placing a
webpage into this top ten is quite a tricky matter. Which is why
companies employ search engine optimization (seo) experts to tweak and write up
their pages for the purpose of getting them to the top ten results. It is like
paying for advertising. If you do not pay to advertise your product on TV, no
one will know about you and you will not be able to sell anything. In this same
manner, if you do not get into the top 10 search results, most people will not
bother visiting the million and tenth result to visit your page. So because the seo experts endlessly twiddle with their pages to keep them in the top ten, the
search engines have begun to store their changes or older pages so that they can
examine them. It is an endless "war" between the search engines and the seo
experts. The seo experts are employing all the nasty tricks they can to get
their pages near the top, while the search engines are employing all their
resources to keep their results honest. If the seo expert cannot get their
employer's pages near the top, their employer will be angry because they are not
selling products. On the other hand, if the search engines do not produce honest
results, the users of their search engine will be angry, because they are
wasting their time looking for something and being led to pages which do not
interest them. There are four main search engines out there, each competing for
your attention: google, yahoo, msn.com, and most recently ask.com. There were
and are many other search engines out there too, such as excite.com, altavista
and many others, but much of them have been bought up and incorporated into the
larger ones, or they use the larger ones when devising their own search results.
And what is a search result? Well, let us go back to google and how it stores an old and new copy of each webpage it has "indexed". The search engine sends out its robots, like little army personnel, who go out and travel the internet, visiting each page, and sending back to its mother ship a reconnaissance photograph of what it has found. The mother ship (the search engine) stores these photographs in its archives. While on a reconnaissance flight if the army dude finds a weblink on the page it visits, the robot follows that link and sends a photograph of that new page back to the ship as well. But if there is something wrong with the link, or it takes too long to load, the robot goes back to the first page, while the mother ship logs information that that link was not reliable. After all, it has plenty of work to do surfing all those billions and billions of pages, doesn't it? So the search engines devise ways how to make this easier for them. If there is a link on a webpage which sometimes does not work (such as the computer on which the webpage is hosted is turned off, crashes, or loses its internet connection), this can be catastrophic for that page's web rankings, because the search engine will store information indicating that that link (IP address) is unreliable, dropping the page down drastically in its ranking (how high it ends up on the search results).
For a long time Yahoo was the dominant search engine. It had a very large "directory" of webpages, each screened by a human. To get your pages into Yahoo's directory was very important if you wanted good rankings (to end up high on search results, such as in the top ten links on the first page). Your page did not HAVE to be in the directory, but being in the directory helped very much with your rankings.
And how are these rankings or placement achieved? Well, this is the entire secret of how the search engines operate.
To repeat, the robots and army reconnaissance personnel have been surfing and running around the internet, collecting as much information as they can and sending this all back to the mother ship and search engine. The search engine, which is basically a super computer (or many computers, as is the case with google), stores all these pages in its memory banks. Now, when you the user type google.com into your web browser, you are punching in google's alias name, which diverts you to its particular fixed IP address. By typing in google, you are being taken to its special postal address (which, actually, can be any of several, since it is such a large service. It has several addresses in different countries, and the search engine can tell which country you are located in, because it can see how you have connected to the internet, and your service provider, and how your service provider has hooked up to the internet. Scary eh?). Once you are taken to google's address, you are basically logged into its computer, and subject to its rules. You are basically looking at its own webpage, which in this case is dynamic (changing). You have logged into its webpage, where there is a little box/field where you can type in your search. For example, if you type in the word "translation" and press Search (or Enter on your keyboard), the magic begins. The search engine takes this word "translation" and searches all the pages it has stored in its memory banks (retrieved by its robots), and produces for you a nice and neat list of links connecting you directly to these pages. Totally amazing, eh? Much like if you were to go to the Start menu on your own computer, pressed Search, and performed a search on your own computer for a particular file. It may take a while to generate the "results", but your computer isn't a supercomputer, is it? Therefore, as hard as it may be for you to fathom, your little search on google is basically a command to search in google's memory banks, of the billion upon billions of pages stored there, to generate a result of all those pages having the word "translation" in it.
But how is the order of results (rankings) decided? Well, these search engine companies have hired the best mathematicians in the world to help them with this. Recently, google offered more money to one of msn's most important mathematicians, and you can imagine that Microsoft was not happy! Not to mention that the mathematician might have revealed some important secrets to its new employer. Basically these search engines hire these super mathematicians because they want the search results they produce to be accurate. If their results were not accurate, you would not be happy, and not use their service anymore, would you? Google is the most used search engine, because its search results are more accurate and honest. At one point they came up with a better algorithm (mathematical equation) than Yahoo and used to generate their search results. People started catching on, and started using google more than Yahoo. The dynamics of how the various search engines calculate where to place/rank the individual webpages among the search results is quite complicated (you can read about my own success in search engine optimization), but to give you a quick insight, if there are 244 million pages out there containing at least once the word "translation", the webpage which mentions it only once, while talking about a completely different subject, might end up in last place - in position 244 million. However, anyone who designs a page with only the word "translation", repeated a million times, certainly will not end up in first place. This is because this tactic has been tried countless times. Especially in the beginning of the internet, when a porno company made all sorts of these pages, with the hopes of generating as much traffic to its pages as possible. Poor little grandmas were looking for a translation company, but were instead taken to horrible porno pages, because the search engines were deceived in this way. Putting the word "translation" a million times on a page is called "over-compliance". It has breached a certain threshold, where the search engine has said to itself: "Aha! This page smells fishy. Let's penalise it and purposefully put it in position 244 million, below the page which talks about something else but has the word 'translation' only once in it." And this is basically how the search engines work, battling it out against all the seo experts, constantly tweaking their algorithms with the intention of producing as accurate and honest results as possible, to satisfy you, their "customer", so that you can find what you are looking for, and to help the poor little grandma find her translation company. Whenever the search engines tweak and change their algorithm (mathematical formula used to produce and rank in a particular order your search engine results), all the seo experts study and guess about the changes, and hurry and scurry to change their own pages, to keep them near the top. It is an insane war, but the search engines are the ones hiring the mathematical geniuses, and generally successful in producing honest and accurate results.
But as an seo expert myself (trust me, the
translation industry, with its hundred of millions of pages, is a very
competitive one, and if I can get close to the top ten in many keywords in this
field, I consider myself an expert), one of my jobs is to view how you the user
has stumbled on my many pages. And I must admit that I have to chuckle many
times. One of the problems experienced by search engines is that the average
user does not even know how to search well. Which is why I created this page -
to help you with this. There are actually people out there who are hired to surf
the internet. To collect useful information for someone else. And like any
skill, the better you are at it and the more tricks you know, the faster you are
able to collect more useful information, and hence the better you are paid for
it. I understand how the search engines work, and how computers work in general,
so I always manage to find what I am looking for rather quickly. I also know how
people put together webpages, and the wording they might use, so I take all this
into consideration when performing my own searches.
But when I see the keywords people have used to find my own pages, I have to chuckle, and which is why I wrote this page (well, yes, I am a charitable and caring person, but I also realise that such useful and interesting pages as this one help all my other pages and bring me valued business). Many times people write a question, like: "What is time difference between Greece and New York in GMT?", which may take them to my explanation of time zones pages. But this is not a good way to perform a search. A computer is not an intelligent machine which can interpret what you are asking it and guide you to the best option. It is a simple machine which follows simple commands. Your brain is made up of electrical circuits, just like a computer. But each of your wires might be hooked up to a hundred thousand other wires, while each wire in a computer is hooked up to only two or three other wires. Where each of these other three wires may represent "yes", "no", or "neutral/stay the same". Compared to your hundred thousand wires, where one may represent "If my husband's right eye twitches in a certain way when I ask him that question, I'm going to slap him in the face". The human mind is much more dynamic and complex than a computer's brain, so asking a question to a computer in the same way you would confront your spouse regarding why they came home late last night really does not work. The mathematical geniuses hired by the search engines are trying to create the perfect mathematical formula while keeping in mind the silly way you may perform your searches, and the millions of seo experts out there who employ nasty tricks to get their pages to the top. The search engines will improve over time, and perhaps, one day, a search engine like ask.com will create an algorithm which can successfully interpret silly searches which are posed like questions. But even if so, a sound and logical search will always produce for you, the user, better results.
Let me try to explain how.
I once discovered this on Yahoo's Advanced Search options. You can look for the same, or on other search engines, reading their help files, but options explained on Yahoo's pages were basically the same that applies for any search on a computer. Even though I rarely employ these advanced options, it is still good to know.
Here are some basic and important principles:
Enclosing words in quotation marks will yield
For example, using a search
fox jumped over the fence
will yield different results than
"fox jumped over the fence"
because the first search will yield pages which have the word "fox", and/or "jumped", and/or "over", and/or "fence" (in this order of importance), while the second search will yield only those pages which contain the exact phrase "fox jumped over the fence". Search engines usually ignore common words like "the" and "and", so it doesn't really make sense to even use them, unless you use them in a phrase search, as per our second example above using the quotation marks. But it won't hurt to use these common words - the search engines will simply ignore them.
Doing a phrase search like this can be much more effective, if used properly. Perhaps you are certain of an exact quotation and hope to find the book which quotes it online.
The order of words is important
For our first example above, the one without the quotation marks, search engines would usually place greater emphasis on the first word "fox", subsequently followed by the other words. So if the word fox is more important in your search, make sure to place it first. But if "fence" plays a more important role, and you want to find pages which focus on fences, it would be better to place the word fence first. Do not think like a human brain, which in English is accustomed to the word fox coming before the word fence (although, in other languages, this order can certainly be reversed!), but think like a computer, and how the search engine algorithms work. If a fence page is what you are after, definitely put that word fence first. Think in terms of simple logic.
The choice of words is important
Now lets say you have a problem with foxes jumping over your fence and eating your rabbits. Is the word 'fox' really important in your searches? Isn't the strength and height of your fence more important? By thinking in human terms and including the word fox in your search, you are unnecessarily diverting the focus of your search. Think about the points you need, and how the average person might make a webpage explaining their service, and bridge the gap. Don't just ask: "Can I get a good fence to stop those blasted foxes from jumping over my fence and eating my beloved rabbit?" This will totally pollute and dilute your search with words that do not relate properly with what you are trying to accomplish. Perhaps they express your feelings, but they will not achieve what you are after.
We have decided that the word fence is important, so let's put that as the first word in our keyword search.
Are you specifically interested in a metal fence? So the foxes cannot chew their way through it? I suppose you want to look at and touch it first before purchasing it, or you do not want to pay for high transportation costs, in which case you would prefer a locally based company. If you live in London England, you might try a search like
fence metal London
or for a search for UK based companies, you can try
fence metal .uk
The space in between each word/section separates them. Putting too many unnecessary words could dilute the search in wanted directions.
In our case above, .uk is the domain used by the United Kingdom, and hence will be included in the web address of any page hosted on a British based server. But keep in mind that UK companies might host their web pages on a US based server, meaning that their domains will end in .com or .net etc.
Also, although the word fence is most important, the algorithms also consider phrases and collections of words as a package, so you might try using 'metal fence' instead, or even better, in full quotation marks ("metal fence").
But then you could ask yourself, who would make a page about metal fences? Maybe a search for a hardware store would be better. Obviously, in this case, it would be better to ask around locally, or pick up the Yellow Pages. I am just trying to offer you an example. Actually, picking up the Yellow Pages is much like working with a search engine. The point is, you need something, and you should have a clear strategy how to best attain it. This is the whole secret about effective searches: bridging the gap to the person who makes the webpage and using the search engine to your advantage. Not just asking questions and looking at everything from your own perspective.
I don't use these tricks very often, and I'm not sure on which search engines they work, but it could be good to know about them anyway.
By putting a plus sign before the word, you are stating that you definitely want that word to be on the pages in the search results.
fence metal +London
means that you DEFINITELY want the word London to be on the page. Without the plus sign, London does NOT necessarily need to be on the page. When the search engines produce your search results for you, they take your keywords, in the order that you provided them, and sort/rank the pages for you according to their complex algorithm.
fence metal -Manchester
Alternatively, you can put a minus symbol before a word if you do NOT want it on the pages generated by the search results. Perhaps you've had a bad experience buying something in Manchester and want to make sure you do not get any pages with Manchester on it.
Of course, this assumes that a company based in Manchester would have put their postal address or city on their webpage, which does not have to be the case. You should therefore always think about the text which different people might use when making their webpages.
You can also combine the various elements above, such as
fence metal "alkaline resistant" -Manchester
The phrase "alkaline resistant" would be treated as a single word. You can probably precede it by a plus sign if you for certain wanted this quality in your fence (or at least on the webpages in the search result). By using the quotation marks in this manner, you would refine your search to leave out those pages which would include only the words 'alkaline' or 'resistant'.
Refining your search
By following the above tricks and thinking of a good strategy, you should be able to find what you are looking for within the top ten or twenty results. If you feel that your results are a bit scattered and not focused, not really generating the results you are after, after quickly checking out the top 10 to 20 links (not necessarily following them), you can think about and tweak your search. Always think about the text or words which a company or person might use to describe their service on a webpage.
Another option is to "Search within your results", which you can do, for example, on google. You will find a link for this option on all the results pages and which will take you to a new page, where you can input more keywords but which will only search within your previously attained results, refining them further to a smaller list. You can keep doing this in several steps, if you choose.
By thinking about your searches and experimenting and tweaking in this way, you should quickly learn how to best choose your keywords in future searches and create more accurate and useful results.
Ever noticed how many search engines offer a free
toolbar to help you with your searches? The toolbar embedded directly in your
browser? Do you think they are being altruistic and kind by offering you such a
nifty tool? Well, if you read the fine print when installing the program, you
will discover that by installing such a program/toolbar on your computer, you
will be sending information to that search engine of your search and surfing
habits. Many people might feel threatened by this as an invasion of their
privacy, while others may appreciate that providing such information will help
produce for them better results. Basically, by compiling information about your
search and surfing habits, the search engine is creating a profile of you, as a
consumer of goods. The search engine can learn how you performed searches in the
past, and use this information to produce future results which better suit your
tastes. As mentioned above, the search engines often know where you are hooking
up to the internet. As I often
travel around Europe, when I cross into a new
country and call up a google search, I find I am taken to a different subsidiary
of google, such as away from google.com to google.yu, which is the domain for
Yugoslavia. This can be particularly annoying when the default language also
changes, and I often have to change the preferences back to English.
Another front in this consumer database compilation is taking place on emails. Google now offers google mail, with a robust 2GB of space, and if you read the fine print when setting up an account with them, you will notice that, in exchange for this generous amount of space, google will be monitoring all your outgoing and incoming emails. This has scary implications of Big Brother, but the strategy behind these measures is to help compile consumer information about you. You might be writing to a friend of yours that you are interested in buying a satellite dish. And when surfing the internet or performing a search, you might notice a sponsored ad on the side of the page which reads: "Hey bob, check out the great prices on satellite dishes just around the corner from where you live!" This is what advertisers have been dreaming about from the beginning days of marketing. Instead of having to pay for millions of fliers or envelopes delivered to the same number of postal mailboxes, in hopes of some of them landing in the hands of an interested consumer, now they can directly target their potential customers, and even address them by their names! Your name is stored on your computer, or you might fill it in when installing your search toolbar, the engines know where you have hooked up to the internet and therefore where you are located (unless you use a program like anonymizer), they might draw information from your emails (MSN/Microsoft bought Hotmail for a reason, and Yahoo has its own email service), and learn from your previous searches to know exactly what you are looking for, providing you with perfectly targetted advertising. And which is why all these search engines have recently invested heavily into buying advertising companies.
Does this invoke fears of Big Brother, or is it just better and targetted searches, with lots of other and useful information offered on the search results page? With video images embedded in an increasing number of webpages and ever faster internet connections allowing regular video streaming, the computer monitor is increasingly replacing the television. In Japan, they have long sold widescreen TVs which can also be used to surf the internet. A movie playing on one half, the right hand side of the screen used to order a pizza online, while showing advertising directly targetted to your needs. A potential invasion of your privacy. On the other hand, when you do perform your searches, wouldn't it be nice just to ask a silly question and get exactly what you are looking for?
More computer tips
Translation Agency and Outsourcing
(programming work, graphics design, full list of outsourcing)
Copyright © KENAX, by Karel Kosman - All Rights Reserved Worldwide.