by Aaron Bowen, MLIS Day
We have more than 60 languages; we have support for pretty much everything you’ll find on the Web. One of the most interesting statistics about the Web is that the growth of the Web outside the United States is much faster than it is in the United States, and the growth rate of non-English languages is much faster than the growth rate of English. Of course, English has a head start. So the right thing is that eventually languages like Chinese should be the dominant languages because there are so many more people who speak these languages in the world. Today, we’re working on that.
- Eric Schmidt, Google CEO, March 27, 2002,
International Herald Tribune – link to article
As many websites have done, Google has developed a host of international versions of its search technology, and continues to develop its index and taxonomies to incorporate many of the world’s languages. This begs several questions, including whether or not there are differences in the results found by the international versions of Google as opposed to google.com, and if those differences are the result of the language in which the search string is typed, the language in which a particular Google portal is coded, or both. The answer seems to be both, although how these factors interact with each other and if one of these factors carries more weight in determining the order of the results are unanswerable questions in the absence of any knowledge about Google’s search algorithm.
A while back one of my friends asked me to find some information on the University of Marie Curie for her. My search strategy was to go to Google.fr and search for “l’Université de Marie Curie.” Running the search again for the purposes of this article got me the French version of the University of Pierre and Marie Curie (UPMC) site as the hit. After another click I had the English version of the website (my friend doesn’t speak French). There is a difference between the two versions. The English version has a lot of information about the university, whereas the French version has links to current university news and events. The sidebar and toolbars are the same.
Searching Google.com for “ University of Marie Curie” gives me the English version of the website on the first hit, and the French version as the fifth hit. Beyond these hits there is one other site in French relating to the University, a link to the Maria Curie-Skłodowska University in Poland as the third hit, and information about Marie Curie herself as well as certain scientific awards granted in her name, all in English. Searching Google.com for “l’Universite de Marie Curie” brings up the French UPMC site again.
I repeated this same experiment with the Georgetown webpage as my desired target. Besides establishing that Georgetown has not translated their homepage into French, which is hardly surprising, I observed the same principle at work. Searching both portals in English gave me the Georgetown home page as the first hit. Searching both portals in French gave me several pages buried in the Georgetown website, some in French and others in English. It also gave me a variety of pages in both French and English relating to Georgetown, but not the main homepage. Furthermore, the results varied depending on which portal I used. Thus searching for the webpage of either university using that university’s natural language offers links to each school’s homepage, but changing the language, the portal, or both, changes the results.
The French version of the Google toolbar (which I have) displays results in English or French depending on which language I use to search. Furthermore, I asked a Russian lady I know how she searched for information in Russian. She said she just typed Russian characters into google.com and got results in Russian. There currently is not a Russian Google platform – the site www.google.ru is not affiliated with the Google corporation. All of this suggests that the language of a search is the predominant factor in determining search results, but my tests above show that this is not the case. Furthermore, some of my tests in one language got results in the other language. Searching Google.fr for “ University of Marie Curie” gave me the links to the French sites, the obvious conclusion as to why being that I used the French portal.
There is also the noise issue of languages I didn’t expect, as happened with the Polish site I found. The function allowing a user to limit his/her results to one specified language is one answer to this, although this function wasn’t always completely reliable at avoiding hits in a language other than that specified.
Searching Google.com in English for UPMC, without restricting the language of the results, gave me a link to the English version of the UPMC website as the first hit, followed by a range of other sites offering general information on Marie Curie in English and two UPMC links in French. Restricting the results to English produced the same list. Furthermore, running the English search on Google.fr, first with no language restrictions and then limiting the results to only pages in French, consistently gave me the French UPMC homepage as the first hit. If the language is the operative factor, searching google.fr in English shouldn’t give results in French. This suggests that in addition to the language of the search string the portal also plays a role in weighting the results.
And realistically it makes sense that the portal one uses to search would affect the results. If it did not have this effect, there would be no reason to have multiple international portals. Thus the logical conclusion is that both variables affect the results, notwithstanding the questions of how specifically they do so and what weight each one carries in determining the results.