Вы находитесь на странице: 1из 11

Is Google sufficient for effective

information retrieval?
The World Wide Web has made effective information retrieval a ubiquitous concern for the majority
of people. For the most part, we want quality results, delivered as quickly as possible, over a multitude
of devices. However, the long history of information retrieval pre-dates the internet and the day-to-
day use of search engines we know today. The integration of search engines into desktop and mobile
operating systems has been pervasive for the last 15 years, however information retrieval through
computer-like based searching systems are found as early as 1940s, information retrieval systems
can be found in commercial intelligence applications as long ago as the 1960s.

Early examples of computer based searching systems were inspired by the scientific innovation of
fin-de-sicle and early 20
th
Century society. With increasing processor speeds and storage capacity,
the progression away from library-based methods of acquiring, searching and indexing had begun.
Today, the Google search engine is the predominant way people retrieve information. So much so,
'to Google is commonly used as a de facto verb, meaning to search and retrieve a piece of information
on any commonly used, text based search engine.

Google's popularity and ubiquity is the result of the described progression towards automated, non-
library based information retrieval. However, one can argue that Googles popularity is representative
of the majoritys need for simplicity of interface, quantity and speed of results, over qualitative results
which could be gained through library search engines which search the deep web.

Google has become the one stop destination for Information retrieval, the preferred resource and
not just for the causal browser. Increasingly, Google Search and Google Scholar is being used as an
alternative to library based search engines by academics, which, it has been argued, is concerning and
even dangerous (Lawrence & Miller, 2000).

Bell (2004) reaffirms the concern that there is a generation of academics, who choose simplicity of
interface over quality of results. Academics, who place information retrieval through Google as the
central resource used to aid research, the symbol of competition to the academic library (Bell,
2004). Bell argues that the academic now suffers from infobesity, referring to an unhealthy
disregard for critical thought, or evaluation, when it comes to the results of our information retrieval
queries. The fear being not that Google is being used as an information retrieval resource, but that it
is being used as the only resource.

In this essay, I intend to explore whether Google is a sufficient method of information retrieval. To
set the parameters for this essay, I intend to approach Google from the perspective of an academic
researcher. I will first explore statistical evidence that Google is being used and indeed, misused, in
the manner described. I will assess the effectiveness of the results Google can provide for the
academic and contrast these to the results alternative web based methods of information retrieval can
provide. I then intend to make use of the in-depth research that has been conducted in the area, modern
academic arguments surrounding the academic use of Google and my own experiences as an
academic using Google as means of retrieval of information. Finally, using my findings I will
conclude as to the sufficiency of Google as a means of information retrieval.

As mentioned previously, there is a growing concern amongst thinkers in the information retrieval
community, that Google is replacing the library as a means of academic information retrieval.
Gardener & Inger's (2013) research reveals that library resources are certainly not the first choice for
a student, looking to retrieve information and quite often it is not even the second. Gardener & Inger's
research backs up the notion that academics are increasingly using easier, more open access sources
than harder to find and access specialised search engines or library resources. Traditionally the
academic would have set the library as central to their study, this however was in a time where the
accessibility and abundance of information was relatively scarce compared to today.

Gardener & Inger (2013) produced statistical evidence that academic students and researchers use
Google more than professional information managers. Their study found that Google Scholar was
used slightly more than Google basic search for student academics, while academic researchers
favoured Google Search. Ravens (2012) research into the same area found that professors deemed
Google to be an appropriate academic research tool for less than 20% of research material, however
first year students used Google to retrieve 50%-100% of their information. This research outlines the
importance of asking the question Is Google enough for the academic? when it is being so heavily
relied on.
As metadata distribution is maximised and users are able to choose more freely their referred routes
to content, many of the advanced features that users require seem to be migrating to their chosen
discovery platforms leaving the publisher site ever more as a content silo [...]
(Gardener & Inger, 2013)

When addressing this question we will now look at Google, the corporation and how the commercial
concerns of Google impact on the effectiveness of Google Search as a method of information
retrieval. Google is a commercial company making money for Google is the primary concern
Google is a business. Knowledge of the history of the business will help us set a context for critically
assessing Google Search, Google Scholar and Google Books as academic resources.

Russell & Cohn (2012: 11-20) outline the history and evolution of Google as the world's preferred
search engine in their study Is Google Making Us Stupid?. Google Inc. is a multinational corporation,
based in the United States of America, whose interests go beyond that of information retrieval. Google
specialises in a number of internet-related services, including cloud computing and storage, online
advertising, email services even mobile phones. Most of Google's profits however are made from
advertising, derived from the corporation's Adwords system.

While Google Inc. is a massive commercial venture today, Google search was initially a research
project conducted by Larry Page and Sergey Brin while they were Ph.D students at Stanford
University. Google is rooted in academia, developing into Google, the privately owned company in
September 1998. However, today Google is a share commodity, with Page and Brin owning roughly
a combined 16% of the shares. Google is the most used method of information retrieval on the web,
Google Search handles more than three billion searches each day (Russell & Cohn,2012: 11-20).

Lewandowski (2008:2-4) outlines that Google Search utilises simple text based queries, breaking up
the users' text into a series of search terms. Google Search does have an advanced search option,
which can be used to qualify searches by assigning a query set criteria. These advanced queries
transform into simple queries when entered. Google Search makes use of Boolean operators, seeking
text in publicly accessible documents, made accessible via the World Wide Web.

Google Search is a series of localised websites, the largest of those being Google.com. While
academics may also make use of Google Books and Google Scholar, Google.com is the most visited
website in the world (Russell & Cohen, 2012:15). Additional features Google search provides are
definition links for searches including dictionary words; the number of results you have gained from
your search; proposed alternative searches; and purposed spelling correction for a word the search
engine determines you have spelt wrong. These are all displayed on the Search Engine Results Page
(SERP), additional features are added with great frequency.

Google applies query expansion to each submitted query, transforming a query into what will actually
be used to retrieve information. Google is secretive about this process (Russell & Cohn, 2012:14),
however Garson (2003) highlights changes we can be certain the search engine is making to our
query. Namely, Google Search will reorder our terms, invisible to the user, in order to reduce the
work in achieving relevant results. The ordering of the results then is based in part on the query.
Secondly, Google Search utilises stemming in order to increase search quality by keeping small
syntactic variants of search terms, and thirdly there is also spelling correction, which is limited and
down, initially invisible to the user.

The results Google Search provides are based partially on a priority rank, called PageRank. This is
an algorithm, which orders results by counting the number and quality of links. This is achieved by
automated Googlebots. These bots count back links to a page to determine an estimated importance
for the page in question. The more important a website is, the more often and the higher up in results
it will appear on the SERP (Hchsttter & Koch, 2008).

Google's method of information retrieval and page ranking was crucial to the search engine's rise to
predominance. Previous methods of keyword-based ranking of search results would rank pages by
how often the term searched for appeared on the page or how strongly associated search terms were
within a page. Google uses PageRank algorithms to analyse human generated links, assuming that
web pages linked away from important pages are likely to be important themselves. The PageRank
algorithm computes and assigns a recursive score for pages, based on the sum of the PageRank of the
pages linking to them. Google is secretive about the algorithms and criteria PageRank will rank
results, with a rumoured 250 factors (Russell & Cohn, 2012).

Findings include that search engines use quite different approaches to results pages composition
and therefore, the user gets to see quite different results sets depending on the search engine and
search query used. Organic results still play the major role in the results pages, but different
shortcuts are of some importance, too. Regarding the frequency of certain host within the results
sets, we find that all search engines show Wikipedia results quite often, while other hosts shown
depend on the search engine used. Both Google and Yahoo prefer results from their own offerings
(such as YouTube or Yahoo Answers). (Hchsttter & Koch, 2008)
While PageRank is a very sophisticated means of ordering a SERP, Google is often criticised for low
quality, high quantity of results. Since the rise of the World Wide Web as a means on retrieving
information the sheer mass of available resource, on almost every topic imaginable, makes for results
which are from extremely unreliable sources (Hchsttter & Koch, 2008). While Google can produce
results that will aid academics a great deal, the sheer number of results and lack of quality control of
results means that extremely bad results can cancel out the good (Russell & Cohn, 2012).

The cost of using the largest database in the world, with a search engine that is available over a
multitude of platforms is that at times one is presented with amorphous low-quality knowledge. As
mentioned, Google is a commercially ran business and ranks pages with the view to make money,
this too could affect the validity of results for academic information retrieval (Hchsttter & Koch,
2008).

A further hindrance to the academic is Google's failure to search non-indexable knowledge, of which
there is a considerable amount. Information that is accessible by unique queries rather than links is in
many ways invisible to Google, this is known as the 'Deep' or 'Hidden' web.
Bergman (2001:8) cites library catalogues, official legislative documents of governments and phone
books amongst other databases and documents that Google cannot search as opposed to dedicated
library search engines.

This information is crucial for sufficient research and in-depth knowledge of a subject or collection.
Bergman (2001) puts it to us that:
[...]searching on the Internet today can be compared to dragging a net across the surface of
the ocean: a great deal may be caught in the net, but there is a wealth of information that is
deep and therefore missed.

Infact, while Google provides a very broad overview of the internet's academic resources, there is a
larger amount of information which is not part of the Google friendly 'Surface Web', information
which cannot be indexed by a standard search engine.

A great deal of academic resources is virtually buried on dynamically generated sites, a traditional
search engine such as Google would not produce this information as part of their SERP or rank it
with the PageRank system.

Bespoke library search engines such as the WordNet and the Cyc project seek to provide a narrower
but more accurate means of information retrieval, from more relevant and reputable sources. Search
engines which can retrieve information from collections which are not indexed in a manner that
Google can identify, by knowledgable experts inputting high quality contents.

Google is superior for coverage and accessibility. Library systems are superior for quality of
results. Precision is similar for both systems. Good coverage requires use of both, as both have
many unique items. Improving the skills of the searcher is likely to give better results from the
library systems, but not from Google.
(Brophy & Bawden, 2005:498)

From my own experiences regarding academic information retrieval I have found the results provided
by alternatives to Google such as JSTOR, Refseek, EBSCOE and Project Muse preferable to those
provided by Google. This is because the search engines I have referenced retrieve information from
specialised libraries which are effectively invisible to Google. The results I have gained from these
websites have been of a higher quality and from more trusted sources. The reputation of a site such
as JSTOR, with high standards of peer review, and a high quality of input, I have found invaluable
and a more useful source of information retrieval than Google.


There are however drawbacks to using these methods of information retrieval. While they can
produce more accurate and reputable results, they are often less attractive in the way the information
is displayed. Library search engines can at times be initially more difficult to use. Often alternatives
to Google are less integrated across a multitude of platforms, for example mobile phones and tablets.
Increasingly academics are researching on devices away from the standard desktop or laptop,
platforms seldom supported by academic library database search engines (Raven, 2012: 12).

In contrast to this, there has been research conducted contradicting the notion that Google provides
low quality high quantity results. In a study conducted concerning the use of Google Scholar as an
alternative to PubMed, Cochrane and other medical research search engines found that Google
Scholar coverage was 100% (Shultz, 2007:442). In a study of 29 systemic reviews Shultz cites that if
the authors had used Google Scholar alone in their research, no reference would have been missed
(Shultz, 2007:442).

These studies conclude that Google Scholar's coverage was much higher than first thought for high
quality studies and that the standard of information retrieved meant that Google Scholar could be
used alone in some cases (Shultz, 2007:442).

This however is in contrast to other studies concerning the use of Google Scholar which reveal it as
an inconsistent means of information retrieval, too focused on citation count and lacking sufficient
coverage over a multitude of disciplines (Brophy & Bawden, 2005).

In conclusion, drawing on my own personal experiences using Google as a means of academic
information retrieval and the research presented in this essay, I have found Google as an adequate
starting point for the academic. Google has access to a wide range of academic information, which it
provides a sufficient overview of. However, for the academic, the results Google provides must be
critically assessed and analysed. As a result Google is not sufficient enough to be used in isolation
for information retrieval, but it can act as a guide to direct research and study.
If the academic uses Google as a starting point, an indicator as to the direction of a piece of research,
which would then be conducted in greater depth with a dedicated library search engine, then Google
is sufficient as a means of information retrieval.

Echoing Bell (2004), there is a danger of Google becoming the only source of academic information
retrieval and as academics we should be sufficiently critical in our analysis of sources to not allow
this to be the case.
While library search engines are initially less easily accessible and less user friendly in their search
methods and results, we cannot, as academics, ignore the information they provide us with access to
access to databases of information which cannot be retrieved by Google; access to a higher quality
of information from more reputable sources, open to academic scrutiny and peer review; information
of a high academic standard.
Accessibility is likely (rightly or wrongly) to be favoured over quality as a determinant of choice
by the student users considered here. Lack of comprehensiveness in retrieval is unlikely to be a
strong motivator for these users to use any retrieval systems in addition to an internet search
engine. Nor is the prospect of undertaking extra training to make better use of library databases
likely to be attractive, when this is not useful for Google
(Brophy & Bawden, 2005: 512)
Google can be argued to be sufficient for providing context and parameters of study. It can be used
to provide common themes and arguments from which to start study and to point an academic
researcher in a certain direction, after which a more dedicated, in-depth method of search is required
to retrieve higher quality information.

Bibliography
Bell, S. (2004), "The infodiet: how libraries can offer an appetizing alternative to Google", The
Chronicle of Higher Education, 50 (24) 15-21. [E-Journal accessed on 10/12/13

http://chronicle.com/prm/weekly/v50/i24/24b01501.html]

Bergman, M. (2001). The deep web: Surfacing hidden value. Journal of electronic publishing. 7 (1),
7-21. [E-journal accessed on 7/12/13
http://grids.ucs.indiana.edu/courses/xinformatics/searchindik/deepwebwhitepaper.pdf]

Brophy, J., & Bawden, D. (2005, December). Is Google enough? Comparison of an internet search
engine with academic library resources. Aslib Proceedings 57, (6). 498-512. Emerald Group
Publishing Limited.

Chen, X. (2013). Cross-Examining Google Scholar. Reference & User Services Quarterly. 52 (4),
52-279.

Cilibrasi, R. (2006). "Automatic extraction of meaning from the web.". Information Theory, 2006
IEEE International Symposium on. 32 (3), 2309-2313.

Cohan, R (2012). Is Google Making us Stupid?. London: Book on Demand. 10-150.

Fry, H (2008). A handbook for teaching and learning in higher education. Hersay, PA: Idea
Publishing.
Garson, G (2013). Public Information Technology : Policy and Management Issues. London:
Routledge. [Ebook Available from: eBook Collection (EBSCOhost), Ipswich, MA. Accessed
December 11, 2013]

Gardener, T & Inger, S (2013). How Readers Discover Content in Scholarly Journals . Abingdon:
Renew Training. 1-28. [E-Book accessed on 9/12/13
http://www.renewtraining.com/How-Readers-Discover-Content-in-Scholarly-Journals-summary-
edition.pdf]

Herring, M. (2001). "10 Reasons the Internet is no substitute for a library.". American libraries. 32
(4), 76-79.

Hochstotter, M & Koch, N. (2009). Standard parameters for searching behaviour in search engines
and their empirical evaluation. Journal of Information Science. 35 (1), 45-65. [E-Journal accessed on
10/12/13
http://eprints.rclis.org/16081/1/What_users_see_preprint.pdf]
Lawrence, H., Miller, W. (2000), Academic Research on the Internet: Options for Scholars and
Libraries, Haworth Information Press, New York

Lewandowski, D. (2005). "Web searching, search engines and Information Retrieval.". Information
Services and Use. 25 (3), 137-147.

Ponte, J. (1998). "A language modeling approach to information retrieval.". Proceedings of the 21st
annual international ACM SIGIR conference on Research and development in information retrieval.
11 (3), 275-281.

Raven, M. (2012). Bridging the Gap: Understanding the Differing Research Expectations of First-
Year Students and Professors. Evidence Based Library and Information Practice . 7 (3), 12. [E-
Journal accessed on 8/12/13
http://ejournals.library.ualberta.ca/index.php/EBLIP/article/view/17172]

Rowlands, I Et Al.. (2008). "The Google generation: the information behaviour of the researcher of
the future.". Aslib Proceedings. 60 (4), 30-43. Emerald Group Publishing Limited.

Shultz, M (2007). Comparing test searches in PubMed and Google Scholar. Journal of the Medical
Library Association : JMLA. 95(4) 442-445

Вам также может понравиться