Вы находитесь на странице: 1из 34

Citation Indexing

Nitish Mathew

Thanks to
Dr. C. Lee Giles
Dr. Paul Cohen
Outline
Introduction to Citation Indexing
What is Citation Indexing
Concept
Web of Science
Bias
Autonomous Citation Indexing
Future Application
Technology Forecasting
Summary
Why do literature search?


Avoid unwitting duplication of research

Wasted time, effort & funds

Plagiarism issues


Concept of Citations

Citations symbolize the conceptual association of scientific
ideas as recognized by publishing research authors.

By the references they cite in their papers, authors make explicit
linkages between their current research and prior work in the
archive of scientific literature.

Distinction between "citation" and
"reference"
If Paper R contains a bibliographic footnote using and
describing Paper C, then
R contains a reference to C,
C has a citation from R.
The number of references a paper has is measured by the
number of items in its bibliography as endnotes, footnotes, etc.,
The number of citations a paper has is found by looking it up [in
a] citation index and seeing how many others papers mention
it."
Source: Price D. J. D. Little science, big science...and beyond. New York: Columbia University Press,
1986.
[6] The concept of citation indexing: A unique
and innovative tool for navigating the research
literature. Current Contents, January 3, 1994.
..To start, it is important to clarify the
terminological distinction between
"citation[6] and "reference". In his
classic book Little Science, Big
Science, Derek Price gave a clear
definition of both terms. He said: "It
seems to me a great pity to waste a
good technical term by using the words
citation and reference interchangeably.
I therefore propose and adopt the
convention that if Paper R contains a
bibliographic footnote using and
describing Paper C, then R contains
Paper C
Paper R

Little science, big science...and beyond.


This is my first Current Contents (CC)
essay under the rubric of Citation Comments.
As discussed in last week's CC, this new
monthly feature will focus on the applications
of the Institute for Scientific Information's
(ISI's) databases. 1 An appropriate topic to
launch this new series is perhaps the most
rudimentary -- the basic concept of citation
indexing.
To start, it is important to clarify the
terminological distinction between "citation"
and "reference". In his classic book Little
Science, Big Science, Derek Price gave a
clear definition of both terms. He said: "It
seems to me a great pity to waste a good
technical term by using the words citation and
reference interchangeably. I therefore
propose and adopt the convention that if
Paper R contains a bibliographic footnote
using and describing Paper C, then R
contains a.
R contains a reference to C,
C has a citation from R.
Citation Index
Paper C
1) Paper X
2) Paper Y
3) Paper R
4) Paper Q
Citation Indexing

A citation index indexes the citations an article makes, linking the
article with cited works.
Originally designed mainly for literature search for researchers to
find subsequent articles that cite a given article.

Invented by Dr. Eugene Garfield
Example of a Citation Indexing Firm - Institute for Scientific
Information

(ISI)

Institute for Scientific Information

(ISI)

Index the linkages by listing both the cited and citing works.
The ISI databases
Science Citation Index (SCI)
Social Sciences Citation Index (SSCI)
Arts & Humanities Citation Index (A&HCI)

Multidisciplinary. They cover virtually all disciplines whereas
traditional indexing and abstracting services are limited to a
single field.


Web of Knowledge
ISI Web of Knowledge

, a dynamic, integrated, Web-based


environment

ISI Web of Science

provides access to
Science Citation Index (over 3,200 journals )
Social Sciences Citation Index (1400 journals)
Arts & Humanities Citation Index
Updated weekly.
Journals from 1986 is available for Penn State Users
Previous years of each index are available in PRINT at the
Libraries.
Web of Science

search current and retrospective multidisciplinary information
from nearly 8,500 research journals in the world.

users can navigate forward, backward, and through the literature,
searching all disciplines and time spans to uncover lot of
information relevant to their research.


Advantages

Compared to traditional indexing-
no subjective judgments to be made about relevant descriptors
faster
no limit to index terms - all cited references are indexed.

Problems with ISI Databases

Require manual effort during indexing
Expensive
Bias issues
One possible solution Autonomous Citation Indexing
Adapted from Citation Indexing - Its Theory and Application in Science, Technology, and
Humanities by Eugene Garfield
Bias in Citation Databases
Bibliometric indicators do not represent all publishing -though
these databases have an international coverage, they have a
certain amount of bias-
They contain more minor US journals than minor European journals
Non-English language journals are not as comprehensively indexed
From a non-English speaking world perspective, bibliometric indicators
represent only international level, predominantly English language, higher
impact, peer-reviewed, publicly available research output.

Source: Bibliometric Indicators and the Social Sciences, prepared for ESRC, J. Sylvan Katz SPRU,
University of Sussex UK, December 1999
Bias in Citation Databases
One of the recurrent criticisms journal selection is biased by
the internal management decisions of ISI.
Only journals are indexed- monographs are left out.
A lack of correlation between the most highly cited authors
based on the journal sample and those based on the monograph
sample suggests that there may be two distinct populations of
highly cited authors.




Source: Blaise Cronin and Herbert W. Snyder. Comparative citation rankings of authors in
monographic and journal literature: a study of sociology. Journal of Documentation,53(3):263273,
1997.
ResearchIndex/CiteSeer
ResearchIndex: A scientific literature digital library that
incorporates
Autonomous citation indexing
Citation context
Full-text indexing
Related document identification
Query sensitive summaries
Awareness and tracking
Citation graph analysis
http://citeseer.nj.nec.com/cs

Source: Presentation on Searching the World Wide Web General and Scientific Information
Access, Steve Lawrence
CiteSeer How does it work?
Source: CiteSeer: An Automatic Citation Indexing System (1998),C. Lee Giles, Kurt D. Bollacker,
Steve Lawrence, Digital Libraries 98 - The Third ACM Conference on Digital Libraries
Downloads
papers
from the
Web

Convert to
text and
parse

Obtain
Citations &
Do Full Text
Indexing
Store them
in
Database
Query by citations
or key words
CiteSeer - Document Acquisition
Web search engines used for crawling
Heuristics used to locate papers
Pages containing words publications, papers, postscript, etc.).
locates and downloads Postscript files identified by .ps, .ps.Z, or
.ps.gz extensions.
URLs and Postscript files that are duplicates of those already found are
detected and skipped.



Document Parsing
The downloaded Postscript files are first converted into text
Information extracted include- URL , Header, Abstract,
Introduction, Citations, Citation context and Full text
Issues in Citation Parsing include:
Natural language citations
Citations to the same article (affects citation statistics)


Querying and Browsing
First query key word search used to return a list of citations
matching the query or list of articles.
Finds related documents- a combination of weighed similarity
measures are used
http://citeseer.nj.nec.com/cs

Advantages of CiteSeer
Completely Autonomous - cheaper and more availability
More up-to-date databases - not limited to a pre-selected set of
journals or publication delays
Literature search based on the context of citations
Ability to recognize variant forms of citations
No bias due to no subjective selection of journals
Not restricted to papers preprints, technical reports,
conference proceedings also indexed.
User feedback on each article
Source: Autonomous Citation Matching (1999) Steve Lawrence, C. Lee Giles, Kurt Bollacker
Proceedings of the Third International Conference on Autonomous Agents
Areas of Improvement
1. Does not cover the significant journals comprehensively.
(might be less of a disadvantage over time as more journals become available
online)

2. Cannot distinguish subfields as accurately
(e.g. CiteSeer will not disambiguate two authors with the same name.)

3. Similar document retrieval system could be enhanced and improved.

4. Heuristics used to locate articles could be improved
Future prospects
Technology Forecasting
DIVA (for Database Information Visualization and Analysis
system) - bibliometric analysis of collections of scientific
literature and patents for technology forecasting.
Documents, drawn from the technological field of interest, are
visualized as clusters on a two dimensional map, permitting
exploration of the relationships among the documents and
document clusters
Can yield insight into trends in the technological field of interest.

Source: DIVA: A Visualization System for Exploring Document Databases For Technology
Forecasting by Steven Morris, Zheng Wu, Camille DeYong, Sinan Salman, Dagmawi Yemenu
Computers and Industrial Engineering, Vol. 43, No. 4
Clustering of documents

Document Maps
Document timelines
Document timelines
Document timelines
Document timelines
Polymers cluster report showing a plot of links to all other clusters by year

Document timelines
Polymers cluster report showing a plot of links to each other cluster by year.

A comment on bibliometric analysis
Compared to a
drunk who is
looking for his
keys under a
street lamp .


When asked by a
passer-by as to
why he is looking
there, the reply was
This is where
the lamp is.

A comment on bibliometric analysis
Critics say that publications (and citations) just provide easy data and that
the assessment of real quality needs more quantitative considerations.


Summary

Citation Indexing more the 40 years old.
Simple concept far reaching influences, applications
Many possibilities for
Improvement of existing systems
Developing new uses in the networked world

Вам также может понравиться