Вы находитесь на странице: 1из 25

Bringing the ART

of Manuscript
cataloging to the computer

World Digital Library Arab Peninsula Regional Group meeting

Magdy Nagi
The Wellcome Arabic Manuscript
Cataloging Partnership
Wellcome Trust Arabic Manuscript Digitization
Partnership
• Creating a unique online resource of ~ 500 Arabic
and Islamic manuscripts related to classical medicine
associated with full text search for incipits, chapter
headings, explicits, .. etc.

• A partnership between Wellcome Library, KCL and


the BA for the creation of an online discovery and
dissemination tools to avail the manuscript and
metadata on the web for free.
Main features of the application
• Manuscript facsimiles immediately available
Main features of the application
• Manuscript facsimiles immediately available
• Zooming shows images at higher quality
Main features of the application
• Manuscript facsimiles immediately available
• Zooming shows images at higher quality
• Associating images with metadata field values
Main features of the application
• Manuscript facsimiles immediately available
• Zooming shows images at higher quality
• Associating images with metadata field values
• Entering non-standard characters
Virtual Keyboard allows entering non-standard characters
Main features of the application
• Manuscript facsimiles immediately available
• Zooming shows images at higher quality
• Associating images with metadata field values
• Entering non-standard characters
• Configurable workflow between BA and
Wellcome Trust
Main features of the application
• Manuscript facsimiles immediately available
• Zooming shows images at higher quality
• Associating images with metadata field values
• Entering non-standard characters
• Configurable workflow between BA and
Wellcome Trust
• Audit trail of all changes to metadata records
Main features of the application
• Manuscript facsimiles immediately available
• Zooming shows images at higher quality
• Associating images with metadata field values
• Entering non-standard characters
• Configurable workflow between BA and
Wellcome Trust
• Audit trail of all changes to metadata records
• TEI P5 compliant output
TEI P5
• The TEI P5 standard allows entering extensive metadata
about manuscripts
– “This module defines a special purpose element which can be used to
provide detailed descriptive information about handwritten primary
sources.”
• The very vast possibilities make it powerful yet difficult
to use
– <persName> (personal name) contains a proper noun or proper-noun
phrase referring to a person, possibly including any or all of the person's
forenames, surnames, honorifics, added names, etc.

Can the data model harness its power without getting


out of control?
Data Model
• Provisioning the data fields that the catalogers
will need is not possible, because features of the
collection are not known until it is cataloged.
– We discovered the need to indicate MSPart because
some manuscripts are made of parts bound together
• Creating fields for anything possible puts us in the
same dilemma of TEI P5’s excess of possibilities

The answer lies in having a Flexible Data Model,


based on TEI P5 to be comprehensible.
Flexible Data Model
• TEI P5 is an XML vocabulary, and XML is a
flexible and structured way of storing data.

• The challenge is that years of development


against RDBs makes available many ways to
easily create data entry applications for RDB, and
almost nothing for XML.

A library called XML Skeleton Annotations (XSA)


was created just for that, and will soon be publicly
available.
XML Skeleton Annotations (XSA)
• Takes a single configuration file as input, describing the
data model and the corresponding website structure.

• Generates User Interface (UI) that is bound directly to


the XML document loaded.

• Gives users control over the look and feel of the UI


generated

• Access roles, indexing, authority lists are also included


in the configuration file.
The outcome of using XSA
• A user friendly system for entering metadata that
follows a very flexible model.
• Adding a new field to the data model is very straight
forward; just a few lines in the configuration file, and
no coding at all.
• Standards compliant records: No XML exporting code,
and the library is XML schema driven.
• Changing the hierarchy of data is possible by XSLT.
• Highly reusable, and easy to learn.
Other parts of the system
• Manageability?
Using XML as a data storage format raises the concerns of
its manageability, but there are good solutions and others
are rising.

• The XML collection is made searchable by submitting parts


of each XML record to a Lucene index. Only the index and
the document ID should be stored.

• SVN is used to orchestrate and track access to the


collection. But concurrent editing of one XML still needs a
good XML merger before it can be safely enabled.
System Architecture
Wellcome Arabic Manuscript Catalog…

Coming soon
Thank You

Вам также может понравиться