Вы находитесь на странице: 1из 5

Chapter 5

Threats to Digital Preservation and Possible Solutions

Keep constantly in mind in how many things you yourself have witnessed changes already. The universe is change, life is understanding. (Marcus Aurelius) We indicated in the Introduction some of the things we need to be worried about. In this chapter we look at these in more detail, supported by information about what others worry about. There are some obvious threats to the preservation of digitally encoded information. One is what one might call bit rot i.e. the deterioration in our ability to read the bits in which the information is encoded. While this is fundamental, nevertheless there are an increasing number of ways to overcome this problem, the simplest of which is replication of the bits i.e. making multiple copies. One way to think about this is to consider what one might be able to rely on in the long term. Within a single organisation, with a continuous supply of adequate funding, the job of digital preservation is at least feasible. However no-one can be sure of continued funding, and examples of such continued, and generous, funding are hard if not impossible to nd. Instead the preservation of any piece of digitally encoded information almost certainly will rely being passed from one organisation to another. Thus it depends on a chain of preservation which is only as strong as its weakest link. In the following sub-sections we discuss some of the major potential points of failure in these chains and some of the ways in which these points might be addressed. Subsequent sections provide more details of the concepts needed to support these solutions.

D. Giaretta, Advanced Digital Preservation, DOI 10.1007/978-3-642-16809-3_5, C Springer-Verlag Berlin Heidelberg 2011

41

42

5 Threats to Digital Preservation and Possible Solutions

Potential points of failure Failure of any chain of preservation may be imagined as involving changes in, or non-maintainability of, essential hardware, software or support environment. Additionally the human methodology established for preservation may not be followed (sudden changes of a whole team of people, etc.) OAIS stresses the importance of taking into account the changes in the knowledge base of the designated community. This may not be done adequately

Potential solutions One of the recognised techniques of isolating dependencies on hardware, software and environment is virtualisation. By this is meant the technique of identifying important, abstract, interfaces/processes which can be implemented on top of concrete implementations which are available at any particular time in the future Changes in Knowledge Base can only be truly solved by the community itself, but procedures can be proposed which help to ensure that gaps in understandability are at least recognised and the information requested from the community before it is entirely lost Provenance and authenticity is, in part at least, dependent on social and information policy concerns, process documentation, and other aspects which cannot have a purely technical solution. However some tools can be made available to ameliorate the risks of security breaches. Systems security and data integrity are only two aspects of provenance and authenticity, and we should be careful not to assume that tools for these problems will provide solutions to larger problems Constant vigilance about security of encodings and a preparedness to apply more secure encoding

Additionally one may have a loss in the chain of evidence and lack of certainty of provenance or authenticity

Encodings used to establish lack of tampering and currently considered unbreakable, may eventually be broken using increasingly powerful processors or sophistication of attack The custodian of the data, an organisation or project, no matter how well established, may, at some point in the future, cease to exist Even if the organisation exists, the mechanisms to identify the location of data, for example a DNS entry pointing to a host machine, may no longer be resolvable Mandating the continued use of specic systems or formats is one possible way to try to ensure preservation. For example we might try to mandate all images to be JPEG, all documents to be PDF/A, and all science data to be kept as XML les, or demand that a specic ontology be adopted. Even if we were to be successful for a limited time, the one thing we can be sure of is that things would change and the mandates would fail

Custodianship should always be regarded as a temporary trust and techniques are needed to allow a smooth handing over of holdings from one link in the chain of preservation to the next The provision of a denitive system of persistent actionable identiers which spreads the risk of the deterioration of identier systems must be proposed Given the constantly changing world we need a system which does not force a specic way of doing things but instead we should be able to allow anything to be accommodated. For example we cannot mandate a particular way of producing representation information or provenance. While it might have some advantages in terms of interoperability in the short term, in the long term we would be locked into a dead-end. However this should not prevent us from advising on best practise

5.1

What Can Be Relied on in the Long-Term?

43

5.1 What Can Be Relied on in the Long-Term?


While we cannot provide rigorous proofs, it is worth, at this point, listing those things which we might credibly argue would be available in the long term, in order to clarify the basis of our approach. We should be able to trace back our preservation plans to these assumptions. Were we able to undertake a rigorous mathematical proof these would form the basis of the axioms for our theorems. Words on paper (or Silicon Carbide sheets) that people can read; ISO standards are an example of this. Over the long term there may be an issue of language and character shape. Carvings in stone and books have proven track records of preserving information over hundreds of years. The information such as some fundamental Representation Information which is collected. A somewhat recursive assumption, however it is difcult to make progress without it. This Representation Information includes both digital as well as physical (e.g. books) objects. Some kind of remote access Network access is the natural assumption but in principle other methods of obtaining information from a given address/location would sufce, for example fax or horse-back rider. Some kind of computers Perhaps not strictly necessary but this seems a sensible assumption given the amount of calculation needed to do some of the most trivial operations, such as displaying anything beyond simple ASCII text, or extracting information from large datasets. People? Organisations? Clearly neither the originators of the digital objects nor the initial host organisations can be relied on to continue to exist. However if no people and no organisations exist at all then perhaps digital preservation becomes a moot topic. Identiers? Some kind of identier system is needed, as discussed in Sect. 10.3.2, will be needed, but clearly we cannot assume that any given URL, for example, will remain valid. With these in mind we are almost ready to move on to some general considerations about future-proong digitally encoded information.

44

5 Threats to Digital Preservation and Possible Solutions

5.2 What Others Think About Major Threats to Digital Preservation


A major survey carried out by the PARSE.Insight project [1], with several thousand responses from around the world, across disciplines and across stakeholders, has shown that the majority of researchers thought that there were a number of threats to the preservation of digital objects which were either very important or important. There are a number of general threats as shown in Fig. 5.1. It is interesting to see that human error, natural disasters and political instability are included in the list, in addition to concerns about funding and continuity. There were also some more specic threats which are summarised in Table 5.1. These were regarded by a clear majority across disciplines, countries and roles as either important or very important.

Fig. 5.1 General threats to digital preservation, n = 1,190

Table 5.1 Threats to digital preservation Outline threat Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved Examples Things which used to be tacit knowledge are no longer known. For example particular terminology may fall out of use; whole languages may die; paradigms of ways to analyse problems may disappear Hardware on which one currently depends, for example on Intel x86 CPUs, or tape readers, or whole operating systems which software relies, on may no longer function through lack of support. Open source software may be available but its developers may drift away

Non-maintainability of essential hardware, software or support environment may make the information inaccessible

5.3

Summary Table 5.1 (continued)

45

Outline threat The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity

Examples Someone may claim that a digital object is something of signicance, for example a diary of a famous person or a piece of missing scientic data, but one may have doubts about its origin and whether it has been surreptitiously altered A piece of software may refuse to work after a certain date because of a time limit on the licence; it may not be possible to back up a digital object because it would not be legal; your own data, which you had submitted to a repository, may be used without your permission even though you explicitly stated that it should be kept for 30 years without anyone else accessing it An XML schema may reference other schema, but the location suggested for that other schema cannot be found A Web page contains a link to an image but the URL does not work in fact the DNS may say there is no such address registered The organisation that is charged with looking after the digital object may lose its funding The people we entrust with our digital objects may make preservation decisions which in the long run mean that the digital objects are not usable

Access and use restrictions may make it difcult to reuse data, or alternatively may not be respected in future

Loss of ability to identify the location of data

The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future The ones we trust to look after the digital holdings may let us down

5.3 Summary
In order to preserve digitally encoded information we must have some understanding of the types of threats that must be guarded against. This chapter should have provided the reader with requisite background knowledge to be aware of the wide variety of threats which must be countered.

Вам также может понравиться