Вы находитесь на странице: 1из 22

Chapter 19

STFC Science Testbed

Background For the STFC testbeds a methodology was developed in response to the challenge of digital preservation. This challenge lies in the need to preserve not only the dataset itself but also the ability it has to deliver knowledge to a future user community. The preservation objective is dened by the knowledge that a data set is capable of imparting to any future designated user community and has a profound impact on the required preservation actions an archive must carry out. We sought to incorporate a number of analysis techniques tools and methods into an overall process capable of producing an actionable preservation plan for scientic data archives. The Implementation Plans

19.1 Dataset Selection


Several datasets are used in four scenarios in order to illustrate a number of important points. The datasets come from the archives located in STFC acquired from instruments in other locations, illustrated in Fig. 19.1, and for the study the MST radar in Wales (Fig. 19.2) and Ionosonde data from many stations around the world.

19.2 Challenges Addressed


The challenges addressed are that the physical phenomena about which the data is being collected are complex and specialist knowledge is needed to use the data. Moreover the data is in specialised formats and needs specialised software in order to access it. Therefore the risks to the preservation of this data include
D. Giaretta, Advanced Digital Preservation, DOI 10.1007/978-3-642-16809-3_19, C Springer-Verlag Berlin Heidelberg 2011 345

346

19

STFC Science Testbed

Fig. 19.1 Examples of acquiring scientic data

The MST Radar at Capel Dewi near Aberystwyth is the UKs most powerful and versatile wind-proling instrument. Data can currently accessed via the British Atmospheric Data Centre. It is a 46.5 MHz pulsed Doppler radar ideally suited for studies of atmospheric winds, waves and turbulence. It is run predominantly in the ST mode (approximately 220 km altitude) for which MST radars are unique in their ability to give continuous measurements of the three dimensional wind vector at high resolution (typically 23 min in time and 300 m in altitude).

Fig. 19.2 MST radar site

19.5

MST RADAR Scenarios

347

the risk to the continued ability of users to understand and use it, especially since intimate knowledge of the instruments is needed, and, as we will see, this knowledge is not widespread. Much is contained in Web sites, which are probably ephemeral. the likelihood that the software currently used to access and analyse the data will not be supported in the long term the provenance of the data depends on the what is in the head of the repository manager the funding of the archives is by no means guaranteed and yet, because much knowledge is linked to key personnel, there is a risk that it will not be possible to hand over the data/information holdings fully to another archive.

19.3 Preservation Aims


After discussion with the archive managers and scientists it was agreed that the preservation aims should be to preserve the ability of users to extract from the data and understand in sufcient detail to use in scientic analyses, a number of key measurements. The knowledge base of the Designated Community will be somewhat lower than the experts, but still include the broad disciplinary understanding of the subject. In order to be in a position to hand over its holdings, some example AIPs must be constructed. Note that we do not attempt to construct AIPs for the whole archive, nevertheless the Representation Information and PDI we capture are applicable to most of the individual datasets. With the ability to create AIPs, the archive would be in a position to hand over its holdings to the next in the chain of preservation if and when this is necessary.

19.4 Preservation Analysis


We structure the analysis of the detailed work around constructing the AIP. A number of strategies were considered. Of those eliminated it is worth mentioning that emulation was not regarded as useful by the archive scientists because it restricted the ways in which they could use the data. Similarly transformation of the data might be an option in future but only when other options became too difcult. In order to understand this, a preservation risk analysis was conducted which allows the archive managers to assess when this point is likely to arrive.

19.5 MST RADAR Scenarios


Four scenarios are detailed here, for two different instruments. IN the interests of brevity we list the actions carried out in each scenario, including where appropriate the use of the Key Components and toolkits.

348

19

STFC Science Testbed

19.5.1 STFC1: Simple Scenario MST


A user from a future designated user community should be able to extract the following information from the data for a given altitude and time Horizontal wind speed and direction Wind sheer Signal Velocity Signal Power Aspect Correlated Spectral Width MST1.1 An example of data set specic plotting and analysis programs for the MST would be the MST GNU plot software. This software plots Cartesian product of wind proles from NetCDF data les. This software was developed by the project scientist due to specialised visualization requirements where ner denition of colour and font was needed than that provided by generic tools. Preservation risks are due to the following user skill requirements and technical dependencies. UNIX http://www.unix.org/ or Linux distribution The user must be able to install python http://www.python.org/ with python-dev module installed with numpy array package and pycdf GNU plot to be installed http://www.gnuplot.info/docs/gnuplot.html and a user must be able to set environmental variables The ability to run required python scripts through a UNIX command line GNU plot template le to format plot output. A number of preservation strategies presented themselves Emulation Strategy One solution is preserving the software through emulation, for example Dioscuri http://dioscuri.sourceforge.net/faq.html. Current work with the PLANETS project http://www.planets-project.eu/news/?id=1190708180 will make Dioscuri capable of running operating systems such as Linux Ubuntu which should satisfy platform dependencies. With the capture of specied software packages/libraries and the provision of all necessary user instructions this would become a viable strategy. Conversion Strategy It is additionally possible to convert NetCDF les to another compatible format such as NASA AMES http://badc.nerc.ac.uk/help/formats/NASA-Ames/. We were able to achieve this conversion using the community developed software Nappy

19.5

MST RADAR Scenarios

349

http://home.badc.rl.ac.uk/astephens/software/nappy/, CDAT http://www2-pcmdi. llnl.gov/cdat and Python. This is a compatible self describing ASCII format, so the information should still be accessible and easily understood as long as ASCII encoded text can still be read. There would be however reluctance to do this as NASA AMES les are not as easily manipulated making it more cumbersome to analyse data in the desired manner. Preservation by Addition of Representation Information Strategy An alternate strategy is to gather the following documentation relating to the NetCDF le format which contains adequate information for future users to extract the required parameters from the NetCDF le. Currently this information can be found in the BADC support pages on NetCDF http://badc.nerc.ac.uk/help/formats/ NetCDF/ which can be archived using the HTtrack tool or adequately referenced. These pages suggest some useful generic software a future user may wish to utilize. If these pages or no longer available or the software is unusable a user can consult documents from the NetCDF documentation and libraries from Unidata http://www. unidata.ucar.edu/software/NetCDF/docs/. This means that if future user community still have skills in FORTRAN, C, C++, Python or Java they will be able to easily write software to access the required parameters. The BADC decided to opt for the following strategies Referencing BADC support Referencing Unidata support Crystallising out RepInfo from UNIDATA doc library to allow developer to write or extend their own software in the following languages JAVA C++ FORTRAN 77 Python

MST1.2 The GAP manager can be used to identify NetCDF le as at risk when BADC or UNIDATA support goes away either to a variety of technical or organisational reasons. This can now be replaced with other RepInfo from the registry repository which we will take from the NetCDF document library at UNICAR whose longevity is not guaranteed http://www.unidata.ucar.edu/software/netcdf/docs/. We will use this documentation and the real life BADC user survey to create different designated community prole with the GAP manager. This will show how we can satisfy the needs of different communities of C++, Fortran, Python and Java programmers who wish to use the data.

350

19

STFC Science Testbed

MST1.3 We explored good about NetCDF standardisation and show CASPAR supports it by archiving the CF standard name list monitoring it and using POM to send notication of changes therefore supporting the semantic integrity of the data. NetCDF (network Common Data Form) is an interface for array-orientated data access and a library that provides an implementation of that interface. NetCDF is used extensively in the atmospheric and oceanic science communities. It is a preferred le format of the British Atmospheric data centre that currently provides access to the data. The NetCDF software was developed at the Unidata Program Center in Boulder Colorado USA http://www.unidata.ucar.edu/. NetCDF facilitates preservation for the following reasons NetCDF is a portable, self-describing binary data format so is ideal for capture of provenance, descriptive and semantic information. NetCDF is network-transparent, meaning that it can be accessed by computers that store integers, characters and oating-point numbers in different ways. This provides some protection against technology obsolescence. NetCDF datasets can be read and written in a number of languages, these include C, C++, FORTRAN, IDL, Python, Perl, and Java. The spread of languages capable of reading these datasets ensures greater longevity of access because as one language becomes obsolete the community can move to another. The different language implementations are freely available from the UNIDATA Center, and NetCDF is completely and methodically documented in UNIDATAs NetCDF Users Guide making capture of necessary representation information a relatively easy low cost option. Several groups have dened conventions for NetCDF les, to enable the exchange of data. BADC has adopted the Climate and Forecasting (CF) conventions for NetCDF data and have created standard names. CF conventions are guidelines and recommendations as to where to put information within a NetCDF le, and they provide advice as to what type of information you might want to include. CF conventions allow the creator of the dataset to include information representation and preservation description information in a structured way. Global attributes describe the general properties and origins of the dataset capturing vital provenance and descriptive information, while local attributes are used. MST1.5 Archive the MST support website and carrying out an assessment of it constituent elements and use the Registry to repository to add basic information on HTML, Word, PDF, JPEG, PNG and PostScript to facilitate preservation of a simple static website Much additional valuable provenance information has also been recorded in the MST radar support website. Selected pages or the entire site could be archived as Preservation Description Information.

19.5

MST RADAR Scenarios

351

Fig. 19.3 STFC MST website

The MST website is currently located at http://mst.nerc.ac.uk (Fig. 19.3). Due to the sites simple structure, which consists of a set of static pages and common le types it would be a relatively simple operation to run a web archiving tool such as HTtrack (http://www.httrack.com/) to copy the website and add additional RepInfo on HTML, PDF, MS Word and JPEG from the DCC Registry Repository of Representation Information RRORI. HTtrack is only one of a range of webarchiving tools which are freely available and require minimal skill to operate. However it is worth noting that it is only by virtue of the technical simplicity of the site that it is so relatively easy to archive and preserve. MST1.6 PACK component was used to create and add checksum to the AIP maintaining the existing directory structure of data les. MST 1.7 The current directory structure is logical and well thought out. This should be maintained in the AIP package. Details of archiving conventions are recorded in the MST website http://mst.nerc.ac.uk/archiving_conventions.html which will need to be altered by the removal of the BADC from the top of the directory hierarchy structure to avoid confusion. /badc/dataset-name/data/data-type-name/YYYY/MM/DD/ 19.5.1.1 Preservation Information Network Model for MST Simple Solution A preservation information network model (Fig. 19.4) is a representation of the digital objects, operations and relationships which allow a preservation objective to be met for a future designated community. The model provides a sharable, stable and organized structure for digital objects and their associated requirements. The

352
MST NetCDF Cartesian
1.2 MST website

19

STFC Science Testbed

1.1 Description directory structure 1.2.1 Description and provenance

1.4 Climate Forecast Standard Terms

1.3
1.2.2 Instruction for running static website

1.4.1 XML

1.4.1.1 PDF 1.2.3 UK web archiving consortium 1.3.1 Reference BADC help on NetCDF 1.3.2 Reference UNIDATA help on NetCDF 1.3.3 NetCDF tutorial for Developers

1.2.3.4.1 Word 97

1.2.3.4

1.2.3.4.4 PNG reference to standard

1.2.3.4.2 JPEG

1.2.3.4.3 PDF

1.3.3.1 Java libraries, API, manual and instructions for developers

1.3.3.4 FORTRAN libraries, API, manual and instructions for developers

1.2.3.4.5 HTML 4.0

1.3.3.3 C++ libraries, API, manual and instructions for developers

1.3.3.3 Python libraries, API, manual and instructions for developers

Fig. 19.4 Preservation information network model for MST-simple solution

model also directs the capture and description of digital objects which need to be packaged and stored within an OAIS compliant Archival Information Package. 19.5.1.2 Components of a Preservation Network Models Preservation network modelling has many similarities to classic conceptual modelling approaches such as Entity-Relationship or Class diagrams, as it is based upon the idea of making statements about resources. The preservation network model consist of two components the digital objects and the relationships between them. Objects are uniquely identied digital entities capable of an independent existence which possess the following attributes Information is a description of the key information contained by the digital object. This information should have been identied during preservation analysis as being the information required to satisfy the preservation objective for the designated user community. Location information is the information required by the end user to physically locate and retrieve the object. AIPs may be logical in construction with key digital object being distributed and managed within different information systems. This tends to be the case when data is in active use with resources evolving in dynamic environment.

19.5

MST RADAR Scenarios

353

Physical State describes the form of the digital object. It should contain sufcient information relating to the version, variant, instance and dependencies. Risks most digital solutions will have inherent risks and a nite lifespan. Risks such interpretability of information, technical dependencies or loss designated community skill. Risks should be recorded against the appropriate object so they can be monitored and the implication of them being realised assessed. Termination of network occurs when a user requires no additional information or assistance to achieve, the dened preservation objective given the accepted risks will not be imminently realised. Relationship captures how two objects are related to one another in order to full the specied preservation objective whilst being utilized by a member of the designated user community. Function, in order to satisfy the preservation objective a digital object will perform a specic function for example the delivery of textual information or the extraction and graphical visualisation of specic parameters Tolerance, not every function is critical for the fullment of the preservation objective with some digital objects included as they enhance the quality of the solution or ease of use. The loss of this function is denoted in the model as a tolerance. Quality assurance and testing, The ability of an object to perform the specied function may have been subjected to quality assurance and testing which may be recorded against the relationship. Alternate and Composite relationships can be thought of as logical And (denoted in diagrams by circle) or Or (denoted in diagrams by diamond) relationships. Where either all relationships must function in order to full the required objective or in the case of the later only one relationship needs to function in order to full the specied objective.

19.5.1.3 Quality Assurance and Testing of MST Simple Solution 19.5.1.3.1 Overall All Solution Validated By Curation Manager at the British Atmospheric Data Centre and the NERC Earth Observation Data Centre. His role is to oversee the operations of the data centres ensuring that they are trusted repositories that deliver data efciently to users. He has a particular interest in data publication issues. He is also the facility manager for the NERC MST radar facility. NERC MST radar facility project scientist and is part of the committee for the MST radar international workshop. The international workshop on MST radar, held about every 23 years, is a major event gathering together experts from all over the world, engaged in research and development of radar techniques to study the mesosphere, stratosphere, troposphere (MST).

354

19

STFC Science Testbed

19.5.1.3.2 Element of Solution Validated as Follows MST1.1 Directory Structure Directory structure validation trivial as very simple structure easy to navigate MST1.2 MST website Content supplied validated and managed by the project scientist and is subject to community and user group scrutiny MST1.2.1 MST website provenance validate by the website creator and manager MST1.2.2 Instructions for running static website this was tested locally with the group user where able to unzip and use website providing they had Firefox/Internet Explorer, Adobe and Word installed on their laptops/PC MST1.2.3 reference testing trivial easily, risk that this reference needs to be monitored is accepted MST1.2.3.4 composite strategy elements of MST website have been scrutinised by research team We conrmed that the site contained jpeg, png, word, pdf and html le (Fig. 19.5). We then established that use of these le types was stable in the user community. Use of le types is monitored by the BADC who carry a regular survey of their user community. We accepted there was a risk that users may at some point in the future not be able to use these le and will use the BADC survey mechanism to monitor the situation. RepInfo for this le type was also added to the AIP so the le type could easily be understood and monitored.

Fig. 19.5 MST web site les

19.5

MST RADAR Scenarios

355

MST 1.2.3.4.1 Information on Word 97 supplied by Microsoft MST 1.2.3.4.2 Reference to British and ISO standards on JPEG MST 1.2.3.4.3 W3C validated description MST 1.2.3.4.4 Reference to ISO standard MST 1.2.3.4.1 Reference to ISO standard MST1.3.1 Reference to BADC software solutions for NETCDF. Tested by CASPAR STFC and IBM Haifa. Successfully tested and validated the extraction parameters using software supplied by the BADC InfrastructureManager. He looks after the software that runs the BADC, including the registration system and dataset access control software and Met Ofce Coordinator, who works for the NCAS/British Atmospheric Data Centre (but is located in Hadley Centre for Climate Prediction and Research at the UK Met Ofce (http://www.metofce.gov.uk). Main duties involve work with: Global model datasets obtained from the European Centre for Medium Range Weather Forecasts (ECMWF). Liaison with the Met Ofce regarding scientic and technical interactions. Development of software tools for data extraction, manipulation and delivery (based on Climate Data Analysis Tools (CDAT). Development of software for data format conversion such as NAppy. MST 1.3.2 & 1.3.3.9(14); RepInfo has been subjected to community scrutiny and published by UNIDATA The Unidata mission is to provide the data services, tools, and cyber infrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata, funded primarily by the National Science Foundation, is one of eight programs in the University Corporation for Atmospheric Research (UCAR) Ofce of Programs (UOP). UOP units create, conduct, and coordinate projects that strengthen education and research in the atmospheric, oceanic and earth sciences. Unidata is a diverse community of over 160 institutions vested in the common goal of sharing data, and tools to access and visualize that data. For 20 years Unidata has been providing data, tools, and support to enhance Earth-system education and research. In an era of increasing data complexity, accessibility, and multidisciplinary integration, Unidata provides a rich set of services and tools. The Unidata Program Center, as the leader of a broad community: Explores new technologies Evaluates and implements technological standards and tools Advocates for the community

356

19

STFC Science Testbed

Provides leadership in solving community problems in new and creative ways Negotiates for new and valuable data sources Facilitates data discovery and use of digital libraries Enables student-centred learning in the Earth system sciences by promoting use of data and tools in education Values open standards, interoperability, and open-source approaches Develops innovative solutions and new capabilities to solve community needs Stays abreast of computing trends as they pertain to advancing research and education MST1.4 CF standard names list. The conventions for climate and forecast (CF) metadata are designed to promote the processing and sharing of les created with the NetCDF API. The CF conventions are increasingly gaining acceptance and have been adopted by a number of projects and groups as a primary standard. The conventions dene metadata that provide a denitive description of what the data in each variable represents, and the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, re-gridding, and display capabilities. The CF conventions generalize and extend the COARDS conventions. 19.5.1.3.3 Discussion and Validation of CF metadata Takes Place in Two Formats 1. CF metadata Trac, and 2. cf-metadata mailing list. The list is then published by Alison Pamment CF meta data secretary. Alison is research scientist based at the Rutherford Appleton Laboratory and is responsible for Climate and Forecast (CF) metadata support. MST 1.4.1 W3C validate standard MST 1.4.1.1 PDF ISO standard Inputs needed for the creation of the AIP are illustrated in Fig. 19.6.

19.5.2 Scenario2 MST-Complex


A user from a future designated user community should be able to extract the following information from the data for a given altitude and time Horizontal wind speed and direction Wind sheer Signal Velocity Signal Power Aspect Correlated Spectral Width

19.5

MST RADAR Scenarios

357

Fig. 19.6 Preservation information ow for scenario 1 - MST-simple

The Preservation Information Network is shown in Fig. 19.7. In addition future users should have access to User group notes, MST conference proceedings and peer reviewed literature published by previous data users. MST Scenario2 has a higher level preservation objective and can be considered an extension of scenario 1 as the AIP information content is simply extended. The signicance of this is that future data users will have access to important information which will help in the studying the following types of phenomena captured within the data Precipitation Convection Gravity Waves Rossby Waves Mesoscale and Microscale Structures Fallstreak Clouds Ozone Layering

19.5.2.1 Preservation Objectives for Scenario2 MST-Complex A user from a future designated user community should be able to extract the following information from the data for a given altitude and time

358 19

STFC Science Testbed

Fig. 19.7 Preservation information network model for MST-complex solution

19.5

MST RADAR Scenarios

359

Horizontal wind speed and direction Wind sheer Signal Velocity Signal Power Aspect Correlated Spectral Width

In addition future users should have access to User group notes, MST conference proceedings and peer reviewed literature published by previous data users. MST Scenario2 has a higher level preservation objective and can be considered an extension of scenario 1 as the AIP information content is simply extended. The signicance of this is that future data users will have access to important information which will help in the studying the following types of phenomena captured within the data Precipitation Convection Gravity Waves Rossby Waves Mesoscale and Microscale Structures Fallstreak Clouds Ozone Layering Implementation points based on strategies for scenario1 MST1.7 We reviewed bibliography contained by website and quality of references. Carried out an investigation and review of technical reports which are used heavily at STFC but have not been generated here. Identify clear cases of reports which have correctly cited but have not need been deposited anywhere as they have no natural home and digitise for inclusion within the AIP. The website additionally contains a bibliographic record of publications resulting from use of the data. This record contains good quality citations but there would be concerns regarding permanent access to some of these materials, consider the two examples below W. Jones and S. P. Kingsley. MST radar observations of meteors. In Proceedings of the Wagstaff (USA) Conference on Astroids, Comets and Meteors. Lunar and Planetary Institute (NASA Houston), July 1991 S. P. Kingsley. Radio-astronomical methods of measuring the MST radar antenna. Technical report to MST radar user community, 1989. Neither of these two items is current held by either The British Library http:// www.bl.uk/ or The Library of Congress http://catalog.loc.gov/ based on searches of their catalogues. Nor do they in exist in the local STFC institutional repository http://epubs.cclrc.ac.uk/. The preservation strategy to deal with this bibliography was to create MARC http://www.loc.gov/marc/ http://www.dcc.ac.uk/diffuse/?s=36 records in XML format for items held by the British Library and to begin the process of obtaining copies of the other items from the current community and digitise them in PDF format for direct inclusion within the AIP.

360

19

STFC Science Testbed

MST1.8 The international workshop on MST radar is held about every 23 years, and is a major event gathering together experts from all over the world, engaged in research and development of radar techniques to study the mesosphere, stratosphere and troposphere (MST). It was additionally attended by young scientists, research students and also new entrants to the eld to facilitate close interactions with the experts on all technical and scientic aspects of MST radar techniques. It is this aspect which makes the proceedings an ideal resource for future users who are new to the eld. Permanent access to these proceedings is again at risk. The MST 10 proceedings are available for download from the internet http://jro.igp.gob.pe/mst10/ and from the British Library. Proceedings 3, 510 are also available from the British library, meeting 4 is only available from the Library of Congress and unfortunately the proceedings from meetings 1 and 2 have not been deposited in either institution. A number of strategies present themselves. Copies of proceedings 1, 2 and 4 could be obtained from the still active community, digitised and incorporated into the AIP. The proceedings which are currently held by the British Library can be obtained, digitised and incorporated into the AIP or alternatively the XML MARC record can be obtained and incorporated into the AIP as a reference as there is a high to degree of condence in the permanence of these holdings. MST1.9 The project scientist has again been quite diligent in keeping minutes of the user group meetings which are run for data-using scientists several times a year. As result this information is easily captured. It currently resides in the NCAS CEDA repository which provides easy access to current data users however there are no guarantees that this repository will persist in the longer term so a simple reference in the form of URL would not be considered to be sufcient to guarantee permanent access to this material. This leaves two strategies open to the archive. The rst involves taking a copy of this material and including it physically within the AIP. The second involves orchestration where the CEDA repository would be required to alert the custodians of the MST data to the demise of the repository or migration of this material, so it may be obtained for direct inclusion in the AIP. We created reference to the MST user group minutes held in the newly created CEDA institutional repository for the Nation Centre for Atmospheric studies http:// cedadocs.badc.rl.ac.uk/. We registered the demise of this repository as a risk to monitored and recommended the development of an orchestration strategy for material held. This repository as it is representative of a proliferation of repositories in academia whose longevity is not guaranteed. 19.5.2.2 Quality Assurance and Testing of MST Complex Solution MST 2.5 Bibliography content supplied and validated by the project scientist MST 2.5.1 MARC21 specication standard validated by library of congress MST 2.5.1 XML specication validated by W3C MST 2.5.1.1 & 2.5.2.1 PDF ISO standard Inputs needed for the creation of the AIP are illustrated in Fig. 19.8.

19.6

Ionosonde Data and the WDC Scenarios

361

Fig. 19.8 Preservation information ow for scenario 2 - MST-complex

19.6 Ionosonde Data and the WDC Scenarios


The World Data Centre (WDC) system was created to archive and distribute data collected from the observational programmes of the 19571958 International Geophysical Year. Originally established in the United States, Europe, Russia, and Japan, the WDC system has since expanded to other countries and to new scientic disciplines. The WDC system now includes 52 Centres in 12 countries. Its holdings include a wide range of solar, geophysical, environmental, and human dimensions data. The WDC for Solar-Terrestrial Physics based at the Rutherford Appleton laboratory holds ionospheric data comprising vertical soundings from over 300 stations, mostly from 1957 onwards, though some stations have data going back to the 1930s. The Ionosonde is a basic tool for ionospheric research. Ionosondes are Vertical Incidence radars which record the time of ight of a radio signal swept through a range of frequencies (130 MHz) and reected from the ionised layers of the upper atmosphere (90800 km) as an ionogram. These results are analysed to give the variation of electron density with height up to the peak of the ionosphere. Such electron-density proles provide most of the Information required for studies of the ionosphere and its effect on radio communications. Only a

362

19

STFC Science Testbed

small fraction of the recorded ionograms are analysed in this way, however, because of the effort required. The traditional input to the WDC has been hourly resolution scaled data, but many stations take soundings at higher resolutions. The WDC receives data from the many ionosonde stations around the world through a variety of means including ftp, email, CD-ROM. Data is provided in a number of formats: URSI (simple hourly resolution) and IIWG (more complex, time varying) standard formats as well as station specic bulletins. The WDC stored data in digital formats comprises 2.9 GB of data in IIWG format and 70 GB of raw MMM, SAO, ART les from Lowell digisondes. The WDC also holds about 40,000 rolls of 16/35 mm lm ionograms and ~10,000 monthly bulletins of scaled ionospheric data. Some of this data is already in digital from, but much, particularly the ionogram images, is not yet digitised. Many stations data is provided in IIWG or URSI format directly. This data may be automatically or manually scaled. selection of European stations provide raw format data from Lowell digisondes, a particular make of ionosonde, as part of a COST project. This data is in a proprietary format, but Lowell provides Java based software for analysis. The WDC uses this software to manipulate this data, particularly from the CCLRCs own Ionospheric Monitoring Groups Ionosondes at Chilton, UK and Stanley, Falkland Islands. The autoscaled data from these stations is also stored in a PostgreSQL database. Other stations provide a small set of standard parameters in a station specic bulletin format which is similar to the paper bulletins traditionally produced from the 1950s onwards. The WDC has some bespoke, congurable software to extract the data from these bulletins and convert it to IIWG format. It is important to realise that this is a totally voluntary data collection and archive system. The WDCs have no control or means of enforcing a standard means of data processing or dissemination, though weight of history and ease-of-use tends to make this the preferred option.

19.6.1 STFC3: Implementation Plan for Scenario3 Ionosonde-Simple


The rst preservation scenario show us again supporting and integrating with existing preservation practices of the World Data Centre, which means creating a consistent global record from 252 station by extracting a standardise set of parameters from the Ionograms produced around the world. A user from a future designated community should be able to the following fourteen standard Ionospheric parameters from the data for a given station and time. They should also be able to understand what these parameters represent. Fmin, foE h_E,foes h_Es, type of Es,

19.6

Ionosonde Data and the WDC Scenarios

363

fbEs, foF1, M(3000)F1, h_F, h_F2, foF2, fx, M(3000)F2. The preservation information ow is shown in Fig. 19.9 and the corresponding information network is shown in Fig. 19.10. 19.6.1.1 Preservation Information Flow for Scenario3 Ionosonde-Simple

Fig. 19.9 Preservation information ow for scenario 3 Ionosonde-simple

IIWG
1.1 Description directory structure 1.2 CSV file of station information 1.5 URSI handbooks 1.5.1 PDF

1.3 IIWG format description

1.4 URSI parameter code DEDSL dictionary

1.4.2 XML

1.4.1 DEDSL specification

1.4.2.1 & 1.4.1.1 PDF

Fig. 19.10 Preservation network model for scenario 3 Ionosonde simple

364

19

STFC Science Testbed

19.6.1.2 Implementation Points Based on Strategies for Scenario3 IO1.1 New RepInfo based on IIWG format description removing need to understand FORTRAN as is the case with comprehending the current version IO1.2 Create DEDSL dictionary for 14 standard parameters and add RepInfo from the Registry Repository on the XML DEDSL standard IO1.3 Authenticity information from the current archivist for the 252 stations and the data transformation/ingest process IO1.4 Perform CSV dump of station information from Postgres database IO1.5 A logical description of the directory structure was created IO1.6 PACK was used to create and add checksum to the AIP maintaining the existing directory le structure

19.6.2 STFC4: Implementation Plan for Scenario4 Ionosonde-Complex


The second preservation scenario for the Ionosonde can only be carried out for 7 European stations but will allow a consistent Ionogram record for the Chilton site which dates back to the 1920s. A user from a future designated community should be able reproduce an Ionogram from the raw mmm/sao data les (see Fig. 19.11) and have access to the Ionospheric Monitoring groups website, the URSI handbooks of interpretation and Lowell technical documentation. Being able to preserve the Ionogram record is signicant as it a much richer source of information more accurately able to covey the state of the atmosphere when correctly interpreted. The preservation information ow is shown in Fig. 19.12.

Fig. 19.11 Example plot of output from Ionosonde

19.6

Ionosonde Data and the WDC Scenarios


Preservation Description Information
Interpreted using

365

Information Static

Content
Representation Information

Information expected tp evolve over time

Reference Information

Provenance Information

Context Information

Fixity Information

Structure Informaton

adds meaning to

Semantic Informaton

Other Representation Information

Station code, descriptions and organisational information Description of Directory structure

Ionospheric Monitoring group website MMM &SAO file format descritions

SAO - Explorer

Raw mmm & SAO data files

URSI handbook of Ionogram interpretation

Lowell Technical documentation

Data Archivist Data Producers Scientific Organisation

Fig. 19.12 Preservation information ow for scenario 4 Ionosonde-complex

19.6.2.1 Implementation Points Based on Strategies for Scenario4 IO2.1 Archive SAO explorer with RepInfo from registry repository for JAVA 5 software IO2.2 Digitise and include URSI handbooks of interpretation in the AIP and deposit in Registry Repository for other repository users IO2.3 Digitise and include Lowell technical documentation in the AIP and deposit in Registry Repository for other repository users IO2.4 Archive the Ionospheric monitoring group website and carrying out an assessment of its constituent elements and use the Registry to repository to add basic information on HTML, Word, PDF, JPEG, PNG and PostScript to facilitate preservation of a simple static website IO2.5 Review bibliography contained by website and quality of references. Carry out an investigation and review of technical reports which are used heavily at STFC but have not been generated here. Identify clear cases of reports which have correctly cited but have not need been deposited anywhere as they have no natural home and digitise for inclusion within the AIP. IO2.6 Perform CSV dump of station information from Postgres database IO2.7 Create logical description of directory structure IO2.8 Use PACK to create and add checksum to the AIP maintaining the existing directory structure

366

19

STFC Science Testbed

IO2.9 Use the GAP manager to identify a GAP based on the demise of the JAVA virtual machine. Use POM to notify us of the gap and update the AIP with a replacement EAST description of the mmm le structure from the registry repository.

19.7 Summary of Testbed Checks


At each of the steps listed above checks were performed to ensure that, for example the Representation Information e.g. IO1.1 the description of the IIWG format was checked by extracting numbers from the data le using generic tools and comparing these to the values obtained using the current tools. The overall check was to go through the AIP with the archive managers and scientists and ensure that they agreed with the Representation Information and PDI which had been captured this required several iterations but in the end they were willing to sign-off on all the materials. Users with the appropriate knowledge base have also been successful in extracting and performing the basic analysis tasks with the specied data. Taking this together with the acceptance by the archive managers and scientists of the preservation analysis, risks analysis and the adequacy of the AIP, we believe that the aims of the testbed have been successfully achieved.

Вам также может понравиться