Data Archive

Background

For many years there has been discussion on the need for a data archive to avoid the lose of data from the caving community. Over time discussion has concentrated on two strands; how to store the data and how to make the data accessible. The William Pengelly Cave Studies Trust Ltd has forwarded a proposal of how to make the data accessible to the none surveyor. The Cave Registry http://british-caving.org.uk/?page=24 also aims to make subterranean data available, however, it has broader aims than either strand of the Archive. Both cave Archive strands should work closely with the Registry, hopefully working towards a coherent system for storing, and accessing information on the subterranean world. It is proposed that both this proposal and the William Pengelly Cave Studies Trust Ltd proposal be two different strands of the National Data Subterranean Archive

In CP25, September 1999, Andrew Atkinson proposed a system for Archiving of cave data. The main stand of this was the levels of protection that the data held, split into 5 Classes.

The Data Classes

  1. Public domain - The data is stored and is free to any user for any purpose. The original author should be credited.
  2. Free Access - The data is stored and is free for any user for any purpose. as long as the original author is credited. Profit may not be made but the costs of distribution may be recovered.
  3. Limited Access - The data is available to any user, but reproduction and use may only be carried out with the permission of the provider or holding body. Where to gain permission will accompany the data. (i.e. the original author may pass permission to the holding body or provider.)
  4. No Access - The data may not be accessed by anyone, however a list of the fact that it exists will be published. Any further enquiries will be referred to the provider.
  5. Secret - The data will be stored, however no record will be publicly available. Anyone asking about data about the cave, or entrances with the same location will be told “nothing known for that site’. The authors will be informed of the request unless those asking request secrecy (i.e. secrecy can be reciprocal).

Note that different types of data can be kept under different Classes. Typically the survey data itself might be Class 4, whilst the completed survey is Class 2 or 3. There will also be the facility for providers to record what information they hold but do not or cannot (i.e. due to expense) give to the holding body. This should reduce duplication of work. Anyone wishing to send locations under Class 5 may also do so with the same conditions.

The original article is available at http://www.sat.dundee.ac.uk/~arb/surveying/archive.html


In the intervening years technology has move on and now it is possible to achieve some of the aims using web bases solutions. The proposal below aims to set out how to implement the Class 1, 2 and 3 data, with a vision of how the proposal should progress in the future

Proposal

To provide a Subversion Repository (hence known the repository) for surveyors of any subterranean feature to utilise: Allowing groups to work collaboratively on a surveying project.
The Repository to be regularly backed up and stored offline.
Publish (web pages) descriptions of the data held, including information about the backups.

Data to be held

It is intended that the repository to be an area that can be used as a working area for survey projects. Therefore, all files that are needed for a project should be stored be stored in the repository. These should included
Digital versions of original cave notes.
Data files
Digital versions of drawn up surveys
Digital versions of final surveys.

Any file that can be generated from other files in the repository should not be included in the repository.

More detailed information, including the rules and advice about the use of the repository will be published on the Archives website.

Who can store data in the Repository?

Any member of the BCA (any status) or subscriber to the CSG plus anybody with data of UK caves can set up a Project on the Repository.

The SVN Repository

One repository will be set up per project.
One project coordinator will be given administration rights over the repository and is then responsible for issuing usernames and rights to other participants within the individual project.
All data will be available for anonymous download. (during the period when only class 1, 2 and 3 is included)

Backup Procedure

Backups will be taken regular intervals, so long as there has been a change in the data.

Backups will be stored off line.

Backups will be stored in multiple locations. (Each Project will be backed up in a minimum of 2 locations)

Backups are only to restore the repository if it fails, not for correcting user errors.

Information about the backups will be published on the web site, including
The version number that is stored
Generic locations for the stores and what data is held in each. Ie Location 1, Location 2 will be used, not addresses.

Published information

The Project Coordinator will be responsible for maintaining in the information for the Project. The Project Coordinator will be able to control rights of the Project Users to be able to edit the published pages. More details will be published on the Archive web site, however, information should included.
Programs needed to process the data.
Description of the data structure and how to assemble the project
All participants in the project identifying the Project Coordinators and Project Administrators.
Location.
Details of data available.

Surveying Programs

Proportions of the data stored, will require proprietary software to access the data. The Project Coordinator should ensure that plain test and graphic files of the data should be stored in the repository, so that the data can be recovered in the future, whether the software is available or not. It is hoped that schemers for translating data and possibly stores of the software can also be stored on in the repository, although copyright and licences will cause difficulties in some cases. This will be an ongoing developing project.

Links with the Cave Registry and William Pengelly Cave Studies Trust Ltd

These schemes are complimentary to the Proposal, and close links should be maintained, so that the sites can be cross-referenced by users. Identifiers to link the systems. Eg the use of the same reference number.

Future

Work to be considered for the future

Allow the use of the website, by Project Co-ordinators to publish the existence of data without uploading data to the depository.

Implement Category 4 and 5 for complete projects

Implement different Categories within the same project.

Automate the translation of data into the William Pengelly Cave Studies Trust Ltd Archive.