The Australian Antarctic program Data Policy (2010)

This version of the Australian Antarctic program Data Policy was created in 2010, and applied to all projects conducted between 2010 and the creation of the 2013 policy on April 2, 2013.

Table of Contents

Summary

Detail


Summary

Australian Antarctic Data Centre (AADC) Responsibilities

The AADC will ensure that:

AAp Scientist's Data Management Responsibilities

AAp Scientists will ensure that they:


Detail

Australian Antarctic Program (AAp) Data Policy

The Australian Antarctic Program (AAp) has had a Data Policy since 1999. The original Policy was endorsed by the Antarctic Science Advisory Committee (i.e. the committee appointed to provide advice to the DSEWPAC Minister on the Australian Antarctic Science Strategy and its implementation). Minor revisions have been made to the Policy content since 1999, with a substantive re-write occurring in 2010 following publication of the 2010 Science Strategy. The current Policy has been endorsed by the AAp Chief Scientist.

Glossary of Terms

AAD
Australian Antarctic Division
AAp
Australian Antarctic Program
Data Centre/AADC
Australian Antarctic Data Centre
AMD
Antarctic Master Directory
ATCM
Antarctic Treaty Consultative Meeting
SCADM
SCAR Standing Committee on Antarctic Data Management
NADC
National Antarctic Data Centre
PI
Principal Investigator (project lead scientist)
SCAR
Scientific Committee on Antarctic Research
WDC
World Data Centre
WDS
World Data Centre System

Policy Rationale - The Value of Antarctic Science Data

Data are valuable assets. The cost of acquiring Antarctic scientific data is logistically expensive. For example, a field person wintering over would cost in excess of $0.5M in logistical support alone (2010 figures) and costs will continue to rise with inflation and increasing commodity prices. Unlike less remote and more hospitable areas, opportunities to collect quality data on the ground are limited. The data that are collected are therefore extremely valuable and in many cases, irreplaceable. Despite technological advances in remote sensing, field data are still required for validation and calibration purposes and many types of observations are not yet amenable to remote measurement.

Data must be recognised as having a potential value that may well exceed the value of the individual publications that are derived from it. Traditionally, publications in scientific journals have been the primary means of evaluating scientific productivity, but all areas of society are now recognising the true value of the underlying data and the need for ready access to it. Despite this recognition, data underpinning a publication are rarely published in unison with a scientific paper, yet these data may increase more rapidly in value over time as compared to their literary counterparts and remain capable of generating further research insights, often in areas unconnected to the original topic under study. It is possible that scientists may underestimate the true value of the data that they have collected and may not initially envisage alternative future uses of their data. This Policy aims to help AAp researchers maximise the value of the data they collect by providing guidance on how to use the AAp's dedicated data management facilities to make all AAp data potentially re-usable and publicly accessible.

Another key role of the Policy is to create a context that fosters development of qualified, foundational scientific digital databases and products, built over time through the process of data aggregation and pooling. These types of data assets will benefit and support all AAp researchers, both present and future.

Policy Overview

A condition of participation in the Australian Antarctic Program (AAp) is that all data collected under the AAp, products derived from those data, and samples remain the property of the Commonwealth of Australia. This excludes samples collected from Macquarie Island which are the property of the Tasmanian Government. It is the role of AAp Principal Investigators to ensure that all data and samples generated as part of their research are adequately managed for long-term re-use. This generally involves ensuring from the outset that all data/samples are adequately documented with metadata and that arrangements are made for data to be deposited with the Australian Antarctic Data Centre (AADC). Alternative long-term repositories will be considered to host data but this will require a due diligence check of the nominated repository by the AADC. The submission of a data management plan is a mandatory first milestone for all AAp projects.

Appropriate metadata must be created in the AADC's metadata system (CAASM) to describe any captured data and all data must be submitted to the AADC, or an approved long-term repository, by a project's end date. Progress towards completion of metadata and submission of all datasets will be monitored through the Australian Antarctic On-line progress reporting system. Completion of metadata involves ensuring that the record accurately describes the final state of the data, as it is progressively worked up through the project. Note that all metadata records are made public after initial moderation and should be available from an early point in the project's execution. Samples must be catalogued and submitted to recognised collection hosting facilities.

Unless there are extenuating circumstances, data submitted to the AADC will be made public, usually after a suitable embargo period. Extenuating circumstances preventing timely publication of data must be presented to the AADC Manager.

AAp Data Management Facilities

The Australian Antarctic Data Centre (AADC), the primary data management facility supporting the AAp, was established in 1996 to manage and disseminate scientific data resulting from research within the AAp. The AADC helps fulfil Australia's obligations under Article (III).(1).(c) of the Antarctic Treaty which states that "Scientific observations and results from Antarctica shall be exchanged and made freely available." Further, as a party to the Antarctic Treaty, Australia agreed to establish a National Antarctic Data Centre (NADC) and to publish data in a timely manner through the collaborative systems established by Antarctic Treaty members.

The AADC serves as Australia's NADC. It is one of a number of Antarctic NADCs whose data publication activities are internationally coordinated via the SCAR Standing Committee on Antarctic Data Management (SCADM). SCAR (http://www.scar.org) plays a key role in bringing together large international and inter-disciplinary Antarctic and Southern Ocean research programs. SCADM is SCAR's data management arm and has 25 member states, providing a forum for Antarctic data managers to collaborate on data management and international scientific data exchange issues. SCAR has its own Data Policy (at appendix one ofhttp://www.scar.org/publications/reports/Report_34.pdf) with which this Policy is entirely consistent.

The services, data and products available via the AADC can be found online at http://data.aad.gov.au.

Who Owns Australian Antarctic Science Data?

A condition of participation in the AAp is that each supported expeditioner is required to acknowledge that data and samples collected from the Antarctic, sub-antarctic and Southern Ocean are the property of the Commonwealth of Australia. Samples collected on Macquarie Island are, as an exception, the property of the Tasmanian Government and the Tasmanian Government requires verification of the curation of such items into an approved collection. Permit conditions usually stipulate that samples must be registered with the Tasmanian Museum within 14 days of the expiry of a permit. This policy, however, recognises the original data collector as a data originator and potentially as a data custodian, i.e. an individual who has collected data on behalf of the Commonwealth and who has a vested interest in its use and management. A data custodian has certain functions and rights (explained in a later section).

As a consequence of Australia's adherence to the Antarctic Treaty System and specifically Article (III).(1).(c), the Commonwealth of Australia will make AAp data publicly available via a Creative Commons "Attribution Only" license (see the Creative Commons web site for more detail at http://creativecommons.org.au/learn-more/licences). Whenever a work is copied or redistributed under this type of Creative Commons licence, the original creator (and any other nominated parties) must be credited as the source of the data. This license has no other restrictions on use. Timing of data release is addressed below.

How Are Data Defined?

"Data" comprise almost any scientific observation or measurement, either raw or processed in any format, either electronic or paper. The AADC has the capability to manage a broad range of scientific data types. Data in this context could include-

A wide range of data formats are readily accepted by the Data Centre and are outlined in detail at http://data.aad.gov.au/aadc/guidelines/. If a data format is not on the accepted format list the AADC will still try to accommodate a submission but it will require some interaction with Data Centre staff, prior to deposition.

Where necessary and as appropriate, more than one form of the same dataset will need to be managed by the Data Centre, for example, both raw and processed data.

The AADC has facilities to scan and copy field notes. These types of data will not be made public, unless approved by the author. The notes will be managed by the AADC, however, to aid in any necessary future interpretation of data quality.

Custodianship and Public Release of Antarctic Science Data

The AADC links the primary responsibility for a dataset to a data custodian and sees itself as a custodial agency once it takes receipt of data. Normally, a data custodian would be:

The data custodian is usually someone who can act as a technical contact for anyone needing further details about a dataset and is generally someone very familiar with how, why and when the data were captured. They are listed as a data custodian in the metadata record that accompanies all AAp data. Metadata provides the linkage between the data, the custodian and a custodial agency.

A custodial agency, such as the AADC, provides a hosting service for the data in order to provide long-term continuity for data management and access. This is particularly useful in project-based science programs such as the AAp where data custodians regularly change interests, positions, roles, agencies and/or retire. The Data Centre also buffers the custodian from having to respond to requests for data by automating the request process. All data requests are automatically or manually logged by the AADC and responded to in most cases without recourse to the data custodian. The data custodian can request feedback on data usage from the Data Centre.

The Data Centre is, however, dependent on the custodian for information relating to a particular set of data. It is the responsibility of the custodian, in collaboration with the Data Centre, to ensure that data are well documented, primarily through metadata records, and that these data are then in a form that is acceptable for ingestion into AADC archives.

The data custodian has the right to negotiate with the AADC regarding the timing of the public release of data in their custodianship. However, in most cases release of the data will be timed to coincide with the completion date for the project by which time all deliverables, including proposed publications should be finalised. This default period will be used by the AADC in planning for the public release of data, but the Centre will always be happy to release data earlier if required. This means that the AADC is agreeing to embargo any data that is provided to it, during the project's execution, until the specified project end date (as listed in the Antarctic Applications Online system). This assumes that project end dates are relatively fixed and that they do not continuously extend to accommodate the production of unfinished or unforeseen publications. Exceptions to this project duration embargo period are listed in Table 1.

Data custodians will be notified of any impending data release and cases for an extension of an embargo period will need to be made to the AADC Manager. In difficult cases, where the extensions required appear unreasonable or unjustifiable, the Chief Scientist will determine the merits of the case presented. However, the default stance taken by the Data Centre will be that most requests for extensions will be legitimate and the Data Centre will work in good faith with the data custodian to reach an agreeable outcome.

When the AADC makes data public from its web site, the data (through bundled metadata) will carry a Creative Commons license. This license will stipulate how the data custodian is to be credited for collecting (or being the source of) the data. Whilst the AADC will supply a default template for citation requiring acknowledgement of the data custodian as the data originator, the data custodian has the right to negotiate with the AADC on how they wish to be cited.

Where requested, the AADC will also undertake to forward data to other agencies, institutions, World Data Centres or scientific data networks. AADC routinely publishes by default to a variety of public, global data networks to gain greater exposure for hosted datasets and as a way of participating in global in scale, data product development.

Table 1. Data Publication Embargo Periods by Exception
For Different Classes of Data

(unless stated otherwise in this table, data can be embargoed for a period equal to a project's duration)

Data Type Embargo Period Explanatory Notes
Ship-sourced underway data suite. Immediate release - potentially available in real-time Consists of sea surface and atmospheric parameters samples whilst a ship is transiting.
Ship-sourced observations and measurements. By a project's end date. But copies of all sensor and instrument data to be deposited with the AADC at the end of every voyage.
Data captured by students aligned to the AAp. Until the student has published their thesis
Data from monitoring projects (medium and long-term) For projects approved for >5 years, up to a maximum of 5 years from time of data collection. e.g. seabird observational study.
Data that are supplied as Commercial-In-Confidence, or are supplied under Privacy Laws. Unlimited. Some fisheries data are provided based in a commercial-in-confidence basis. However, even in these circumstances, generalisation may render the data publishable in some circumstances. Patient records are examples of data not for release.
Data on threatened (listed) species. Unlimited. Release of data on the location of threatened species may render them more vulnerable.
{: .table}

Note this table is not meant to be exhaustive but gives guidance based on different classes of data. If the table does not address a particular use-case, clarification can be provided by the AADC Manager.

How Will the AADC Understand My Data Management Requirements?

All AAp endorsed projects must, as the first milestone in their research project plans, complete and submit a Data Management Plan within the first 6 months after notification of a successful AAp project application. Progress in implementing this plan will be assessed annually until the project is complete. This will form a component of the annual progress reporting obligations that are associated with all endorsed AAp projects.

As a guide to what is required in the Plan, there is a Data Management Plan template and a sample plan on the AADC web site. To assist with completing these Plans, the AADC has assigned Liaison Officers to each Science Stream identified in the Science Strategic Plan. All AAp projects fit into at least one of these Streams. Stream Liaison Officers for each Stream are listed on the AADC web site at http://data.aad.gov.au/aadc/slo/.

The purpose of the Plan is to identify early in the research process what datasets are likely to be captured, what processing might need to be undertaken, how the data are to be managed within the life of the project, what the likely data flows are, what resources might be required to achieve effective management and manipulation of the data, how the Data Centre can help with data storage and project-based data dissemination and how data should eventually be published and cited.

Using these Plans the Data Centre will get a detailed AAp-wide view of the data management needs of each project and be in a position to use this information to help a project identify where they might usefully collaborate on data capture activities and how data being captured by one project could be gainfully used by another during the life of the projects concerned. The Plans will also enable the Data Centre to efficiently target its work-plans and resources to meet the needs of its scientific users.

The Plan will cover how the project intends to manage (in collaboration with a hosting agency) any data, metadata, samples, models and model data. Models and model data present some data management challenges not the least relating to maintaining a suitable run time environment for archived models, storage problems associated with model output volumes and how to determine which model runs are of value for preservation purposes. Whilst the AADC will assist in establishing a preservation plan for models and model output it should be recognised that it may not always be possible to find a useful and workable data preservation solution and pragmatism will need to be applied.

What is Metadata and Why is it Important in Data Management?

All AAp projects are required to create metadata records describing captured data, and these records must be submitted to the AADC's metadata system – the Catalogue of Australian Antarctic and Sub-antarctic Metadata (CAASM). Regardless of whether the data ends up residing with the AADC, or with another hosting institution, a metadata record for captured datasets must be deposited into CAASM. Registration of metadata in CAASM is the only mechanism the AADC has for maintaining a complete inventory of the data captured as part of the AAp. This record can be periodically updated as data are processed, analysed and published on.

The AADC has a dedicated metadata officer who can assist data custodians with metadata creation and advice on the granularity at which records should be created for different classes of data. Skeleton metadata records will initially be automatically created by the Data Centre, based on information in a PI's project proposal and subsequently from information derived from the project's Data Management Plan. It is, however, the responsibility of the PI to ensure that these records are sufficient to describe the entirety of the data captured in his/her project and that these records are accurate and completed by the project's nominated end date.

Metadata is the primary mechanism for documenting data and in relevant cases, the instruments, sensors and procedures involved in data collection. Metadata standards support unlimited links to other documents, particularly in the form of web pages. This enables the fundamental metadata parameters (who, when, where, what) to be augmented with detailed descriptions and parameters that the custodian considers necessary for other scientists to make effective use of their data. Metadata provides information about data in a similar way that a card catalogue provides information about publications within a library. A library catalogue facilitates searching for particular topics or author, while metadata may be searched by data custodian, type of data, time or area of collection and other parameters to locate relevant data or samples.

The AADC will work with data custodians to ensure that their datasets are linked to metadata records that are of a suitable standard to be used for data citation purposes within peer reviewed publications. This Policy strongly encourages all AAp scientists to routinely cite the data that they have used within their own publications via a link to metadata. Alternatively, those scientists who publish dedicated "data" papers in data journals, simply reference these publications. The AADC can provide advice on how best to cite for different types of publications and this issue should be explicitly addressed in the project's Data Management Plan.

An AAp Project is considered incomplete until all data resulting from that project have been described by a metadata record and those data have been submitted to the AADC (or alternate host). Future project applications will not be supported whilst metadata or data are outstanding from earlier projects.

Management of Field and Laboratory Notebooks

Many AAp science project staff have a requirement to use hard-copy laboratory and field notebooks. These documents often form an important part of the metadata that gives context to the generation of data and as such they should be digitised during the life of a project (through scanning, preferably with optical character recognition). These notebooks should then be linked to the metadata/data package (but may not necessarily be made public). Recommended notebook types abd guidelines for digitisation can be found athttp://data.aad.gov.au/aadc/guidelines/scanning.cfm

Alternate Hosting by Parties Other than the AADC

Preferably, data should be submitted electronically to the AADC via the AADC on-line Electronic Data submission System (EDS) athttp://data.aad.gov.au/aadc/eds. Alternative hosting arrangements can, however, be made. The data must still be described by metadata submitted to the Australian Antarctic Data Centre metadata database (CAASM).

Alternative data management agencies may be considered if the facilities and practices used at these hosting sites can ensure long-term, public access to the data and guarantee data preservation into the future. This would generally mean that the data were readily identifiable, retrievable in suitable formats, available on-line and that the hosting agency had transparent, operational codes of practice for long-term data preservation.

If a data custodian wishes to use an alternate hosting facility this should be documented as part of the Data Management Plan. The AADC will undertake a due diligence check of the nominated repository to ensure it meets standards expected of a long-term data hosting facility.

Sample Management

How do samples differ from data? Data are digital while samples are inherently physical. The AADC was established to manage primarily digital data and has no facilities of its own for the storage of physical samples. However, the Antarctic Division does manage several large in-house biological collections and the AADC can provide advice about collections management and collections management facilities. Memorandums of Understanding (MOUs) are being established with national museums willing to host AAp samples. As these are developed they will be advertised on the AADC web site. At this time more detailed guidance will be available about recommended sample hosting facilities and practises.

In the interim data custodians are currently requested to write a metadata record to describe sample collections, however, in the immediate future the AAp will be introducing a dedicated sample tracking system that will assist projects to manage and track collected samples and reference material.

Geological specimens must be lodged with Geoscience Australia - please contact Chris Carson chris.carson@ga.gov.au.

Bird banding records must be submitted to the Australian Bird and Bat Banding Schemes. Guidelines for using ABBBS facilities can be found at: http://www.environment.gov.au/biodiversity/science/abbbs/pubs/bander-guidelines.pdf and the scheme managers can be contacted via email on abbbs@environment.gov.au.

The Special Case of Data Captured from the AAp's Marine Science Platform

A suite of fixed sensors routinely capture data during most marine science specific and cargo transit voyages of the main AAp (AAD-leased) marine science platform. These data will be captured, processed if required, and then released publicly in the shortest time feasible. In some cases this will be in near real-time directly from the ship.

Other data are captured from opportunistically deployed sensors that are mounted and operated upon request by ship-based AAD technical specialists or by the PI's themselves. All digital observations and measurements made and logged under the auspices of AAp projects onboard the ship must be copied to the ship's main data logging system. This will ensure that all instrument or sensor-based data captured using the platform is safely deposited in the AADC at the end of each voyage. These "raw" or "partially processed" data will not be publicly released until the expiry of any embargo period but can be supplied to the PI upon request. Any subsequently qualified or processed copies of these data will also ultimately need to be deposited in the Data Centre. Data can be taken off of the ship by PI's at the end of a voyage providing that a copy has been deposited in the main data logging system. A skeletal metadata record should exist for each dataset that is captured.

Physical samples obtained during a voyage should also have a metadata record.

Purchasing Data and Data Agreements When Forming Partnerships

The purpose of this Policy is to maximise the availability and re-use of data acquired by projects within the AAp. Licenses associated with purchased third-party data should be negotiated such that they permit future re-use of the data by other AAp participants. Licenses should not be struck, wherever feasible, where conditions limit re-use to an individual, or a single project team. The AADC Manager or AADC Stream Liaison Officers can assist with third-party data purchases and license negotiations.

The same considerations apply when forming collaborative partnerships that effectively extend the reach of the AAp into other national and international science consortia. The AAp has an open data policy, albeit permitting some embargo periods to enable sufficient time for scientific publication. It is not unreasonable to expect collaborators to adhere to similar principles. Partnerships established with other individuals, or consortia should directly address how the partnership will deal with data access and publication issues from the outset and be documented. Negotiated arrangements should not contradict the spirit and intent of the AAp Policy.

This document was last updated by Kim Finney in 2010.