The Australian Antarctic program Data Policy (2014)

This version of the Australian Antarctic program Data Policy was created in November, 2014, and applied to all projects approved from this date onwards until superseded by the 2015 data policy released on 22 June 2015. It also retrospectively applied to projects created under the previous policy (2013v2).

Table of Contents

Summary

Detail


Summary

Australian Antarctic Data Centre (AADC) Responsibilities

The AADC will ensure that:

AAp Scientist's Data Management Responsibilities

AAp Scientists will ensure that they:


Detail

Australian Antarctic Program (AAp) Data Policy

The Australian Antarctic Program (AAp) has had a Data Policy since 1999. Minor revisions have been made to the Policy content since 1999, with a substantive re-write occurring in 2010 following publication of the 2010 Science Strategy. The current Policy (2013 update of the 2010 Policy) has recently been endorsed by the AAp Chief Scientist.

Glossary of Terms

AAD
Australian Antarctic Division
AAp
Australian Antarctic Program
AMD
Antarctic Master Directory
ATCM
Antarctic Treaty Consultative Meeting
CI
Chief Investigator (project lead scientist)
Data Centre/AADC
Australian Antarctic Data Centre
Digital Object Identifier (DOI)
A permanent and unique digital identifier allocated to a resource that can be used in an addressable form to locate a resource.
Model
A scientific model seeks to represent empirical objects, phenomena, and physical processes in a logical and objective way. A model is typically implemented in software and it is usually the output of this software that produces valuable data about the behaviour of the system being modelled. A model's conceptual structure (including e.g. its assumptions and delimitations), implementation (software), and output are all of importance from a data archiving (and data re-use) perspective.
MyScience
An AADC-built online application that provides an integrated view of the human and scientific resources associated with any given AAp-approved project.
NADC
National Antarctic Data Centre
Sample Tracker System
An online system for cataloguing physical samples captured in all AAp projects. Samples that are housed in AAD facilities must also be tracked, when moved, using this system.
SCADM
SCAR Standing Committee on Antarctic Data Management
SCAR
Scientific Committee on Antarctic Research

Policy Rationale - The Value of Antarctic Science Data

Data are valuable assets. The cost of acquiring Antarctic scientific data is logistically expensive. For example, a field person wintering over would cost in excess of $0.5M in logistical support alone (2012 figures) and costs will continue to rise with inflation and increasing commodity prices. Unlike less remote and more hospitable areas, opportunities to collect data in the field are limited. The data that are collected are therefore extremely valuable and in many cases, irreplaceable. Despite technological advances in remote sensing, field data are still required for validation and calibration purposes and many types of observations are not yet amenable to remote measurement.

Data must be recognised as having a potential value that may well exceed the value of the individual publications that are derived from it. Traditionally, publications in scientific journals have been the primary means of evaluating scientific productivity, but all areas of society are now recognising the true value of the underlying data and the need for ready access to it. It is now generally accepted that publishing the data used in a scientific paper is good scientific practice. These data may increase more rapidly in value over time as compared to their literary counterparts and remain capable of generating further research insights, often in areas unconnected to the original topic under study. It is possible that scientists may underestimate the true value of the data that they have collected and may not initially envisage alternative future uses of their data. This Policy aims to help AAp researchers maximise the value of the data they collect by providing guidance on how to use the AAp's dedicated data management facilities to make all AAp data potentially re-usable and publicly accessible.

Another key role of the Policy is to create a context that fosters development of qualified, foundational scientific digital databases and products, built over time through the process of data aggregation and pooling. These types of data assets will benefit and support all AAp researchers, both present and future.

Policy Overview

A condition of participation in the Australian Antarctic Program (AAp) is that all data and samples collected under the AAp, and products derived from those data and samples, remain the property of the Commonwealth of Australia. This excludes samples collected from Macquarie Island which are the property of the Tasmanian Government. It is the role of AAp Chief Investigators to ensure that all data and samples generated as part of their research are adequately managed for long-term re-use. This generally involves ensuring from the outset that all data/samples are adequately documented with metadata and that arrangements are made for data to be deposited with the Australian Antarctic Data Centre (AADC). Alternative long-term repositories will be considered to host data but this will require a due diligence check of the nominated repository by the AADC. The submission of a data management plan is a mandatory first milestone for all AAp projects (with a few minor exceptions where medical ethics prevents full compliance with this Policy).

Appropriate metadata must be created in the AADC's metadata system (CAASM) to describe any captured data and all data must be submitted to the AADC, or an approved long-term repository, by a project's end date. Progress towards completion of metadata and submission of all datasets will be monitored through the AADC on-line MyScience application (http://data.aad.gov.au/aadc/projects/). Completion of metadata involves ensuring that the metadata documentation accurately describes the final state of the data, as it is progressively worked up through the project. Note that all metadata records are made public after initial moderation and should be available from an early point in the project's execution. When complete, physical samples must be catalogued in the AAD online Sample Tracking System - from June 2014.

Unless there are extenuating circumstances, data submitted to the AADC will be made public, usually after a suitable embargo period. Extenuating circumstances preventing timely publication of data must be presented to the AADC Manager.

AAp Data Management Facilities

The AADC, the primary data management facility supporting the AAp, was established in 1996 to manage and disseminate scientific data resulting from research within the AAp. The AADC helps fulfill Australia's obligations under Article (III).(1).(c) of the Antarctic Treaty which states that "Scientific observations and results from Antarctica shall be exchanged and made freely available." Further, as a party to the Antarctic Treaty, Australia agreed to establish a National Antarctic Data Centre (NADC) and to publish data in a timely manner through the collaborative systems established by Antarctic Treaty members.

The AADC serves as Australia's NADC. It is one of a number of Antarctic NADCs whose data publication activities are internationally coordinated via the SCAR Standing Committee on Antarctic Data Management (SCADM). SCAR (http://www.scar.org) plays a key role in bringing together large international and inter-disciplinary Antarctic and Southern Ocean research programs. SCADM is SCAR's data management arm and has 25 member states, providing a forum for Antarctic data managers to collaborate on data management and international scientific data exchange issues. SCAR has its own Data Policy (at appendix one of http://www.scar.org/publications/reports/Report_34.pdf) with which this Policy is consistent.

The services, data and products available via the AADC can be found online at http://data.aad.gov.au.

Who Owns Australian Antarctic Science Data?

A condition of participation in the AAp is that each supported expeditioner is required to acknowledge that data and physical samples collected from the Antarctic, subantarctic and Southern Ocean are the property of the Commonwealth of Australia (unless by prior arrangement such property is waived because data or samples are to become an integral part of a research resource whose intellectual property is already owned by another party e.g. a model, a model simulation or a frequently updated gridded global dataset owned by an international institution or consortium). In waiving such rights and by contributing to the development of these types of products, it is assumed that there is intent on the part of the property owner to fully realise the value of the provided data through its public availability and reusability in future research.

Samples collected on Macquarie Island are, as an exception, the property of the Tasmanian Government and the Tasmanian Government requires verification of the curation of such items into an approved collection. Tasmanian Government permit conditions usually stipulate that samples must be registered with the Tasmanian Museum within 14 days of the expiry of a permit. Similarly, samples owned by the Commonwealth which form a recognised ‘collection', should also be deposited in a sustainably managed collection facility.

This policy recognises the original data collector as a data originator and potentially as a data custodian, i.e. an individual who has collected data on behalf of the Commonwealth (or the Tasmanian Government) and who has a vested interest in its use and management. A data custodian has certain functions and rights (explained in a later section).

As a consequence of Australia's adherence to the Antarctic Treaty System and specifically Article (III).(1).(c), the Commonwealth of Australia will make AAp data publicly available via a Creative Commons "Attribution Only" license (see the Creative Commons Web site for more detail at http://creativecommons.org.au/learn-more/licences). Whenever a work is copied or redistributed under this type of Creative Commons licence, the original creator (and any other nominated parties) must be credited as the source of the data. This license has no other restrictions on use, however, all AAp data users are strongly urged to adopt the norms of behaviour anticipated of participants in the Polar Information Commons (PIC) community (see http://www.polarcommons.org/ethics-and-norms-of-data-sharing.php). A PIC badge, with links to these norms, is presented on all AADC metadata-based data download pages. One of these norms is that users will make reasonable and timely efforts to notify data custodians when they intend to use accessed data. Timing of data release for AAp data submitted to the AADC is addressed below.

How Are Data Defined?

"Data" comprise almost any scientific observation or measurement, either raw or processed in any format, either electronic or paper. The AADC has the capability to manage a broad range of scientific data types. Data in this context could include-

A wide range of data formats are readily accepted by the Data Centre and are outlined in detail at http://data.aad.gov.au/aadc/guidelines/. If a data format is not on the accepted format list the AADC will still try to accommodate a submission but it will require some interaction with Data Centre staff, prior to deposition. The use of non-proprietary file formats is preferred wherever possible; they are more likely to be readable in the future.

Where necessary and as appropriate, more than one form of the same dataset will need to be managed by the Data Centre, for example, both raw and processed data.

The AADC has facilities to scan and copy field notes. These types of data will not be made public, unless approved by the author. The notes will be managed by the AADC, however, to aid in any necessary future interpretation of data quality.

Custodianship and Public Release of Antarctic Science Data

Whilst a Chief Investigator (CI) of an AAp project bears ultimate accountability for delivering project data into a secure repository for archiving purposes by the end of a project, the AADC links the responsibility for a dataset to an individual data custodian and sees itself as a custodial agency once it takes receipt of data. Normally, a data custodian would be:

The data custodian is usually someone who can act as a technical contact for anyone needing further details about a dataset and is generally someone very familiar with how, why and when the data were captured. They are listed as a data custodian in the metadata record that accompanies all AAp data. Metadata provides the linkage between the data, the custodian and a custodial agency.

A custodial agency, such as the AADC, provides a hosting service for the data in order to provide long-term continuity for data management and access. This is particularly useful in project-based science programs such as the AAp where data custodians regularly change interests, positions, roles, agencies and/or retire. The Data Centre also buffers the custodian from having to respond to requests for data by automating the request process. All data requests are automatically or manually logged by the AADC and responded to in most cases without recourse to the data custodian. The data custodian can request feedback on data usage from the Data Centre.

The Data Centre is, however, dependent on the custodian for information relating to a particular set of data. It is the responsibility of the custodian, in collaboration with the Data Centre, to ensure that data are well documented, primarily through metadata records, and that these data are then in a form that is acceptable for ingestion into AADC archives.

The data custodian has the right to negotiate with the AADC regarding the timing of the public release of data in their custodianship. However, in most cases release of the data will be timed to coincide with the completion date for the project by which time all deliverables, generally including proposed publications should be finalised. This default period will be used by the AADC in planning for the public release of data, but the Centre will always be happy to release data earlier if required. This means that the AADC is agreeing to embargo any data that is provided to it, during the project's execution, until the specified project end date (as listed in the Antarctic Applications Online and MyScience systems). This assumes that project end dates are relatively fixed and that they do not continuously extend to accommodate the production of unfinished or unforeseen publications. Exceptions to this project duration embargo period are listed in Table 1.

Data custodians will be notified of any impending data release and cases for an extension of an embargo period will need to be made to the AADC Manager. In cases where the extension requested necessitates broader consultation, the Chief Scientist will be approached to consider the merits of the case. However, the default stance taken by the Data Centre will be that most requests for extensions can be adequately argued and the Data Centre will work in good faith with the data custodian to reach an agreeable outcome.

When the AADC makes data public from its Web site, the data (through bundled metadata) will carry a Creative Commons license. This license will stipulate how the data custodian is to be credited for collecting (or being the source of) the data. Whilst the AADC will supply a default template for citation requiring acknowledgement of the data custodian as the data originator, the data custodian can negotiate with the AADC on how they wish to be cited.

Where requested, the AADC will also undertake to forward data to other agencies, institutions, World Data Centres or scientific data networks. The AADC routinely publishes by default to a variety of public, global data networks to gain greater exposure for hosted datasets and as a way of participating in global in scale, data product development.

Table 1. Data Publication Embargo Periods by Exception
For Different Classes of Data

(unless stated otherwise in this table, data can be embargoed for a period equal to a project's duration)

Data Type Embargo Period Explanatory Notes
Ship-sourced underway data suite. Immediate release - potentially available in real-time Consists of measured sea surface and atmospheric variables that are generally recorded by the standard marine science instrumentation on the ship.
Ship-sourced observations and measurements. By a project's end date. But copies of all sensor and instrument data to be deposited with the AADC at the end of every voyage.
Data captured by students aligned to the AAp. Until the student has published their thesis
Data from monitoring projects (medium and long-term) For the life of the project, for those projects approved for <5 years. Up to a maximum of 5 years for projects that are approved for > 5 years, commencing from time of data collection. e.g. Antarctic-continent -based seabird observational study.
Data that are supplied as Commercial-In-Confidence, or are supplied under Privacy Laws. Unlimited. Some fisheries data are provided based in a commercial-in-confidence basis. However, even in these circumstances, generalisation may render the data publishable in some circumstances. Patient records are examples of data not for release.
Data on threatened (listed) species. Unlimited. Release of data on the location of threatened species may render them more vulnerable.

Note this table is not meant to be exhaustive but gives guidance based on different classes of data. If the table does not address a particular use-case, clarification can be provided by the AADC Manager.

How Will the AADC Understand My Data Management Requirements?

All AAp endorsed projects must, as the first milestone in their research project plans, complete and submit a Data Management Plan within the first 6 months after notification of a successful AAp project application. Progress in implementing this plan will be assessed annually until the project is complete. This will form a component of the annual progress reporting obligations that are associated with all endorsed AAp projects.

To streamline the process of creating Plans, the AADC has developed an online data management planning application which is an embedded component of the broader MyScience research resource administration system (http://data.aad.gov.au/aadc/projects/). To assist with completing these Plans, the AADC has also assigned a Science Liaison Officer (SLO) to each approved project. A project's assigned SLO and their contact details are listed in MyScience. New user accounts for MyScience can be created online (http://data.aad.gov.au/aadc/user/user.cfm) if a user has an approved AAp project.

The purpose of the Plan is to identify early in the research process what datasets are likely to be captured, what processing might need to be undertaken, how the data are to be managed within the life of the project, what the likely data flows are, what resources might be required to achieve effective management and manipulation of the data, how the Data Centre can help with data storage and project-based data dissemination and how data should eventually be published and cited.

Using these Plans the Data Centre will get a detailed AAp-wide view of the data management needs of the program and be in a position to use this information to help a project identify where they might usefully collaborate on data capture activities and how data being captured by one project could be gainfully used by another during the life of the projects concerned. The Plans will also enable the Data Centre to efficiently target its work-plans and resources to meet the needs of its scientific users.

The Plan will cover how the project intends to manage (in collaboration with a hosting agency) any data, metadata, samples, models and model data. Models and model data can present some data management challenges not the least relating to issues such as: maintaining a suitable run time environment for archived models; storage problems associated with model output volumes and how to determine which model runs are of value for preservation purposes. Whilst the AADC will assist in establishing a preservation plan for models and model output it should be recognised that it may not always be possible to find an ideal data preservation solution and pragmatism will need to be applied. The current roll-out of a national Research Data Storage Infrastructure (RDSI), through storage nodes at 6 sites around Australia that will be closely coupled to high performance computing infrastructure, should help to address some of the issues associated with model archiving.

What are Metadata and DOIs and Why are they Important in Data Management?

All AAp projects are required to create metadata records describing captured data, and these records must be submitted to the AADC's metadata system – the Catalogue of Australian Antarctic and Subantarctic Metadata (CAASM). Regardless of whether the data ends up residing with the AADC, or with another hosting institution, a metadata record for captured datasets must be deposited into CAASM. Registration of metadata in CAASM is the only mechanism the AADC has for maintaining a complete inventory of the data captured as part of the AAp. This record can be periodically updated as data are processed, analysed and published on.

The AADC has a dedicated metadata officer who can assist data custodians with metadata creation and advice on the granularity at which records should be created for different classes of data. Skeleton metadata records will initially be automatically created by the Data Centre, based on information in a CI's project proposal and subsequently from information derived from the project's Data Management Plan. It is, however, the responsibility of the CI to ensure that these records are sufficient to describe the entirety of the data captured in his/her project and that these records are accurate and completed by the project's nominated end date.

Metadata is the primary mechanism for documenting data and in relevant cases, the instruments, sensors and procedures involved in data collection. Metadata standards support unlimited links to other documents, particularly in the form of Web pages. This enables the fundamental metadata parameters (who, when, where, what) to be augmented with detailed descriptions and parameters that the custodian considers necessary for other scientists to make effective use of their data.

Metadata provides information about data in a similar way that a card catalogue provides information about publications within a library. A library catalogue facilitates searching for particular topics or author, while metadata may be searched by data custodian, type of data, time or area of collection and other parameters to locate relevant data or samples.

All data submitted to the AADC will have an accompanying metadata record. Where there are a number of files making up the data submission, these are zipped together and are recognised as a data ‘package'. There is a one-to-one correspondence between a metadata record and its associated data. Upon request a Digital Object Identifier (DOI) can be allocated to submitted data (packages) by the AADC. This DOI acts as a unique and permanent (Web-addressable - http://dx.doi.org) identifier for the data and links the accompanying metadata to the data. This then allows the data to be easily cited in peer reviewed publications. Data must, however, be deposited with the AADC before it can be allocated a DOI.

All AAp scientists are strongly encouraged to routinely cite the data that underpins their research via an addressable DOI for submitted data. Citations that include DOIs can be tracked, thereby improving the discoverability and recognition of a custodian's data and published articles.

Data citation practices are evolving and are variable amongst publishers. However, if possible this Policy recommends using a full citation where the data is cited both inline (e.g., Terauds (2013)) and expanded in full in the bibliography (e.g., 'Terauds, Aleks (2013) Macquarie Is. Azorella dieback 5m x 5m quadrats 2008-2012 Australian Antarctic Data Centre doi:http://dx.doi.org/10.4225/15/50FF1931B92C1'). Regardless of where it is placed, the dataset DOI should be included somewhere in the publication.

Because practices are evolving there is confusion amongst some researchers about the use of dataset DOIs. This confusion escalates when dealing with the publication of 'Data Papers' (i.e., a relatively new form of writing papers that focuses quite narrowly on describing a particular dataset and the collection and processing steps used to produce it).

Traditionally DOIs have been assigned to publications and publications are a well entrenched measure of a researcher's credentials. Researchers generally favour citation of their papers, rather than their data. It is important, however, to understand that DOIs can be assigned to both publications and datasets. It is perfectly reasonable for a researcher to want their data paper to be cited in preference to their dataset, but this does not negate using a dataset DOI in the data paper.

An AAp Project is considered incomplete until all data resulting from that project have been described by a metadata record and those data have been submitted to the AADC (or alternate host). Future Australian Antarctic Science project applications from investigators will incur assessment penalties whilst metadata or data, for which they are responsible are still outstanding from earlier projects.

Management of Field and Laboratory Notebooks

Many AAp science project staff have a requirement to use hard-copy laboratory and field notebooks. These documents often form an important part of the metadata that gives context to the generation of data and as such they should be digitised during the life of a project (through scanning, preferably with optical character recognition). These notebooks should then be linked to the metadata/data package (but may not necessarily be made public). Recommended notebook types and guidelines for digitisation can be found at http://data.aad.gov.au/aadc/guidelines/scanning.cfm

Alternate Hosting by Parties Other than the AADC

Preferably, data should be submitted electronically to the AADC via the AADC on-line Electronic Data submission System (EDS) at http://data.aad.gov.au/aadc/eds. Alternative hosting arrangements can, however, be made. The data must still be described by metadata submitted to the Australian Antarctic Data Centre metadata database (CAASM).

Alternative data management agencies may be considered if the facilities and practices used at these hosting sites can ensure long-term, public access to the data and guarantee data preservation into the future. This would generally mean that the data were readily identifiable, retrievable in suitable formats, available on-line and that the hosting agency had transparent, operational codes of practice for long-term data preservation.

If a data custodian wishes to use an alternate hosting facility this should be documented as part of the Data Management Plan. The AADC will undertake a due diligence check of the nominated repository to ensure it meets standards expected of a long-term data hosting facility.

Sample Management

How do samples differ from data? Data are generally in digital form or are numbers, codes, symbols or text sequences recorded on paper, while samples are inherently physical (e.g. ice cores, rock samples, or biological specimens). The AADC was established to manage primarily digital data and has no facilities of its own for the storage of physical samples. However, the Antarctic Division does manage several large in-house biological collections and has dedicated internal and external storage facilities for sample management. The AADC can provide advice about collections management and collections management facilities. Memorandums of Understanding (MOUs) are being established with national museums willing to host AAp samples. As these MOUs are developed they will be advertised on the AADC Web site.

Geological specimens must be lodged with Geoscience Australia - please contact Chris Carson chris.carson@ga.gov.au.

Bird banding records must be submitted to the Australian Bird and Bat Banding Schemes. Guidelines for using ABBBS facilities can be found at: http://www.environment.gov.au/biodiversity/science/abbbs/pubs/bander-guidelines.pdf and the scheme managers can be contacted via email on abbbs@environment.gov.au.

The Special Case of Data Captured from the AAp's Marine Science Platform

A suite of fixed sensors routinely capture data during most marine science specific and cargo transit voyages of the main AAp (AAD-leased) marine science platform. These data will be captured, processed if required, and then released publicly in the shortest time feasible. In some cases this will be in near real-time directly from the ship.

Other data are captured from opportunistically deployed sensors that are mounted and operated upon request by ship-based AAD technical specialists or by the CI's themselves. All digital observations and measurements made and logged under the auspices of AAp projects onboard the ship must be copied to the ship's main data logging system. This will ensure that all instrument or sensor-based data captured using the platform is safely deposited in the AADC at the end of each voyage. These "raw" or "partially processed" data will not be publicly released until the expiry of any embargo period but can be supplied to the originating CI upon request. Any subsequently qualified or processed copies of these data will also ultimately need to be deposited in the Data Centre. Data can be taken off of the ship by CI's at the end of a voyage providing that a copy has also been deposited in the main data logging system. A skeletal metadata record should exist for each dataset that is captured before the ship returns to port.

Physical samples obtained during a voyage should also be recorded in the Sample Tracker System.

Purchasing Data and Data Agreements When Forming Partnerships

The purpose of this Policy is to maximise the availability and re-use of data acquired by projects within the AAp. Licenses associated with purchased third-party data should be negotiated such that they permit future re-use of the data by other AAp participants. Licenses should not be struck, wherever feasible, where conditions limit re-use to an individual, or a single project team. The AADC Manager or AADC Science Liaison Officers can assist with third-party data purchases and license negotiations.

The same considerations apply when forming collaborative partnerships that effectively extend the reach of the AAp into other national and international science consortia. The AAp has an open data policy, albeit permitting some embargo periods to enable sufficient time for scientific publication. It is not unreasonable to expect collaborators to adhere to similar principles. Partnerships established with other individuals, or consortia should directly address how the partnership will deal with data access and publication issues from the outset and be documented. Negotiated arrangements should not contradict the spirit and intent of the AAp Policy.

This document was last updated by Kim Finney in November 2014.