W3 PICES GLOBEC Data Management
Workshop
Discussion Summary
- Data
Management should be USER Driven. Scientists
must decide what products they would like to see produced by GLOBEC
IPO. Collation of datasets will be
important for Synthesis to be successful.
Comprehensive metadata is the starting point of identification of
datasets. Therefore it is critical
that metadata inventory is as comprehensive as possible.
- The carrot
and stick scenario to encourage scientists to submit metadata/data
discussed. Agreed that the stick
method is rarely effective. Data
Managers need to offer incentives to scientists to submit
data/metadata. Some system must be
developed to give credit to individuals whose data is used in
publications.
- ‘Carrots’ good software
good tools to extract data
good tools to view/visualize
data
more data for people to work
with
Barriers to sharing data length of time
to process samples (up to 2 years)
Coding systems
Effort required to ‘give’
data
metadata requirements
format issues
Therefore Flexibility required in input format
- Todd O’Brien
suggested that submitting a dataset should carry a similar credit to
publishing a paper with funding agencies/employers. The group agreed that this would be a
good idea but did not believe that it would become common practice.
- The group
felt that Funding Agencies should take a firmer line with those scientists
who did not submit data to NDC in accordance with funding requirements.
- It was agreed
that Biologists are generally much more reluctant and slower to submit
both data and metadata than physicists and chemists. It was felt that the reasons for this
was the long time necessary for analysis for biological samples and this
high level of individual investment in the data increased the
proprietorial feeling of the scientist toward the data. It was noted that scientists are
concerned that others will used their data without their consent and
before they have had a chance to publish.
There is no enforceable system in place to prevent this happening.
- Agreements
exist at official levels to archive data.
The system of National Data Centres that already exist should be
built on. NODC’s have highly
variable abilities to deal with GLOBEC like data. It was felt that GLOBEC and JGOFs community
has not really pushed NODC’s to actively handle biological data. Biological
data is often collected at separate institutes to physical and chemical
data. The Ocean Climate Laboratory
World Ocean Plankton database will take database in any format. It is not longer necessary for
scientists to spend a lot of time preparing data before submission. The key requirement is that metadata is
complete so that the data is useful.
It was confirmed that a scientist might be contacted by telephone
to confirm or fill in any holes in the metadata but this would informal
and time consuming. The WDC is
more concerned with getting the data in the first place than the format it
arrives in. Excel spreadsheets or
columns of data are acceptable.
- It was noted
that many people had not heard of the World Ocean Plankton Database. Ways of increasing the visibility of
this facility was discussed.
- It was
understood by the group that biologists need a longer timescale in which
to submit their data than physicists or chemists, but it was felt that
this could not be used as an excuse for not submitting data within a
reasonable timescale. This
reasonable timescale would vary between disciplines.
- It was noted
that a liaison system between NDC and scientists increased the amount of
data submitted to the NDCs as it increased confidence in scientists that
their data would be well looked after if they knew the person they were
submitting it to. A best practice
example of this is BODC, which sends its Data Managers on cruises with the
scientists. It was noted that
funding for this practice in the US has been reduced and a negative effect
was starting to be felt by the NDCs.
- It was
discussed the the OCL lab had contact with 3 institutes in FSU but talk
given by Igor Shevchenko noted many more labs. Sergey Piontkovski noted that he was trying to organize a
workshop for summer 2003 to bring together biologists from this region to
discuss their data archives. It
was suggested the Todd O’Brien and SP would discuss co-operating to organize
this workshop.
- It was
suggested that scientists should ‘claim’ their data officially by writing
metadata entries. It was suggested
that a skinny DIF would flag the data as existing and who it belonged
to. Increased visibility of the
dataset would increase awareness of those who were not following dataset
sharing etiquette. By submitting
metadata, the scientist would notify the community of the datasets
existence but would be allowed time to work on the dataset and publish
before sharing. This would help
with datasets being lost.
- It was also
suggested that when a paper was published, the metadata entry identifier
and database should be cited. Publishers
should check the metadata entries for the data owner and check with the
owner of the data that permission to use the dataset should be sought.
It was generally
not felt that this would work in practice.
- It was felt
that the value of a dataset was increased the more people use the
dataset. Multiple author papers
are becoming more common, especially as funding agencies are increasingly
focused on multi-disciplinary science.
Steps must be taken to increase the confidence of biologists in
sharing their data so that the full benefits of multi-disciplinary studies
can be utilized.
- The role of
GLOBEC National Representatives was discussed. It was agreed that the GLOBEC IPO should have a clear idea
of what GLOBEC representatives think they should be doing. It was envisaged that this would be
accomplished by the GDM writing Terms of Reference for GLOBEC National and
Regional representatives setting out a list of expectations of National/Regional representatives
and confirming with each that they were happy to accept these Terms of
Reference and continue in the position.
The group felt that each National programme should be encouraged to
produced a CD-ROM of data collected in their projects. Requirements should be made clear to
incoming GLOBEC programmes. It was
asked if their was a formal procedure involved in a programme joining
GLOBEC. GDM replied that a letter
of application was generally required but was not aware of any formal
commitments being exchanged. GLOBEC
programmes are expected to abide by GLOBEC data policy which includes
submission of metadata and archiving data to ensure longterm future of
dataset but no enforcement in place.
It was suggested the the GDM produce a ‘Report Card’ on list of
GLOBEC programmes detailing Data status of each country/programme. A series of categories should be
designed and submitted to DMTT for comment. Status report should be presented at end of 2003. ‘Report
Card’ should be submitted to SSC for action and follow up. Publication of table in Newsletter was
also suggested.
- Comparability
between samples collected in different cruises and different instruments
was mentioned. It was noted that
at a GLOBEC Data Management workshop in 1996 it was decided that GLOBEC
research would not lend itself to a strict set of protocols such as those
adopted by JGOFs. GLOBEC did
decide NOT to have a similar methodology. This means that metadata is even more important to inform
scientists of the comparability between datasets. Report of workshop is available on ICES
website – www.ices.dk or by
contacting Keith Brander.
- The
relationship between GOOS and GLOBEC was mentioned/questioned briefly.
- The issue of
sample curation was discussed by no resolution.