W3 PICES GLOBEC Data Management Workshop

 

Discussion Summary

 

  1. Data Management should be USER Driven.  Scientists must decide what products they would like to see produced by GLOBEC IPO.  Collation of datasets will be important for Synthesis to be successful.  Comprehensive metadata is the starting point of identification of datasets.  Therefore it is critical that metadata inventory is as comprehensive as possible.

 

  1. The carrot and stick scenario to encourage scientists to submit metadata/data discussed.  Agreed that the stick method is rarely effective.  Data Managers need to offer incentives to scientists to submit data/metadata.  Some system must be developed to give credit to individuals whose data is used in publications.

 

  1. ‘Carrots’      good software

good tools to extract data

good tools to view/visualize data

more data for people to work with

 

Barriers to sharing data                length of time to process samples (up to 2 years)

Coding systems                             

Effort required to ‘give’ data

metadata requirements

format issues

Therefore                          Flexibility required in input format

 

 

  1. Todd O’Brien suggested that submitting a dataset should carry a similar credit to publishing a paper with funding agencies/employers.  The group agreed that this would be a good idea but did not believe that it would become common practice.

 

  1. The group felt that Funding Agencies should take a firmer line with those scientists who did not submit data to NDC in accordance with funding requirements.

 

  1. It was agreed that Biologists are generally much more reluctant and slower to submit both data and metadata than physicists and chemists.  It was felt that the reasons for this was the long time necessary for analysis for biological samples and this high level of individual investment in the data increased the proprietorial feeling of the scientist toward the data.  It was noted that scientists are concerned that others will used their data without their consent and before they have had a chance to publish.  There is no enforceable system in place to prevent this happening.

 

  1. Agreements exist at official levels to archive data.  The system of National Data Centres that already exist should be built on.  NODC’s have highly variable abilities to deal with GLOBEC like data.  It was felt that GLOBEC and JGOFs community has not really pushed NODC’s to actively handle biological data. Biological data is often collected at separate institutes to physical and chemical data.  The Ocean Climate Laboratory World Ocean Plankton database will take database in any format.  It is not longer necessary for scientists to spend a lot of time preparing data before submission.  The key requirement is that metadata is complete so that the data is useful.  It was confirmed that a scientist might be contacted by telephone to confirm or fill in any holes in the metadata but this would informal and time consuming.  The WDC is more concerned with getting the data in the first place than the format it arrives in.  Excel spreadsheets or columns of data are acceptable.

 

  1. It was noted that many people had not heard of the World Ocean Plankton Database.  Ways of increasing the visibility of this facility was discussed.

 

  1. It was understood by the group that biologists need a longer timescale in which to submit their data than physicists or chemists, but it was felt that this could not be used as an excuse for not submitting data within a reasonable timescale.  This reasonable timescale would vary between disciplines.

 

  1. It was noted that a liaison system between NDC and scientists increased the amount of data submitted to the NDCs as it increased confidence in scientists that their data would be well looked after if they knew the person they were submitting it to.  A best practice example of this is BODC, which sends its Data Managers on cruises with the scientists.  It was noted that funding for this practice in the US has been reduced and a negative effect was starting to be felt by the NDCs.

 

  1. It was discussed the the OCL lab had contact with 3 institutes in FSU but talk given by Igor Shevchenko noted many more labs.  Sergey Piontkovski noted that he was trying to organize a workshop for summer 2003 to bring together biologists from this region to discuss their data archives.  It was suggested the Todd O’Brien and SP would discuss co-operating to organize this workshop.

 

  1. It was suggested that scientists should ‘claim’ their data officially by writing metadata entries.  It was suggested that a skinny DIF would flag the data as existing and who it belonged to.  Increased visibility of the dataset would increase awareness of those who were not following dataset sharing etiquette.  By submitting metadata, the scientist would notify the community of the datasets existence but would be allowed time to work on the dataset and publish before sharing.  This would help with datasets being lost.

 

  1. It was also suggested that when a paper was published, the metadata entry identifier and database should be cited.  Publishers should check the metadata entries for the data owner and check with the owner of the data that permission to use the dataset should be sought.

 

It was generally not felt that this would work in practice.

 

  1. It was felt that the value of a dataset was increased the more people use the dataset.  Multiple author papers are becoming more common, especially as funding agencies are increasingly focused on multi-disciplinary science.  Steps must be taken to increase the confidence of biologists in sharing their data so that the full benefits of multi-disciplinary studies can be utilized.

 

  1. The role of GLOBEC National Representatives was discussed.  It was agreed that the GLOBEC IPO should have a clear idea of what GLOBEC representatives think they should be doing.  It was envisaged that this would be accomplished by the GDM writing Terms of Reference for GLOBEC National and Regional representatives setting out a list of expectations  of National/Regional representatives and confirming with each that they were happy to accept these Terms of Reference and continue in the position.  The group felt that each National programme should be encouraged to produced a CD-ROM of data collected in their projects.  Requirements should be made clear to incoming GLOBEC programmes.  It was asked if their was a formal procedure involved in a programme joining GLOBEC.  GDM replied that a letter of application was generally required but was not aware of any formal commitments being exchanged.  GLOBEC programmes are expected to abide by GLOBEC data policy which includes submission of metadata and archiving data to ensure longterm future of dataset but no enforcement in place.  It was suggested the the GDM produce a ‘Report Card’ on list of GLOBEC programmes detailing Data status of each country/programme.  A series of categories should be designed and submitted to DMTT for comment.  Status report should be presented at end of 2003. ‘Report Card’ should be submitted to SSC for action and follow up.  Publication of table in Newsletter was also suggested.

 

  1. Comparability between samples collected in different cruises and different instruments was mentioned.  It was noted that at a GLOBEC Data Management workshop in 1996 it was decided that GLOBEC research would not lend itself to a strict set of protocols such as those adopted by JGOFs.  GLOBEC did decide NOT to have a similar methodology.  This means that metadata is even more important to inform scientists of the comparability between datasets.  Report of workshop is available on ICES website – www.ices.dk or by contacting Keith Brander.

 

  1. The relationship between GOOS and GLOBEC was mentioned/questioned briefly.

 

  1. The issue of sample curation was discussed by no resolution.