This document has been updated!

The most recent version of this document (v3) was posted on 2007 July 18.

View the most recent version

Document information

doi:10.1038/npre.2007.425.1
0 votes

Examining the uses of shared data

Heather A. Piwowar1 & Douglas B. Fridsma1

Correspondence: (Login to view email address)

  1. University of Pittsburgh
Document Type:
Poster
Date:
Received 11 July 2007 15:47 UTC; Posted 17 July 2007
Subjects:
Bioinformatics
Tags:
Abstract:

Background
Many initiatives and repositories exist to encourage the sharing of research data, and thousands of microarray gene expression datasets are publicly available. Many studies reuse this data, but it is not well understood which datasets are reused and for what purpose.

Materials and Methods
We trained a machine-learning algorithm to automatically classify full-text gene expression microarray studies into two classes: those that generated original microarray data (n=900) and those which only reused data (n=250). We then compared the Medical Subject Heading (MeSH) terms of two classes to identify MeSH topics which were over- or under-represented by publications with reused data.

Results
Studies on humans, mice, chordata, and invertebrates were equally likely to be conducted using original or shared microarray data, whereas shared data was used in a relatively high proportion of studies involving fungi (odds ratio (OR)=2.4), and a relatively low proportion involving rats, bacteria, viruses, plants, or genetically-altered or inbred animals (OR<0.05). Unsurprisingly, when we looked at Major MeSH terms to represent the primary purpose of the studies, statistical and computational methods clearly dominated. The only biomedical topics with a relatively high proportion of data reuse Major MeSH terms were Promoter Regions, Evolution, and Protein Interaction Mapping.

Discussion
Identifying areas of particularly successful microarray data reuse—such as Saccharomyces cerevisiae datasets and studies of promoter regions and evolution—can highlight best practices to be used when developing research agendas, tools, standards, repositories, and communities in areas which have yet to receive major benefits from shared data.

Presented at:
..., 03 August 2007

Discussion

Votes:

0 votes

(Login to vote)

Comments:

1 comment

Heather Piwowar on 17 July 2007 18:02 UTC

Whoops, important typo, off by a factor of 10.
In the results paragraph, it should read “a relatively low proportion involving rats, bacteria, viruses, plants, or genetically-altered or inbred animals (OR < 0.5)”

(Login to post a comment)

(Login to share with a colleague)

Additional information

License:
This document is licensed to the public under the Creative Commons Attribution 2.5 License
How to cite this document:

Piwowar, Heather and Fridsma, Douglas. Examining the uses of shared data. Available from Nature Precedings <http://dx.doi.org/10.1038/npre.2007.425.1> (2007)

Version info:

Other versions of this document in Nature Precedings

Version number Document title Date
v3 Posted 18 July 2007
v2 Posted 17 July 2007

Other versions of this document elsewhere on the web

None known.

Participate

Advertisement