The most recent version of this document (v3) was posted on 2007 July 18.
View the most recent versionDocument information
Examining the uses of shared data
Correspondence: (Login to view email address)
- University of Pittsburgh
- Document Type:
- Poster
- Date:
- Received 17 July 2007 17:47 UTC; Posted 17 July 2007
- Subjects:
- Bioinformatics
- Abstract:
Background
Many initiatives and repositories exist to encourage the sharing of research data, and thousands of microarray gene expression datasets are publicly available. Many studies reuse this data, but it is not well understood which datasets are reused and for what purpose.Materials and Methods
We trained a machine-learning algorithm to automatically classify full-text gene expression microarray studies into two classes: those that generated original microarray data (n=900) and those which only reused data (n=250). We then compared the Medical Subject Heading (MeSH) terms of two classes to identify MeSH topics which were over- or under-represented by publications with reused data.Results
Studies on humans, mice, chordata, and invertebrates were equally likely to be conducted using original or shared microarray data, whereas shared data was used in a relatively high proportion of studies involving fungi (odds ratio (OR)=2.4), and a relatively low proportion involving rats, bacteria, viruses, plants, or genetically-altered or inbred animals (OR<0.05). Unsurprisingly, when we looked at Major MeSH terms to represent the primary purpose of the studies, statistical and computational methods clearly dominated. The only biomedical topics with a relatively high proportion of data reuse Major MeSH terms were Promoter Regions, Evolution, and Protein Interaction Mapping.Discussion
Identifying areas of particularly successful microarray data reuse—such as Saccharomyces cerevisiae datasets and studies of promoter regions and evolution—can highlight best practices to be used when developing research agendas, tools, standards, repositories, and communities in areas which have yet to receive major benefits from shared data.- Presented at:
- ISMB 2007, 22 July 2007
Discussion
- Votes:
-
1 vote
- Comments:
-
1 comment
1 comment on previous versions - (Login to share with a colleague)
Additional information
- License:
- This document is licensed to the public under the Creative Commons Attribution 2.5 License
- How to cite this document:
-
Piwowar, Heather and Fridsma, Douglas. Examining the uses of shared data. Available from Nature Precedings <http://dx.doi.org/10.1038/npre.2007.425.2> (2007)
- Version info:
Heather Piwowar on 17 July 2007 18:01 UTC
Whoops, important typo, off by a factor of 10.
In the results paragraph, it should read “a relatively low proportion involving rats, bacteria, viruses, plants, or genetically-altered or inbred animals (OR < 0.5)”