<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/" version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/">
  <channel>
    <title>Nature Precedings - Tag feed for Open data</title>
    <link>http://precedings.nature.com/tags/Open%20data</link>
    <description>Recently posted documents tagged with 'Open data'</description>
    <dc:publisher>Nature Publishing Group</dc:publisher>
    <dc:language>en</dc:language>
    <prism:publicationName>Nature Precedings</prism:publicationName>
    <image>
      <title>Nature Precedings</title>
      <url>http://precedings.nature.com/images/header_logo.gif</url>
      <link>http://precedings.nature.com</link>
    </image>
    <atom:link type="application/rss+xml" rel="self" href="http://precedings.nature.com/tags/Open%20data/feed"/>
    <item>
      <title>Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness</title>
      <link>http://dx.doi.org/10.1038/npre.2008.2083.1</link>
      <description>Molecular biology data are subject to terms of use that vary widely between databases and curating institutions. This research presents a taxonomy of contractual and technical restrictions applicable to databases in life science. It builds upon research led by Science Commons demonstrating why open data and the freedom to integrate facilitate innovation and how this openness can be achieved. The taxonomy describes technical and legal restrictions applicable to life science databases, and its metadata have been used to assess terms of use of databases hosted by Life Science Resource Name (LSRN) Schema. While a few public domain policies are standardized, most terms of use are not harmonized, difficult to understand and impose controls that prevent others from effectively reusing data. Identifying a small number of restrictions allows one to quickly appreciate which databases are open. A checklist for data openness is proposed in order to assist database curators who wish to make their data more open to make sure they do so.</description>
      <guid>http://dx.doi.org/10.1038/npre.2008.2083.1</guid>
      <pubDate>Fri, 18 Jul 2008 13:51:08 UTC</pubDate>
      <dc:title>Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness</dc:title>
      <dc:identifier>doi:10.1038/npre.2008.2083.1</dc:identifier>
      <dc:date>2008-07-18</dc:date>
      <dc:creator>Melanie Dulong de Rosnay</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2008-07-18T13:51:08Z</prism:publicationDate>
      <prism:category>Manuscript</prism:category>
      <prism:section>Molecular Cell Biology</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/2083/version/1/files/npre20082083-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Open Data in Science</title>
      <link>http://precedings.nature.com/documents/1526/version/1</link>
      <description>Open Data (OD) is an emerging term in the process of defining how scientific data may be published and re-used without price or permission barriers. Scientists generally see published data as belonging to the scientific community, but many publishers claim copyright over data and will not allow its re-use without permission. This is a major impediment to the progress of scholarship in the digital age. This article reviews the need for Open Data, shows examples of why Open Data are valuable and summarizes some early initiatives in formalizing the right of access to and re-use of scientific data. </description>
      <guid>http://precedings.nature.com/documents/1526/version/1</guid>
      <pubDate>Fri, 18 Jan 2008 19:51:35 UTC</pubDate>
      <dc:title>Open Data in Science</dc:title>
      <dc:identifier>hdl:10101/npre.2008.1526.1</dc:identifier>
      <dc:date>2008-01-18</dc:date>
      <dc:creator>Peter Murray-Rust</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2008-01-18T19:51:35Z</prism:publicationDate>
      <prism:category>Manuscript</prism:category>
      <prism:section>Biotechnology</prism:section>
      <prism:section>Chemistry</prism:section>
      <prism:section>Genetics &amp; Genomics</prism:section>
      <prism:section>Molecular Cell Biology</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/1526/version/1/files/npre20081526-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Open Notebook Science: Perspectives from a newbie</title>
      <link>http://dx.doi.org/10.1038/npre.2007.1130.1</link>
      <description>My group is using an electronic lab notebook based on a Blog format that is being developed in collaboration with the group of Professor Jeremy Frey. Here I discuss how this led to the adoption of an open notebook science approach in my group as well as some of the consequences, both positive and negative, of adopting such an approach.</description>
      <guid>http://dx.doi.org/10.1038/npre.2007.1130.1</guid>
      <pubDate>Thu, 27 Sep 2007 14:49:10 UTC</pubDate>
      <dc:title>Open Notebook Science: Perspectives from a newbie</dc:title>
      <dc:identifier>doi:10.1038/npre.2007.1130.1</dc:identifier>
      <dc:date>2007-09-27</dc:date>
      <dc:creator>Cameron Neylon</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2007-09-27T14:49:10Z</prism:publicationDate>
      <prism:category>Presentation</prism:category>
      <prism:section>Biotechnology</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/1130/version/1/files/npre20071130-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/2.5/</creativeCommons:license>
    </item>
    <item>
      <title>Examining the uses of shared data</title>
      <link>http://dx.doi.org/10.1038/npre.2007.425.3</link>
      <description>Does your research area re-use shared datasets?*   Re-using data has many benefits, including research synergy and efficient resource use*   Some research areas have tools, communities, and practices which facilitate re-use*   Identifying these areas will allow us to learn from them, and apply the lessons to areas which underutilize the sharing and re-purposing of scientific data between investigators    Which datasets?This preliminary analysis examines the re-use of microarray gene expression datasets. Thousands of microarray gene expression datasets have been deposited in publicly available databases. Many studies reuse this data, but it is not well understood for what purposes.  Here, we examined all publications found in PubMed Central on April 1, 2007 whose full-text contained the phrases &#8220;microarray&#8221; and &#8220;gene expression&#8221; to find studies which re-used microarray data.    How did we identify re-use?We developed prototype machine-learning classifiers to identify a) studies containing original microarray data (n=900) and b) studies which instead re-used microarray data (n=250).  Preprocessing (Python NLTK) extracted manually-selected keyword frequencies from the full-text publications as features for a Support Vector Machine (SVMlite).  The classifier was trained and tested on a manually-labeled set of documents (PLoS articles prior to January 2007 containing the word &#8220;microarray,&#8221; n=200).    How did we identify patterns of re-use?We compared the Medical Subject Heading (MeSH) of the two classes to estimate the odds that a specific MeSH term would be used given all studies with original microarray data, compared to the odds of the same term describing studies with re-used data.  Terms were truncated to comparable levels in the MeSH hierarchy.    ResultsPublications with original vs. re-used microarray data have different distributions of MeSH terms (Figure 1), and occur in different proportions across various journals (Figure 2).     Microarray data source (original vs. re-used) did not affect the odds of a study focusing on humans, mice, or invertebrates, whereas publications with re-used data did involve a relatively high proportion of studies involving fungi (odds ratio (OR)=2.4), and a relatively low proportion involving rats, bacteria, viruses, plants, or genetically-altered or inbred animals (OR     Trends in odds ratios of MeSH terms for other attributes can be seen in Figure 3.    HopeAlthough not all research topics can be addressed by re-using existing data, many can.  Identifying areas with frequent re-use can highlight best practices to be used when developing research agendas, tools, standards, repositories, and communities in areas which have yet to receive major benefits from shared data.      Future WorkWe plan to refine our tool for identifying studies which re-use data, and continue studying and measuring re-use and reusability.</description>
      <guid>http://dx.doi.org/10.1038/npre.2007.425.3</guid>
      <pubDate>Wed, 18 Jul 2007 13:26:38 UTC</pubDate>
      <dc:title>Examining the uses of shared data</dc:title>
      <dc:identifier>doi:10.1038/npre.2007.425.3</dc:identifier>
      <dc:date>2007-07-18</dc:date>
      <dc:creator>Heather A. Piwowar</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2007-07-18T13:26:38Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/425/version/3/files/npre2007425-3.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/2.5/</creativeCommons:license>
    </item>
    <item>
      <title>Examining the uses of shared data</title>
      <link>http://dx.doi.org/10.1038/npre.2007.425.2</link>
      <description>BackgroundMany initiatives and repositories exist to encourage the sharing of research data, and thousands of microarray gene expression datasets are publicly available. Many studies reuse this data, but it is not well understood which datasets are reused and for what purpose.Materials and MethodsWe trained a machine-learning algorithm to automatically classify full-text gene expression microarray studies into two classes: those that generated original microarray data (n=900) and those which only reused data (n=250). We then compared the Medical Subject Heading (MeSH) terms of two classes to identify MeSH topics which were over- or under-represented by publications with reused data.ResultsStudies on humans, mice, chordata, and invertebrates were equally likely to be conducted using original or shared microarray data, whereas shared data was used in a relatively high proportion of studies involving fungi (odds ratio (OR)=2.4), and a relatively low proportion involving rats, bacteria, viruses, plants, or genetically-altered or inbred animals (ORDiscussionIdentifying areas of particularly successful microarray data reuse&#8212;such as Saccharomyces cerevisiae datasets and studies of promoter regions and evolution&#8212;can highlight best practices to be used when developing research agendas, tools, standards, repositories, and communities in areas which have yet to receive major benefits from shared data.</description>
      <guid>http://dx.doi.org/10.1038/npre.2007.425.2</guid>
      <pubDate>Tue, 17 Jul 2007 13:56:37 UTC</pubDate>
      <dc:title>Examining the uses of shared data</dc:title>
      <dc:identifier>doi:10.1038/npre.2007.425.2</dc:identifier>
      <dc:date>2007-07-17</dc:date>
      <dc:creator>Heather A. Piwowar</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2007-07-17T13:56:37Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/425/version/2/files/npre2007425-2.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/2.5/</creativeCommons:license>
    </item>
    <item>
      <title>Examining the uses of shared data</title>
      <link>http://dx.doi.org/10.1038/npre.2007.425.1</link>
      <description>BackgroundMany initiatives and repositories exist to encourage the sharing of research data, and thousands of microarray gene expression datasets are publicly available. Many studies reuse this data, but it is not well understood which datasets are reused and for what purpose.Materials and MethodsWe trained a machine-learning algorithm to automatically classify full-text gene expression microarray studies into two classes: those that generated original microarray data (n=900) and those which only reused data (n=250). We then compared the Medical Subject Heading (MeSH) terms of two classes to identify MeSH topics which were over- or under-represented by publications with reused data.ResultsStudies on humans, mice, chordata, and invertebrates were equally likely to be conducted using original or shared microarray data, whereas shared data was used in a relatively high proportion of studies involving fungi (odds ratio (OR)=2.4), and a relatively low proportion involving rats, bacteria, viruses, plants, or genetically-altered or inbred animals (ORDiscussionIdentifying areas of particularly successful microarray data reuse&#8212;such as Saccharomyces cerevisiae datasets and studies of promoter regions and evolution&#8212;can highlight best practices to be used when developing research agendas, tools, standards, repositories, and communities in areas which have yet to receive major benefits from shared data.</description>
      <guid>http://dx.doi.org/10.1038/npre.2007.425.1</guid>
      <pubDate>Tue, 17 Jul 2007 13:13:40 UTC</pubDate>
      <dc:title>Examining the uses of shared data</dc:title>
      <dc:identifier>doi:10.1038/npre.2007.425.1</dc:identifier>
      <dc:date>2007-07-17</dc:date>
      <dc:creator>Heather A. Piwowar</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2007-07-17T13:13:40Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/425/version/1/files/npre2007425-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/2.5/</creativeCommons:license>
    </item>
  </channel>
</rss>
