<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/" version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/">
  <channel>
    <title>Nature Precedings - Heather Piwowar</title>
    <link>http://precedings.nature.com/users/a56378f9b7e953b713da4222e8318e54/</link>
    <description>Documents posted by Heather Piwowar</description>
    <dc:publisher>Nature Publishing Group</dc:publisher>
    <dc:language>en</dc:language>
    <prism:publicationName>Nature Precedings</prism:publicationName>
    <image>
      <title>Nature Precedings</title>
      <url>http://precedings.nature.com/images/header_logo.gif</url>
      <link>http://precedings.nature.com</link>
    </image>
    <atom:link type="application/rss+xml" rel="self" href="http://precedings.nature.com/users/a56378f9b7e953b713da4222e8318e54/feed"/>
    <item>
      <title>Identifying Data Sharing in Biomedical Literature</title>
      <link>http://precedings.nature.com/documents/1721/version/2</link>
      <description>Many policies and projects now encourage investigators to share their raw research data with other scientists. Unfortunately, it is difficult to measure the effectiveness of these initiatives because data can be shared in such a variety of mechanisms and locations. We propose a novel approach to find shared datasets: using NLP techniques to identify declarations of dataset sharing within the full text of primary research articles. Using regular expression patterns and machine learning algorithms on open access biomedical literature, our system was able to identify 61% of articles with shared datasets with 80% precision. A simpler version of our classifier achieved higher recall (86%), though lower precision (49%). We believe our results demonstrate the feasibility of this approach and hope to inspire further study of dataset retrieval techniques and policy evaluation.</description>
      <guid>http://precedings.nature.com/documents/1721/version/2</guid>
      <pubDate>Mon, 04 Aug 2008 20:32:00 UTC</pubDate>
      <dc:title>Identifying Data Sharing in Biomedical Literature</dc:title>
      <dc:identifier>hdl:10101/npre.2008.1721.2</dc:identifier>
      <dc:date>2008-08-04</dc:date>
      <dc:creator>Heather Piwowar</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2008-08-04T20:32:00Z</prism:publicationDate>
      <prism:category>Manuscript</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/1721/version/2/files/npre20081721-2.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Envisioning a data reuse registry</title>
      <link>http://dx.doi.org/10.1038/npre.2008.2152.1</link>
      <description>Repurposing research data holds many benefits for the advancement of biomedicine, yet is very difficult to measure and evaluate. We propose a data reuse registry to maintain links between primary research datasets and studies that reuse this data. Such a resource could help recognize investigators whose work is reused, illuminate aspects of reusability, and evaluate policies designed to encourage data sharing and reuse.</description>
      <guid>http://dx.doi.org/10.1038/npre.2008.2152.1</guid>
      <pubDate>Mon, 04 Aug 2008 20:13:00 UTC</pubDate>
      <dc:title>Envisioning a data reuse registry</dc:title>
      <dc:identifier>doi:10.1038/npre.2008.2152.1</dc:identifier>
      <dc:date>2008-08-04</dc:date>
      <dc:creator>Heather A. Piwowar</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2008-08-04T20:13:00Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/2152/version/1/files/npre20082152-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>A review of journal policies for sharing research data</title>
      <link>http://precedings.nature.com/documents/1700/version/1</link>
      <description>Background:  Sharing data is a tenet of science, yet commonplace in only a few subdisciplines.  Recognizing that a data sharing culture is unlikely to be achieved without policy guidance, some funders and journals have begun to request and require that investigators share their primary datasets with other researchers.  The purpose of this study is to understand the current state of data sharing policies within journals, the features of journals which are associated with the strength of their data sharing policies, and whether the strength of data sharing policies impact the observed prevalence of data sharing. Methods:  We investigated these relationships with respect to gene expression microarray data in the journals that most often publish studies about this type of data.  We measured data sharing prevalence as the proportion of papers with submission links from NCBI&amp;#8217;s Gene Expression Omnibus (GEO) database.  We conducted univariate and linear multivariate regressions to understand the relationship between the strength of data sharing policy and journal impact factor, journal subdiscipline, journal publisher (academic societies vs. commercial), and publishing model (open vs. closed access).Results:  Of the 70 journal policies, 18 (26%) made no mention of sharing publication-related data within their Instruction to Author statements.  Of the 42 (60%) policies with a data sharing policy applicable to microarrays, we classified 18 (26% of 70) as moderately strong and 24 (34% of 70) as strong.Existence of a data sharing policy was associated with the type of journal publisher:  half of all commercial publishers had a policy compared to 82% of journals published by academic society.  All four of the open-access journals had a data sharing policy. Policy strength was associated with impact factor:  the journals with no data sharing policy, a weak policy, and a strong policy had respective median impact factors of 3.6, 4.5, and 6.0.  Policy strength was positively associated with measured data sharing submission into the GEO database:  the journals with no data sharing policy, a weak policy, and a strong policy had median data sharing prevalence of 11%, 19%, and 29% respectively.Conclusion:  This review and analysis begins to quantify the relationship between journal policies and data sharing outcomes and thereby contributes to assessing the incentives and initiatives designed to facilitate widespread, responsible, effective data sharing. </description>
      <guid>http://precedings.nature.com/documents/1700/version/1</guid>
      <pubDate>Thu, 20 Mar 2008 21:00:20 UTC</pubDate>
      <dc:title>A review of journal policies for sharing research data</dc:title>
      <dc:identifier>hdl:10101/npre.2008.1700.1</dc:identifier>
      <dc:date>2008-03-20</dc:date>
      <dc:creator>Heather A. Piwowar</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2008-03-20T21:00:20Z</prism:publicationDate>
      <prism:category>Manuscript</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/1700/version/1/files/npre20081700-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Prevalence and Patterns of Microarray Data Sharing</title>
      <link>http://dx.doi.org/10.1038/npre.2008.1701.1</link>
      <description>Sharing research data is a cornerstone of science. Although many tools and policies exist to encourage data sharing, the prevalence with which datasets are shared is not well understood. We report our preliminary results on patterns of sharing microarray data in public databases.The most comprehensive method for measuring occurrences of public data sharing is manual curation of research reports, since data sharing plans are usually communicated in free text within the body of an article. Our early findings from manual curation of 100 papers suggest that 30% of investigators publicly share their full microarray datasets. Of these, 70% of the datasets are deposited at NCBI&amp;#8217;s Gene Expression Omnibus (GEO) database, 20% at EBI&amp;#8217;s ArrayExpress, and 10% in smaller databases or lab or publisher websites.Next, we supplemented this manual process with a rough automated estimate of data sharing prevalence. Using PubMed, we identified research articles with MeSH terms for both &amp;#8220;Gene Expression Profiling&amp;#8221; and &amp;#8220;Oligonucleotide Array Sequence Analysis&amp;#8221; and published in 2006. We then searched GEO and ArrayExpress for links to these PubMed IDs to determine which of the articles had been credited as an originating data source.Of the 2503 articles, 440 (18%) articles had links from either GEO or ArrayExpress. Of these 440 articles, 70% had links from GEO and 30% from ArrayExpress, with an overlapping 12% from both GEO and ArrayExpress.Interestingly, studies with free full text at PubMed were twice (Odds Ratio=2.1; 95% confidence interval: [1.7 to 2.5]) as likely to be linked as a data source within GEO or ArrayExpress than those without free full text. Studies with human data were less likely to have a link (OR=0.8 [0.6 to 0.9]) than studies with only non-human data. The proportion of articles with a link within these two databases has increased over time: the odds of a data-source link for studies was 2.5 [2.0 to 3.1] times greater for studies published in 2006 than 2002.As might be expected, studies with the fewest funding sources had the fewest data-sharing links: only 28 (6%) of the 433 studies with no funding source were listed within GEO or ArrayExpress. In contrast, studies funded by the NIH, the US government, or a non-US government source had data-sharing links in 282 of 1556 cases (18%), while studies funded by two or more of these mechanisms were listed in the databases in 130 out of 514 cases (25%).In summary, our initial manual approach for identifying studies which shared their data was comprehensive but time-consuming; natural language processing techniques could be helpful. Our subsequent automated approach yielded conservative estimates for total data sharing prevalence, nonetheless revealing several promising hypotheses for data sharing behaviorWe hope these preliminary results will inspire additional investigations into data sharing behavior, and in turn the development of effective policies and tools to facilitate this important aspect of scientific research.</description>
      <guid>http://dx.doi.org/10.1038/npre.2008.1701.1</guid>
      <pubDate>Thu, 20 Mar 2008 17:08:03 UTC</pubDate>
      <dc:title>Prevalence and Patterns of Microarray Data Sharing</dc:title>
      <dc:identifier>doi:10.1038/npre.2008.1701.1</dc:identifier>
      <dc:date>2008-03-20</dc:date>
      <dc:creator>Heather A. Piwowar</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2008-03-20T17:08:03Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/1701/version/1/files/npre20081701-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Examining the uses of shared data</title>
      <link>http://dx.doi.org/10.1038/npre.2007.425.3</link>
      <description>Does your research area re-use shared datasets?*   Re-using data has many benefits, including research synergy and efficient resource use*   Some research areas have tools, communities, and practices which facilitate re-use*   Identifying these areas will allow us to learn from them, and apply the lessons to areas which underutilize the sharing and re-purposing of scientific data between investigators    Which datasets?This preliminary analysis examines the re-use of microarray gene expression datasets. Thousands of microarray gene expression datasets have been deposited in publicly available databases. Many studies reuse this data, but it is not well understood for what purposes.  Here, we examined all publications found in PubMed Central on April 1, 2007 whose full-text contained the phrases &#8220;microarray&#8221; and &#8220;gene expression&#8221; to find studies which re-used microarray data.    How did we identify re-use?We developed prototype machine-learning classifiers to identify a) studies containing original microarray data (n=900) and b) studies which instead re-used microarray data (n=250).  Preprocessing (Python NLTK) extracted manually-selected keyword frequencies from the full-text publications as features for a Support Vector Machine (SVMlite).  The classifier was trained and tested on a manually-labeled set of documents (PLoS articles prior to January 2007 containing the word &#8220;microarray,&#8221; n=200).    How did we identify patterns of re-use?We compared the Medical Subject Heading (MeSH) of the two classes to estimate the odds that a specific MeSH term would be used given all studies with original microarray data, compared to the odds of the same term describing studies with re-used data.  Terms were truncated to comparable levels in the MeSH hierarchy.    ResultsPublications with original vs. re-used microarray data have different distributions of MeSH terms (Figure 1), and occur in different proportions across various journals (Figure 2).     Microarray data source (original vs. re-used) did not affect the odds of a study focusing on humans, mice, or invertebrates, whereas publications with re-used data did involve a relatively high proportion of studies involving fungi (odds ratio (OR)=2.4), and a relatively low proportion involving rats, bacteria, viruses, plants, or genetically-altered or inbred animals (OR     Trends in odds ratios of MeSH terms for other attributes can be seen in Figure 3.    HopeAlthough not all research topics can be addressed by re-using existing data, many can.  Identifying areas with frequent re-use can highlight best practices to be used when developing research agendas, tools, standards, repositories, and communities in areas which have yet to receive major benefits from shared data.      Future WorkWe plan to refine our tool for identifying studies which re-use data, and continue studying and measuring re-use and reusability.</description>
      <guid>http://dx.doi.org/10.1038/npre.2007.425.3</guid>
      <pubDate>Wed, 18 Jul 2007 13:26:38 UTC</pubDate>
      <dc:title>Examining the uses of shared data</dc:title>
      <dc:identifier>doi:10.1038/npre.2007.425.3</dc:identifier>
      <dc:date>2007-07-18</dc:date>
      <dc:creator>Heather A. Piwowar</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2007-07-18T13:26:38Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/425/version/3/files/npre2007425-3.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/2.5/</creativeCommons:license>
    </item>
    <item>
      <title>Sharing Detailed Research Data Is Associated with Increased Citation Rate</title>
      <link>http://dx.doi.org/10.1038/npre.2007.361.1</link>
      <description>Presentation based on the publication here:Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available.We examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression.This correlation between publicly available data and increased literature impact may further motivate investigators to share their detailed research data.</description>
      <guid>http://dx.doi.org/10.1038/npre.2007.361.1</guid>
      <pubDate>Thu, 05 Jul 2007 13:07:43 UTC</pubDate>
      <dc:title>Sharing Detailed Research Data Is Associated with Increased Citation Rate</dc:title>
      <dc:identifier>doi:10.1038/npre.2007.361.1</dc:identifier>
      <dc:date>2007-07-05</dc:date>
      <dc:creator>Heather A. Piwowar</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2007-07-05T13:07:43Z</prism:publicationDate>
      <prism:category>Presentation</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/361/version/1/files/npre2007361-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/2.5/</creativeCommons:license>
    </item>
  </channel>
</rss>
