Document information

hdl:10101/npre.2008.1760.1
Document Type:
Manuscript
Date:
Received 03 April 2008 16:51 UTC; Posted 03 April 2008
Subjects:
Evolution and Ecology, Bioinformatics
Tags:
Abstract:

A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular taxonomic and phylogenetic research means that a lot of information in public sequence databases is not linked to formal taxonomic names. This review explores the use of other identifiers, such as specimen codes and GenBank accession numbers, to link otherwise disconnected facts in different databases. The structure of these links can also be exploited using the PageRank algorithm to rank the results of searches on biodiversity databases. The key to rich integration is a commitment to deploy and reuse globally unique, shared identifiers (such as DOIs and LSIDs), and the implementation of services that link those identifiers.

Discussion

Votes:

3 votes

(Login to vote)

Comments:

0 comments

(Login to post a comment)

(Login to share with a colleague)

Additional information

License:
This document is licensed to the public under the Creative Commons Attribution 3.0 License
How to cite this document:

Page, Roderic. Biodiversity informatics: the challenge of linking data and the role of shared identifiers. Available from Nature Precedings <http://hdl.handle.net/10101/npre.2008.1760.1> (2008)

Version info:

Other versions of this document in Nature Precedings

None.

Other versions of this document elsewhere on the web

  • 10.1093/bib/bbn022 (Peer Reviewed): Briefings in Bioinformatics (2008); published by Oxford University Press.

Participate

Advertisement