ProSpecTome Homepage

Introduction

ProSpecTome is a protein-specific corpus that is designed to facilitate the fair evaluation of protein name taggers. It has been compiled by re-annotating 243 MEDLINE abstracts from the widely-used JNLPBA evaluation corpus (available from the Evaluation Data link on this page).

The annotation guidelines used in the construction of ProSpecTome are very different to those used to contruct the JNLPBA corpus. For example, ProSpecTome incorporates two levels of specificity with regard to the category protein, with general references to proteins annotated separately from the names of individual proteins and protein families (see the ProSpecTome annotation guidelines for full details). Using both corpora together, a researcher can carry out a richer analysis of tagger performance than was previously possible.

ProSpecTome was constructed by Renata Kabiljo with the help of Diana Stoycheva under the supervision of Dr Adrian Shepherd. Inter-annontator agreement, assessed through the independent annotation of 43 (of the full 243) abstracts, was 0.89 (F-measure).

Downloads

The following files are available for download:

Contact

All enquiries should be directed to Renata Kabiljo (email r.kabiljo@mail.cryst.bbk.ac.uk).

Publications

If you use ProSpecTome in your research, please cite: