PSORT-B Submit Sequences | Resources | Documentation | Contact  
PSORT-B Menu

Archived PSORTdb Datasets of Proteins of Known Localization

Important: This is not the current version of the PSORTdb dataset. The current version is available here.

PSORT-B v.1.1 was trained and evaluated using a dataset of Gram-negative proteins of experimentally-verified localization developed by the authors of the program, called S-LOC. For a detailed description of how the dataset was created, please see the PSORT-B paper (Gardy et al, 2003).

The dataset used in initial training and testing of the program as reported in the PSORT-B paper is given below as Version 1.0, and includes 1441 proteins of experimentally verified localization. PSORT-B was then updated using Version 1.1, which includes 1572 proteins. The proteins in this expanded dataset are the result of further Medline literature review, as well as examination of the ecosal.org literature on E. coli and S. typhimurium.

For each S-LOC dataset, a tab-delimited version of the full database is available, which can easily be imported into your favourite spreadsheet program. The database contains, for each protein:

  • SwissProt identifier OR NCBI GI Number (GI numbers are used for proteins without SwissProt records)
  • Experimentally verified subcellular localization
  • Protein description
  • Source organism
  • Sequence
  • Sequence Length

And, for proteins new to S-LOC Version 1.1:

  • Changes to the entry since Version 1.0
  • Source of the experimental verification (Literature Search, Proteomic Study, Ecosal.org Reference)
  • PMID of the reference reporting localization

FASTA format files are also available for both the full dataset and each localization site, and the information provided on the definition lines includes the protein's accession number, localization, and name.

If you make use of the S-LOC (PSORT-B) dataset in your research, please cite:

Jennifer L. Gardy, Cory Spencer, Ke Wang, Martin Ester, Gabor E. Tusnady, Istvan Simon, Sujun Hua, Katalin deFays, Christophe Lambert, Kenta Nakai and Fiona S.L. Brinkman (2003). PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria, Nucleic Acids Research 31(13):3613-17.

* Important note: The Version 1.0 dataset reported in the paper contains two proteins not included here: GenPept record AAF63437.1, an Outer Membrane/Extracellular protein which was removed from the final version of the dataset because it references a viral protein (likely a contaminant in the proteomics experiment the protein record was derived from), and SwissProt record Q51368, a Periplasmic/Inner Membrane protein which appeared twice in the original dataset.




[ Submit Sequences | Resources | Documentation | Contact ]