![]() |
Submit Sequences | Resources | Documentation | Contact |
| Archived PSORTdb Datasets of Proteins of Known Localization Important: This is not the current version of the PSORTdb dataset. The current version is available here. PSORT-B v.1.1 was trained and evaluated using a dataset of Gram-negative proteins of experimentally-verified localization developed by the authors of the program, called S-LOC. For a detailed description of how the dataset was created, please see the PSORT-B paper (Gardy et al, 2003). The dataset used in initial training and testing of the program as reported in the PSORT-B paper is given below as Version 1.0, and includes 1441 proteins of experimentally verified localization. PSORT-B was then updated using Version 1.1, which includes 1572 proteins. The proteins in this expanded dataset are the result of further Medline literature review, as well as examination of the ecosal.org literature on E. coli and S. typhimurium. For each S-LOC dataset, a tab-delimited version of the full database is available, which can easily be imported into your favourite spreadsheet program. The database contains, for each protein:
And, for proteins new to S-LOC Version 1.1:
FASTA format files are also available for both the full dataset and each localization site, and the information provided on the definition lines includes the protein's accession number, localization, and name. If you make use of the S-LOC (PSORT-B) dataset in your research, please cite:
* Important note: The Version 1.0 dataset reported in the paper contains two proteins not included here: GenPept record AAF63437.1, an Outer Membrane/Extracellular protein which was removed from the final version of the dataset because it references a viral protein (likely a contaminant in the proteomics experiment the protein record was derived from), and SwissProt record Q51368, a Periplasmic/Inner Membrane protein which appeared twice in the original dataset.
|