|
PSORT.org provides links to the PSORT family of programs for subcellular localization
prediction as well as other datasets and resources relevant to localization
prediction. The page is currently hosted by the Brinkman Laboratory
at Simon Fraser University, and our goal is to provide an open-source
resource centre for researchers interested in subcellular localization
prediction.
Please choose
from the
following PSORT programs for localization
prediction:
Locally hosted resources:
PSORTb and PSORTdb are
maintained by the Brinkman
Laboratory,
Simon Fraser University, British Columbia, Canada.
PSORT and
PSORT II are maintained by Kenta Nakai, at the Human Genome Center, Institute for Medical Science, University
of Tokyo, Japan. iPSORT is maintained by Hideo
Bannai at the Human Genome Center.
Other predictive methods, datasets and resources:
The following is a collection of links relevant to
subcellular localization prediction. If you would like to see a link
to a particular program or resource added to this page, please contact us.
At the bottom of the page, we have also provided a
suggested reading list containing selected review articles describing
SCL and SCL prediction.
Other prokaryotic subcellular localization
predictors (with web servers):
- PRED-TAT
(Bagos
et al, 2010) predicts TAT and Sec signal peptides.
- CW-PRED
(Litou
et al, 2008) predicts cell wall-attached proteins in Gram-positive
bacteria using HMM.
- PRED-SIGNAL
(Bagos
et al, 2009) predicts signal peptides for Archaea.
- PRED-LIPO
(Bagos
et al, 2008) predicts lipoprotein signal peptides of Gram-positive
bacteria using HMM.
- iLoc-Gneg
(Xiao
et al, 2011) uses Gene Ontology and sequence information to
predict 8 sites in Gram-negative bacteria
- NClassG+
(Restrepo-Montoya
et al, 2011) a sequence-based classifier for identifying non-classically
secreted Gram-positive bacterial proteins.
- Gpos-mPLoc
(Shen and
Chou, 2009) and Gneg-mPLoc
(Shen and
Chou, 2010) predict bacterial subcellular localization by using
gene ontology, functional domain, and sequential evolution.
- SOSUI-GramN
(Imai
et al, 2008) predicts Gram-negative localizations based on N-
and C-terminal signal sequences.
- Augur
(Billion
et al, 2006) is a computational pipeline for Gram-positive bacterial
whole-genome sufrace protein predictions.
- SubcellPredict
(Niu
et al, 2008) uses AdaBoost algorithm to predict cytoplasmic,
periplasmic and extracellular localizations sites for prokaryotic
organisms.
- P-classifier
(Wang
et al, 2005) predicts subcellular localizations of proteins
for Gram-negative bacteria based on amino acid subalphabets and
a combination of multiple support vector machines
- PSLDoc
(Chang et
al, 2008) uses document classification techniques and incorporates
a probabilistic latent semantic analysis with a support vector machine
model, for prediction on prokaryotes and eukaryotes.
- TBpred
(Rashid
et al, 2007) is a prediction server that predicts four subcellular
localization (cytoplasmic,integral membrane,secretory and membrane
attached by lipid anchor) of mycobacterial proteins.
- PSL101
(Su
et al, 2007) is a hybrid prediction method for Gram-negative
bacteria that combines a one-versus-one support vector machine(SVM)
model and a structure homology approach
- SLP-Local
(Matsuda
et al, 2005) predicts localizations for chloroplast, mitochondria,
secretory pathway, and other locations (nucleus or cytosol) for
eukaryotic proteins, as well as cytoplasm, extracell, and periplasm
for Gram negative organisms.
- Gpos-PLoc
(Shen
and Chou, 2007) and Gneg-PLoc
(Chou
and Shen, 2006) use K-nearest neighbor-based classifier to predict
localizations for Gram-positive and Gram-negative bacteria, respectively.
- CELLO version 2 (Yu
et al, 2006) uses a two-level Support Vector Machine system
to assign localizations to both prokaryotic and eukaryotic proteins.
Version 1 of the software is described in the Yu et al, 2004 paper.
- PSLpred
(Bhasin et al, 2005) is a localization prediction tool for
Gram-negative bacteria which utilizes support vector machine and
PSI-BLAST to generate predictions for 5 localization sites.
- Proteome Analyst's Subcellular Localization Server (Lu et al, 2004) This specialized server available at the PENCE
Proteome Analyst site is able to classify Gram-negative, Gram-positive,
fungi, plant and animal proteins to many localization sites. A database
of predictions is also available and is described below.
- LOCtree (Nair and Rost, 2005). LOCtree is a eukaryotic and prokaryotic
localization prediction tool available at the CUBIC site. Databases of localization predictions made by
CUBIC's servers are also available and are described below.
- SubLoc (Hua and Sun, 2001) uses Support Vector Machine to assign a
prokaryotic protein to the cytoplasmic, periplasmic, or extracellular
sites, and a eukaryotic protein to the cytoplasmic, mitochondrial,
nuclear, or extracellular sites. A modified version of SubLoc was
used in PSORT-B v.1.1 to differentiate cytoplasmic and non-cytoplasmic
proteins.
- SignalP 4.0 (Petersen
et al, 2011), (Bendtsen et al, 2004) predicts traditional N-terminal signal
peptides in both prokaryotic and eukaryotic proteins.
- TatP
(Bendtsen
et al, 2005) predicts twin-arginine signal peptides in bacteria.
- LipoP
(Juncker
et al,2003) uses HMM to predict lipoprotein signal peptides
in Gram-negative bacteria.
- Signal-BLAST
(Frank
and Sippl, 2008) uses BLAST to predict dignal peptides in bacteria.
Other prokaryotic subcellular localization prediction methods (without web servers):
- Wang
et al, 2011 predict protein SCL by pseudo amino acid composition
with a segment-weighted and features-combined approach.
- FFT-based SCL predictor (Wang
et al, 2007) is a fast Fourier transform-based support vector
machine for subcellular localization prediction using different
substitution models
- GNBSL (Guo
et al, 2006) generates subcellular localization prediction for
Gram negative bacteria using a combination of several different
SVM's based on the PSSM and PSFM generated from the input protein
- HensBC (Bulashevska
and Eils, 2006) predicts localizations by constructing a hierarchical
ensemble of classifiers, namely Bayesian classifiers based on Markov
chain models
Other eukaryotic subcellular localization predictors:
- PredSL
(Petsalaki
et al, 2006) uses neural networks, Markov chains and HMMs to
predict eukaryotic protein SCLs based on their N-terminal amino
acid sequences.
- PSCL
(Wang
et al, 2011) uses Interpro domains to predict plant protein
SCLs
- BCAR
SCL prediction (Yoon
and Lee, 2011) predicts plant, animal and fungal protein SCLs
by boosting association rules.
- SCLpred
(Mooney
et al, 2011) predicts SCLs for animals and fungi by N-to-1 neural
networks.
- Discriminative
HMMs (Lin
et al, 2011) predicts yeast SCLs using motifs that are present
in a compartment but absent in other, nearby, compartments by utilizing
an hierarchical structure that mimics the protein sorting mechanism.
- SecretP
(Yu
et al, 2010) predicts mammalian secreted proteins using PseAA
and SVMs
- TESTLoc
(Shen
and Burger, 2010) predicts 9 plant protein subcellular localizations
for EST-DNA input.
- PROlocalizer
(Laurila and
Vihinen, 2010) predicts 12 animal protein localization by integrating
11 methods together.
- Plant-mPLoc
(Shen and
Chou, 2010) predicts plant protein subcellular localization
by Gene Ontology, functional domain, and 3 modes of pseduo-amino
acid composition.
- YLoc
(Briesemeister
et al, 2010, Briesemeister et al, 2010) provides attributes
explanations for users and mutliple localization prediction capabilities
for animal, plant and fungal protein subcellular localizations.
- KnowPredsite
(Lin
et al, 2009) predicts single and multiple localizations based
on local similarity of proteins at different sites.
- SherLoc2
(Briesemeister
et al, 2009) predicts animal, plant and fungal protein subcellualr
localizations using sequence-based and text-based features.
- MultiLoc2
(Blum
et al, 2009) predicts animal, plant and fungal protein subcellularlocalizations
by integrating phylogeny and Gene Ontology terms to the new version
of the software.
- Hum-mPLoc
2.0 (Shen
and Chou, 2009) is an updated version of Hum-mPLoc.
- Signal-BLAST
(Frank
and Sippl, 2008) uses BLAST to predict dignal peptides in eukaryotes
and bacteria.
- RSLpred
(Kaundal
and Raghava, 2009) predicts subcellular localization of rice
(Oryza sativa) proteins.
- SubCellProt
(Garg
et al, 2009) uses k Nearest Neighbor (k-NN) and Probabilistic
Neural Network (PNN) to classify proteins into 11 subcellular localizations.
- ESLpred2
(Garg
and Raghava, 2008) is an updated version of ESLpred and can
predict localizations for animal, plant, and fungus proteins.
- AdaBoost
Learner (Jin
et al, 2008) predicts 12 eukaryotic localizations using the
AdaBoost algorithm.
- SubcellPredict
(Niu
et al, 2008) uses AdaBoost algorithm to predict cytoplasmic,
nuclear, mitochondrial, and extracellular localizations sites for
eukaryotic organisms.
- PSLDoc
(Chang et
al, 2008) uses document classification techniques and incorporates
a probabilistic latent semantic analysis with a support vector machine
model, for prediction on prokaryotes and eukaryotes.
- EpiLoc (Brady
and Shatkay, 2008) is a text-based system for predicting animal,
plant and fungal protein subcellular locations.
- ProLoc-GO
(Huang et
al, 2008) utilizes Gene Ontology terms for sequenced-based prediction
of subcellular localization.
- AAIndexLoc (Tantoso
and Li, 2007) predicts protein subcellular localization by using
amino acid composition and physicochemical properties.
- SLPFA
(Tamura
and Akutsu, 2007) predicts localizations by feature vectors
based on amino acid composition (frequency) and sequence alignment.
Subcellular locations predicted include chloroplast, mitochondria,
secretory pathway, and other locations (nucleus or cytosol) for
eukaryotic proteins
- YimLOC
(Shen
and Burger, 2007) integrates previously published subcellular
localization prediction tools using a stacked decision tree and
makes predictions for mitochondrial proteins.
- SLP-Local
(Matsuda
et al, 2005) predicts localizations for chloroplast, mitochondria,
secretory pathway, and other locations (nucleus or cytosol) for
eukaryotic proteins, as well as cytoplasm, extracell, and periplasm
for Gram negative organisms.
- SherLoc
(Shatkay
et al, 2007) intergrates several sequence and text-based features
and provides predictions for plant, animal, and fungal proteins.
- SLPS (Jia
et al, 2007), or Subcellular Localization Predicting System,
predicts localization using a Nearest Neighbor Algorithm (NNA) and
incorporating a protein functional domain profile.
- Hum-mPLoc
(Shen
and Chou, 2007) is a localization predictor specific for human
proteins. It uses an ensemble classifier that handles cases where
a human protein has multiple possible location sites.
- Hum-PLoc
(Chou
and Shen, 2006) uses a KNN classifier to predict localizations
of human proteins.
- Euk-mPLoc
(Chou
and Shen, 2007) (Chou
and Shen, 2010) is a general eukaryotic predictor which hybridizes
gene ontology information, functional domain information, and sequential
evolutionary information to predict eukaryotic protein subcellular
localization.
- Euk-PLoc
(Shen
et al, 2007) is a general eukarytoic predictor that uses KNN
(K-Nearest Neighbor)based algorithm to predict localizations.
- Plant-PLoc (Chou
and Shen, 2007) is a plant-specific predictor that uses KNN
algorithm to predict localizations.
- BaCelLo (Pierleoni
et al, 2006) is a predictor for five classes of eukaryotic subcellular
localization (secretory pathway, cytoplasm, nucleus, mitochondrion
and chloroplast) and it is based on different SVMs organized in
a decision tree.
- Protein Prowler version 1.2 (Hawkins
and Boden, 2006) uses a multi-layer classifer system for predicting
the subcellular localization of proteins based on their amino acid
sequence. It classifies eukaryotic targeting signals as secretory,
mitochondrion, chloroplast or other. Version 1.1 was originally
described in Boden and Hawkins, 2005 paper.
- pTARGET
(Guda
2006), (Guda
and Subramaniam, 2005) uses amino acid composition and localization-specific
Pfam domains to assign a eukaryotic protein to one of nine localization
sites.
- CELLO version 2 (Yu
et al, 2006) uses a two-level Support Vector Machine system
to assign localizations to both prokaryotic and eukaryotic proteins.
- Golgi
Localization Predictor (Yuan
and Teasdale, 2002) predicts Golgi Type II membrane proteins
and can discriminate between proteins destined for the Golgi apparatus
or other post-Golgi locations.
- pSLIP (Sarda et al, 2005) uses support vector machine and multiple
physiochemical properties of amino acids to assign a eukaryotic
protein to one of six localization sites.
- HSLpred (Bhasin et al, 2005) is a localization prediction tool for
human proteins which utilizes support vector machine and PSI-BLAST
to generate predictions for 4 localization sites.
- LOCSVMPSI (Xie
et al, 2005) is a eukaryotic localization prediction method
that incorporates evolutionary information into its predictions.
The method uses PSI-BLAST and support vector machine to generate
predictions for up to 12 localization sites.
- PSLT (Scott et al, 2004) is a Bayesian network-based method that
predicts human protein localization based on motif/domain co-occurence.
The tool is not yet available online, however its predictions for
9793 human proteins in SWISS-PROT are available for download from
the PSLT site.
- ESLPred (Bhasin and Raghava, 2004) uses Support Vector Machine and
PSI-BLAST to assign eukaryotic proteins to the nucleus, mitochondrion,
cytoplasm, or extracellular space.
- Proteome Analyst's Subcellular Localization Server (Lu et al, 2004) This specialized server available at the PENCE
Proteome Analyst site is able to classify Gram-negative, Gram-positive,
fungi, plant and animal proteins to many localization sites. A database
of predictions is also available and is described below.
- LOCtree (Nair and Rost, 2005). LOCtree is a eukaryotic and prokaryotic
localization prediction tool available at the CUBIC site. Databases of localization predictions made by
CUBIC's servers are also available and are described below.
- SecretomeP (Bendtsen et al, 2004) predicts eukaryotic proteins which are
secreted via a non-traditional secretory mechanism.
- SignalP (Bendtsen et al, 2004) predicts traditional N-terminal signal
peptides in both prokaryotic and eukaryotic proteins.
- SubLoc (Hua and Sun, 2001) uses Support Vector Machine to assign a
prokaryotic protein to the cytoplasmic, periplasmic, or extracellular
sites, and a eukaryotic protein to the cytoplasmic, mitochondrial,
nuclear, or extracellular sites. A modified version of SubLoc was
used in PSORT-B v.1.1 to differentiate cytoplasmic and non-cytoplasmic
proteins.
- TargetP (Emanuelsson et al, 2000) predicts the presence of signal peptides,
chloroplast transit peptides, and mitochondrial targeting peptides
for plant proteins, and the presence of signal peptides and mitochondrial
targeting peptides for eukaryotic proteins.
- Predotar is designed to predict the presence of mitochondrial
and plastid targeting peptides in plant sequences.
Other eukaryotic subcellular localization prediction methods (without web servers):
- GO-TLM (Mei
et al, 2011) uses a Gene Ontology transfer model to predict
eukaryotic protein SCLs.
- Wang
et al, 2011 predicts yeast protein SCL with frequent pattern
tree approach (FPT)
- Tian
et al, 2011 predict protein SCLs by combining PCA and WSVMs.
- Liao
et al, 2011 predict apoptosis protein SCLs with PseAAC by incorporating
tripeptide composition.
- M(3)-SVM (Yang
and Lu, 2010) uses an ensemble classifier that includes gene
ontology (GO) semantic information, amino acid composition with
secondary structure and solvent accessibility information to predict
SCLs.
- ngLOC (King
and Guda, 2007) uses an n-gram-based Bayesian classifier
that predicts the localization of a protein sequence over ten distinct
subcellular organelles. An enhanced version of ngLOC was developed
to estimate the subcellular proteomes of eight eukaryotic organisms:
yeast, nematode, fruitfly, mosquito, zebrafish, chicken, mouse,
and human.
Nucleus-specific localization predictors:
- NoD
(Scott
et al, 2011) predicts human nucleolar SCL using neural network
algorithm.
- SpectrumKernel+ (Mei
and Fei, 2010) predicts subnuclear localizations by embedding
into implicit size-varying motifs the multi-aspect amino acid physiochemical
properties captured by amino acid classification approaches.
- NLStradamus
(Nguyen
Ba et al, 2009) is a simple Hidden Markov Model for nuclear
localization signal prediction.
- Nuc-PLoc
(Shen
and Chou, 2007) is a web-server for predicting protein subnuclear
localization by fusing PseAA composition and PsePSSM.
- NUCLEO
(Hawkins
et al, 2007) predicts possible nuclear localization by taking
into consideration of dually localized proteins. It uses an SVM-based
approach with a custom kernel that employs a composite spectrum
(or multiple k-mer) encoding conjoined with a bit vector
indicating the presence or absence of a range of sequence motifs
known to be important for nuclear proteins.
- NucPred
(Brameier
et al, 2007) predicts possible nuclear localization by using
a genetic programming-based algorithm. Previous version was described
in Heddad et al, 2004 paper.
- ProLoc (Huang
et al, 2007) predicts subnuclear localizations using an evolutionary
SVM based classifier with automatic selection from a large set of
physicochemical composition (PCC) features.
- Subnuclear
Compartments Prediction System (Lei
and Dai, 2006), (Lei
and Dai, 2005) predicts subnuclear localization by combining
an SVM-based system for sequence analysis with a nearest-neighbor
classifier using a similarity measure derived from the GO annotation
terms for the protein sequences.
- NetNES (la
Cour et al, 2004) predicts nuclear export signals using neural
network and HMMs.
- predictNLS (Cokol et al, 2000) uses nuclear localization signal motifs
to predict whether a protein might be localized to the nucleus.
Viral protein subcellular localization predictors:
- Virus-mPLoc (Shen and Chou, 2010) predicts viral protein subcellular localization with the ability to predict multiple localizations for a protein.
- Virus-PLoc (Shen and Chou, 2007) predicts viral protein subcellular localization using a fusion of classifiers implemented with K-nearest neighbor rules and Swissprot annotated viral proteins as training data.
Other subcellular localization-related databases:
- OMPdb
(Tsirigos
et al, 2011) is a database of a comprehensive collection of beta-barrel
outer membrane proteins in Gram-negative bacteria.
- ExTopoDB
(Tsaousis
et al, 2011) is a database of experimentally derived topological
models of transmembrane proteins.
- LocDB
(Rastogi
and Rost, 2011) is a manually curated database with experimental
annotations for the subcellular localizations of proteins in Homo
sapiens and Arabidopsis thaliana.
- FGsub
(Sun
et al, 2010) is a website that contains SCL predictions results
for fungal pathogen Fusarium graminearum.
- CoBaltDB
(Goudenège
et al, 2010) is a database of prokaryotic subcellular localization
predictions that integrates the prediction results of many general
SCL predictors as well as specific signal sequence or cleavage site
predictors.
- LocateP-DB
(Zhou
et al, 2008) is a database of precomputed Gram-positive genomic
protein subcellular localization predictions.
- DBMLoc
(Zhang
et al, 2008) is a database of proteins with multiple subcellular
localizations.
- TOPDOM (Tusnady
et al, 2008) is a database of domains and sequence motifs located
consistently on the same side of the membrane in alpha-helical transmembrane
proteins.
- eSLDB
(Pierleoni
et al, 2007) collects the annotations of subcellular localizations
of eukaryotic proteomes based on experimental results, homology, and
computational predictions.
- SUBA
(Heazlewood
et al, 2007) is an Arabidopsis subcellular localization database
with annotations based on experimental results, literature references,
Swiss-Prot annotations, and computational predictions.
- FTFLP Database
(Li
et al, 2006) contains a collection of Arabidopsis protein localizations
verified using fluorescent tagging of full-length proteins.
- SPdb
(Choo
et al, 2005) is a signal peptide database containing a repository
of experimentally verified and predicted signal peptides.
- NESbase
(la
Cour et al, 2003) is a database with a collection of nuclear export
signals.
- LOCATE
(Sprenger
et al, 2007) (Fink
et al, 2006) is a database that houses data describing the membrane
organization and subcellular localization of human and mouse proteins.
-
-
PA-GOSUB ( Lu et al, 2005) is a database collecting the localization
predictions made by the Proteome Analyst tool.
-
-
-
-
DBSubLoc ( Guo et al, 2004): A dataset of proteins with annotated subcellular
localizations according to SWISS-PROT and PIR.
-
LOCtarget ( Nair and Rost, 2004) is a database of LOCtree predictions
for structural genomics targets. LOC3D ( Nair and Rost, 2003) is a database of predicted localizations
for eukaryotic proteins with 3D structures. LOCkey ( Nair and Rost, 2002) contains predicted localizations for
the human, Arabidopsis, fly, yeast and worm genomes based on Swiss-Prot
keywords. LOChom (2002) is a database of predicted localizations based
on homology to experimentally annotated proteins.
-
SignalP ( Nielsen et al, 1997): The dataset of prokaryotic and eukaryotic
secreted and non-secreted proteins used to train SignalP, and also
used to train PSORTb's signal peptide prediction module.
-
Signal Peptides ( Menne at al, 2000): The dataset of prokaryotic and eukaryotic
secreted and non-secreted proteins used in an independent evaluation
of several signal peptide prediction methods, and used to test PSORTb's
signal peptide prediction module
- STEPdb (Orfanoudaki et al.) A database of comprehensive characterization of sub-cellular localization and topology of the Escherichia coli proteome
Transmembrane alpha-helix predictors and membrane
prediction software:
- MemPype
(Pierleoni
et al, 2011) is a pipeline for identifying membrane-associated
proteins and discriminates types of membrane SCLs and topolgy for
eurkaryotic membrane proteins.
- TOPCONS
(Bernsel
et al, 2009) is a web server for consensus prediction of membrane
protein topology.
- SPOCTOPUS
(Viklund
et al, 2008) predicts signal peptides and transmembrane helices
as well as their topology.
- Philius
(Reynolds
et al, 2008) is an updated version of Phobius
- HMM-TM
(Bagos et
al, 2006) incorporates prior topological information in HMMs.
- THUMBUP,
UMDHMM(TMHP) and TUPS (Zhou
et al, 2005) are web-based toolkits for topology prediction
of transmembrane helical proteins.
- SVMtop
(Lo
et al, 2007)
- LIPS
(Adamian
and Liang, 2006)
- MEMSAT3
(Jones,
2007)
- PONGO
(Amico
et al, 2006)
- Localizome
(Lee
et al, 2006)
- SVMtm
(Yuan
et al, 2003)
- PRODIV-TMHMM
(Viklund
and Elofsson, 2004)
- PolyPhobius (Käll
et al, 2005)
- Phobius
(Käll
et al, 2007; Käll
et al, 2004)
- ConPredII (Arai et al, 2004)
- TMHMM (Krogh et al, 2001)
- HMMTOP (Tusnady and Simon, 1998) HMMTOP is used in all versions of
PSORTb.
- DAS (Cserzo et al, 1997)
- TMpred (Hofmann and Stoffel, 1993)
- SOSUI (Tokyo Univ. of Agriculture & Technology)
- TMAP (Karolinska Institut; Sweden)
- TopPred 2 (Pasteur Institute)
Beta-barrel outer membrane protein
predictors:
- ConBBPRED
(Bagos
et al, 2005) is a consensus predictor for beta barrel outer mebmrane
protein.
- TMBETA-GENOME
(Gromiha
et al, 2007)
- PROFtmb
(Bigelow
and Rost, 2006); first described in Bigelow
et al, 2004 paper.
- transFold
(Waldispuhl
et al, 2006), (Waldispuhl
et al, 2006)
- TMBETA-SVM (Park
et al, 2005)
- TMB-Hunt (Garrow et al, 2005)
- TMBETA-NET (Gromiha et al, 2005)
-
- Pred-TMBB (Bagos et al, 2004)
|