PeptideAtlas
PeptideAtlas Home
 Seattle Proteome
 Center

  
PeptideAtlas:
  Overview
  Contacts
  Data Contributors
  Publications
  Software
  Database Schema
  Feedback
  FAQ

Atlas Data:
  Data Repository
  Human Plasma
(Farrah, et al.)

  HPPP Data Central
  PeptideAtlas Builds
  Search Database
  

  Contribute Data
  Genome Browser
Setup


Related:
  SRMAtlas
  PASSEL
  Phosphopep
  Unipep
  mspecLINE

Spectral Libs:
  Libraries + Info
  SpectraST Search


Glossary/Terms:
  Atlas nomenclature
  SGD nomenclature
  Protein ID terms

  
  LOGIN

Protein Identification Terminology used in PeptideAtlas

Each PeptideAtlas build is associated with a reference database -- usually a combination of several protein sequence databases (Swiss-Prot, IPI, Ensembl ...) for the species plus a database of contaminants. From the reference database, any protein that contains any observed peptide is considered to be a member of the Atlas. It is easy to see that the entire list of proteins in an Atlas is going to be highly redundant. Thus, we label each Atlas protein using the terminology below.

The term '''observed peptides''' in this context refers to the set of peptides in the PeptideAtlas build. These peptides are selected using a PSM (peptide spectrum match) FDR threshold applied to each experiment separately. (In older builds, peptides were selected using a probability cutoff to all PSMs for the Atlas.)

A new implementation of the PeptideAtlas Browse Proteins tab was released in December 2009. This implementation allows you to select proteins based on the terminology below.

Protein Presence Levels

Taken together, the set of proteins with a Presence Level label for any Atlas has the property that no two members share exactly the same set of observed peptides.

Label Technical definition Practical definition
Canonical From each ProteinProphet protein_group, the protein with the highest probability is selected to be canonical. Then, recursively, any other protein in that group which shares fewer than 80% of its peptides with any other canonical from that group is also labeled canonical. During this selection process, each set of indistinguishable proteins is considered to be a single entity, and the one from that set with the most preferred identifier (for human and mouse, Swiss-Prot primary splice variant) is the one labeled canonical. The set of canonical proteins is a minimal, non-redundant list of proteins derived from the set of identified peptides for an Atlas. The number of canonicals is what we use as the protein count for the Atlas build.
Possibly Distinguished From each ProteinProphet protein_group, any protein that is not canonical and not subsumed is labeled possibly_distinguished. As above, from among any set of indistinguishable proteins, only one (the one with the most preferred identifier) is labeled possibly_distinguished. The set of canonical proteins plus possibly_distinguished proteins is a more inclusive, but also non-redundnat, list of proteins derived from the set of identified peptides for an Atlas. The canonical list will not explain all observed peptides, but the combined canonical plus possibly_distinguished list '''will''' explain all observed peptides.
Subsumed Any protein labeled subsumed by ProteinProphet. As above, from among any set of indistinguishable proteins, only one (the one with the most preferred identifier) is labeled subsumed. A protein whose observed peptides are a subset of the observed peptides of a canonical or possibly distinguished protein is considered subsumed. For any pair of subsuming/subsumed proteins, it is possible that both have been observed, but it is more conservative to claim that only the subsuming has been observed. Subsumed proteins are not necessary to explain all observed peptides.
NTT-Subsumed Any protein that is possibly_distinguished by the above definition, but whose peptides differ from a canonical only by the number of tryptic terminii. Proteins that are ntt-subsumed contain exactly the same set of observed peptides as a canonical protein, but at least one of those peptides has fewer tryptic terminii in the ntt-subsumed protein. For any pair of possibly_distinguished/ntt-subsumed proteins, it is much more likely that the possibly_distinguished has been observed, because that will be the one with the greater number of tryptic terminii among its peptides.

Protein Redundancy Labels

Proteins that are in the Atlas reference set that are redundant to proteins with the above labels are given the labels below.

Label Technical definition Practical definition
Indistinguishable Indistinguishable from a protein with a Presence Level label, according to ProteinProphet Exactly the same peptides from this protein have been observed as have been observed for a protein with a Presence Level label.
Identical Identical in sequence to a protein with any other label. Identical in sequence to a protein with any other label.
No label, no presence level Identified only by semi-tryptic or non-tryptic peptides. Identified only by semi-tryptic or non-tryptic peptides.

© 2004-2014, Institute for Systems Biology. All Rights Reserved
Project funding: