PeptideAtlas Overview
The long term goal of the PeptideAtlas project is full annotation
of eukaryotic genomes through a thorough validation of expressed proteins.
The PeptideAtlas provides a method and a framework to accommodate
proteome information coming from high-throughput proteomics
technologies. The online database
administers experimental data
in the public domain. We encourage you to
contribute to the database.
Details of the PeptideAtlas construction can be found within the first
publication.
Briefly, the general outline of obtaining high quality peptide sequences,
mapping, and storing in a database is shown in the figure below and outlined here.
A protein mixture sample is prepared (perhaps labeled, digested
with trypsin, purified, separated using chromotography).
The sample is run through a mass spectrometer (e.g., ESI MS/MS).
The MS/MS spectra are compared to theoretical spectra
to identify possible peptides (SEQUEST).
The peptide identifications are scored, formed into false and true
positive distributions, and subsequently filtered to retain
only the highest scoring identifications (PeptideProphet).
The peptide sequences are compared to protein sequence databases using
the NCBI BLAST program ( e.g. for human, we use the Ensembl protein
sequence database). As the peptides are identified in a given protein,
so are their locations relative to the protein start (CDS coordinates).
The search results are parsed for perfect pattern matches.
The peptide locations in chromosomal coordinates are calculated
from the CDS coordinates.
The data are stored in the SBEAMS database and can be accessed
through web pages
(see Browse Database). The peptides are assigned a unique identity of the form
PAp[8-digit number], such as PAp0000001 in our database, but can also
be found via their sequences.
The Human peptides can be browsed as a track within Ensembl's
Genome Browser. This functionality is available for the
Human_P0.9_Ens30_NCBI35.
The PeptideAtlas is stored using a database
schema which accommodates different builds
of PeptideAtlas, different versions of ENSEMBL, different organisms
(for example, human, fly, mouse), and different reference protein sequence sets
as starting material.

|