Proteomics
Proteomics
Proteomics is the science of studying the multitude of proteomes found in living organisms. A proteome is the entire collection of proteins expressed by a genome or in a tissue. The contents of a proteome can differ in various tissue types, and it can change as a result of aging, disease, drug treatment, or environmental effects.
This is contrary to the concept of a genome, which is an organism's complete collection of DNA. A genome's composition remains more or less constant from tissue to tissue, except for mutations and polymorphisms that can occur.
The word "proteome" was first coined in late 1994. By 1997 there were a number of research conferences focusing on proteomics.
According to the first draft of the human genome, based on the work by the Human Genome Project and by Celera Inc., there are only between thirty thousand and seventy thousand genes in the human genome, many fewer than had been estimated previously. However, as of 2002 there were still groups that believed that there are at least 120,000 genes. Regardless of which of these estimates proves more accurate, the number of potential proteins in the human proteome is quite large. Although the first draft of the human genome reduced the estimates for the total number of human genes, it also predicted a greater amount of alternative splicing of genes, and therefore more distinct protein products per gene, than had been anticipated.
At its simplest level, proteomics is the study of protein expression in a proteome, or trying to understand the relative levels (amounts) of each protein within the mixture. Proteomics attempts to characterize proteins, compare variations in their expression levels in normal and disease states, study their interactions with other proteins, and identify their functional roles.
Unlike the traditional approach of studying individual proteins one at a time, proteomics uses an automated, high-throughput approach. High-throughput refers to the number of items (in this case, proteins) that can be analyzed or studied per unit of time. New technologies and substantial bioinformatics tools are required to compare entire proteomes. Expansion of the field of proteomics into the realm of "big science" (meaning many dollars invested by a large number of companies and universities) is several years behind the expansion of genomics. This is primarily because proteins are more difficult to work with in a laboratory setting than are nucleic acids such as DNA.
The development of protein analysis technologies is more difficult than the development of DNA analysis technologies for three reasons. First, the basic alphabet for encoding proteins consists of twenty amino acids, whereas there are only four different nucleotides, the alphabet of DNA. Second, the messenger RNA (mRNA) for some genes can be differentially spliced, meaning that multiple messages can be made from a single gene, resulting in multiple, distinct protein products. Finally, many proteins are modified once they have been synthesized. This is known as post-translational modification. There are a number of types of post-translational modifications, such as the addition of sugar, phosphate, sulfate, lipid, acetyl, or methyl groups. Each of these modifications has the ability to change the functional activity of a protein.
The above issues have made the elucidation of reliable, high-throughput techniques for characterizing proteins, including their expression levels, on a proteome-wide level a major challenge. Hence, techniques for doing, for example, high-throughput DNA sequencing and gene expression studies have been developed and commercialized on a large scale sooner than similar protein analysis techniques. This is not to imply that all of the techniques involved in proteomics are new. Some, such as two-dimensional gel electrophoresis , have been around since the 1970s. However, the need to adapt these techniques to a large "proteome" scale brings with it a unique set of challenges.
For researchers involved in areas such as drug discovery, proteomics approaches will need to be used to obtain a greater understanding of disease mechanisms and drugs' mechanisms of action. Large-scale studies looking at gene expression via quantification of mRNA abundance are already possible and well commercialized. These technologies are very powerful, and the highest throughput approaches are capable of analyzing tens of thousands of genes per experiment. Sophisticated bioinformatics systems have been, and continue to be, developed to analyze these vast amounts of data. However, studies have shown that mRNA levels do not necessarily correlate well with protein levels.
Researchers must understand proteins and their roles, since proteins are the functional units within cells. As of 2002, the vast majority of drug targets were proteins. There are a handful of drugs, including some chemotherapeutic agents, that bind to DNA, but most drugs bind to specific protein targets. In the cases where the target is a protein, the drugs themselves are primarily small inorganic molecules or, in some cases, small proteins, such as hormones , that bind to a larger protein target in the body. Some drugs are actually therapeutic proteins that are delivered to the site of the disease.
Laboratory Techniques
The primary attributes used to identify proteins include the protein's mass and apparent mass, its isoelectric point, and its N-and C-terminal sequence tags. A protein's mass and its apparent mass are probably the most common characteristics used. Protein mass is determined by adding the total mass of all the amino acids in the protein to the mass of any molecules added through post-translational modification. A protein's isoelectric point is the pH at which it is neutrally charged. A protein's N-and C-terminal sequence tags are short sequences of amino acids on either end of the protein. Since there are twenty different possible amino acids at each position in a protein, a peptide of only four or five amino acids in length is likely to be unique to a specific protein. There are 160,000 (204) combinations of sequences that are four amino acids long.
The most commonly used laboratory techniques in proteomics are two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) and mass spec-trometry. These techniques have been modified for use in proteomics. Both can be used in combination with more traditional protein separation techniques, including column chromatography.
Starting in the late 1990s, several companies also started developing "protein chips," another strategy for studying proteomes and other complex protein mixtures. These chips allow a researcher to collect minute quantities of proteins that bind to specific molecules on their surface. By 2001, some companies announced they were developing "antibody chips" onto which antibodies will be attached. The antibodies can then be used as probes to capture and quantify specific proteins found in complex mixtures.
The use of 2-D PAGE allows the simultaneous separation of thousands of proteins, and the technique is still a key tool in proteomics technologies. The first dimension of protein separation on the gel is by isoelectric focusing, in which proteins are separated along a pH gradient until they reach a stationary position, where their net charge is zero.
The second dimension of separation on the gel is by molecular mass. Sodium dodecyl sulphate (SDS) is applied, and it binds to all the proteins. This provides the proteins with a uniform charge along their length, so that they will migrate across the gel according to their molecular mass when a current is applied. After the 2-D PAGE is run, the gel is stained. The result is a two-dimensional map consisting of hundreds or thousands of protein spots.
Since the early use of 2-D PAGE in the early 1970s, a number of modifications have been made to make gels more reproducible and more amenable to the higher-throughput use necessary for proteomics applications. However, 2-D PAGE is still something of an art form, and high-quality, reproducible results are difficult to obtain except in the hands of very experienced users. The technology needs to be further simplified to allow casual and novice users to obtain reproducible, quality results.
Mass spectrometry is an analytical technique that very accurately measures the mass of proteins and peptides . There are two common types of mass spectrometry. The first type, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry, can be used to analyze proteins that are embedded in solid samples and measures their mass in a flight tube. The second type, electrospray ionization mass spectrometry, can be used to analyze proteins that are in a liquid solution and measures their mass in either a flight tube or in a device known as a quadrupole. There are also other variations on these techniques.
Mass spectrometry is commonly used for peptide mass fingerprinting. In this process, a protein sample is isolated by 2-D PAGE and cut with an enzyme that specifically targets particular amino acids. Mass spectrometry is used to measure the masses of the resulting cut pieces, or peptides. These masses can be thought of as a fingerprint that can be compared to the fingerprints of proteins whose amino acid sequences have already been analyzed and stored in a database.
To determine the fingerprints of proteins that have already been sequenced, a computer program determines the amino acid composition, and thus the masses, of the pieces that would result if those proteins were also cut by the same enzyme. A list of proteins is generated from the database, sorted by how many peptides they share with the unknown experimental protein.
There are also technologies, including the yeast two-hybrid system, that can be used to study interactions between proteins. These approaches complement 2-D PAGE and mass spectrometry data by helping to elucidate functional cellular pathways.
Databases and Computational Approaches
There is an ever-increasing number of protein and proteome databases being developed. The most comprehensive information about specific proteins is found in databases that store protein sequences. One of the first and probably the best known such database is SWISS-PROT, which was created in 1986.
SWISS-PROT is a curated database that provides not only protein sequences but also such information as descriptions of a protein's function, its domain structure, and post-translational modifications, as well as links to other related databases. Other sequence-based protein databases include the Yeast Proteome Database and Human PSD.
There are also a number of widely used pattern and profile databases that are used to reveal relationships among proteins based on the presence of particular groups of amino acids in the proteins' sequences. Such groups, known as patterns, motifs, domains, signatures, or fingerprints, are found in specific regions of proteins that are important to some function of the protein. They could be in an area that performs some type of enzymatic activity or that is the site of a certain post-translational modification. Both their sequence and structure are typically well conserved. Some of the best known pattern and profile databases are: PROSITE, Pfam, PRINTS, and BLOCKS.
see also Alternative Splicing; Bioinformatics; Gel Electrophoresis; Genome; Human Genome Project; Mass Spectrometry; Post-translational Control; Proteins.
Anthony J. Recupero
Bibliography
Wilkins, Marc R., et al. eds. Proteome Research: New Frontiers in Functional Genomics. New York: Springer-Verlag, 1997.
Proteomics
Proteomics
Proteomics is a discipline of microbiology and molecular biology that arose from the gene sequencing efforts that culminated with the publication of the sequence of the human genome in 2001. In addition to the human genome, sequences of disease-causing bacteria and other microorganisms continue to be deduced. Although fundamental, knowledge of the sequence of nucleotides that comprise deoxyribonucleic acid reveals only a portion of the protein structure encoded by the DNA. Proteins are an essential element of bacterial structure and function; for example, a variety of protein enables a microorganism to establish and maintain an infection. Thus, knowledge of the three-dimensional structure and associations of proteins is vital for a full understanding of microorganism behavior and operation. Proteomics is an approach to unravel the structure and function of proteins.
The word proteomics is derived from PROTEin complement to a genOME. Essentially, this is the spectrum of proteins that are produced from the template provided by an organism’s genetic material under a given set of conditions. Different proteins can be produced under different environmental conditions. Proteomics compares the protein profiles of proteomes under different conditions in order to unravel biological processes.
The origin of proteomics dates back to the identification of the double-stranded structure of DNA by Watson and Crick in 1953. More recently, the development of the techniques of protein sequencing and gel electrophoresis in the 1960s and 1970s provided the technical means to probe protein structure. In 1986, the first protein sequence database was created (SWISS-PROT, located at the University of Geneva). By the mid-1990s, the concept of the proteome and the discipline of proteomics were well established. The power of proteomics was manifest in March 2000, when the complete proteome of a whole organism was published, that of the bacterium Mycoplasma genitalium
Proteomics research often involves the comparison of the proteins produced by a bacterium (example, Escherichia coli ) grown at different temperatures, or in the presence of different food sources, or a population grown in the lab versus a population recovered from an infection. Escherichia coli responds to changing environments by altering the proteins it produces. However, the full extent of the various alterations and their molecular bases are largely unknown. Proteomics research essentially attempts to provide a molecular explanation for bacterial behavior.
Proteomics can be widely applied to research of diverse microbes. For example, the yeast Saccharomyces cerevisiae is being studied to reveal the proteins produced and their functional associations with one another.
The task of sorting out all the proteins that can be produced by a bacterium or yeast cell is formidable. Targeting of the research effort is essential. For example, the comparison of the protein profile of a bacterium obtained directly from an infection (in vivo) with populations of the same microbe grown under defined conditions in the lab (in vitro) could identify proteins that are unique to the infection. Some of these could become targets for diagnosis, therapy, or for prevention of the infection.
The study of proteins is difficult. The amount of protein cannot be amplified as easily as can the amount of DNA, making the detection of minute amounts of protein challenging. The structure of proteins must be maintained, which can be difficult. For example, enzymes, heat, light, or the energy of mixing can break down some proteins.
With the advent of DNA chips, the expression of thousands of genes can be monitored simultaneously. But DNA is static. It exists and is either expressed or not. Moreover, the expression of a protein does not necessarily mean that the protein is active. Also, proteins can be modified after being produced. Proteins can adopt different shapes, which can determine different functions and levels of activity after they have been produced. These functions provide the structural and operational framework for the life of the bacterium. Proteomics represents the next step after gene expression analysis.
Proteomics utilizes various techniques to probe protein expression and structure. The migration of proteins can depend on their net charge and on the size of the protein molecule. When these migrations are in two dimensions, as in 2-D polyacrylamide gel electrophoresis, thousands of proteins can be distinguished in a single experiment. A technique called mass spectrometry analyzes a trait of proteins known as the mass-to-charge ratio, which essentially enables the sequence of amino acids comprising the protein to be determined. Techniques exist that detect modifications after protein manufacture, such as the addition of phosphate groups. Analogous to DNA chips, so-called protein microarrays have been developed. In these, a solid support holds various molecules (antibodies and receptors, as two examples) that will specifically bind protein. The binding pattern of proteins to the support can help determine what proteins are being made and when they are synthesized.
Proteomics typically operates in tandem with bioinformatics, which is an integration of mathematical, statistical, and computational methods to unravel biological data. The vast amount of protein information emerging from a single experiment would be
KEY TERMS
Chimeric protein— Protein containing at least two different parts derived from two separate genes, but expressed as a single protein.
DNA-binding domain— Part of a protein that interacts with DNA.
Electrophoresis— Separation of nucleic acid or protein molecules in an electric field.
Peptides— Low molecular weight molecules formed from two or more amino acids linked together by a peptide bond.
Polyacrylamide— Branched polymer of acrylamide, used to make gels for electrophoresis.
Reporter gene— Gene that encodes easily assayable product (protein), for example luciferase, green fluorescent protein, or chloramphenical acetyltransferase (CAT). It is fused to a promoter region of gene that is being tested.
Transcriptional activator— A protein that induces transcription of a gene if stimulated, contains a DNA-binding domain and an activator domain
impossible to analyze by manual computation or analysis. Accordingly, comparison of the data with other databases and the use of computer modeling programs, such as those that calculate three-dimensional structures, are invaluable in proteomics.
Resources
BOOKS
Palzkill, Timothy. Proteomics. Boston: Kluver Academic Publishers, 2002.
Twyman, R.M. Principles of Proteomics. Oxford: BIOS Scientific Publications, 2004.
Veenstra, Timothy D. and John R. Yates. Proteomics for Biological Discovery. New York: Wiley-Liss, 2006.
PERIODICALS
Rappsilber, Juri, and Matthias Mann, “What Does it Mean to Identify a Protein in Proteomics?” Trends in Biochemical Sciences (February 2002):74–78.
OTHER
Protein Prospector<http://prospector.ucsf.edu> (November 16, 2002).
Human Proteome Organization<http://www.hupo.org/.> (accessed November 1, 2006).
Brian Hoyle
Proteomics
Proteomics
Proteome is a complement of proteins expressed in a cell at given time and proteomics means global analysis of this protein complement. Proteomics investigates the global changes of proteins in cells and tissues in response to a stimulus (for example temperature change, drug or nutrient treatment, or growth phase). It also studies protein-protein interactions. Proteomics came into prominence after 1997 and quickly became a popular research avenue, holding much greater importance than scientists initially suspected. The main reason for this is the fact that based on the genomic sequence it is impossible to predict how the gene products (proteins) are going to behave. Proteins are regulated at the level of protein translation, subsequently they can be modified by addition of various molecules (sugar, for example). Proteins can have varying half-lives, and their intracellular distribution can be predicted only with limited certainty.
Methods
The most basic method used in proteomics is a twodimensional (2D) electrophoresis . Cellular or tissue extracts are separated on a polyacrylamide gel in two dimensions, according to their charge and size, producing a pattern of spots. Although up to 11,000 spots can be separated on one gel, a typical number is approximately 2000. Following the separation, patterns obtained from test and control samples are overlaid and analyzed to determine any changes in protein expression, their levels or modifications. Proteins that are present in one, but not the other sample are isolated from the gel. In order to identify them, proteins are digested with an enzyme (usually trypsin) and the obtained small fragments (peptides) are analyzed by mass spectrometry to produce peptide fingerprints or protein tags that can be used for identification of the unknown spot. A second method used in analysis is tandem spectrometry. When each of the analyzed peptides are further digested and re-analyzed, this approach produces some sequence information in addition to mass . It is important to realize that mass spectrometry or even microsequencing are not used to fully sequence the samples, but to create sufficient information that will identify the unknown protein by searching the databases.
An important area of proteomic studies is to identify the interactions between proteins to determine the networks created by proteins in cells. A method used for such studies is a yeast two-hybrid system, which is best compared to fishing. Scientists use a bait molecule (chimeric protein), produced from the DNA sequence of a protein of interest fused to a sequence of a DNA-binding part of a known transcriptional activator, to identify what protein (prey molecule) interacts with their protein of interest. The process of identification of interacting complexes involves observing a color change resulting from an activation of a reporter gene. This gene is activated by a formed complex due to the fact that a prey molecule contains a coding sequence of a protein that might interact with the bait fused to an activation domain of the same transcriptional activator used for creating the bait. As of 2002, a number of commercial companies (for example, Hybrigenics) have developed a large scale automated yeast two hybrid screen.
An alternative method for studying protein interactions is creating tagged proteins, introducing them into cells, and subsequently using the tag to isolate the protein complexes formed in cells. The complexes are then separated on a gel according to their size and individual proteins are isolated, and identified by mass spectrometry.
Computational tools in proteomics are very important as the data generated requires image analysis, peptide and protein tag analysis, extensive database searching, and further investigation involving for example protein modification analysis.
Use of proteomics
Proteins play the most important part in creating cells and tissues, and directing their functions. It should be possible to identify the protein signatures of various diseases, especially at their onset to help in diagnosis and treatment. Protein signatures are particularly valuable in drug design and clinical trials. Scientists also have more basic interests in proteome, and proteomics is used to study bacteria , plant, and animal cells in order to understand how the proteins change during a particular treatment or phase of growth.
Future application of proteomic analysis is highly dependent on further technological developments to streamline analysis of clinical samples. The most obvious challenge for proteomics is developing protein chips similar to DNA microarrays. Also, new methods are required for studying the protein-protein interactions and large protein complexes.
Resources
books
Palzkill, Timothy. Proteomics. Boston: Kluver Academic Publishers, 2002.
Pennington, S.R., and M. J. Dunn, eds. Proteomics: From Protein Sequence to Function. Oxford: Bios and New York: Springer, 2001.
Westermeier, Reiner, and Tom Naven. Proteomics in Practice. Berlin: Wiley-VCH Verlag GmbH, 2002.
periodicals
Dove, Alan. "Proteomics: Translating Genomics into Products?" Nature Biotechnology (March 1999): 233–236.
Kumar, Anuj, and Michael Snyder. "Protein Complexes Take the Bait." Nature (January 2002): 123–124.
Rappsilber, Juri, and Matthias Mann. "What Does it Mean to Identify a Protein in Proteomics?" Trends in Biochemical Sciences (February 2002): 74–78.
organizations
Swiss Institute of Bioinformatics, CMU. Rue Michel-Servet 1, 1211 Genève 4, Switzerland, 41–22–7025858. <http://www.expasy.ch>..
other
Protein Prospector [cited November 16, 2002]. <http://prospector.ucsf.edu>.
Human Proteome Organization [cited November 16, 2002]. <http://www.hupo.org/.>.
Agnieszka Lichanska
KEY TERMS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .- Chimeric protein
—Protein containing at least two different parts derived from two separate genes, but expressed as a single protein.
- DNA-binding domain
—Part of a protein that interacts with DNA.
- Electrophoresis
—Separation of nucleic acid or protein molecules in an electric field.
- Peptides
—Low molecular weight molecules formed from two or more amino acids linked together by a peptide bond.
- Polyacrylamide
—Branched polymer of acrylamide, used to make gels for electrophoresis.
- Reporter gene
—Gene that encodes easily assayable product (protein), for example luciferase, green fluorescent protein, or chloramphenical acetyltransferase (CAT). It is fused to a promoter region of gene that is being tested.
- Transcriptional activator
—A protein that induces transcription of a gene if stimulated, contains a DNA-binding domain and an activator domain
Proteomics
Proteomics
Proteomics is a discipline of microbiology and molecular biology that has arisen from the gene sequencing efforts that culminated in the sequencing of the human genome in the last years of the twentieth century. In addition to the human genome, sequences of disease-causing bacteria are being deduced. Although fundamental, knowledge of the sequence of nucleotides that comprise deoxyribonucleic acid reveals only a portion of the protein structure encoded by the DNA . Because proteins are an essential element of bacterial structure and function (e.g., role in causing infection), the knowledge of the three-dimensional structure and associations of proteins is vital. Proteomics is an approach to unravel the structure and function of proteins.
The word proteomics is derived from PROTEin complement to a genOME. Essentially, this is the spectrum of proteins that are produced from the template of an organism's genetic material under a given set of conditions. Proteomics compares the protein profiles of proteomes under different conditions in order to unravel biological processes.
The origin of proteomics dates back to the identification of the double-stranded structure of DNA by Watson and Crick in 1953. More recently, the development of the techniques of protein sequencing and gel electrophoresis in the 1960s and 1970s provided the technical means to probe protein structure. In 1986, the first protein sequence database was created (SWISS-PROT, located at the University of Geneva). By the mid-1990s, the concept of the proteome and the discipline of proteomics were well established. The power of proteomics was manifest in March 2000, when the complete proteome of a whole organism was published, that of the bacterium Mycoplasma genitalium
Proteomics research often involves the comparison of the proteins produced by a bacterium (example, Escherichia coli ) grown at different temperatures, or in the presence of different food sources, or a population grown in the lab versus a population recovered from an infection. Escherichia coli responds to changing environments by altering the proteins it produces. However, the full extent of the various alterations and their molecular bases are largely unknown. Proteomics research essentially attempts to provide a molecular explanation for bacterial behavior.
Proteomics can be widely applied to research of diverse microbes. For example, the yeast Saccharomyces cerevisiae is being studied to reveal the proteins produced and their functional associations with one another.
The task of sorting out all the proteins that can be produced by a bacterium or yeast cell is formidable. Targeting of the research effort is essential. For example, the comparison of the protein profile of a bacterium obtained directly from an infection (in vivo ) with populations of the same microbe grown under defined conditions in the lab (in vitro ) could identify proteins that are unique to the infection. Some of these could become targets for diagnosis, therapy, or for prevention of the infection.
The study of proteins is difficult. The amount of protein cannot be amplified as easily as can the amount of DNA, making the detection of minute amounts of protein challenging. The structure of proteins must be maintained, which can be difficult. For example, enzymes , heat, light, or the energy of mixing can break down some proteins.
With the advent of the so-called DNA chips , the expression of thousands of genes can be monitored simultaneously. But DNA is static. It exists and is either expressed or not. Moreover, the expression of a protein does not necessarily mean that the protein is active. Also, proteins can be modified after being produced. Proteins can adopt different shapes, which can determine different functions and levels of activity after they have been produced. These functions provide the structural and operational framework for the life of the bacterium. Proteomics represents the next step after gene expression analysis
Proteomics utilizes various techniques to probe protein expression and structure. The migration of proteins can depend on their net charge and on the size of the protein molecule. When these migrations are in two dimensions, as in 2-D polyacrylamide gel electrophoresis, thousands of proteins can be distinguished in a single experiment. A technique called mass spectrometry analyzes a trait of proteins known as the mass-to-charge ratio, which essentially enables the sequence of amino acids comprising the protein to be determined. Techniques exist that detect modifications after protein manufacture, such as the addition of phosphate groups. Analogous to DNA chips, so-called protein microarrays have been developed. In these, a solid support holds various molecules (antibodies and receptors, as two examples) that will specifically bind protein. The binding pattern of proteins to the support can help determine what proteins are being made and when they are synthesized.
Proteomics typically operates in tandem with bioinformatics , which is an integration of mathematical, statistical, and computational methods to unravel biological data. The vast amount of protein information emerging from a single experiment would be impossible to analyze by manual computation or analysis. Accordingly, comparison of the data with other databases and the use of computer modeling programs, such as those that calculate three-dimensional structures, are invaluable in proteomics.
The knowledge of protein expression and structure, and the potential changes in structure and function under different conditions, could allow the tailoring of treatment strategies. For example, in the lungs of those afflicted with cystic fibrosis, the bacterium Pseudomonas aeuruginosa forms adherent populations on the surface of the lung tissue. These populations, which are enclosed in a glycocalyx that the bacteri produce, are very resistant to treatments and directly and indirectly damage the lung tissue to a lethal extent. Presently, it is known that the bacteria change their genetic expression as they become more firmly associated with the surface. Through proteomics, more details of the proteins involved in the initial approach to the surface and the subsequent, irreversible surface adhesion could be revealed. Once the targets are known, it is conceivable that they can be blocked. Thus, biofilms would not form and the bacteria could be more expeditiously eliminated from the lungs.
See also Biotechnology; Molecular biology and molecular genetics