Genomics and Proteomics

Microarrays and the Transciptome

The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non-coding RNA produced in cells.

Learning Objectives

Define the transcriptome

Key Takeaways

Key Points

  • Unlike the genome, which is roughly fixed for a given cell line (excluding mutations), the transcriptome can vary with external environmental conditions.
  • The transcriptome reflects the genes that are being actively expressed at any given time.
  • DNA microarrays can provide a method for comparing on a genome-wide basis the abundance of DNAs in a specific sample.

Key Terms

  • transcriptome: The complete set of messenger RNA molecules (transcripts) produced in a cell or a population of cells.
  • DNA microarray: a collection of microscopic DNA spots attached to a solid surface forming an array; used to measure the expression levels of large numbers of genes simultaneously
  • PCR: polymerase chain reaction

The Transcriptome

The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non-coding RNA produced in one or a population of cells. The term can be applied to the total set of transcripts in a given organism, or to the specific subset of transcripts present in a particular cell type. Unlike the genome, which is roughly fixed for a given cell line (excluding mutations), the transcriptome can vary with external environmental conditions. Because it includes all mRNA transcripts in the cell, the transcriptome reflects the genes that are being actively expressed at any given time, with the exception of mRNA degradation phenomena such as transcriptional attenuation.

Analysis of the Transcriptome

The study of transcriptomics, also referred to as expression profiling, examines the expression level of mRNAs in a given cell population, often using high-throughput techniques based on DNA microarray technology. A number of organism-specific transcriptome databases have been constructed and annotated to aid in the identification of genes that are differentially expressed in distinct cell populations.

DNA microarrays can provide a genome-wide method for comparison of the abundance of DNAs in the same samples.The DNA in spots can only be PCR products specific for individual genes. A DNA copy of RNA is made using the enzyme reverse transcriptase. Sequencing is now being used instead of gene arrays to quantify DNA levels, at least semi-quantitatively.

The transcriptomes of stem cells and cancer cells are of particular interest to researchers who seek to understand the processes of cellular differentiation and carcinogenesis. Analysis of the transcriptomes of human oocytes and embryos is used to understand the molecular mechanisms and signaling pathways controlling early embryonic development, and could theoretically be a powerful tool in making proper embryo selection during in vitro fertilization.

image

DNA microarray principle: The core principle behind microarrays is hybridization between two DNA strands, the property of complementary nucleic acid sequences to specifically pair with each other by forming hydrogen bonds between complementary nucleotide base pairs.

Proteomics

Proteomics is the large-scale study of proteins, particularly their structures and functions.

Learning Objectives

Summarize the purpose of, and methods used for, proteomics

Key Takeaways

Key Points

  • The proteome is the entire complement of proteins, including the modifications made to a particular set of proteins, produced by an organism or system.
  • The proteome varies with time and distinct requirements, or stresses, that a cell or organism undergoes.
  • Proteomics typically gives us a better understanding of an organism than genomics.

Key Terms

  • proteomics: The branch of molecular biology that studies the set of proteins expressed by the genome of an organism.
  • genomics: The study of the complete genome of an organism.
  • mass spectrometry: An analytical technique that measures the mass/charge ratio of the ions formed when a molecule or atom is ionized, vaporized, and introduced into a vacuum. Mass spectrometry may also involve breaking molecules into fragments – thus enabling its structure to be determined.

Proteomics is the large-scale study of proteins, particularly their structures and functions. The proteome is the entire complement of proteins, including the modifications made to a particular set of proteins, produced by an organism or system. This will vary with time and distinct requirements, or stresses, that a cell or organism undergoes.

While proteomics generally refers to the large-scale experimental analysis of proteins, it is often specifically used for protein purification and mass spectrometry. After genomics and transcriptomics, proteomics is considered the next step in the study of biological systems. It is much more complicated than genomics mostly because while an organism’s genome is more or less constant, the proteome differs from cell to cell and from time to time. This is because distinct genes are expressed in distinct cell types. This means that even the basic set of proteins which are produced in a cell needs to be determined. In the past, this was done by mRNA analysis, but this was found not to correlate with protein content. It is now known that mRNA is not always translated into protein. The amount of protein produced for a given amount of mRNA depends on the gene it is transcribed from and the current physiological state of the cell.

Proteomics confirms the presence of the protein and provides a direct measure of the quantity present. Not only does the translation from mRNA cause differences, but many proteins are also subjected to a wide variety of chemical modifications after translation which are critical to the protein’s function such as phosphorylation, ubiquitination, methylation, acetylation, glycosylation, oxidation, and nitrosylation. Some proteins undergo ALL of these modifications, often in time-dependent combinations, aptly illustrating the potential complexity one has to deal with when studying protein structure and function.

Proteomics typically gives us a better understanding of an organism than genomics. First, the level of transcription of a gene gives only a rough estimate of its level of expression into a protein. An mRNA produced in abundance may be degraded rapidly or translated inefficiently, resulting in a small amount of protein. Second, as mentioned above many proteins experience post-translational modifications that profoundly affect their activities. For example, some proteins are not active until they become phosphorylated. Third, many transcripts give rise to more than one protein through alternative splicing or alternative post-translational modifications. Fourth, many proteins form complexes with other proteins or RNA molecules. They only function in the presence of these other molecules. Finally, protein degradation rate plays an important role in protein content.

One way in which a particular protein can be studied is to develop an antibody which is specific to that modification. For example, there are antibodies that only recognize certain proteins when they are tyrosine-phosphorylated, known as phospho-specific antibodies. There are also antibodies specific to other modifications. These can be used to determine the set of proteins that have undergone the modification of interest. For more quantitative determinations of protein amounts, techniques such as ELISAs can be used.

Most proteins function in collaboration with other proteins. One goal of proteomics is to identify which proteins interact. This is especially useful in determining potential partners in cell signaling cascades. Several methods are available to probe protein–protein interactions. The traditional method is yeast two-hybrid analysis. New methods include protein microarrays, immunoaffinity, and chromatography followed by mass spectrometry, dual polarisation interferometry, Microscale Thermophoresis, and experimental methods such as phage display and computational methods.

image

Robotic preparation of MALDI mass spectrometry samples: Matrix-assisted laser desorption/ionization (MALDI) is a soft ionization technique used in mass spectrometry. It allows for the analysis of biomolecules and large organic molecules which tend to be fragile and fragment when ionized by more conventional ionization methods.

One of the most promising developments to come from the study of human genes and proteins has been the identification of potential new drugs for the treatment of disease. This relies on genome and proteome information to identify proteins associated with a disease, which computer software can then use as targets for new drugs. For example, if a certain protein is implicated in a disease, its 3-D structure provides the information to design drugs to interfere with the action of the protein. A molecule that fits the active site of an enzyme, but cannot be released by the enzyme, will inactivate the enzyme. Understanding the proteome, the structure and function of each protein and the complexities of protein–protein interactions will be critical for developing the most effective diagnostic techniques and disease treatments in the future. Moreover, an interesting use of proteomics is using specific protein biomarkers to diagnose disease. A number of techniques allow testing for proteins produced during a particular disease, which helps to diagnose the disease quickly.

Metabolomics

Metabolomics is the scientific study of chemical processes involving metabolites.

Learning Objectives

Review metabolomics

Key Takeaways

Key Points

  • The metabolome represents the collection of all metabolites in a biological cell, tissue, organ or organism, which are the end products of cellular processes.
  • Metabolites are the intermediates and products of metabolism.
  • The metabolome forms a large network of metabolic reactions, where outputs from one enzymatic chemical reaction are inputs to other chemical reactions.
  • NMR and Mass Spectroscopy are the most widely used techniques to identify metabolites.

Key Terms

  • metabolomics: The study of the range of metabolites present in a person’s body at normal times, and when suffering from specific diseases; may be useful as a diagnostic tool
  • mass spectrometry: An analytical technique that measures the mass/charge ratio of the ions formed when a molecule or atom is ionized, vaporized, and introduced into a vacuum. Mass spectrometry may also involve breaking molecules into fragments – thus enabling its structure to be determined.
  • metabolite: Any substance produced by, or taking part in, a metabolic reaction.

Metabolomics is the scientific study of chemical processes involving metabolites. The metabolome represents the collection of all metabolites, which are the end products of cellular processes, in a biological cell, tissue, organ, or organism. Thus, while mRNA gene expression data and proteomic analyses do not tell the whole story of what might be happening in a cell, metabolic profiling can give an instantaneous snapshot of the physiology of that cell. One of the challenges of systems biology and functional genomics is to integrate proteomic, transcriptomic, and metabolomic information to give a more complete picture of living organisms.

History and Development

The idea that biological fluids reflect the health of an individual has existed for a long time. The term “metabolic profile” was introduced by Horning, et al. in 1971, after they demonstrated that gas chromatography- mass spectrometry (GC-MS; ) could be used to measure compounds present in human urine and tissue extracts. GC-MS is a method that combines the features of gas-liquid chromatography and mass spectrometry to identify different substances within a test sample. Concurrently, NMR spectroscopy, which was discovered in the 1940s, was also undergoing rapid advances. In 1974, Seeley et al. demonstrated the utility of using NMR to detect metabolites in unmodified biological samples. This first study on muscle tissue highlighted the value of NMR, in that it was determined that 90% of cellular ATP is complexed with magnesium. As sensitivity has improved with the evolution of higher magnetic field strengths and magic-angle spinning, NMR continues to be a leading analytical tool to investigate metabolism.

image

Gas Chromatography–mass spectrometry: Gas chromatography–mass spectrometry (GC-MS) is a method that combines the features of gas-liquid chromatography and mass spectrometry to identify different substances within a test sample.

In 2005, the first metabolomics web database for characterizing human metabolites, METLIN, was developed in the Siuzdak laboratory at The Scripps Research Institute. METLIN contained over 10,000 metabolites and tandem mass spectral data. On January 23, 2007, the Human Metabolome Project, led by Dr. David Wishart of the University of Alberta, Canada, completed the first draft of the human metabolome, consisting of a database of approximately 2500 metabolites, 1200 drugs and 3500 food components.

As late as mid-2010, metabolomics was still considered an “emerging field”. Further, it was noted that further progress in the field was in large part the result of addressing otherwise “irresolvable technical challenges” through technical evolution of mass spectrometry instrumentation. The word was coined in analogy with transcriptomics and proteomics. Like the transcriptome and the proteome, the metabolome is dynamic, changing from second to second. Although the metabolome can be defined readily enough, it is not currently possible to analyse the entire range of metabolites by a single analytical method.

Metabolites are the intermediates and products of metabolism. Within the context of metabolomics, a metabolite is usually defined as any molecule less than 1 kDa in size. However, there are exceptions to this, depending on the sample and detection method. Macromolecules such as lipoproteins and albumin are reliably detected in NMR-based metabolomics studies of blood plasma. In plant-based metabolomics, it is common to refer to “primary metabolites,” which are directly involved in growth, development and reproduction, and “secondary metabolites,” which are indirectly involved in growth, development and reproduction. In contrast, in human-based metabolomics it is more common to describe metabolites as being either endogenous (produced by the host organism) or exogenous. The metabolome forms a large network of metabolic reactions, where outputs from one enzymatic chemical reaction are inputs to other chemical reactions. Such systems have been described as hypercycles.

Separation methods: Gas chromatography, especially when interfaced with mass spectrometry (GC-MS), is one of the most widely used and powerful methods. It offers very high chromatographic resolution, but requires chemical derivatization for many biomolecules: only volatile chemicals can be analysed without derivatization.

Detection methods: Mass spectrometry (MS) is used to identify and to quantify metabolites after separation. Surface-based mass analysis has seen a resurgence in the past decade, with new MS technologies focused on increasing sensitivity, minimizing background, and reducing sample preparation.

Statistical methods: The data generated in metabolomics usually consist of measurements performed on subjects under various conditions. These measurements may be digitized spectra, or a list of metabolite levels. In its simplest form this generates a matrix with rows corresponding to subjects and columns corresponding to metabolite levels.

Key applications:

  • Toxicity assessment/toxicology. Metabolic profiling, especially of urine or blood plasma samples, can be used to detect the physiological changes caused by toxic insult of a chemical or mixture of chemicals. This is of particular relevance to pharmaceutical companies wanting to test the toxicity of potential drug candidates.
  • Functional genomics. Metabolomics can be an excellent tool for determining the phenotype caused by a genetic manipulation, such as gene deletion or insertion. Sometimes this can be a sufficient goal in itself—for instance, to detect any phenotypic changes in a genetically-modified plant intended for human or animal consumption. More exciting is the prospect of predicting the function of unknown genes by comparison with the metabolic perturbations caused by deletion/insertion of known genes.