Gene Expression: Transcription
The majority of genes are expressed as the proteins they encode.
The process occurs in two steps:
- Transcription = DNA -> RNA
- Translation = RNA -> protein
Taken together, they make up the "central dogma" of biology: DNA -> RNA -> protein.
Here is an overview.
This page examines the first step:
Gene Transcription: DNA -> RNA
DNA serves as the template for the synthesis of RNA much as it does for its own replication.
The Steps
- several protein transcription factors bind to promoter sites, usually on the 5' side of the gene to be transcribed
- an enzyme, RNA polymerase, binds to the complex of transcription factors
- working together, they open the DNA double helix
- RNA polymerase proceeds down one strand moving in the 3' -> 5' direction
- as it does so, it assembles ribonucleotides (supplied as triphosphates, e.g., ATP) into a strand of RNA
- each ribonucleotide is inserted into the growing RNA strand following the rules of base pairing. Thus for each C encountered on the DNA strand, a G is inserted in the RNA; for each G, a C; and for each T, an A. However, each A on the DNA guides the insertion of the pyrimidine uracil (U, from uridine triphosphate, UTP). There is no T in RNA.
- synthesis of the RNA proceeds in the 5' -> 3' direction.
- as each nucleoside triphosphate is brought in to add to the 3' end of the growing strand, the two terminal phosphates are removed
Note that at any place in a DNA molecule, either strand may be serving as the template; that is, some genes "run" one way, some the other (and in a few remarkable cases, the same segment of double helix contains genetic information on both strands!). In all cases, however, RNA polymerase proceeds along a strand in its 3' -> 5' direction.
Several types of RNA are synthesized:
- messenger RNA (mRNA). This will later be translated into a polypeptide.
- ribosomal RNA (rRNA). This will be used in the building of ribosomes: machinery for synthesizing proteins by translating mRNA.
- transfer RNA (tRNA). RNA molecules that carry amino acids to the growing polypeptide.
- small nuclear RNA (snRNA). DNA transcription of the genes for mRNA, rRNA, and tRNA produces large precursor molecules ("primary transcripts") that must be processed within the nucleus to produce the functional molecules for export to the cytosol. Some of these processing steps are mediated by snRNAs.
There are 4 kinds. In eukaryotes, these are
- 18S rRNA. One of these molecules, along with some 30 different protein molecules, is used to make the small subunit of the ribosome.
- 28S, 5.8S, and 5S rRNA. One each of these molecules, along with some 45 different proteins, are used to make the large subunit of the ribosome.
The name given each type of rRNA reflects the rate at which the molecules sediment in the ultracentrifuge. The larger the number, the larger the molecule (but not proportionally).
The 28S, 18S, and 5.8S molecules are produced by the processing of a single primary transcript from a cluster of identical copies of a single gene. The 5S molecules are produced from a different cluster of identical genes.
There are some 32 different kinds of tRNA in a typical eukaryotic cell.
- each is the product of a separate gene
- they are small (~4S), containing 73-93 nucleotides
- many of the bases in the chain pair with each other forming sections of double helix
- the unpaired regions form 3 loops
- each kind of tRNA carries (at its 3' end) one of the 20 amino acids (thus most amino acids have more than one tRNA responsible for them)
- at one loop, 3 unpaired bases form an anticodon
- base pairing between the anticodon and the complementary codon on a mRNA molecule brings the correct amino acid into the growing polypeptide chain. Further details of this process are described in the discussion of translation.
Messenger RNA comes in a wide range of sizes reflecting the size of the polypeptide it encodes. Most cells produce small amounts of thousands of different mRNA molecules, each to be translated into a peptide needed by the cell. Many mRNAs are common to most cells, encoding "housekeeping" proteins needed by all cells (e.g. the enzymes of glycolysis). Other mRNAs are specific for only certain types of cells. These encode proteins needed for the function of that particular cell (e.g., the mRNA for hemoglobin in the precursors of red blood cells).
Small Nuclear RNA (snRNA)
Approximately a dozen different genes for snRNAs, each present in multiple copies, have been identified. The snRNAs have various roles in the processing of the other classes of RNA. For example, several snRNAs are part of the spliceosome that participates in converting pre-mRNA into mRNA by excising the introns and splicing the exons. [Scroll down to the discussion of RNA processing.]
The RNA polymerases are huge multi-subunit protein complexes. Three kinds are found in eukaryotes.
- RNA polymerase I (Pol I). It transcribes the rRNA genes for the precursor of the 28S, 18S, and 5.8S molecules. (and is the busiest of the RNA polymerases)
- RNA polymerase II (Pol II). It transcribes the mRNA and snRNA genes.
- RNA polymerase III (Pol III). It transcribes the 5S rRNA genes and all the tRNA genes.
All the primary transcripts produced in the nucleus must undergo processing steps to produce functional RNA molecules for export to the cytosol. We shall confine ourselves to a view of the steps as they occur in the processing of pre-mRNA to mRNA.
The steps:
- Synthesis of the cap. This is a stretch of three modified nucleotides attached to the 5' end of the pre-mRNA.
- Synthesis of the poly(A) tail. This is a stretch of adenine nucleotides attached to the 3' end of the pre-mRNA.
- Step-by-step removal of introns present in the pre-mRNA and splicing of the remaining exons. This step is required because most eukaryotic genes are split.
Most eukaryotic genes are split into segments. In decoding the open reading frame of a gene for a known protein, one usually encounters periodic stretches of DNA calling for amino acids that do not occur in the actual protein product of that gene. Such stretches of DNA, which get transcribed into RNA but not translated into protein, are called introns. Those stretches of DNA that do code for amino acids in the protein are called exons.
Examples: - the gene for one type of collagen found in chickens is split into 52 separate exons
- the gene for dystrophin, which is mutated in boys with muscular dystrophy, has 79 exons
- even the genes for rRNA and tRNA are split.
The cutting and splicing of mRNA must be done with great precision. If even one nucleotide is left over from an intron or one is removed from an exon, the reading frame from that point on will be shifted, producing new codons specifying a totally different sequence of amino acids from that point to the end of the molecule (which often ends prematurely anyway when the shifted reading frame generates a STOP codon).
The removal of introns and splicing of exons is done with the spliceosome. This is a complex of several snRNA molecules and several proteins.
The introns in most pre-mRNAs begin with a GU and end with an AG.
Presumably these short sequences are essential for guiding the spliceosome.
Alternate Splicing
The processing of pre-mRNA for many proteins proceeds along various paths in different cells or under different conditions. For example, early in the differentiation of a B cell (a lymphocyte that synthesizes an antibody) the cell first uses an exon that encodes a transmembrane domain that causes the molecule to be retained at the cell surface. Later, the B cell switches to using a different exon whose domain enables the protein to be secreted from the cell as a circulating antibody molecule.
So, whether a particular segment of RNA will be retained as an exon or excised as an intron can vary under different circumstances. Clearly the switching to an alternate splicing pathway must be closely regulated.
Why split genes?
Perhaps during evolution, eukaryotic genes have been assembled from smaller, primitive genes - today's exons. Some proteins, like the antibodies mentioned in the previous section, are organized in a set of separate sections or domains each with a special function to perform in the complete molecule. Each domain is encoded by a separate exon. Having the different functional parts of the antibody molecule encoded by separate exons makes it possible to use these units in different combinations. Thus a set of exons in the genome may be the genetic equivalent of the various modular pieces in a box of "Lego" for children to assemble in whatever forms they wish.
But the boundaries of other exons do not seem to correspond to domain boundaries of the protein. Furthermore, rRNA and tRNA genes are also split, and these do not encode proteins. So perhaps some exons are simply "junk" DNA that was inserted into the gene at some point in evolution without causing any harm.
Summary
Gene expression occurs in two steps:
- transcription of the information encoded in DNA into a molecule of RNA (described here) and
- translation of the information encoded in the nucleotides of mRNA into a defined sequence of amino acids in a protein (discussed in Gene Translation: RNA -> Protein).
24 June 1999