Gene expression and transcription


The genome contains the hereditary information of the structure and function of a cell or organism. This information is stored as a sequence of bases in DNA. A relatively small percentage of DNA codes for proteins and ribonucleic acids (RNAs), while a large amount of the genome is composed of sequences without a clear function. The conversion of the information stored within DNA into a functional molecule, or RNA and proteins, is termed gene expression. Gene expression occurs in two stages: transcription and translation. During transcription, DNA is copied into RNA. RNA is then used to synthesize proteins during translation.

Key enzymes involved in transcription are DNA-dependent RNA polymerases. These enzymes synthesize the RNA molecule based on the genes encoded in DNA, which contain starting sites (promoters) where transcription begins. Transcription factors are required to recognize the promoter. RNA polymerase moves along the template strand of the double-stranded DNA. The strand is synthesized until the end of the DNA segment (termination site) is reached. In eukaryotes, the newly formed primary transcript is further modified to be, for example, available for protein synthesis.

Gene expression is strongly regulated at all levels. Some genes are expressed in all cells and are required as housekeeping genes for basic cellular functions (i.e., constitutive expression). Other genes are only active in certain cells; their expression is regulated by a variety of mechanisms. Genes can undergo activation or silencing, and transcription depends on the presence of specific DNA-binding proteins. The newly formed RNA may also be degraded after transcription by various mechanisms before use in protein synthesis. There are also regulatory mechanisms at a translational level. Although each cell in an organism contains the same DNA, the regulated expression of certain genes causes the cells to specialize and assume different functions, e.g., muscle cells or hepatocytes.


In protein synthesis, DNA is initially transcribed into mRNA (transcription) and mRNA is translated into an amino acid chain (translation)!


In transcription, DNA serves as a template to produce a complementary RNA molecule. Only a single-strand from the double-stranded DNA (dsDNA) is read.

RNA polymerases and transcription factors

RNA polymerases

Transcription reactions are catalyzed by (DNA-dependent) RNA polymerases. In eukaryotic cells, there are various types of RNA polymerase. They recognize different promoter types and transcribe different types of genes.

  • Structure: composed of two large subunits with many polypeptide chains
  • Function: synthesis of a new RNA strand from 5' to 3' direction; reading of the DNA strand from 3' to 5' direction
Type of RNA polymerase Transcripts Location

RNA polymerase I


RNA polymerase II

RNA polymerase III

Mitochondrial RNA polymerase


RNA polymerase II transcribes almost all genes that code for proteins!

The RNA polymerases are numbered in the order in which their products are utilized in the process of protein synthesis! I, II, and III → rRNA, mRNA, and tRNA, respectively

In prokaryotes, there is only one type of RNA polymerase that transcribes all three types of RNA!

Transcription factors

RNA polymerases require helper proteins for promoter recognition of the genes to be transcribed.

DNA-binding proteins

Proteins, such as transcription factors that bind to DNA, require specific protein domains, also termed structural motifs. These structural motifs usually use either an α-helix or a β sheet to bind to the major groove of DNA. Transcription factors have DNA-binding domains through which they are able to interact with specific DNA segments to perform their function. Numerous structural motifs of DNA-binding domains have been identified. Important examples are the zinc finger domains, leucine zippers, basic helix-loop-helix, and the homeobox.

An important structural motif of DNA-binding proteins is an α-helix with many basic amino acid residues!References:[1]

Stages of transcription

Transcription is divided into three phases: initiation, elongation, termination.

  1. Initiation (transcription): the start of transcription by the formation of the initiation complex and unwinding of DNA
    1. Preinitiation complex (RNA polymerase-promoter closed complex) formation by binding of general transcription factors and RNA polymerase to the promoter region (e.g., TATA box, CAAT box, GC box)
    2. Formation of a transcription bubble by unwinding the DNA double helix to a single strand with a length of 10–12 bases (open complex)
    3. Start of RNA synthesis
  2. Elongation: extension of the RNA strand
    • 3'OH group of the growing RNA strand is attached to the α-phosphate group of the next complementary nucleoside triphosphate
  3. Termination: During termination, polyadenylation starts.

During transcription, base pairing occurs between DNA and RNA. Uracil (instead of thymine) in RNA pairs with adenine in DNA!

RNA and DNA pair in an antiparallel direction. The 5' end of one strand is the 3' end of the other strand and vice versa. In both cases, the base sequences are written in the usual 5'→3' direction!

Post-transcriptional modification (RNA processing)

In eukaryotes, the end-product of transcription is heterogeneous nuclear RNA (hnRNA), which is then transformed into mature mRNA through post-transcriptional modifications in the nucleus. These modifications include capping, polyadenylation, splicing, and RNA editing. mRNA then leaves the nucleus and enters the cytosol




  • Definition: excising of introns from hnRNA transcripts and direct linkage of exons
  • Function: excision of introns so that the resulting mRNA only contains relevant information in the form of exons
  • Process
    1. Spliceosome formation at the exon-intron border
      • Complex of:
        • Various snRNAs that are bound to proteins and form snRNPs (small nuclear ribonucleoproteins)
        • The hnRNA to be modified
        • Many other small proteins
      • Involved sequence segments on the hnRNA:
        • Exon-intron borders : characterized by specific base sequences (consensus sequences) on the RNA
          • 5' splice site
          • 3' splice site
        • Branch point: adenine nucleotide located in the intron, on which a lariat structure is formed (see below)
        • Pyrimidine-rich sequence in front of the 3' splice site
    2. Opening of the exon-intron border at the 5' splice site: A temporary lariat structure with a 2'→ 5' phosphodiester bond is formed, which links the two ends to be joined together in close proximity (loop formation)
    3. Opening of the exon-intron border at the 3' splice site
    4. Joining of the exon ends

The exons of a gene are the coding segments; the introns are removed from hnRNA by splicing!

RNA editing

Alternative splicing

  • Definition: removal of introns within hnRNA with differential joining of exons
  • Process: similar to splicing with additional splicing factors that determine the range of splice locations
  • Function
    • Various proteins can be produced from one gene: increased information density of DNA
    • The formation of new proteins is facilitated: more rapid adaptation to altered living conditions.

The one gene-one enzyme hypothesis does not apply to eukaryotes! A variety of proteins can be formed from one gene by alternative splicing!

Quality control of mRNA

Regulation of transcription

Because transcription and protein synthesis require large amounts of energy, gene expression is strongly regulated. While some genes are continuously transcribed, other genes undergo regulation.

Prokaryotic gene regulation (operon model)

Regulation of gene expression was initially analyzed in E. coli. Regulatory sequences in the bacterial genome ensure gene expression of the enzyme β-galactosidase if the sugar lactose is available as an energy source. Other proteins are also synthesized, which are associated with lactose metabolism. Therefore, it involves the coordinated expression of several genes.

In the lac operon, the repressor binds to the operator and prevents transcription of the operon gene in the absence of lactose.

Eukaryotic gene regulation

Regulation of gene expression is significantly complicated in eukaryotes compared to prokaryotes. One reason is due to the difference in size between the genomes of eukaryotes and prokaryotes, with eukaryotes having a significantly larger genome. Another reason is that the DNA in the eukaryotic genome in the nucleus is strongly condensed and packaged as chromatin. As a result, it is less accessible than prokaryotic DNA. However, a common feature of eukaryotes and prokaryotes is the importance of activators and repressors, which bind specific DNA sequences and increase or inhibit gene expression.

Transcriptional inhibitors

Transcriptional inhibitors are strong cytotoxins but can also be partially used as an antibiotic.

Inhibitor Mechanism Occurrence/use
Actinomycin D
  • 1. Cooper GM. The Cell: A Molecular Approach. Sunderland, MA: Sinauer Associates; 2000.
last updated 09/14/2020
{{uncollapseSections(['Xoc90W0', 'cocaaW0', '1oc2aW0', 'i9cJMe0', 'WocPaW0', 'docoaW0', 'VocGaW0'])}}