Introduction to Various Tags in Protein Purification

In the rapidly evolving field of biochemistry, the purification of proteins stands as a cornerstone technology, pivotal not only for advancing biological research but also for spearheading developments in biopharmaceuticals and diagnostics. This intricate process involves the meticulous separation of target proteins from complex biological matrices, aiming to achieve samples of the highest purity and activity. At the heart of this technology lies the ingenious use of tags—molecular beacons designed to streamline the identification, isolation, and purification of proteins. Tags are essentially sequences or structures ingeniously fused to target proteins, serving as handles or markers that facilitate the protein's detection and purification from a mixture. These tags are tailored to provide specific structural or sequence motifs that render the target protein easily distinguishable and manipulable. This not only simplifies the purification process but also significantly enhances the purity and overall yield of the target protein. Designing an effective tag involves adhering to a set of fundamental principles. Primarily, the tag should possess the ability to specifically bind to a chosen affinity ligand or medium, ensuring the selective capture of the target protein. It should seamlessly integrate into the target protein's structure—whether at the N- or C-terminus or a strategic surface position—without compromising the protein’s native function or stability. Furthermore, the chosen tag should exhibit commendable solubility, stability, and biocompatibility, safeguarding the target protein's integrity throughout the purification process. In the ensuing discourse, we delve into the specifics of various tags commonly employed in protein purification. Each tag's unique features, applications, and the nuanced balance of their advantages and disadvantages will be examined, providing a comprehensive understanding of their role in enhancing the capabilities of protein purification technologies.

1. Histidine Tag

His tag is one of the most commonly used protein purification tags. Its main advantages are ease of use, high specificity, and good purification effect. The tag is based on affinity chromatography technology and consists of 6 consecutive histidines [1]. It can bind multivalent Ni2+ ions and form a complex with the histidine residues on the His tag, making it very easy to identify Ni-NTA Ion exchange resin binding on ion exchange resin. In protein engineering, His tags are usually fused to the N- or C-terminus of the target protein, or at specific positions on its surface to facilitate subsequent purification and detection [2]. As a commonly used protein purification tag, His tag has the following advantages and disadvantages:

Advantage:

①High affinity: His tag has a high affinity for nickel (Ni) or other transition metal ions and can bind to resins or columns containing Ni ions to achieve selective capture and purification of target proteins. ②Easy operation: The purification process of His tag is relatively simple. Usually, you only need to load the sample containing His tag into the nickel column and go through the elution step to obtain the target protein. Therefore, its operation process is relatively easy and suitable for beginners and quick purification needs. ③Wide scope of application: His tag is suitable for most proteins, including recombinant proteins expressed by cells, proteins expressed by yeast, proteins expressed by bacteria, etc., so it has a wide range of applications. ④Lower cost: Compared with some other affinity tags, such as GST tags or Strep tags, His tags are relatively low-cost and suitable for large-scale protein purification needs.

Disadvantage:

①Non-specific binding: In some cases, His tag may cause non-specific binding, that is, some non-target proteins may also bind to the Ni column, thus affecting the purity and quality of the target protein. ②May affect the structure and function of the protein: When the His tag is fused to a specific region of the protein, it may affect the structure and function of the protein, especially when the tag is located near the active site of the protein. ③Condition optimization is required: Although the His tag purification process is relatively simple, in some cases conditions may need to be optimized and adjusted to improve purification efficiency and selectivity.

In summary, the His tag, as a commonly used protein purification tag, has advantages such as ease of operation, wide applicability, and relatively low cost. However, it also has disadvantages such as non-specific binding and the potential to affect protein structure and function. Therefore, careful consideration and optimization are necessary when using it, depending on the specific circumstances.

2. Glutathione S-transferase Tag

GST tag (glutathione S-transferase) is a large molecular weight protein and a member of the transferase family and is widely used in molecular biology and biochemical research. It consists of 220 amino acids and is highly specific to GSH (thione). The GST tag achieves affinity purification by binding to glutathione resin, which specifically binds to the GST tag, thereby purifying the fusion protein [3]. As a commonly used protein purification tag, GST tag has the following advantages and disadvantages:

Advantage:

①High affinity: The GST tag has high affinity with glutathione resin, enabling fast and efficient purification. Glutathione resin can specifically bind the GST tag, allowing the target protein to be purified efficiently [4]. ②High purity: The selective binding of the GST tag allows the target protein to be highly purified from complex mixtures to obtain high-purity protein samples. ③Broad applicability: GST tags are suitable for a variety of expression systems, including bacterial, yeast, and mammalian cell systems. This broad applicability makes the GST tag the tag of choice for many researchers in protein purification. ④Convenient operation: Purification using the GST tag is usually simple and easy. You only need to combine the fusion protein containing the GST tag with the glutathione resin, and the target protein can be obtained through the elution step.

Disadvantage:

①Tag size: The GST tag is relatively large (approximately 26 kDa) and may have a certain impact on the structure and function of the fusion protein. ②Specificity: Although the GST tag has high affinity, non-specific binding may occur in some cases, resulting in co-purification of non-target proteins. ③Immunogenicity: Under certain experimental conditions, the GST label may induce immune responses, which requires special attention especially when conducting studies in animal models. In summary, the GST tag is a versatile protein purification tool that offers efficient and selective affinity purification capabilities. However, when using the GST tag, researchers should consider its potential impact on protein structure and immunogenicity.

3. Maltose-Binding Protein Tag

MBP tag is a commonly used protein purification tag, which is widely used to purify target proteins from expression systems such as bacteria and yeast. The MBP tag is derived from the E. coli alkaline phosphatase (MalE) of Escherichia coli and is usually fused to the N- or C-terminus of the target protein, or to a specific site on the target protein. MBP tags are also commonly used in affinity chromatography purification processes to selectively capture MBP-tagged proteins through maltose agarose columns [5]. MBP tag, as a commonly used protein purification tag, has the following advantages and disadvantages:

Advantage:

①High solubility and stability: MBP tags can increase the solubility of the target protein in solution and improve its stability, helping to prevent the aggregation and precipitation of the target protein, thereby maintaining the native conformation and activity of the target protein. ②High purity: By binding to the specific ligand on the affinity resin, the MBP tag enables the target protein to be purified efficiently and obtain a high-purity target protein sample, which is suitable for various biological experiments and applications [6]. ③Efficient expression: Target proteins fused with MBP tags usually have higher expression levels in expression systems, which helps improve the yield and purity of the target protein. ④Convenient enzyme digestion: MBP tags can be excised by specific enzymatic enzymes (such as TEV enzyme) to separate the target protein from the tag, thereby easily obtaining pure target proteins.

Disadvantage:

①Tag size: The MBP tag is relatively large (about 42 kDa) and may have a certain impact on the structure and function of the fusion protein, especially for some structure-sensitive proteins. ②Immunogenicity: Under certain experimental conditions, MBP tags may induce immune responses, and attention should be paid to their potential immunogenicity, especially when conducting studies in animal models. ③Digestion efficiency: Using specific enzymes to excise the MBP tag may result in incomplete digestion or the production of by-products, affecting the purity and activity of the target protein. In summary, MBP tag is an effective protein purification tool with the advantages of high solubility, stability and purification efficiency, and is suitable for a variety of expression systems. When using MBP tags for protein purification, researchers should weigh their advantages and disadvantages and choose the best purification strategy for their experimental needs.

4. FLAG Tag

FLAG tag is a commonly used protein tag, usually consisting of the DYKDDDDK sequence, in which the three amino acids of DYK provide antigenicity, allowing the FLAG tag to efficiently bind to anti-FLAG antibodies. The other three D and K residues of the tag are used to improve antigenicity and stability. This structure enables FLAG tags to be widely used for detection, purification and localization of target proteins. The tag was originally developed by researchers at Kodak's Kodak BioSciences research division. FLAG tag, as a commonly used protein purification tag, has the following advantages and disadvantages:

Advantage:

①Miniaturization: FLAG tag consists of DYKDDDDK sequence, the structure is relatively simple and small, and has less impact on the structure and function of the target protein. ②Specificity: Because the FLAG tag has high structural specificity and can efficiently bind to anti-FLAG antibodies, it is suitable for highly sensitive protein detection and purification [7]. ③Easy to synthesize and clone: The FLAG tag sequence is relatively simple, easy to synthesize and clone into expression vectors, and is suitable for various expression systems. ④Versatility: In addition to being used for protein detection and purification, FLAG tags can also be used for protein localization and functional studies, such as fluorescent labeling, etc.

Disadvantage:

①Immunogenicity: Under certain experimental conditions, FLAG tags may cause immune responses, especially when conducting studies in animal models. Pay attention to its potential immunogenicity. ②Affinity: Due to the miniaturization of the FLAG tag, its affinity may not be as good as some other large tags, resulting in lower purification efficiency in some cases. ③Enzyme cleavage efficiency: Using specific enzymes to excise FLAG tags may result in incomplete enzyme cleavage or the production of by-products, affecting the purity and activity of the target protein [8]. In summary, the FLAG tag, as a commonly used protein tag, possesses advantages such as small size, high specificity, ease of synthesis, and application. However, it also presents limitations such as immunogenicity, affinity, and enzymatic cleavage efficiency. When utilizing the FLAG tag for protein research, researchers should choose appropriate tags and experimental strategies based on the specific experimental requirements and characteristics of the target protein.

5. Hemagglutinin Tag

The HA tag (amino acid sequence: YPYDVPDYA) is a short sequence derived from amino acids 98 to 106 of the human influenza virus hemagglutinin (HA) protein, with a molecular weight of 1.1 KDa. It is one of the currently widely used epitope tags and can form Strong antibody recognition site. After the HA tag is fused to the C-terminus or N-terminus of the target protein through molecular biology methods, anti-HA tag antibodies can be used to detect [9], isolate and purify HA-tagged target proteins without the need for protein-specific antibodies or probes. Hemagglutinin (HA) tag, as a commonly used protein tag, has the following advantages and disadvantages in protein research:

Advantage:

①Miniaturization: The HA tag consists of a relatively short YPYDVPDYA sequence, which has less impact on the structure and function of the target protein and is beneficial to maintaining the native state of the protein. ②Highly specific: The HA tag binds to specific antibodies with high specificity, thus enabling highly sensitive detection and purification of target proteins. ③Easy to synthesize and clone: The sequence of HA tag is relatively simple, easy to synthesize and clone into expression vectors, and is suitable for various expression systems. ④Versatility: In addition to being used for protein detection and purification, HA tags can also be used for protein localization and functional studies.

Disadvantage:

①Immunogenicity: Under certain experimental conditions, the HA tag may induce immune responses, and special attention needs to be paid to its potential immunogenicity, especially when conducting studies in animal models. ②Affinity: Due to the miniaturization of the HA tag, its binding to the affinity matrix may not be as strong as some larger tags, which may result in lower purification efficiency in some cases. ③Tag sequence interference: When the HA tag is fused to the N or C terminus of the target protein, it may affect the structure and function of the target protein, especially for specific proteins, which may have adverse effects. Overall, HA tag as a commonly used protein tag, has advantages such as small size, high specificity, ease of synthesis, and application. However, it also has limitations including immunogenicity, affinity, and potential interference with the tagged protein sequence. When utilizing the HA tag for protein research, it is necessary to consider these pros and cons comprehensively and select appropriate tags and experimental strategies based on the specific experimental requirements and characteristics of the target protein.

6. Myc Tag

Myc tag is a commonly used protein tag, usually used to label target proteins to facilitate their detection, purification and localization. This tag is derived from the Myc gene of avian myelocytomatosis virus. The protein encoded by it is a transcription factor that is involved in regulating biological processes such as cell growth and proliferation [10]. As a commonly used protein tag, Myc tag has the following advantages and disadvantages:

Advantage:

①Miniaturization: The Myc tag consists of 9 amino acids and has a smaller molecular weight, which has less impact on the structure and function of the fusion protein and is beneficial to maintaining the natural state of the protein. ②Immunogenicity: Myc tag has strong binding ability to anti-Myc antibodies, enabling highly sensitive detection and purification of target proteins. ③Wide applicability: Myc tags are suitable for a variety of expression systems, including bacteria, mammalian cells, and yeast, and are easy to synthesize and clone into expression vectors. ④Location tracking: For proteins expressed in cells, Myc tag can be used to track and locate the target protein, helping to observe its distribution and localization in cells.

Disadvantage:

①Non-specific: Under certain experimental conditions, the Myc tag may interact with other proteins, leading to misinterpretation or false-positive results. ②Affecting protein stability: Some studies have shown that the use of Myc tags may affect protein stability and expression levels, especially in certain specific expression systems. ③Sequence duplication: Since the sequence of the Myc tag contains repeated leucine residues, it may lead to structural instability and easy degradation. In summary, Myc tag as a commonly used protein tag, has advantages such as small size, immunogenicity, and wide applicability. However, it also has disadvantages including nonspecificity, potential impact on protein stability, and sequence repetition. When using the Myc tag for protein research, it is necessary to consider these pros and cons comprehensively and select appropriate tags and experimental strategies based on the specific experimental requirements and characteristics of the target protein.

7. Avi Tag

Avi tag is a protein tag commonly used in biology. Its main function is to provide the target protein with a sequence that can specifically bind biotin, thereby facilitating the conduct of biotin-related experiments. The tag is derived from the structure of the biotin receptor, whose sequence is GLNDIFEAQKIEWHE and contains 15 amino acid residues. Avi tags work by utilizing high-affinity binding between biotin and biotin receptors. Biotin is a small organic molecule that combines with biotin receptors to form highly stable biotin-biotin receptor complexes. Therefore, fusing the Avi tag into the target protein can enable the target protein to specifically bind biotin, thereby achieving tracking, localization, purification and other operations of the target protein [11]. Avi tag, as a biotinylated protein tag, has the following advantages and disadvantages:

Advantage:

①Specificity: The binding of Avi tag to biotin is very specific, with almost no non-specific binding, so it is suitable for high-sensitivity biotin-related experiments. ②Flexibility: Avi tags can be fused to the target protein at different locations, often through synthetic genes. This flexibility makes Avi tags suitable for a variety of experimental designs and protein engineering. ③Stability: The biotin-biotin receptor complex is highly stable, so the binding of Avi tags can remain stable under a variety of conditions, which is beneficial to the repeatability and stability of experiments.

Disadvantage:

①Dependence on biotin: The working principle of the Avi label relies on the binding of biotin to the biotin receptor, so sufficient biotin needs to be added in the experiment to realize the function of the label. ②Effects of Biotin: Under certain experimental conditions, biotin may have an impact on protein structure and function, so the experimental results need to be properly controlled and verified. ③Synthesis cost: The cost of synthesizing the gene sequence of Avi tags and biotin-related reagents may be higher, and cost-effectiveness and experimental needs need to be considered in the experimental design.

8. GFP Tag

The green fluorescent protein (GFP) tag is a commonly used protein tag derived from the jellyfish Aequorea victoria. The main feature of the GFP tag is its own fluorescent property, which can emit green fluorescence, allowing the protein fused with the GFP tag to be tracked and observed in living cells and living animals. The GFP tag usually consists of 238 amino acids, which contains a tryptophan (Tyr) residue that combines with the blue light emitted under blue light excitation to produce green fluorescence. The structure of the GFP tag has been strictly optimized to have less impact on the structure and function of the fusion protein. Green fluorescent protein (GFP) tag, as a commonly used protein tag, has the following advantages and disadvantages:

Advantage:

①Visualization: The green fluorescence generated by the GFP tag can visually observe the expression and localization of the target protein in cells or organisms without using techniques such as microscope staining. ②Non-invasive: GFP tag does not require external fluorescent dyes or antibodies, does not damage organisms and cell structures, and is suitable for in vivo imaging and tracking dynamic changes of target proteins [12]. ③Flexibility: The GFP tag can be fused to the N-terminus or C-terminus of the target protein, or inserted into the interior of the target protein, making it suitable for a variety of expression systems and experimental designs. ④Stability: The fluorescence signal generated by the GFP tag is stable and long-lasting, not affected by external conditions, and is conducive to long-term real-time observation. ⑤Quantitative analysis: The fluorescent signal generated by the GFP tag can be used for quantitative analysis of the target protein, such as through techniques such as luciferase assay.

Disadvantage:

①Large molecular weight: The GFP tag is relatively large and may affect the structure and function of the target protein, especially in some specific expression systems, which may have adverse effects. ②Time delay: GFP tags require a certain amount of time to produce an observable fluorescent signal and therefore may not be sensitive enough to transient changes in protein activity. ③Background signal: In some cases, the GFP tag may produce background signal, leading to false positive results and requires appropriate control and validation. ④Unique to fluorescent proteins: The GFP tag is only suitable for target protein expression and does not have the labeling function for unexpressed proteins [13]. ⑤Photoquenching: High-intensity laser or long exposure may cause photoquenching of the GFP tag, affecting the stability and persistence of the fluorescence signal. In summary, GFP tag as a commonly used protein tag, has the advantages of flexibility, stability and quantitative analysis, but it also has disadvantages such as larger size, time delay and light quenching. When using GFP tags for protein research, it is necessary to fully consider these factors and select appropriate tags and experimental strategies based on specific experimental needs and the characteristics of the target protein.

References:
[1] Hochuli, E., Bannwarth, W., Döbeli, H., Gentz, R., & Stüber, D. (1988). Genetic approach to facilitate purification of recombinant proteins with a novel metal chelate adsorbent. Biotechnology (N Y), 6(11), 1321-5.
[2] Bornhorst, J. A., & Falke, J. J. (2000). Purification of proteins using polyhistidine affinity tags. Methods in Enzymology, 326, 245-254.
[3] Smith, D. B., & Johnson, K. S. (1988). Single-step purification of polypeptides expressed in Escherichia coli as fusions with glutathione S-transferase. Gene, 67(1), 31-40.
[4] Guan, K. L., & Dixon, J. E. (1991). Eukaryotic proteins expressed in Escherichia coli: an improved thrombin cleavage and purification procedure of fusion proteins with glutathione S-transferase. Analytical Biochemistry, 192(2), 262-267.
[5] di Guan, C., Li, P., Riggs, P. D., & Inouye, H. (1988). Vectors that facilitate the expression and purification of foreign peptides in Escherichia coli by fusion to maltose-binding protein. Gene, 67(1), 21-30.
[6] Kapust, R. B., & Waugh, D. S. (1999). Escherichia coli maltose-binding protein is uncommonly effective at promoting the solubility of polypeptides to which it is fused. Protein Science, 8(8), 1668-1674.
[7] Hopp, T. P., Prickett, K. S., Price, V. L., Libby, R. T., March, C. J., Cerretti, D. P., Urdal, D. L., & Conlon, P. J. (1988). A short polypeptide marker sequence useful for recombinant protein identification and purification. Biotechnology (N Y), 6(10), 1204-10.
[8] Einhauer, A., & Jungbauer, A. (2001). The FLAG peptide, a versatile fusion tag for the purification of recombinant proteins. Journal of Biochemical and Biophysical Methods, 49(1-3), 455-465.
[9] Field, J., Nikawa, J., Broek, D., MacDonald, B., Rodgers, L., Wilson, I. A., Lerner, R. A., & Wigler, M. (1988). Purification of a RAS-responsive adenylyl cyclase complex from Saccharomyces cerevisiae by use of an epitope addition method. Molecular and Cellular Biology, 8(5), 2159-2165.
[10] Evan, G. I., Lewis, G. K., Ramsay, G., & Bishop, J. M. (1985). Isolation of monoclonal antibodies specific for human c-myc proto-oncogene product. Molecular and Cellular Biology, 5(12), 3610-3616.
[11] Beckett, D., Kovaleva, E., & Schatz, P. J. (1999). A minimal peptide substrate in biotin holoenzyme synthetase-catalyzed biotinylation. Protein Science, 8(4), 921-929.
[12] Prasher, D. C., Eckenrode, V. K., Ward, W. W., Prendergast, F. G., & Cormier, M. J. (1992). Primary structure of the Aequorea victoria green-fluorescent protein. Gene, 111(2), 229-233.
[13] Chalfie, M., Tu, Y., Euskirchen, G., Ward, W. W., & Prasher, D. C. (1994). Green fluorescent protein as a marker for gene expression. Science, 263(5148), 802-805.