1The Institute of Tandem Repeats, Sapporo 004-0882, Japan
2Department of Biology, University of Virginia, Charlottesville 22904, USA
Norio Matsushima, Biophysicist, The Institute of Tandem Repeats, Sapporo 060-8556, Japan, Tel: +81 11 886 0087; Fax: +81 11 886 0087; E-mail: firstname.lastname@example.org
Received Date: 07th October 2014
Accepted Date: 07th November 2014
Published Date: 10th November 2014
Miyashita H, Kretsinger RH, Matsushima N (2014) Comparative Structural Analysis of the Extracellular Regions of the Insulin and Epidermal Growth Factor Receptors whose L1 and L2 Domains have Non-Canonical, Leucinerich Repeats. Enliven: Bioinform 1(1): 005.
@ 2014 Dr. Norio Matsushima. This is an Open Access article published and distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.
Insulin receptor (IR) and epidermal growth factor receptors (EGFR) are members of the receptor tyrosine kinase super family. The extracellular regionsof both IR and EGFR contain two L domains. Many crystal structures of the extracellular regions of the IR and EGFR families have been determined in both the unliganded state and in complexes with ligands. The structures reveal that the L domains consist of four to six leucine rich repeats (LRRs); although, their amino acid sequences are highly variable. The present bio-informatic analysis reveals some features on the LRRs and the structures. We conclude that the LRRs in the L domains belong to a non-canonical motif differing from the known (canonical) motifs; the repeat units consist of two β-strands and the overall shape of the LRRs resembles a prism. To characterize the spatial arrangement of the two L-domains we propose two parameters; the distance between the two L domains (D) and the angle between the two axes showing the direction of the β-sheet stacking of the LRRs in the L domains (ψ). These two parameters, D and ψ, describe an essential feature of the structures and ligand induced structural changes.
Insulin receptor; Epidermal growth factor receptor; L domain; Leucine-rich repeats; Non-canonical LRR; Ligand interaction; Dimer; Toll-like receptor; Geometric analysis
CR: Cys-rich Region; EGF: Epidermal Growth Factor; EGFR: Epidermal Growth Factor Receptor; FnIII: Fibronectin type III domains; HCS: Highly Conserved Segment; IGF-1R: Insulin-like Growth Factor 1-Receptor; IR: Insulin Receptor; IRR: Insulin Receptor-related Receptor; LRR: Leucine-Rich Repeat; Nrg-1: Neuregulin-1; 3D: Three Dimensional; TGFα : Transforming Growth Factor-α :TLR: Toll-Like Receptor; VS: Variable Segment
The insulin receptor (IR) and epidermal growth factor receptor (EGFR) families are both members of the receptor tyrosine kinase super family [1-6]. The IR is a large, transmembrane, glycoprotein dimer consisting of several structural domains (figure 1). The N-terminal half of the ecto-domain contains two L domains (L1 and L2) separated by a cys-rich region (CR). The C-terminal half of the IR ectodomain consists of three fibronectin type III domains (FnIII), the second of which contains an insert region of about 120 residues. The IR family consists of IR, the insulin like growth factor 1-receptor (IGF-1R) and the insulin receptor related receptor (IRR). The IR and IGF-1R interact with insulin, IGF-I, and IGF-II. The IR mediates the pleiotropic actions of insulin. The IGFs act via IGF-1R to promote cell proliferation, survival, and differentiation. Binding of insulin to the IR leads to phosphorylation of several intracellular substrates, including, insulin receptor substrates (IRS1-4), SHC, GAB1, CBL and other intermediates involved in cell signaling. The IR family undergoes processing to form two polypeptide chains, α and β, that are assembled into a hetero-tetramer, or an (αβ)2 homodimer, stabilized by disulfide bonds (figure 1).
The EGFR ectodomain contains four domains - L1, CR1, L2 and CR2 (figure 1). The L1 and L2 domains are homologous . The EGFR family consists of EGFR (ErbB1/ HER1), ErbB2 (HER2/Neu), ErbB3 (HER3) and ErbB4 (HER4), and plays important roles in cell growth, differentiation, survival, and migration [8,9].The ErbB receptors interact with eleven polypeptide growth factor ligands including epidermal growth factor (EGF), transforming growth factor-α(TGFα) and neuregulin-1 (Nrg-1) . Drosophila ErbB2 binds an antagonist, Spitz ; although, ErbB2 lacks a known ligand . The EGFR family is activated by ligand-induced dimerization of the receptors; however, there is increasing evidence that the EGFR family is present as pre-formed, inactive, dimers prior to ligand binding .
The IR and EGFR family have multiple, intra chain disulfide bonds and N-linked glycosylation sites. A large number of crystal structures of the extracellular regionsof eight proteins of the IRs and EGFR families have been determined in the unliganded state and in complexes with ligands[12-43].Moreover, their structures in complex with various (therapeutic) antibodies including Cetuximab and Herceptin have been determined [15,17,21-23,25,29,31,32,35-39,41-43]. The unliganded extracellular regions of EGFR, ErbB3, and ErbB4 all adopt a characteristic “tethered” conformation in which the primary receptor dimerization site is occluded by intramolecular interactions between CR1 and CR2 . Upon ligand binding, the receptor undergoes a dramatic rearrangement of domains. The EGFR ligands, including EGF, bind simultaneously to both L1 and L2, forcing them to adopt the extended configuration that is capable of CR1-mediated dimerization.
The structures reveal that these L domains consist of four to six LRRs; although, the LRRs show extreme variability with major insertions in some of their repeats. LRRs are present in over 60,000 proteins. Each repeat of LRRs is typically 20-30 residues long and can be divided into an HCS (Highly Conserved Segment) and VS (Variable Segment). The HCS part consists of LxxLxLxxNx(x/-)L, in which “L” is Leu, Ile, Val, or Phe, “N” is Asn, Thr, Ser, or Cys, “x” is a non-conserved residue, and “-” is a possible deletion site [44-46]. There are eight classes of LRRs-“Typical”, “SDS22-like”, ”IRREKO”, “Bacterial”, “Plant specific”, “TpLRR”, “RI-like”, and “Cysteine-containing” [46-48]. Tandem LRR domains consist of a super helical arrangement of repeating structural units and fold into a horse shoe, a right-handed or left-handed helix, or a prism shape . Three residues at positions 3 to 5, xLx, in the HCS part form a short β-strand. These β-strands stack parallel and then the LRRs assume their super-helical arrangements.
Very recently, Miyashita et al. identified novel LRRs in over three hundred proteins from unicellular eukaryotes and bacteria. The HCS clearly differs from the canonical motifs with the consensus of VxGx(L/F)x(L/C)xxNx(x/-)L that is characterized by the addition of Gly between the first conserved Val and the second conserved Leu. However, the structure remains unknown.
Cellular Automata (CA) is a basic model of a spatially developed decentralized system, made up of various unique components called Cells.
There have been many determinations of the 3D structures of the IR and EGFR family [12-43]. Moreover, excellent reviews have been prepared in the families of IR and/or EGFR receptors [1-6]. However, it appears that there is no geometric analysis to provide quantitative analysis and comparison of the 3D structures.
Here we show that LRRs in the L domains belong to the non-canonical motif and describe some features of this structure. We propose two parameters to characterize the spatial arrangement of the two L-domains. The two parameters provide fundamental features of the ligand interactions in the IR and EGFR families.
There were forty-three PDB files for the ectodomains of the IR and EGFR families in the NCBI on June 24, 2014 (http://www.ncbi.nlm.nih.gov/). The files contain eight different proteins. They are IR and IGF-1R from human, human EGFR, ErbB2 from human, rat, and Drosophila, human ErbB3, and human ErbB4. The 3D structures of almost the entire ectodomains or the parts (such as L1-CR-L2 in the IR or L2 in the EGFR) have been determined [12-43].
Secondary structure assignment from the atomic coordinates of the IR and EGFR proteins were made by DSSP (http://crdd.osdd.net/raghava/ccpdb/beta2_up.php) (figure 2) . Sequence alignments of individual LRRs were made by MAFFT (http://www.genome.jp/tools/mafft/) (Supplementary material)  and finally by eye.
To understand the spatial arrangement of the L1 and L2 domains we propose two structural parameters, “D” and “ψ” (figure 3). “D” is the distance between the L1 and L2 domains. “ψ” is the angle between the two axes that indicate the direction of parallel stacking of the β-strands in the two L domains. The axis was calculated by a straight line fitting program in MATLAB (MathWorks) that fits a line in 3D space to a set of data points. The coordinates of the α-carbon (Cα) of the consensus leucine residue at position 5 (corresponding to the middle of each β-strand that forms a large β-sheet) in individual repeat units were used as the data points. “D” is the distance between the center of gravity in the two L domains (as defined by the average Cαcoordinates of LRRs in individual L domains).
The sequence analysis predicted six potential LRRs in the respective L domain. However, the structures reveal that in the IR families the L1 domain contain six LRRs and the L2 contains four or five LRRs, in which the β-strands of these LRRs stack parallel, while in the EGFR families L1 has four or five LRRs and L2 has five (table 1).The N- or C-terminal units of the six potential LRR unitsin the L domains usually do not form a β-strand and thus cannot participate in the stacking of parallel β- strands.
|1||Human IR||L1-CR1-L2||11||6||5||55.7||112||Free state||2HR7_A||2.32|
|“||L1-CR1-L2-(FnIII-1)-(FnIII-2) (achain)||10||6||4||45.8||125||Fab 83-7 heavy and light chains||3LOH_E||3.8|
|“||L1-CR1-L2-(FnIII-1)-(FnIII-2) (a chain)||11||6||5||43.6||125||Fab 83-7 heavy and light chains||2DTG||3.8|
|“||L1-CR1-L2-(FnIII-1)||11||6||5||49.8||162||Insulin A and B chains, Fab 83-14- heavyand light chains||3W14_E||4.4|
|“||L1 - CR1 - L2 - (FnIII-1)||11||6||5||49.5||160||“||3W14_F||“|
|L1 -CR1 - L2||11||6||5||51.1||103||Free state||1IGR_A||2.6|
|3||Human EGFR||L1 -CR1 - L2 - CR2||10||5||5||66.7||135||Cetuximab Fab lightchain and Fab heavy chain||1YY9||2.61|
|“||L1-CR1-L2-CR2||10||5||5||61||111||Cetuximab heavy and light chains, Nanobody/VHH domain EgA1||4KRO_A||3.05|
|Inactive (lowpH) complex|
|L1-CR1- L2-CR2||10||5||5||26.7||109||Herceptin Fab heavy and light chains,||1N8Z_C||2.52|
|"||L1-CR1- L2-CR2||10||5||5||27.5||104||Fab37 heavy and light chains||3N85_A||3.2|
|"||L1-CR1- L2-CR2||10||5||5||27.3||103||Pertuzumab Fab heavy and light chains||1S78_A||3.25|
|"||L1-CR1- L2-CR2||10||5||5||27.5||104||Immunoglobulin G-binding protein A||3MZW_A||2.9|
|"||L1-CR1- L2-CR2||10||5||5||27.2||107||Fab heavy and light chains||3BE1_A||2.9|
|"||L1 -CR1 - L2||10||5||5||26.9||106||Free state||2A91||2.5|
|5||Rat Erb-B2||L1-CR1-L2-CR2||10||5||5||27.3||105||Free state||1N8Y_C||2.4|
|6||Drosophila Erb-B2||L1-CR1-L2-CR2||10||5||5||31.6||120||Free state||3I2T_A||2.7|
|"||L1-CR1-L2-CR2||10||5||5||69.1||157||RG7116 Fab heavy and light chains||4LEO_C||2.64|
|"||L1-CR1-L2-CR2||10||5||5||72.4||150||Fab heavy and light chains||4P59||3.4|
|"||L1-CR1-L2||10||5||5||69.8||133||Fab DL11 heavy and light chains||3P11_A||3.7|
|"||L1-CR1-L2-CR2||9||4||5||68||121||Fab heavy and light chains||3U9U_E||3.42|
Table 1 Geometric parameters of the structures of the L1 and L2 domains in the IR and EGFR families
a The total repeat number of LRRs determined by the crystal structure.
b The repeat number of LRRs in the L1 domain determined by the crystal structure.
c The repeat number of LRRs in the L2 domain determined by the crystal structure.
d The distance between the L1 and L2 domains
e The angle between the two axes that indicate the direction of parallel stacking of the β-strands in the two L domains.
The consensus sequence may be represented by IxGxLxIxxNxLxxxxxxxxxL/FxxL/Cxxand the segments of IxGxL and NxL are frequently replaced by IxxGxL and “NxxL, respectively (figure 2A). Gly at position 3 is highly conserved. The VS is more variable. The HCS consists of a twelve residue stretch that is characterized by the addition of Gly between the first conserved hydrophobic Ile and the second conserved hydrophobic Leu. This is consistent with non-canonical motifs proposed by us . Three residues at positions 4 to 6, xLx, in the HCS (corresponding tothree residues at positions 3 to 5 in the canonical HCS) form a β-strand that is part of a large β-sheet (figure 2B). Moreover, two residues, xI, in which “x” is at the last position in the preceding repeat and “I” is at position 1, form an additional β-strand (figure 2B).
Most of the repeat units including the non-canonical units contain both a short β-strand of two to three residues and a longer β-strand of three to five residues forming large β-sheet (figure 2B). This β-β structural motif clearly differs from those of the other (canonical) LRR classes .
The non-canonical LRRs in individual L domains have two β-sheets. One consists of the parallel stacking of β-strand at positions 4 to 6, xLx, in the HCS (called the large β-sheet). The other is parallel β-strand stacking of the two residues xI. The overall shape of the non-canonical LRR domains resembles a prism (figure 3), as seen in the “TpLRR” class. A prism is also formed by β-helices consisting of other types of tandem repeats in proteins. A β-helix is a protein structure formed by the association of parallel β-strands in a helical pattern with two, three, or four faces [51-54].
The crystal structures of the extracellular regions of the eight proteins of the IR and EGFR families are available in the unliganded state [12-14,16,17,22,23,26,27,29-33,35-40,43].The extracellular regionsof the IR family in the crystals consist of L1-CR1-L2, L1-CR1-L2-(FnIII-1), orL1-CR1-L2-(FnIII-1)-(FnIII-2) ( chain), while those of the EGFR family contain L1-CR1-L2 or L1-CR1-L2-CR2 (table 1). The structures of L1-CR1 (in IR), L2 (in EGFR), and L1 (in Erb-B2) have also been determined [15,21,22,25,31,41,42].
The D’s of the IR family are smaller than those of the EGFR family except for ErbB2. The D (= 27Å) of ErbB2 is the smallest among the known structures of the receptors in the unliganded state. In contrast, the IR family has largerψs’s than does the EGFR family. The unliganded ErbB2sadopt an extended conformation as do the complexes of EGFR with EGF or TGFα and of the complex of Erb-B4 with Ngr-1. Consequently, the D of the unliganded ErbB2 is comparable to those of the EGFR and ErbB4 complexes, whileψ is slightly smaller.
Structures of the human IR have been reported in complexes with insulin (figure 4A) . Insulin interacts directly with the large β-sheet of the L1 domain. The αCT segment, residues 693-710, also contributes to interaction with insulin . The structures of EGFR complexes bound to ligands - EGF, TGFα, Nrg-1, or Spitz - have been determined (figure4B) [18-20,24,28]. In the EGFR family, the ligands directly interact with both the L1 and L2 domains of two intramolecular LRR domains through side chains of the residues involved in the parallel stacking of the β-strands forming the large β-sheet. Two intermolecular LRR domains also contribute to interactions with ligands including dsRNA and myeloid differentiation protein 2 (MD-2) in toll-like receptors (TLRs), and consequently the TLRs form homo-, or hetero-dimers [55-61].
The IR family is activated by ligand induced dimerization of the receptors [1-3]. The active dimeric form (in the liganded state) has almost equal or slightly smaller D than in the inactive dimeric form (in the unliganded state) and has largerψ(table 1 and figure 4A). The conformational change is small in comparison with that of the EGFR, as extensively reviewed by Ward et al. . Limited conformational change in IR upon hormone binding is compatible with a small angle X-ray scattering study of IGF-I binding to the soluble IGF-IR ectodomain .Thorough studies indicate that members of the EGFR family are present as pre-formed, yet inactive, dimers prior to ligand binding, as reviewed by Maruyama . He proposed two models for the mechanism of the activation the EGFR. In the models the active form is dimeric in the liganded state. The D’s of the active forms of human EGFR and Erb-4 are smaller than those of the inactive forms, while the ψ'sare larger (table 1 and figure 4B). The D and ψof the EGFR in an inactive (low pH) complex with EGF are more similar to those in the unliganded state than in the active liganded state. Furthermore, the D and ψof the liganded Drosophila Erb-2 are comparable with the unliganded form (table 1).
Consequently, the structural changes induced by protein, ligand interactions may be grouped into three categories based on just two parameters -△D = D (liganded) – D (unliganded) and △ψ= ψ(liganded) - ψ(unliganded) (table 1). The first category is seen in the IR-insulin complex where△D≈ 0Å and △ψ ≈ 34°. The second category is seen in the complexes of EGFR with EGF or TGF and of ErbB4 with Ngr-1; △D ≈ -30Å and △ψ ≈ 110°. The ErbB2-Spitz complex illustrates the third category; △D ≈ 0 Å and △ψ ≈ 0°.
The structures of the EGFR family revealed that the antibodies interact with the L1 domain or the L2 domain in which the extracellular regions adopt the “tethered” conformation, except for ErbB2. Consequently, the antibodies block the ligand binding site on the L1 or L2, and can prevent the ligands from interacting with the receptors. In addition, the antibodies will sterically prevent the extra-celluar region of the receptors from adopting the extended conformation that is required for dimerization .The values of the parameters “D” and “ψ”of ErbB3 and ErbB4 in complex with the antibodies are very comparable to those in the free state, as expected (table 1).
ErbB2 in complex with antibodies also adopts the extended conformation in the free state and in the liganded state. The two parameters do not change due to the formation of the complex. Thus, the antibodies do not interact with the L1 and the L2 domains.
The L1 and L2 domains of the IR and EGFR families consist of four to six LRRs. These LRRs have non-canonical motifs. Each repeating unit is represented by the structural unit β-βThe overall shape of the LRRs resembles a prism. The spatial arrangement of the two L-domains is well characterized by the two parameters consisting of the distance between the two L domains (D) and the angle between the two axes of the β-sheet stacking in the L domains (ψ).
6. Ward CW, Menting JG, Lawrence MC (2013) The insulin receptor changes conformation in unforeseen ways on ligand binding: sharpening the picture of insulin receptor activation. Bioessays 35: 945-954.
7. Bajaj M, Waterfield MD, Schlessinger J, Taylor WR, Blundell T (1987) On the tertiary structure of the extracellular domains of the epidermal growth factor and insulin receptors. Biochim Biophys Acta 916: 220-226.
10. Rutledge BJ, Zhang K, Bier E, Jan YN, Perrimon N (1992) The Drosophila spitz gene encodes a putative EGF-like growth factor involved in dorsal-ventral axis formation and neurogenesis. Genes Dev 6: 1503-1517.
12. Lou MZ, Garrett TPJ, McKern NM, Hoyne PA, Epa VC, et al. (2006) The first three domains of the insulin receptor differ structurally from the insulin-like growth factor 1 receptor in.the regions governing ligand specificity. Proc Natl Acad Sci USA 103: 12429-12434.
13. Smith BJ, Huang K, Kong G, Chan SJ, Nakagawa S, et al. (2010) Structural resolution of a tandem hormone-binding element in the insulin receptor and its implications for design of peptide agonists. Proc Natl Acad Sci USA 107: 6771-6776.
18. Lu C ML, Grey MJ, Zhu J, Graef E, Yokoyama S, et al. (2010) Structural Evidence for Loose Linkage between Ligand Binding and Kinase Activation in the Epidermal Growth Factor Receptor. Mol Cell Biol 30: 5432-5443.
24. Garrett TPJ, McKern NM, Lou MZ, Elleman TC, Adams TE, et al. (2002) Crystal structure of a truncated epidermal growth factor receptor extracellular domain bound to transforming growth factor alpha. Cell 110: 763-773.
31. Schaefer G, Haber L, Crocker LM, Shia S, Shao L, et al. (2011) A two-in-one antibody against HER3 and EGFR has superior inhibitory activity compared with monospecific antibodies. Cancer Cell 20: 472-486.
32. Mirschberger C, Schiller CB, Schraml M, Dimoudis N, Friess T, et al. (2013) RG7116, a Therapeutic Antibody That Binds the Inactive HER3 Receptor and Is Optimized for Immune Effector Activation. Cancer Res 73: 5183-5194.
36. Fisher RD, Ultsch M, Lingel A, Schaefer G, Shao L, et al. (2010) Structure of the Complex between HER2 and an Antibody Paratope Formed by Side Chains from Tryptophan and Serine. J Mol Biol 402: 217-229.
40. Garrett TPJ, McKern NM, Lou MZ, Elleman TC, Adams TE, et al. (2003) The crystal structure of a truncated ErbB2 ectodomain reveals an active conformation, poised to interact with other ErbB receptors. Mol Cell 11: 495-505.
41. Jost C, Schilling J, Tamaskovic R, Schwill M, Honegger A, et al. (2013) Structural Basis for Eliciting a Cytotoxic Effect in HER2-Overexpressing Cancer Cells via Binding to the Extracellular Domain of HER2. Structure 21: 1979-1991.
42. Zhou HH, Zha Z, Liu Y, Zhang HT, Zhu JJ, et al. (2011) Structural Insights into the Down-regulation of Overexpressed p185(her2/neu) Protein of Transformed Cells by the Antibody chA21. J Biol Chem 286: 31676-31683.
43. Garner AP, Bialucha CU, Sprague ER, Garrett JT, Sheng Q, et al. (2013) An antibody that locks HER3 in the inactive conformation inhibits tumor growth driven by HER2 or neuregulin. Cancer Res 73: 6024-6035.
53. Mitraki A, Miller S, van Raaij MJ (2002) Review: conformation and folding of novel beta-structural elements in viral fiber proteins: the triple beta-spiral and triple beta-helix. J Struct Biol 137: 236-247.