By Venera S. Kamburova, Ilkhom B. Salakhutdinov, Shukhrat E. Shermatov, Zabardast T. Buriev and Ibrokhim Y. Abdurakhmonov
Cotton is one of the most important crops in the world. The Gossypium genus is represented by 50 species, divided into two levels of ploidy: diploid (2n = 26) and tetraploid (2n = 52). This diversity of Gossypium species provides an ideal model for studying the evolution and domestication of polyploids. In this regard, studies of the origin and evolution of polyploid cotton species are crucial for understanding the ways and mechanisms of gene and genome evolution. In addition, studies of polyploidization of the cotton genome will allow to more accurately determine the localization of QTLs that determine fiber quality. In addition, due to the fact that cotton fibers are single trichomes originating from epidermal cells, they are one of the most favorable model systems for studying the molecular mechanisms of regulation of cell and cell wall elongation, as well as cellulose biosynthesis.
Currently, the cotton (Gossypium L.) is one of the most important textile crops in the world, producing natural and quality fiber. For example, in 2017/18, the cotton world production and use were estimated at 25.1 million tons [1, 2]. As predicted, world cotton production will grow and reaching 26.1 million tons in 2026 [3].
The Gossypium genus is represented by more than 50 species, divided after ploidy into two groups: diploid (2n = 2x = 26) and tetraploid (2n = 4x = 52) [1, 4]. Moreover, 45 of species are diploid, and five remained species are tetraploid [4]. Among them, the diploid species such as G. arboretum L., G. herbaceum L. and tetraploid G. hirsutum L. and G. barbadense L. are cultivated only [4, 5]. Consequently, this kind of diversity of Gossypium species is a suitable model for studying the evolution, domestication and polyploidy, also to study of ploidy effect on the most important agronomic traits of cotton (e.g. fiber quality), as well as the expression and inheritance of corresponding genes of interest [6].
Similar to most plants, the evolution of cotton was characterized by repeating cycles of whole genome duplication [1, 6, 7]. At the same time, a parallel level of cytogenetic and genomic diversity emerged during the global widespread of the cotton, that finally led to the appearance of eight groups of diploid (n = 13) species (groups A-G and K of genomes) [1, 6]. It should be noted that despite the existence of different types of polyploidy [1, 6], the most common type is allopolyploidy, when two differentiated genomes, usually of various species, are combined in one cell nucleus as a result of hybridization [1, 6].
Thus, allopolyploid duplication of the genome leads to numerous of molecular genetic interactions, interlocus concerted evolution, difference of genomic evolution rates, interlocus transfer of genetic material, and possibly to changes in gene expression [1, 6]. In addition, allopolyploidy may have stimulated the morphological, ecological and physiological adaptation of cotton through natural selection based on a higher level of variability such as a result of duplication of the gene set [1, 6].
For the same reasons, the genome duplication may have given new opportunity for cotton improvement by directional selection [7, 8]. Another important aspect of allopolyploidy is that not every allopolyploid has to strictly correspond to concept of the simple summation of the ancestral diploid genomes. In some cases, the fusion of two different genomes is accompanied by significant genomic reorganization and non-Mendelian genetic inheritance as result [7, 9].
Consider to the mentioned above, we would attempt to analyze the consequences of evolution of polyploids, including on genomic, epigenomic and phenotypic levels in this chapter.
According to molecular genetic data, the history of cotton evolution has amounted about 10–15 million years, after the Gossypium diverged from other Gossypieae [6, 10, 11]. In the same time, the evolution of eight groups of diploid species (genomic groups A-G and K) also occurred by the cotton widespread, that led to the arising of parallel level of cytogenetic and genomic diversity [1, 6, 11]. It should be noted that molecular genetic and cytogenetic studies show that the species lineages on genealogical tree of the genus coincide with genomic groups A-G, K, and AD and geographic origin [11, 12].
The evolution studies of the Gossypium have shown that the origination of tetraploid species proceeded by polyploidization of A- (African) and D-genomes (American) diploid species [1, 6, 11]. Alloploidization of these two genomes occurred about 1.5–2 million years ago, resulting in five different genomes: G. darwinii, G. tomentosum, G. mustelinum, G. hirsutum and G. barbadense, where the last two belong to cultivated species [13]. It was also proved that during the alloploidization process the G. arboreum and G. herbaceum were as receptors of A-genome and should be a predecessors, because all existing polyploid species contain the cytoplasm of the A genome. At the same time, the D-genome donor was appear G. raimondii [11].
After occurrence of the predecessor of allotetraploid species, at the initial stage of divergence led to the origination of two evolutionary lines of cotton with AD genomes: the first includes G. mustelinum (AD4 genome), the second one – all other species (AD1 – AD3 and AD5 genomes). In other words, the follow-up divergence of the second evolutionary line of AD genomes led to the emergence of recent allotetraploid cotton species such as G. hirsutum (AD1 genome), G. barbadense (AD2 genome), G. tomentosum (AD3 genome), and G. darwinii (AD5 genome) [11, 12].
One of an important evolutionary events for Gossypium was appear the domestication of four wild species. This selection was based on the length and quality of cotton fiber, which is anatomically specialized unicellular trichomes located on the surface of the epidermis of seeds [10, 11]. This sequential process led to the domestication of four species of cotton: two American – G. hirsutum and G. barbadense and two Afro-Asian – G. arboretum and G. herbaceum [11].
Followed phylogenetic studies have shown the trait of prolonged elongation of trichomes has appeared first time in the A/F-genomes. Possibly, it was the reason to domestication of G. arboretum and G. herbaceum (A-genome). Unlike A-genome a number of species with D-genome (G. thurberi, G. trilobum, G. davidsonii and G. klotzschianum, and three species of the Cauducibracteolata subsection) lack of clearly visible fibers [11, 12]. This suggests that the traits of prolongation of trichomes were probably inherited by the allotetraploid (AD-genome) from the A-genome [11].
Moreover, the domestication of cotton species led to a change not only in the length of the fiber, but also in the chemical composition of its: the fiber of wild species besides cellulose contains suberin, while in cultivated species it is cellulose only [11].
Summarizing the information mentioned above, it should be noted that the Gossypium diverged from other Gossypieae in the Pleistocene period eventualy. This genus has evolved in two ways: divergence at diploid species(genomic groups A-G and K) and allopolyploidization of A- and D-genomes, followed by arising of tetraploid species (AD1 - AD5-genomes). Besides this, the domestication of these species and artificial selection based on fiber quality have also greate influenced on evolution of cultivated cotton.
Polyploidization of eukaryotic genomes is an important evolutionary event that had a significant effect on the evolution of plants, including cotton [14, 15, 16]. Polyploids are divided into two large groups: autopolyploids and allopolyploids [17, 18, 19, 20]. The difference between these two groups basically lies in the hybridization type: intraspecific hybridization occurs in autopolyploids, while allopolyploids arise by the combination of processes such as interspecies hybridization and duplication of chromosomes [17, 20].
In turn, there are two types of allopolyploids: true and segmental allopolyploids. True allopolyploids emerged due to hybridization of distantly related species, but segmental allopolyploids through hybridization of closely related species with partially different genomes [20]. In this case, segmental allopolyploids can be considered as an intermediate type between true allopolyploids and autopolyploids [20].
In autopolyploids, the presence of more than two homologous chromosomes in the genome may lead to formation of multivalents during meiosis. It contributes to the polysomic type of inheritance of traits. Whereas, in true allopolyploids bivalents are formed, that leads to disomic inheritance of traits. At the same process, in segmental allopolyploids monovalent, bivalent and/or multivalent chromosome pairing is observed during meiosis [20].
The second mechanism is the fusion of unreduced gametes – the basic factor of the natural emergence of polyploidy. In this case, the fusion of unreduced gametes may lead to unilateral- (fusion with a typically reduced gamete) or bilateral polypolydization (fusion with another unreduced gamete) [20].
The formation of unreduced gametes can occur due to errors during meiosis. In this case, errors during meiosis I (first division restitution – FDR) can be a consequence of a fail to chromosome pairing in prophase I (synaptene/pachytene) or separation of homologous chromosomes in anaphase I [20]. At the same time, errors during meiosis II (second division restitution - SDR) occur in anaphase II due to the fail to separation and segregation of sister chromatids [20]. Both of FDR and SDR lead to a chromosome set doubling in gametes, resulted in dyads or triads formation [21].
Depending on the meiotic restitution mechanism, a polyploidization consequences will differ. Thus, after FDR, the heterozygosity level of unreduced gametes will be similar to the original gametes, while SDR leads to a decrease in the level of heterozygosity of unreduced gametes [20]. The heterozygosity level of a resulting polyploids will be of decisive importance both in the struggle for survival as well as by artificial selection.
Polyploidy had a significant effect on the evolution process and formation of species by increasing phenotypic variability, heterosis, and mutation resistance. On the other hand, in terms of evolution, allopolyploidization (interspecific hybridization) is more preferable due to the pronounced effect of heterosis, that manifest in increasing of biomass, growth and its rate, fertility and resistance of occured hybrids to stress [22]. Thus, in tetraploid cultivated cotton species (G. hirsutum and G. barbadense) the quality and yield of fiber are much higher than cultivated diploids (G. arboretum and G. herbaceum) [23].
Resuming the above, polyploidization is rather widespread phenomenon in plant evolution (the number of polyploid species is approximately ¼ of the total number of vascular plant species) [24]. At the same time, the polyploidy occurrence brings an evolutionary “benefit” to a species, increasing its chances in the struggle for survival.
The allopolyploidization process of cotton genome could not be considered as the simple sum of the A- and D-genomes. It has been shown that genome duplication leads to various molecular genetic interactions e.g.: interlocus consistent evolution, different rates of genomes evolution, interlocus transfer of genetic material and changes in gene expression [1, 6, 17].
Additionally, according to the latest molecular data tetraploid cotton species are at least paleo-octaploids, and diploid species are paleo-tetraploids. Due to this fact cotton may be a good model system for studying consequences of genome polyploidization [6, 9, 25].
In connection with the above, let us review the changes that occurred after polyploidization of the cotton genome.
Despite the fact that diploid Gossypium species have the same chromosome basic number (n = 13), the DNA length in different species widely varies from ~900 Mb in D-genomes to ~2500 Mb in K-genomes [1, 6, 17]. Moreover, the analysis of bivalents formation in the metaphase of meiosis also suggest that diploid cotton species are actually paleopolyploid organisms [6]. A number of studies have also shown that the ancestor of Gossypium went off through cycles of polyploidization, followed by the loss of a part of homologous genes and diploidization [6, 26, 27].
In this respect it should be noted that allopolyploidization of cotton has not only characterized by rearrangements at the chromosome level [1, 6]. This assumption was confirmed by both classical cytogenetic and molecular genetic data [1, 6]. Thus, cytogenetic data show that chromosomes of A- and D-genome less form bivalents after crossing of allotetraploids compared to diploid species hybrids [1, 6]. For example, hybrids of allotetraploids form less than one bivalent per cell in the meiotic metaphase, while hybrids of present diploids of A- as well as D-genome form, on average, 5.8 and 7.8 bivalents [1, 6].
Additionally, the analysis of the order and syntheny of genes in the A- and D-genomes as well as allopolyploid genomes (A versus At and D versus Dt) showed a low level of structural chromosome rearrangements with a retention of collinear linkage groups [28]. Along with this, AFLP analysis of nine artificial allotetraploid and allohexaploid cotton species showed a significant additivity of genetic loci [1, 6].
Summarizing the facts, it can be assumed that the cotton genome stabilization after polyploidization led to such reorganization of the original genomes that they were no longer able to homeological pairing [1, 6].
Thus, it can be concluding that the cotton genome is quite stable and genome stabilization is not achieved through structural rearrangements unlike some other plant models with polyploid genome.
As mentioned above, the genome size of different cotton species differs significantly even the same basic number of chromosomes [1, 6, 29]. This may be conditioned with a number mobile genetic elements (MGE) in the Gossypium genome [6]. Wu et al. (2017) have shown that the Gossypium genome contains a large number of MGE, particularly a long terminal repeat (LTR) retrotransposons in compare to Theobroma cacao (L.) and A. thaliana (L.) Heynh [30].
Moreover, the analysis of the genomes of G. raimondii, G. arboreum, and G. hirsutum showed that the greatest number of MGE, especially LTR-retrotransposons is observed in A- and AD-genome [6, 12, 31, 32]. However, the frequency of occurrence of Copia LTR retrotransposons is higher in G. raimondii (D5 genome) – the smallest genome size (885 Mb). At the same time, the occurrence frequency of the Gypsy LTR retroelements is higher in species with a large genome size [6, 32, 33, 34]. Additionally, it was established that the wide distribution of GORGE3 (Gossypium retrotransposable gypsy-like element) in A- and AD-genome was the reason for their upsizing [31, 32, 35, 36].
It has been also found that besides the genome resizing in various cotton species, MGEs have also affected on the expression of genes responsible for fiber development [30, 32]. Thus, in D-subgenome was observed the insertion of the Copia LTR retrotransposon into promoter region of the gene encoding the transcription factor GhMYB25. This well consists with the facts of hyperexpression of the D-genome homeolog in G. hirsutum [32]. Similarly, the insertion of the LINE retrotransposon into promoter of ethylene response factor (GhERF) gene in D-subgenome increases the expression level of the D-homeologue in compare to its A-copy [32].
It has been also suggested that the silencing of CICR (Chinese Institute of Cotton Research) LTR elements had an appreciable effect on the formation of allotetraploid cotton species, because the occurrence frequency of these MGEs is significant in the A-subgenomes, and practically not occur in the D-subgenomes [37].
Summarize this, presence of mobile elements in a genome, their polymorphism and occurrence frequency, probably had the significant influence on the cotton evolution. In addition, MGE are involved in regulation of activity of genes responsible for fiber quality.
Hereof the Gossypium has both diploid and tetraploid genome, it makes cotton an ideal model to study of the homeologous genes evolution and their expression after polyploidization.
As mentioned above, the extended trichomes elongation trait was probably inherited by the allotetraploid AD-genomes from the A-genome [11]. Further evolution of domesticated tetraploids (G. hirsutum and G. barbadense) was done under the influence of artificial selection directed on improving fiber quality. Its led to the asymmetric evolution of the A- and D-subgenomes. According Li et al. (2015) in G. hirsutum the mutation frequency and formation rate of single nucleotide polymorphisms (SNPs) within intergenic collinear regions of the Dt-subgenome were significantly higher than in the At-genome [31]. Meanwhile, established Ks values for pairs of collinear genes in the At- and Dt-subgenomes were less than in the corresponding diploid A- and D-genomes. It was also shown reducing of dN/dS ratio in Dt/D pair in comparison with T. cacao and similar indicators for At/A [31].
In addition, scientists have found a greater extension of total rearrangements in At-subgenome (372.6 Mb) compared to Dt-subgenome (82.6 Mb) by comparative study of interchromosomal rearrangements and SNP frequency in G. hirsutum and G. barbadense [38]. It was also shown that SNP frequency is increased in the At-subgenome in both G. hirsutum and G. barbadense by comparing the Dt-subgenome (5.95 per thousand nucleotides in At-subgenome versus of 5.81 in the Dt-subgenome) [38].
These data also show that allotetraploid genomes due to genetic redundancy are being under less pressure from stabilizing selection, and directed selection by fiber quality has a greater effect on the At-subgenome [31, 38].
The asymmetry of these subgenomes is also appeared by the mutation types occurring in allotetraploid genomes of G. hirsutum and G. barbadense. Thus, it was found that duplications in the At-subgenome were more conserved than in the Dt-subgenome of G. hirsutum. At the same time, there are more conservative deletions in Dt-subgenome compared to the At-subgenome of G. barbadense [39]. These data indicate that artificial selection during cotton domestication furthered the fixation of duplications in the At-subgenome in G. hirsutum, and deletions in the Dt-subgenome of G. barbadense. It may have contributed to the development of a higher fiber quality in Pima cotton that distinguishes the species from others [39].
Differences in subgenomes are also manifested by different occurrence of frequency and activity of MGE. Two independent research groups have found that MGE number in At-subgenome exceeded the same parameter in Dt-subgenome [31, 40]. At the same time, the frequency of LTR-Gypsy occurrence in the At-subgenome was significantly higher than in the Dt-subgenome [31, 40]. Li et al. (2015) have also found that subgenomes differ not only in the MGEs number within them, but also by transcriptional activity and location [31]. Thus, it was shown that the transcription level of both LTR-Copia and LTR-Gypsy was increased in the Dt-subgenome compared with the At-subgenome [31]. However, LTR-Copia were more active and more frequently located near the coding genes when compared to LTR-Gypsy [16].
The asymmetry is also manifested in the unequal expression of At- or Dt-homeologs, which regulate fiber development in cotton [31, 41, 42, 43]. The expression level of homeologs of some transcription factors (eg, MYB) was significantly increased in the At-subgenome [31]. And the comprehensive proteomic analysis of the fiber of allopolyploid species (G. hirsutum and G. barbadense) have shown that A-patterns of expression prevailed in G. hirsutum over ones in G. barbadense at different stages of fiber development. Thus, the expression level changed the direction of dominance from D-genome to A-genome [42].
Moreover, the results obtained using the RNA-seq technology on G. hirsutum have shown a shift on the level of homeologs expression towards the A-subgenome in allotetraploid cotton [44]. This shift of gene expression can be explained by the deactivation of homeologs in non-dominant D-subgenome due to negative regulators (miRNA and transcriptional repressors) [6, 44]. It was also established that genes in A-subgenome may be responsible for the fiber development by regulation of fatty acids biosynthesis/metabolism and microtubules growing process. While the genes in D-subgenome may be involved to the transcription regulation and stress response [44].
Thus, the analysis of the available data allows to speak about the asymmetric evolution of allopolyploid cotton subgenomes with a shift in dominance towards A-subgenome.
The fiber is one of the key point for domestication of four Gossypium species: two diploid G. arboretum and G. herbaceum (A-genome), as well as two tetraploid species G. hirsutum and G. barbadense (AD-genome) [11]. In the meantime, the domestication process of tetraploid species was independent, that have been confirmed both the sequencing data and significant differences in cotton fiber at the proteome level [42, 43, 45].
Cotton fiber is basically elongated single cell of seed epidermis (trichome) with a clear gradation of development stages: fiber initiation, elongation, secondary biosynthesis of the cell walls and maturation [33, 46, 47]. It first appeared among ancestral diploid cotton with A-genome after divergence with F-genome [1, 6, 48]. Allotetraploid species (AD genomes) have significantly higher fiber quality, that can be explained by the nucleotypic effect after allopolyploidization of A- and D-genome [48, 49].
Polyploidization has also led to increase of the number of nuclear genes associated with fiber development [47]. E.g., a number of studies have shown the content of Malvaceae specific genes of MIXTA family, encoding MYB transcription factors and regulating fiber development is significantly higher in allotetraploid species [50, 51]. Additionally, stabilization of the natural and artificial selection contributed a changes at the expression level of fiber development genes. It has been achieved either by epigenetic modifications (DNA methylation, miRNA and siRNA biogenesis) or by histones modification, among other factors [48, 52].
The fiber development in cotton is a complex process ensured by the coordinated action of many genes involvong to biosynthesis of polysaccharides, lipids and phytohormones, pro- and antioxidant system, calcium homeostasis, as well as transcription factor genes (MYB, C2H2, bHLH, WRKY and HD-ZIP) [40, 53, 54, 55]. At the same time, in tetraploid species, the expression and co-expression of genes at different stages of fiber development is different: some genes are expressed at the stage of fiber initiation, others - at the stages of fiber elongation and secondary cell walls biosynthesis [53, 54]. It has been shown that genes in the Dt-subgenome are predominantly expressed at the stage of fiber initiation, very important parameter to the fiber yield [1, 33].
The difference of gene expression level between G. hirsutum and G. barbadense was also established using whole genomes alighnment of both species. It was shown that a longer fiber of G. barbadense may be a result of more continuous activity of genes encoding sucrose transporter (GbTSTl), Na+/H+-antiporter (GbNHXl), aluminum-activated malate transporter (GbALMT16), vacuolar-localized vacuolar invertase (GbVIN1) and plasmodesmata (PD) [8].
It was also found that the fiber development in tetraploid is specified by gene expression in both At- and Dt-subgenome [1, 40, 48, 55]. Despite the fact that major genes for fiber quality were introduced into allopolyploids from A-genome, the genes in Dt-subgenome also take a significant effect on the fiber development in tetraploid cotton [48]. For example, several researchers on the base of an integrated genetic and physical map of fiber development genes supposed that a transcription factors regulating the expression of fiber genes in At-subgenome are transcribed in Dt-subgenome [1, 56].
Along with this, another research group has identified 811 positively selected genes (PSG) in G. hirsutum, 591 of them were associated with fiber development [40, 55]. Along with this, another research group has identified 811 positively selected genes (PSG) in G. hirsutum, 591 of them were associated with fiber development [40, 55]. Moreover, 58% of these PSGs were localized in At-subgenome, and 42% of PSGs were identified in the Dt-subgenome only. Moreover, it has been shown that PSGs in At-subgenome are associated with beta-D-glucan biosynthesis, regulation of signal transduction, as well as carbohydrates and sucrose biosynthesis. While, PSGs in Dt-subgenome determine the stress responses, which, as is known, reflect on fiber development [40, 55, 57].
All of these results were confirmed by studies of functional enrichment of proteins differentially expressed in cotton fiber [42]. The results of the study of proteome in G. hirsutum and G. barbadense have shown that the dominant expression pattern of G. hirsutum was more similar to A-genome (G. arboretum), while dominant expression pattern of G. barbadense was different dependent on fiber development stage, and switched from Dt- subgenome to At-subgenome [42]. In this case, the dominant patterns of At-subgenome produced the enzymes involved to biosynthesis of alcohols, monosaccharides and hexoses, while the patterns of Dt-subgenome produced proteins involved in various stress responses [42]. These results allowed to suggest that similarity in fiber appearance of these two species arose during evolution but through different pathways at the proteomic level [42].
The results obtained by genome sequencing of tetraploid G. hirsutum and diploid G. arboretum and G. raimondii have shown that difference of gene expression between G. hirsutum and G. raimondii was significantly higher than between G. hirsutum and G. arboretum [44]. It has been also demonstrated a shift of the expression level towards the At-subgenome, explained by the authors as an activation/deactivation of Dt-homeologs by negative regulators such as miRNA and transcription repressors. Deactivation of Dt homeologues was confirmed by a reduced number of nonfunctional genes in the Dt-subgenome [44]. The other authors have shown that the Dt-subgenome dominant pattern of G. hirsutum is associated with stress responses (genes encoding phosphatidylinositol phosphate kinase PIPK, PIP (internal plasma membrane protein), calmodulin (CaM), ethylene receptors and ethylene response factors (ERF), ABA receptors (PYR/PYL), protein kinase SnRK and protein kinase PP2C [8].
Thus, all of these data show that hybridization of A- and D-genome in allopolyploids had a significant effect on the fiber development in cotton due to both nucleotypic effect as well as changes and differentiation at the expression level of homeologuesof in At- and Dt-subgenome. Obviously, At-genes are associated with the fiber development, while Dt-genes regulate the activity of At-genes towards to fiber quality and determine the adaptive capabilities of allotetraploid cotton to adverse environment conditions [8, 42, 44].