François Anthony; Gennady Ershov; Alexis Dereeper

doi:10.19080/IJESNR.2022.30.556296

Research Article

Reconstructing the Evolutionary History of Nucleotide-Binding Site (NBS) Genes in Euasterids

François Anthony*, Gennady Ershov and Alexis Dereeper

Institut de Recherche pour le Dévelopement (IRD), UMR Interactions Plantes Microorganismes Environnement (IPME), France

Submission: June 20, 2022; Published: July 04, 2022

*Corresponding author: François Anthony, Institut de Recherche pour le Dévelopement (IRD), UMR Interactions Plantes Microorganismes Environnement (IPME), 34394 Montpellier Cedex 5, France

How to cite this article: Mohamed A, Cahit Y, Muhammet Y. Effect of Various Kinds of Stilling Basin’s Baffle Blocks Arrangement on River Bed Scour.Int J Environ Sci Nat Res. 2022; 30(4): 556296. DOI: 10.19080/IJESNR.2022.30.556296

Abstract

Most resistance (R) genes contain a nucleotide-binding site (NBS) domain characterised by several conserved motifs. Recent whole-genome sequencing data gave us the opportunity to explore the evolutionary history of NBS genes in euasterid clades, including tomato, potato, and for the first time coffee and monkey-flower. Two eurosid species (Arabidopsis, grapevine) were used as outgroups. A workflow based on hidden Markov model searches was designed to identify genes with a complete NBS domain. The coffee genome has the highest number of NBS genes reported in plants. Eight conserved motifs were easily identified in the NBS domain of euasterids, including the P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, GLPL, RNBS-D and MHDV. Differences were detected between the composition, clustering and origin of the NBS genes in euasterid species and those in eurosid species. The study of complex clusters with at least ten NBS genes revealed several patterns of tandem duplication with transfer to a contiguous site or to a more distant one. Tandem duplication appeared to be a continuous mechanism over time since eight gene pairs had zero diversity. The study of orthologous relationships revealed that most NBS genes arose from duplication of paralogues in a few orthologous groups. Evolution of NBS genes was inferred from an analysis of synonymous and non-synonymous substitutions in the orthologous groups. Traces of 11 major large-scale duplication events were observed and dated in the euasterid genomes. Specific ancestral signatures of large-scale duplication events were identified in the genomes.

Keywords: Euasterid; Evolution; Nucleotide-binding site domain; Orthology; Resistance gene analogues

Introduction

Pathogens have been a major threat in agriculture and breeding programmes for decades [1,2]. Changes in pathogen distribution and disease severity could be a response to certain aspects of climate change, and may increase crop yield losses [3,4]. In the face of pathogen diversity (bacteria, fungi, insects, nematodes, oomycetes, virus), plants have set up a sophisticated defence system to detect attacks and activate innate immune responses [5,6]. Plant responses are governed by nonspecific transmembrane pattern recognition receptors (PRRs) and cytoplasmic immune receptors encoded by resistance (R) genes. The products of R genes play a critical role in recognizing proteins (effectors), which are introduced into plant cells by pathogens, and in triggering various defence responses including localized cell death [7,8]. Despite the diversity of pathogen attacks, R proteins share a high degree of homology and present a number of conserved motifs and domains among plant species [9]. Since the discovery of the first R gene in a plant in 1992, the Hm1 gene inmaize [10], more than 100 R genes have been cloned from different plant species (http://prgdb.crg.eu/wiki/). The majority encode a nucleotide-binding site (NBS) domain and leucine-rich repeat (LRR) domains. The NBS domain forms part of a larger domain known as NB-ARC, which is present in the human apoptotic protease-activating factor-1 (APAF-1), the Caenorhabditis elegans death-4 protein (CED-4) and plant R proteins [11,12]. This domain contains the three-layered α-β fold and subsequent short α-helical region characteristic of the AAA+ ATPase domain superfamily [13]. In NBS-LRR-encoding genes, the LRR domain interacts with the product of pathogen AVR genes directly or indirectly and is thus involved in recognising R protein specificity [5,14,15]. The deduced NBS-LRR proteins can be divided into two subfamilies based on their N-terminal features [16]. (i) TIR-NBS-LRR (TNL) proteins contain an N-terminal domain which is similar to both the intracellular signalling domains of Drosophila Toll and the mammalian Interleukin-1 receptor (TIR). (ii) non-TIR-NBS-LRRproteins often present a predicted N-terminal coiled-coil (CC) structure and are collectively named non-TIR proteins. In addition to architectural differences, NBS-LRR-encoding genes in these subfamilies differ considerably in their phyletic distribution and downstream signalling pathways, suggesting possible divergence in their functions [6,17].

Several conserved motifs have been identified throughout the NBS domain of non-TIR and TIR proteins, including P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, GLPL, RNBS-D and MHD [16]. The functional importance of these motifs is documented by the effect of mutations of motif residues which lead to either loss-offunction or auto-activation (i.e., a hypersensitive response in the absence of a pathogen or AVR protein) of the NBS-LRR protein [9]. The RNBS-A, kinase-2 and RNBS-D motifs display different features in non-TIR and TIR proteins, and these can be used as specific signatures to separate the two subfamilies of NBSencoding (NBS) genes. Based on these highly conserved motifs, R gene analogues (RGAs) have been discovered using different genome-wide approaches with degenerate primers [18,19], BLAST [20,21] or HMMER searches [22,23]. Hundreds of NBSLRR genes are generally detected in plant genomes, underlining their duplication dynamics and the key role played by these genes.

In recent years, genomic organisation, phylogenetic reconstruction and evolutionary patterns of NBS-LRR-encoding genes have been investigated extensively in plant genomes [24- 28]. The Arabidopsis genome has become a reference for manygenomic studies, including R gene analyses, because it was the first plant genome to be sequenced [29]. The main findings pointed to a major role of clustering in R gene expansion and to some basic evolutionary mechanisms, such as interlocus gene conversions within clustered R genes, and tandem and segmental duplication [30,31]. Although R genes in a cluster often display evidence for intergenic exchange, paralogues can also diverge considerably from one another [32]. Therefore, the evolution of R genes in plant genomes appears to be a complex process.

Recent advances in whole-genome sequencing gave us the opportunity to explore RGAs in the euasterid clade. Wholegenome data from coffee [33] and monkey-flower [34] were used for the first time in a comparative study of R genes, along with those from potato [35] and tomato [36]. Potato and tomato are members of the Solanaceae family, which contains several major food crops. Coffee belongs to the Rubiaceae family, which is also a family of economic importance. Monkey-flower belongs to an intermediate lineage (Figure 1). Data from two genomes belonging to the eurosid clade, Arabidopsis thaliana [29] and grapevine [37], were included in the analysis as outgroups. We designed a workflow to identify R genes presenting a complete NBS domain and used the same procedure for all genomes. We then conducted a comparative analysis to characterise composition, clustering and selection pressures. Our findings enabled us to retrace the evolutionary history of NBS genes in euasterids and provided insights into genome evolution.

Materials and Methodsy

Identification of NBS genes

Predicted gene sequences were downloaded from http:// coffee-genome.org/ for Coffea canephora, http://www. phytozome.net/ for Mimulus guttatus, http://solanaceae. plantbiology.msu.edu/ for Solanum tuberosum, http:// solgenomics.net/ for S. lycopersicum, http://www.genoscope. cns.fr/spip/Vitis-vinifera-whole-genome.html for Vitis vinifera, and http://www.arabidopsis.org/ for Arabidopsis thaliana. Only a scaffold assembly was available for the monkey-flower genome. For each genome, predicted protein sequences matching the Pfam NBS family (NB-ARC domain PF00931) (http://pfam.sanger. ac.uk) were identified using HMMER search 3.0 (http://hmmer. janelia.org) with an E-value cut-off of 10-60 (Figure S1). These sequences were aligned using HMMER align and used to construct a NBS hidden Markov model (HMM) specific to each genome using HMMER build 3.0. New HMM searches (E-value cut-off of 0.01) led us to select specific sets of NBS candidate genes. The candidate genes were submitted to the National Center for Biotechnology Information’s (NCBI) Conserved Domains tool [38] for validation of the presence of an NBS domain (http://www.ncbi.nlm.nih.gov/ Structure/cdd/wrpsb.cgi). Only sequences with a complete NBS domain at both N- and C-termini were retained for subsequent analyses. The identified sequences are available at the GreenPhyl website (http://www.greenphyl.org/cgi-bin/index.cgi) [39].

Subsequently, the NCBI-CD tool was used to determine whether the corresponding NBS proteins presented TIR and LRR motifs. CC domains were specifically identified using COILS/ PCOILS version 2.2 [40] (P ≥ 0.9) and PAIRCOIL2 [41] (P ≤ 0.025). The NBS genes were then classified according to detailed information on protein motifs and domains. Multiple alignments of amino acid sequences were performed for each genome using MAFFT [42]. The resulting alignments were manually cleaned to remove sequences with poor ends and incomplete motifs using MEGA (Molecular Evolutionary Genetics Analysis) version 5.2 [43].

Prediction of conserved motif structures

The structural diversity of the NBS proteins identified was studied through analysis of conserved motifs and domains. Non-TIR-NBS-LRR (CNL) and TIR-NBS-LRR (TNL) protein sequences were characterised using MEME (Multiple Expectation maximization for Motif Elucidation) [44] with specific conditions: (i) the optimum motif width was set at ≥ 6 and ≤ 50 and (ii) the maximum number of motifs to find was successively set at 15 and 30. The consensus sequences of each motif were aligned using MUSCLE [45]. Similarity and dissimilarity among sequences were further checked in the complete alignment regions. Core sequences containing amino acids conserved in all euasterid consensus sequences were determined for each motif.

Gene clusters

As clustering of R genes frequently occurs in plant genomes, physical clusters of NBS genes were investigated in the six plant genomes studied. We used the same parameters to define a cluster as Holub [46]: a chromosome region which contains more than three genes in a distance of less than 200kb. In monkey-flower, this approach was used on scaffolds since pseudomolecules were not available. Complex clusters, i.e., clusters containing at least 10 NBS genes, were further investigated. Their nucleotide sequences were aligned by MUSCLE and the sequences corresponding to the NBS domains were extracted using the previously built alignments of amino acid sequences. Both ends of the NBS domains were defined using the core sequences of the P-loop and MHD motifs previously determined in euasterids (see section headed “Prediction of conserved motif structures”), e.g., IVGxGGxGKTT and MHDxxxD for CNL proteins respectively. Nucleotide diversity (π) between pairs of NBS domain sequences belonging to the same cluster were calculated using EggLib (Evolutionary Genetics and Genomics Library) tools version 2.1.5 [47]. Gene families were defined based on an π criterion < 20% [48]. Recent duplications of NBS genes were then identified assuming that low levels of diversity (< 5%) between nucleotide sequences corresponded to the most recent duplications [49]. Levels of nucleotide diversity of less than 10% and 20% were also used to retrace the chronology of duplication events in multi-gene families.

Orthology

A clear distinction between paralogues and orthologues is critical for the construction of a robust evolutionary classification of genes [50]. The NBS genes from the six plant genomes were assigned to orthologous groups (orthogroups) using the OrthoMCL database 5 [51]. The sequences within the orthogroups were aligned using MAFFT. Their NBS domains were then cut following the method described above (see section headed “Gene clusters”). R genes (http://prgdb.crg.eu/wiki/) belonging to the same orthogroups were included in the datasets. Phylogenetic trees were constructed based on the neighbour-joining (NJ) method [52] with a Jones-Taylor-Thornton correction model implemented in MEGA. Branch lengths were assigned by pairwise calculation of the genetic distances. The confidence at each branching node was assessed by bootstrap analysis [53] with 500 replicates. Missing data were treated by pairwise deletion of the gaps. The distribution of the genes in the orthogroups was then used to classify the clusters as homogeneous when all sequences shared a common ancestor, or as heterogeneous in the case of more distantly related NBS genes [54].

Ks and Ka analyses

TThe timing of the divergence of homologous genes and the selective pressures on duplicated genes were estimated by calculating the synonymous (Ks) and nonsynonymous (Ka)substitutions per site between NBS genes. The nucleotide sequences within orthogroups were aligned using MAFFT and the sequences corresponding to the NBS domains were cut. Pairs with a nucleotide diversity (π) < 0.05 were eliminated because their Ks (as denominator) were too small to obtain a reliable estimate [55]. Ks and Ka values from pairwise alignments of NBS domain were calculated using the Nei-Gojobori [56] method of model averaging using KaKs-calculator 1.2 software [57]. Assuming that synonymous changes approximate the neutral rate of molecular evolution [58], the relative age-distribution of gene duplicates within a genome can be inferred indirectly from the distribution of Ks [59]. A Ka:Ks ratio >1 generally indicates a positive or diversifying selection for amino acid substitution. Conversely, a Ka:Ks ratio <1 corresponds to a negative or purifying selection [60].

Results

Identification and classification of NBS genes

A total of 5,998 candidate genes encoding NBS domains were detected in the six plant genomes with our HMM workflow (Figure S1) but only 2,151 genes were validated using NCBI-CD (Table S1). Both the absolute number and relative proportion ofNBS genes were significantly higher in coffee (670, 2.62%) and grapevine (459, 1.51%) than in the other genomes (113-373, 0.42-1.11%). Among the validated genes, around 20% presented an incomplete NBS domain according to NCBI-CD and were not retained for further analyses. Differences in the composition of conserved domains associated with the NBS domain were observed between euasterids and eurosids. In euasterids, both the number and proportion of NBS genes containing CC domains were much higher than those containing TIR domains. Around 52.9% of NBS genes presented CC domains in euasterids while only 4.8% presented TIR domains. No TIR domains were detected in monkey-flower and only a few (9) in coffee. The proportion of NBS genes with LRR regions ranged from 13.4 to 24.5% in euasterid genomes. This proportion was higher in grapevine (51.7%) andArabidopsis i4.8%). Including other NCBI conserved domains, the number of conserved domains associated with NBS ones ranged from six to 14 in euasterid genomes while there were 18 in Arabidopsisand 22 in grapevine (Table S2). Out of a total of 39 conserved domains, only 11 were shared by euasterids and eurosids. Except the LRR_8 motif, the most frequently conserved domains associated with NBS domains were related to the ATPase family (AAA and AAA_16), which can act as DNA helicases and transcription factors.

^*CC coiled-coil domain, D DUF3542 domain, LRR leucine-rich repeat domain, NBS nucleotide binding site, TIR Toll/Interleukin-1-receptor, WWRKY domain, X RPW8 gene of resistance.

Based on their N-terminal and C-terminal NCBI-CD domains, the NBS genes formed 15 subclasses in euasterids and 26 in eurosids (Table 1). The majority of NBS genes were classified in four subclasses: CNL, CN, NL, and N. The NBS genes without TIR or CC N-terminal domain were then aligned and divided into TIRNBS and non-TIR-NBS genes according to their NBS signature [61].

Analysis of conserved motif structures

Structural divergence and conserved motifs shared among NBS domains were examined by analysing the predicted CNL and TNL proteins using MEME software. Eight conserved motifs were easily identified in the NBS domain, including the P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, GLPL, RNBS-D and MHDV. Divergence differed within and between CNL and TNL proteins (Table 2). A high level of similarity was found between CNL and TNL proteins in the P-loop, kinase-2, RNBS-B, GLPL, and MHDV motifs. The RNBS-A, RNBS-C, and RNBS-D motifs were dissimilar. The core motifs of euasterid consensus sequences were remarkably conserved in several motifs: 14 residues were conserved in the kinase-2 motif of TNL proteins (KVLIVLDDxDxxDxLxxLAG), 13 in the P-loop motif of CNL (IVGxGGxGKTTLAxxxYN) and TNL (GIWGxxGIGKTTxAKA) proteins, 12 in the GLPL motif of CNL (IxxxCxGLPLAxxxxGxLLRxK) proteins, and 11 in the GLPL motifof TNL (ExxxxExVxxAxxLPLALKV) proteins. Two consensus sequences of RNBS-A, kinase-2 and RNBS-B motifs were identified in the CNL proteins of potato. Some substitutions around the core motif residues were specific to euasterid (most CNL and TIR motifs) and Solanaceae (RNBS-A, Kinase-2, RNBS-C, RNBS-D and MHD of TNL proteins and RNBS-C of CNL proteins) sequences (Table 2).

Genomic organisation of NBS genes

Clustering analysis of NBS genes revealed different genomic organisations in euasterids. One hundred and nineteen clusters according to Holub’s [46] definition were detected in the six plant genomes (Table S3). The number of clusters varied from eight in tomato to 40 in coffee (Table 3). The proportion of clustered NBS genes was 25.2% in tomato, 38.8% in coffee, 40.4% in potato, and 57.8% in monkey-flower. The average number of NBS genes per cluster was higher in monkey-flower (9.1) and grapevine (8.1) than in other genomes (4.8-5.6). In addition to clusters, NBS genes were also grouped in triplets and doublets. Finally, the ratio of NBS genes in a cluster or in a tandem array was 58.9% in tomato, 66.4% in potato, 70.1% in coffee, and 79.9% in monkey-flower. The maximum number of NBS genes in a cluster was 33 genes (monkey-flower CL16).

The physical organisation and sequence diversity of complex clusters of at least ten NBS genes were then characterised. Ten complex clusters were identified in euasterids and fourin grapevine, but none in Arabidopsis (Figure 2). All complex clusters in euasterids were composed of non-TIR sequences while in grapevine there were two clusters (CL09, CL16) with TIR sequences. The grapevine CL09 contained four TIR members followed by ten non-TIR members, suggesting the presence of two distinct clusters. The largest complex clusters covered more than 909 kb (monkey-flower CL09 and grapevine CL16), whereas the shortest cluster spanned only 151kb (potato CL04) (Table 4). The number of predicted genes in a complex cluster varied from ten (potato CL04) to 115 (monkey-flower CL16). The majority of clustered NBS genes (97/165) occupied neighbouring positions in the complex clusters. Examples are monkey-flower CL16 which had 33 members, of which 23 were consecutive, and potato CL04, which is a chain of 10 NBS genes with no non-NBS genes. Average length of clustered NBS genes was lower in euasterids (2,723- 3,804nt) than in grapevine (4,991-11,352nt). Similar differences were also observed in the non-NBS genes located in the complex clusters (1,812-3,086nt in euasterids vs. 3,785-8,924nt in grapevine). The average distance between clustered NBS genes ranged from 12,424nt (potato CL04) to 46,741nt (coffee CL23), reflecting variable gene density. Nucleotide diversity (π) between pairs of clustered NBS genes revealed different diversification patterns among complex clusters. The minimum values were low in all clusters (< 0.06) whereas the maximum values were either low, i.e., 0.153 in potato CL04, 0.157 in monkey-flower CL14 and 0.183 in coffee CL23, or high, i.e. 0.454 in monkey-flower CL09, 0.464 in CL13 and 0.469 in CL16 (Table 4). Computed nucleotide diversity (π) between each pair of NBS domain sequences was then used to analyse relationships among the genes clustered in complex clusters (Figure 3). In monkey-flower CL14 (Figure 3c), coffee CL23 (Figure 3e) and potato CL04 (Figure 3f), all gene pairs presented low diversity and belonged to different gene families, suggesting a recent origin for these clusters. By contrast, the other complex clusters (monkey-flower CL09, CL13 and CL16) showed more diversified sequences, suggesting an ancient origin (Figure 3a, 3b & 3d). The most recent duplications (π < 5%) were observed in five out of six complex clusters of euasterids (Figure 3a-3e) and in the four complex clusters of grapevine (Figure 3g-3j). They involved two (e.g., monkey-flower CL09 and CL13) or three (e.g. monkey-flower CL14) genes and were often accompanied by a translocation or an inversion.

Orthologous relationships of NBS genes

The 1,775 NBS genes identified in this study were assigned to orthologous groups (orthogroups) using OrthoMCL. In euasterids, 1,122 genes (90.3%) were distributed in 23 orthogroups (Table 5). Eight additional orthogroups were identified in eurosids. The orthogroups were composed of either CNL or TNL sequences. The proportion of genes assigned to an orthogroup was lower in coffee (82.6%) than in the other genomes (> 92%). Three orthogroups showed a large expansion in all euasterid genomes: OG5_160141 with 513 genes (45.7% of the genes assigned to OrthoMCL groups), OG5_134032 with 177 genes (15.8%) and OG5_173771 with 116 genes (10.3%). Together these orthogroups represented 64.8% of euasterid NBS genes. Eight orthogroups were composed of sequences from all euasterid genomes and were retained for phylogenetic analysis of NBS domains. They contained about three quarters of the NBS genes of euasterids. Two hundred and thirty grapevine genes but only five Arabidopsis genes classified in these orthogroups.

^* 0.019 – 0.084 between TIR sequences and 0.023 – 0.218 between non-TIR sequences.

For each orthogroup, sequences and R genes (http://prgdb. crg.eu/wiki) were aligned using MAFFT. Regions corresponding to the NBS domain were then selected and the evolutionary history of the sequences was inferred using the neighbour-joining (NJ) method [52]. Allelic diversities appeared to be structured in branches specific to each genome or to the Solanaceae family (Figure 4, S2 & S3). Indeed, most genes in potato and tomato grouped together and formed Solanaceae branches. Eighteen R genes were classified in euasterid orthogroups, 13 of which belonged to the Solanaceae family. Except for the Rpi-blb1 gene,the other R genes in tomato, potato and their wild relatives were classified in accordance with their origin, i.e., in branches which only contained Solanaceae NBS genes (Figure 4, S2 & S3a). Similarly, the RPM1 gene grouped with an Arabidopsis NBS gene (Figure S3d). Different mutation rates were associated with NBS gene expansion in the orthogroups and were three-fold higher in OG5_160141 (Figure 4) and OG5_134032 (Figure S2), than in OG5_143984 (Figure S3b).

Based on their orthogroup, the clusters of NBS genes were separated into “homogeneous” clusters when all clustered genes derived from a single ancestor, or in “heterogeneous” clusters when the genes originated from more than one ancestor. Thirtyfour Thirtyfour homogeneous and 45 heterogeneous clusters were identified in euasterids (Table 6). The proportions of homogeneous and heterogeneous clusters were similar in all genomes except for monkey-flower which had fewer homogeneous clusters (3/16). By contrast, the grapevine genome had a majority of homogeneous clusters (20/28). Most homogeneous clusters were classified in distinct clades, the others being mixed with NBS genes from the same genome (Figure 4, S2 & S3). Two complex clusters were homogeneous (coffee CL23 to OG5_173771 and potato CL04 to OG5_153323), while the others were heterogeneous. Cross analysis of gene positions and origins in the heterogeneous complex clusters revealed discontinuous distribution of orthogroups inthe clusters, suggesting translocation events that concentrated locally NBS genes from different orthologues (Figure 3).

The NBS genes not assigned to an orthogroup represented specific sequences with no known orthologues. These genes were distributed unevenly in the genomes: 97 genes from coffee, 20 from monkey-flower but only three from potato and grapevine, and one from tomato (Table 5). They formed four clusters in coffee and one cluster in monkey-flower (Table 6). In coffee, chromosomes 1 and 3 carried respectively 22 and 17 NBS genes without orthologues.

Synonymous vs. non-synonymous substitutions

The evolution of the NBS domains was investigated by analysing synonymous and non-synonymous substitutions in orthogroups containing NBS genes from the four euasterid genomes. Gene pairs with a nucleotide diversity (π) < 0.05 or with N/A (not applicable) value for Ks or Ka were excluded. Eight gene pairs had zero diversity, suggesting recent duplications: two in monkey-flower (mgv1a000811- mgv11b017348, mgv1a019980-mgv1a022166), two in potato (PGSC0003DMP400013227-PGSC0003DMP400011616, PGSC0003DMP400047655-PGSC0003DMP400047657), one in tomato (Solyc00g102400.2.1-Solyc06g008800.1.1) and three in grapevin (GSVIVP00004211001-GSVIVP00000786001, GSVIVP00005246001-GSVIVP00026462001, GSVIVP00030986001- GSVIVP00030997001).Kruskal-Wallis tests showed differences in π, Ka, Ks and Ka:Ks between orthogroups and genomes (Table S4). The average ratio Ka:Ks was higher in OG5_160141 (0.789-0.811) and OG5_134032 (0.718-0.952) than in OG5_173771 (0.426- 0.643). Differences between homogeneous and heterogeneous clusters were detected only for Ka using the Kolmogorov-Smirnov test (Table S5). All clusters were under purifying selection (Ka:Ks < 1), except one (monkey-flower CL04), under diversifying selection (Ka:Ks > 1). Its low diversity (0.07 < π < 0.11) showed that this cluster was recent.

The relative frequency of Ks revealed variable distribution patterns in the orthogroups. In OG5_160141, all genomes presented a Gaussian-like distribution with maximum Ks frequencies of around 0.6-0.7 for coffee and monkey-flower and 0.7-0.8 for potato and tomato (Figure 5a). A minor peak (0.1-0.2) was also present in potato. Such distribution patterns clearly showed that NBS genes of OG5_160141 arose from a major large-scale duplication event. In contrast, complex Ks distributions were observed in the other orthogroups. In OG5_134032, the distribution pattern was bimodal in all genomes, with the maximum around 0.6-0.7 and 0.1-0.2 (Figure 5b). In OG5_173771, the distribution was complex with several peaks in all genomes (Figure 5c). Relative frequencies of Ka showed similar distribution patterns to Ks in the species (Figure S4). Both Ks and Ka distributions were consequently shaped by the same evolutionary events.

The timing of duplication was estimated by building an evolutionary clock using grapevine as reference and based on fossil deposits for time calibration [62]. The Ks peaks of the three major euasterid orthogroups (Figure 5) were dated, and then grouped in 11 major events according to periods of time (Table S6). Three periods of time were observed following the divergence of Eurosids (Figure 6). The oldest period (83-116Mya) lasted until the divergence of coffee and showed differential contributions of orthogroups to today’s genomes. Five events affected the ancestral sequence of euasterids but ancestral NBS genes were conserved in the four euasterid genomes only in the third event and in a single orthogroup (OG5_134032). The other events left traces of duplication in one (second and fifth events) or two (first and forth) genomes. Interestingly the fourth event appeared to be at the origin of NBS gene expansion in OG5_160141 of both coffee and monkey-flower genomes, but without leaving any traces in the Solanaceae genomes. The following period of time (27-83Mya) occurred between these intensive duplication activities in the ancestral sequence and the recent period. This intermediate period was the longest, but only two minor events happened in a single orthogroup (OG5_173771). The most recent period (16-27Mya) included four minor duplication events, which all occurred during a relatively short period. One orthogroup (OG5_134032) was particularly concerned by these events.

An analysis of the NBS genes involved in duplication events revealed differential impacts on genome organisation. Duplication events involved several chromosomes in all genomes. For example, coffee chromosomes 3, 8 and 11 were the most affected by the events, with 53, 99 and 59 genes involved, respectively (Table S7). In potato, events 1 and 10 were the only ones to involve NBS genes of chromosomes 2, 7 and 12, as a specific signature left in OG5_160141. The coffee and Solanaceae genomes still had traces of important ancestral events (4 and 1, respectively) since all the chromosomes except one of potato were affected, indicating large-scale duplication events.

Discussion

Identification of NBS genes

Plant resistance to a range of pathogenic organisms (bacteria, fungi, insects, nematodes, oomycetes and viruses) is conferred by a diverse group of disease resistance proteins [63]. Classification of these proteins is primarily based on predicted domains and motifs. One of the largest families of resistance proteins encodes NBS domains. The recent availability of the complete genomic sequences of C. canephora and M. guttatus enabled us to perform comparative analyses of NBS domains in the euasterid clade. This clade encompasses many economically important crops such as coffee, potato and tomato, which were included in this study, others like cinchona, jasmine, sesame, and other Solanaceae crops (eggplant, pepper, sweet potato, tobacco, etc.). The cultivated varieties are susceptible to many pests and diseases, and production depends on huge amounts of pesticides. We designed a workflow to perform an automatic search for NBS genes using the predicted gene sequences of a given genome. The procedure was applied to six plant genomes with the aim of constructing datasets of genes with a full NBS domain rather than making a complete inventory. Such predicted genes are stored on the GreenPhyl website [39].

Comparison with previous studies [35,36,61,64] revealed small differences in the number of NBS genes detected. However these variations can be explained by the different methods used for the search (HMM, Basic Local Alignment Search Tool) and validation (NCBI, PFAM, InterProScan, InterPro) (data not shown). Instead of inventorying the NBS genes, our study focused on sequences having a complete NBS domain, excluding those with partial domains. This ensured that at least 80% of both Nand C-termini sequences were included in the alignment found by Reverse Position Specific-BLAST [38,65]. The NBS domain of euasterids is composed of eight conserved motifs in both TIR and non-TIR sequences. The core sequences of each motif resemble each other and are similar to eurosid motifs such as Populus trichocarpa [66],Theobroma cacao [67], Zea mays [68]and Cucumis sativus [69].In addition, they resemble the motifs of CNL sequences in monocots like Oryza sativa [49] and other more distant genera [70] confirming the remarkable conservation of the NBS domain in the plant kingdom..

In addition to conservation of the NBS domain, euasterid genomes displayed specific features compared with those of the eurosids. The conserved domains with which the NBS domain of euasterids is associated differ from those in eurosids. Out of a total of 39 conserved domains recognized by NCBI, seven were specific to euasterids and 21 to eurosids. This points to different genomic composition in the two clades. Similarly, only nine subclasses out of a total of 33 were common to euasterids and eurosids. The differences also included the TIR and LRR subclasses, which are underrepresented in euasterids. Expansion of TIR genes in eurosids but absence in monkey-flower has already been reported by Kim et al. [71]. The underrepresentation of LRR domains in euasterids suggests that the mechanisms of interaction with pathogen-derived molecules differ in euasterids and eurosids. Other differences were found in the orthogroups. Few eurosid sequences were assigned to the three main euasterid orthogroups. This suggests the phylogenetic origins of NBS genes differ in the two clades.

Insights into genome organisation and evolution

Our study of euasterid NBS genes provided insights into genome organisation and evolution. Examination of orthologous relationships may inform chromosomal rearrangements and guide the assembly of non-anchored sequences. Our results confirm that the coffee genome presents no sign of whole-genome polyploidisation since the γ triplication was seen to be at the origin of the core eudicots [33]. However, our study revealed ancient traces of large-scale duplications in the orthogroups whose impacts varied with the genome. The Solanaceae genomes have conserved genes of an ancestral duplication event around 110Mya, but not of the event around 94Mya which was at the origin of NBS gene expansion in the coffee and monkey-flower genomes. In return, no coffee and monkey-flower gene was found to date from the ancestral duplication displayed in the Solanaceae. This demonstrates that NBS genes evolved leaving specific signatures in genomes and that their evolution included significant elimination phases. This corresponds to the birthand- death model of evolution proposed for the vertebrate major histocompatibility complex genes and the immunoglobulin genes [72,73]. New genes are created by repeated gene duplication and some duplicate genes are maintained in the genomes for a long time while others are deleted or become nonfunctional. Four duplication events occurred relatively recently (between 16 and 27Mya), mainly in one orthogroup (OG5_134032) (Figure 6 & Table S6).

One of these events, the 10th, was previously observed in tomato and potato, and dated at 18.3-23.3Mya using substitution rates of cereals and Arabidopsis [74]. Analysis of new sequence data in related lineages will help identify the species which have undergone these duplication events and estimate their relative importance.

The NBS genes in the eurosid genomes included in our study provided information on their evolutionary history. The NBS genes in grapevine showed a high level of conservation of amino acid motifs similar to those in euasterids, especially in coffee. This is in accordance with previous results on microsynteny and collinearity observed in short regions of the grapevine and coffee genomes [75-77]. By contrast, the NBS genes in Arabidopsis differed in composition and organisation. The α and β wholegenome duplications, which occurred in the Arabidopsis genome following the divergence between eurosids I and II (Figure 1), likely split genomic rearrangements and relocated fragments of duplicated chromosomes around the genome, making comparison difficult.

Expansion and diversification of NBS genes

The proportion of euasterid genes that were predicted to encode a NBS domain is similar to estimates for other plant species, and ranges between 0.64 and 1.11%, except for coffee whose genome reveals an extreme expansion of NBS genes, up to 2.62% of the total number of predicted genes. This exceeds the highest proportion (2.05%) of NBS genes found in plants [78]. Expansion in the coffee genome is also reflected in the total number of NBS genes detected by HMM searches, which was higher in coffee than in the other genomes. The number of detected NBS genes in monkey-flower is probably underestimated since they were searched for in 2,216 scaffolds rather than in 14 pseudomolecules [34]. However a mega-cluster of 33 NBS genes (CL16) was identified in a scaffold, which is a record in plants.

Most NBS genes (90.3%) arose from the duplication of paralogues in only a few orthogroups. This expansion is clearly visible in orthogroup OG5_160141, which contains nearly half (45.7%) of the euasterid sequences identified in our study and 35.1% of the coffee sequences. Evidence for diversification in the orthogroups was revealed by examination of the evolutionary distances between branches in our phylogenetic analyses. The speed of evolution varied within the orthogroups, revealing independent diversification dynamics. Expansion in orthogroups was associated with dispersion in the genomes, sometimes involving blocks of NBS genes or single NBS genes. At chromosome level, the genomic dispersion of NBS genes could benefit from chromosomic rearrangements and segmental duplications. When homozygous, these insertions may contribute to the divergence of intergenic regions, since they tend to decrease the chance of misalignment and therefore of unequal crossing-over [79]. They thus open new evolutionary contexts for duplicated gene diversification and possibility for escape from homogenization within clusters [80]. Tandem and segmental duplications, but not whole-genome duplication, hence play a major role in NBS gene expansion in euasterids and in plants in general [48]. Besides expansion and diversification of paralogues, the NBS genes with no known orthologues are a valuable source of diversity despite their small numbers.

The physical distribution of NBS genes revealed that most euasterid sequences are organised in doublets (15.6%), triplets (12.8%) or clusters (41.6%). The coffee genome has at least twice as many clusters (40) as the other euasterid genomes (8-19) in relation with gene expansion. The study of complex clusters with at least ten NBS genes revealed several patterns of tandem duplication with transfer to a contiguous site or to a more distant one. Tandem duplication is undoubtedly an important mechanism to stimulate gene expansion in clusters. As reported in Arabidopsis, tandemly clustered R genes may be a reservoir of genetic variation from which new disease resistant specificities can evolve [79]. Eight recent duplications (π = 0) were detected in our study and three complex clusters comprised a single gene family with low diversity between all gene pairs (π < 0.2). Tandem duplication is thus a continuous mechanism over time. Moreover, it appears to be a universal mechanism since it has also been described in monocots and eurosids [81].

Among the clusters identified in euasterids, the majority were heterogeneous, comprising sequences derived from several orthologues. They probably originated from random associations among NBS genes from different orthologues rather than from diversification within homogeneous clusters as a result of diversifying selection. Heterogeneous clusters can derive from ectopic rearrangement events [61]. However, they present similar diversity and the same mode of evolution as the homogeneous ones, confirming strong homogenization of neighbouring sequences. Such homogenization is likely the result of frequent intergenic exchanges among NBS genes which are closely related and physically linked [82].

Evolutionary history of NBS genes

NBS genes have an ancient and complex origin in plant genomes. Their abundance allowed us to reconstruct their expansion and evolution in the euasterid clade. Our study based on orthology analysis revealed 11 large-scale duplication events. All but the 10th event have not yet been observed in plant genomes, which reinforces the interest of our orthology approach to access ancient genomic rearrangements. The oldest events pinpointed by our study occurred in the euasterid ancestral sequence and left specific signatures in present-day genomes. By contrast, the latest events happened in separate genomic environments after divergence occurred in coffee and monkey-flower. The synchronicity of these events in separate genomes may be related to massive and simultaneous pathogen attacks. The oldest duplication events in the ancestral sequence may represent an adaptive response to ancient pathogen diversification and spread.

Our approach based on orthologous relationships enabled us to retrace the history of NBS genes in the euasterid clade. The orthogroups have their own evolutionary dynamics with variable speeds of diversification and expansion in the genomes. Most orthogroups have been shaped by two large-scale duplication events, showing thus a possible reactivation in time. The synchronicity of large-scale duplication events in different genomes demonstrates the impact of living conditions on NBS gene expansion and diversification in the orthogroups. That is why the NBS genes represent a valuable source of information to understand genome evolution.

IJESNR.MS.ID.556296

Our Media Partner

IJESNR Menu

Useful Links

Downloads

Reconstructing the Evolutionary History of Nucleotide-Binding Site (NBS) Genes in Euasterids

François Anthony*, Gennady Ershov and Alexis Dereeper

Abstract

Introduction

Materials and Methodsy

Identification of NBS genes

Prediction of conserved motif structures

Gene clusters

Orthology

Ks and Ka analyses

Results

Identification and classification of NBS genes

Analysis of conserved motif structures

Genomic organisation of NBS genes

Orthologous relationships of NBS genes

Synonymous vs. non-synonymous substitutions

Discussion

Identification of NBS genes

Insights into genome organisation and evolution

Expansion and diversification of NBS genes

Evolutionary history of NBS genes

Acknowledgement

References

supplementary

Member In: