Near intron pairs and the metazoan tree
Graphical abstract
Highlights
► We use near intron pairs (NIPs) for phylogenetic inference concerning metazoans. ► We could reliably detect shared intron gain events via NIPs. ► The application to a broad collection of genomes reveals a plausible phylogeny.
Introduction
The evolutionary relationships of metazoan phyla still constitute a challenge for both morphological and molecular-based analyses. The traditional view arranges bilaterian metazoans into acoelomates, pseudocoelomates, and coelomates. Starting with the work of Aguinaldo et al. (1997), sequence data, initially mostly rDNA, have been used to establish a “new animal phylogeny” with far-reaching consequences: (1) the protostomes were divided into ecdysozoans (Aguinaldo et al., 1997) and lophotrochozoans (Halanych et al., 1995), and (2) several phyla representing apparently lower grades of complexity (Platyhelminthes, Nemertea, and Nematoda) were relocated amongst the coelomate groups at the crown of the tree. In contrast, some studies employing genomic datasets containing only a few taxa (e.g. Wolf et al., 2004) supported the monophyly of coelomates. Later studies, however, have shown that these results are likely artefacts, misled by a faster evolution of some genomes, such as that of Caenorhabditis elegans (Philippe et al., 2005). These are thus often excluded from phylogenetic datasets. In addition, conflicting signals are often obtained from mitochondrial, nuclear rRNA, phylogenomic and also morphological data (Trautwein et al., 2012). Despite a plethora of studies based on both molecular and morphological data, a consensus on the phylogenetic tree of metazoan phyla is still not in sight (Edgecombe et al., 2011). This concerns in particular the non-bilaterians (Dunn et al., 2008, Schierwater et al., 2009, Pick et al., 2010) and the Lophotrochozoa (Hejnol, 2010).
Characters resulting from structural changes of the genomic sequence, so-called rare genomic changes (RGCs), such as coding insertions/deletions (indels) (Belinky et al., 2010), spliceosomal intron positions (Irimia and Roy, 2008), and positions of mobile genetic elements (Kriegs et al., 2006, Kriegs et al., 2010), are expected to be less prone to homoplasy than substitution patterns of sequence data and hence provide valuable additional information to resolve conflicts in phylogenetic tree reconstruction. For holometabolic insects, novel phylogenetic hypotheses have been introduced on the basis of such characters, for example the basal position of Hymenoptera (Krauss et al., 2005). Later, this proposal received strong support by sequence-based analyses of single-copy nuclear genes and additional intron position data (Savard et al., 2006, Zdobnov et al., 2007, Krauss et al., 2008, Wiegmann et al., 2009). Another study used retrotransposon insertions to improve our knowledge about the basal branching order of rodents (Churakov et al., 2010). Earlier attempts to reconstruct the radiation of rodents are well-known to have suffered from long-branch attraction (LBA) artefacts.
The present study utilizes the conservation of positions of spliceosomal introns among orthologous coding sequences (CDS). Intron positions have already been used by several authors to resolve problematic branches of the metazoan tree (for review see Irimia and Roy, 2008). For instance, an intense debate emerged about the concepts of Ecdysozoa (Roy and Gilbert, 2005) and Coelomata (Zheng et al., 2007) using intron position data. Roy and Gilbert (2005) supported the taxon Ecdysozoa using a pattern of intron conservation. This was criticized by Zheng et al. (2007) by showing that intron loss rates within specific branches are strongly correlated. These authors argued that high rates of independent intron losses within the used nematode and arthropod species had misled the former study. However, in turn, Roy and Irimia (2008) identified several weaknesses of the latter analysis, among them biases in the procedure used to differentiate between intron gain and loss. Pointing to both large intron loss and gain rate variations, Roy and Irimia (2008) avoided a clearcut conclusion about the Ecdysozoa/Coelomata problem.
In order to reduce the impact of homoplastic characters due to parallel intron gains or losses, we specifically consider pairs of nearby introns. More precisely, a near intron pair (NIP) consists of two intron positions in an alignment of two or more orthologous genes that are separated by a small number of nucleotides. Exons smaller than about 50 nt are relatively rare (Saeys et al., 2007) and in general functionally detrimental (Weir et al., 2006). The two nearby intron positions are thus very unlikely to have coexisted. Under the assumption that parallel intron gain is very rare, a NIP can be used to parsimoniously infer an edge of the phylogenetic tree along which both intron loss and gain must have occurred, separating the species sharing one of the positions from those that share the other.
In previous work, we found some evidence that NIPs arise not only from uncoupled, successive processes of intron loss and intron gain, but also from intron sliding (Krauss et al., 2005, Krauss et al., 2008, Lehmann et al., 2010). For Drosophila we could show that some of the younger NIPs were indeed caused by shifts of splice donor and acceptor sites in relation to conserved CDS (Lehmann et al., 2010). In the same study, we used NIPs for a systematic investigation of intron gain mechanisms in Drosophila.
Encouraged by the successful application of NIPs to the phylogeny of holometabolan insects (Krauss et al., 2008, Niehuis et al., 2012), we here try to resolve the phylogenetic tree of animals based exclusively on NIP data from 45 metazoan and 3 outgroup taxa. Our results demonstrate the usefulness of NIPs as phylogenetic marker also for deep metazoan phylogeny. In particular, we evaluate the Ecdysozoa hypothesis, as well as the general agreement of our tree reconstructions with current proposals of metazoan phylogeny.
Section snippets
Compilation of ortholog dataset
Initially, we retrieved orthologous protein-coding genes from the Ensembl Compara database (release 67, May 2012) (Flicek et al., 2011) in the following manner: For a set of eight selected query species (Acyrthosiphon pisum, Caenorhabditis elegans, Drosophila melanogaster, Ixodes scapularis, Nematostella vectensis, Schistosoma mansoni, Strongylocentrotus purpuratus, and Trichoplax adhaerens), all protein-coding gene IDs with the status ‘Known’ were determined using Ensembl Biomart. Then, these
Collection and characterization of a large NIP dataset
Starting from selected Ensembl Compara ortholog predictions, we compiled a set of orthologs covering 48 taxa, comprising 12 metazoan phyla: Cnidaria, Placozoa, Ctenophora, Porifera, Annelida, Mollusca, Platyhelminthes, Chordata, Echinodermata, Hemichordata, Arthropoda, and Nematoda (see Supplementary Table S1, Supplementary Material online). Monosiga brevicollis (Choanoflagellata), Coprinopsis cinerea (Fungi), and Dictyostelium purpureum (Amoebozoa) were added as outgroups.
The automated alignment
Discussion
Here, we extracted for the first time near intron pairs (NIPs) from a broad collection of genomes and used them as binary genome-level character in maximum parsimony analyses to explore the information value of NIPs to reconstruct deep metazoan phylogeny. The resulting tree deviates remarkably from contemporary hypotheses of metazoan relationships. Notably, the unusually distributed taxa Mnemiopsis, Ixodes, and Ambulacraria prevented the tree from resolving major clades such as Bilateria,
Conclusions
Overall, our study demonstrates that near intron pair (NIP) data could be used to derive a working hypothesis of the metazoan phylogeny using MP as tree reconstruction method. In particular, the analysis of NIP characters appears superior to an approach based on simple intron presence/absence data. Corresponding tree searches using Dollo parsimony and Wagner parsimony resulted in topologies that are clearly worse than the NIP-based predictions, respectively, see Supplementary Figs. S8–S9. Thus,
Funding
This work was supported by the Deutsche Forschungsgemeinschaft (KR2065/2 to VK and STA850/6 to PFS). The Deutsche Forschungsgemeinschaft had no role in the design or interpretation of the study.
Acknowledgments
We gratefully acknowledge the availability of the sequencing data of the not yet published genomes of Aplysia californica, Capitella teleta, Danio rerio, Helobdella robusta, Heterorhabditis bacteriophora, Ixodes scapularis, Lottia gigantea, Mnemiopsis leidyi, Rhodnius prolixus, Saccoglossus kowalevskii, Schistosoma japonicum, and Schmidtea mediterranea. We would like to thank for the insightful comments of three anonymous reviewers on a previous version of this manuscript.
References (69)
- et al.
Testing the new animal phylogeny: a phylum level molecular analysis of the animal kingdom
Mol. Phylogenet. Evol.
(2008) - et al.
Nearly complete rRNA genes assembled from across the metazoan animals: effects of more taxa, a structure-based alignment, and paired-sites evolutionary models on phylogeny reconstruction
Mol. Phylogenet. Evol.
(2010) - et al.
Genomic and morphological evidence converge to resolve the enigma of strepsiptera
Curr. Biol.
(2012) - et al.
Intron sliding in conserved gene families
Trends. Genet.
(2000) - et al.
Quantification of insect genome divergence
Trends. Genet.
(2007) - et al.
Evidence for a clade of nematodes, arthropods and other moulting animals
Nature
(1997) - et al.
Large-scale parsimony analysis of metazoan indels in protein-coding genes
Mol. Biol. Evol.
(2010) transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences
BMC Bioinformatics
(2005)- et al.
BLAST+: architecture and applications
BMC Bioinformatics
(2009) - et al.
Three distinct modes of intron dynamics in the evolution of eukaryotes
Genome Res.
(2007)
A phylogeny of Caenorhabditis reveals frequent loss of introns during nematode evolution
Genome Res.
Rodent evolution: back to the root
Mol. Biol. Evol.
Origins of recently gained introns in Caenorhabditis
Proc. Natl. Acad. Sci. USA
Solvent exposure imparts similar selective pressures across a range of yeast proteins
Mol. Biol. Evol.
In search of lost introns
Bioinformatics
Tunicates and not cephalochordates are the closest living relatives of vertebrates
Nature
Broad phylogenomic sampling improves resolution of the animal tree of life
Nature
HaMStR: profile hidden markov model based search for orthologs in ESTs
BMC Evol. Biol.
MUSCLE: multiple sequence alignment with high accuracy and high throughput
Nucleic Acids Res.
Higher-level metazoan relationships: recent progress and remaining questions
Organ. Divers. Evol.
The retention index and the rescaled consistency index
Cladistics
Cases in which parsimony or compatibility methods will be positively misleading
Syst. Zool.
Ensembl 2011
Nucleic Acids Res.
Evidence from 18S ribosomal DNA that the lophophorates are protostome animals
Science
A twist in time–the evolution of spiral cleavage in the light of animal phylogeny
Integr. Comp. Biol.
Assessing the root of bilaterian animals with scalable phylogenomic methods
Proc. Biol. Sci.
Spliceosomal introns as tools for genomic and evolutionary analysis
Nucleic Acids Res.
Bayesian phylogenetics using an RNA substitution model applied to early mammalian evolution
Mol. Biol. Evol.
Phylogenetic mapping of intron positions: a case study of translation initiation factor eIF2gamma
Mol. Biol. Evol.
Near intron positions are reliable phylogenetic markers: an application to holometabolous insects
Mol. Biol. Evol.
Retroposed elements as archives for the evolutionary history of placental mammals
PLoS Biol.
Retroposon insertions provide insights into deep lagomorph evolution
Mol. Biol. Evol.
Clustal W and clustal X version 2.0
Bioinformatics
PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating
Bioinformatics
Cited by (11)
Deep metazoan phylogeny: When different genes tell different stories
2013, Molecular Phylogenetics and EvolutionCitation Excerpt :Difficult phylogenetic problems, such as the relationships between the major metazoan lineages, call for the development of new, sequence-independent genomic markers (SIGMs, e.g., protein domain architecture, gene order, gene fusions, duplications, insertions-deletions, or genetic code variants; Rokas and Holland, 2000) that would provide independent data to test conflicting phylogenetic hypotheses. Although attempts to use such markers, for example microRNAs to resolve sponge relationships (Sperling et al., 2010; Robinson et al., 2013), transposable elements (short interspersed elements, SINEs; Piskurek and Jackson, 2011) and changes in spliceosomal intron positions (NIPs; Lehmann et al., 2012), to resolve early metazoan relationships have thus far been unsuccessful, the growing number of fully sequenced genomes of non-bilaterian animals might provide sufficient data in the future to discover novel SIGMs to test phylogenomic hypotheses and finally enable us to fully appreciate the early evolution of animals. G.W. conceived the research and obtained the funding; T.N. and G.W. designed the research; T.N. and F.S. analyzed the data; M.A., Mn.A., M.E., J.H., B.S., W.M., M.W. and G.W. provided data; M.M., M.N., and J.V. provided samples; M.M. contributed to manuscript revision; and T.N. and G.W. wrote the paper.
Modern biotechnology breakthroughs to food and agricultural research in developing countries
2019, GM Crops and FoodPhylogenomics: An introduction
2017, Phylogenomics: An IntroductionOn 20 years of Lophotrochozoa
2016, Organisms Diversity and EvolutionNever ending analysis of a century old evolutionary debate: "Unringing" the urmetazoon bell
2016, Frontiers in Ecology and Evolution