Near intron pairs and the metazoan tree

https://doi.org/10.1016/j.ympev.2012.11.012Get rights and content

Abstract

Gene structure data can substantially advance our understanding of metazoan evolution and deliver an independent approach to resolve conflicts among existing hypotheses. Here, we used changes of spliceosomal intron positions as novel phylogenetic marker to reconstruct the animal tree. This kind of data is inferred from orthologous genes containing mutually exclusive introns at pairs of sequence positions in close proximity, so-called near intron pairs (NIPs). NIP data were collected for 48 species and utilized as binary genome-level characters in maximum parsimony (MP) analyses to reconstruct deep metazoan phylogeny. All groupings that were obtained with more than 80% bootstrap support are consistent with currently supported phylogenetic hypotheses. This includes monophyletic Chordata, Vertebrata, Nematoda, Platyhelminthes and Trochozoa. Several other clades such as Deuterostomia, Protostomia, Arthropoda, Ecdysozoa, Spiralia, and Eumetazoa, however, failed to be recovered due to a few problematic taxa such as the mite Ixodes and the warty comb jelly Mnemiopsis. The corresponding unexpected branchings can be explained by the paucity of synapomorphic changes of intron positions shared between some genomes, by the sensitivity of MP analyses to long-branch attraction (LBA), and by the very unequal evolutionary rates of intron loss and intron gain during evolution of the different subclades of metazoans. In addition, we obtained an assemblage of Cnidaria, Porifera, and Placozoa as sister group of Bilateria + Ctenophora with medium support, a disputable, but remarkable result. We conclude that NIPs can be used as phylogenetic characters also within a broader phylogenetic context, given that they have emerged regularly during evolution irrespective of the large variation of intron density across metazoan genomes.

Highlights

► We use near intron pairs (NIPs) for phylogenetic inference concerning metazoans. ► We could reliably detect shared intron gain events via NIPs. ► The application to a broad collection of genomes reveals a plausible phylogeny.

Introduction

The evolutionary relationships of metazoan phyla still constitute a challenge for both morphological and molecular-based analyses. The traditional view arranges bilaterian metazoans into acoelomates, pseudocoelomates, and coelomates. Starting with the work of Aguinaldo et al. (1997), sequence data, initially mostly rDNA, have been used to establish a “new animal phylogeny” with far-reaching consequences: (1) the protostomes were divided into ecdysozoans (Aguinaldo et al., 1997) and lophotrochozoans (Halanych et al., 1995), and (2) several phyla representing apparently lower grades of complexity (Platyhelminthes, Nemertea, and Nematoda) were relocated amongst the coelomate groups at the crown of the tree. In contrast, some studies employing genomic datasets containing only a few taxa (e.g. Wolf et al., 2004) supported the monophyly of coelomates. Later studies, however, have shown that these results are likely artefacts, misled by a faster evolution of some genomes, such as that of Caenorhabditis elegans (Philippe et al., 2005). These are thus often excluded from phylogenetic datasets. In addition, conflicting signals are often obtained from mitochondrial, nuclear rRNA, phylogenomic and also morphological data (Trautwein et al., 2012). Despite a plethora of studies based on both molecular and morphological data, a consensus on the phylogenetic tree of metazoan phyla is still not in sight (Edgecombe et al., 2011). This concerns in particular the non-bilaterians (Dunn et al., 2008, Schierwater et al., 2009, Pick et al., 2010) and the Lophotrochozoa (Hejnol, 2010).

Characters resulting from structural changes of the genomic sequence, so-called rare genomic changes (RGCs), such as coding insertions/deletions (indels) (Belinky et al., 2010), spliceosomal intron positions (Irimia and Roy, 2008), and positions of mobile genetic elements (Kriegs et al., 2006, Kriegs et al., 2010), are expected to be less prone to homoplasy than substitution patterns of sequence data and hence provide valuable additional information to resolve conflicts in phylogenetic tree reconstruction. For holometabolic insects, novel phylogenetic hypotheses have been introduced on the basis of such characters, for example the basal position of Hymenoptera (Krauss et al., 2005). Later, this proposal received strong support by sequence-based analyses of single-copy nuclear genes and additional intron position data (Savard et al., 2006, Zdobnov et al., 2007, Krauss et al., 2008, Wiegmann et al., 2009). Another study used retrotransposon insertions to improve our knowledge about the basal branching order of rodents (Churakov et al., 2010). Earlier attempts to reconstruct the radiation of rodents are well-known to have suffered from long-branch attraction (LBA) artefacts.

The present study utilizes the conservation of positions of spliceosomal introns among orthologous coding sequences (CDS). Intron positions have already been used by several authors to resolve problematic branches of the metazoan tree (for review see Irimia and Roy, 2008). For instance, an intense debate emerged about the concepts of Ecdysozoa (Roy and Gilbert, 2005) and Coelomata (Zheng et al., 2007) using intron position data. Roy and Gilbert (2005) supported the taxon Ecdysozoa using a pattern of intron conservation. This was criticized by Zheng et al. (2007) by showing that intron loss rates within specific branches are strongly correlated. These authors argued that high rates of independent intron losses within the used nematode and arthropod species had misled the former study. However, in turn, Roy and Irimia (2008) identified several weaknesses of the latter analysis, among them biases in the procedure used to differentiate between intron gain and loss. Pointing to both large intron loss and gain rate variations, Roy and Irimia (2008) avoided a clearcut conclusion about the Ecdysozoa/Coelomata problem.

In order to reduce the impact of homoplastic characters due to parallel intron gains or losses, we specifically consider pairs of nearby introns. More precisely, a near intron pair (NIP) consists of two intron positions in an alignment of two or more orthologous genes that are separated by a small number of nucleotides. Exons smaller than about 50 nt are relatively rare (Saeys et al., 2007) and in general functionally detrimental (Weir et al., 2006). The two nearby intron positions are thus very unlikely to have coexisted. Under the assumption that parallel intron gain is very rare, a NIP can be used to parsimoniously infer an edge of the phylogenetic tree along which both intron loss and gain must have occurred, separating the species sharing one of the positions from those that share the other.

In previous work, we found some evidence that NIPs arise not only from uncoupled, successive processes of intron loss and intron gain, but also from intron sliding (Krauss et al., 2005, Krauss et al., 2008, Lehmann et al., 2010). For Drosophila we could show that some of the younger NIPs were indeed caused by shifts of splice donor and acceptor sites in relation to conserved CDS (Lehmann et al., 2010). In the same study, we used NIPs for a systematic investigation of intron gain mechanisms in Drosophila.

Encouraged by the successful application of NIPs to the phylogeny of holometabolan insects (Krauss et al., 2008, Niehuis et al., 2012), we here try to resolve the phylogenetic tree of animals based exclusively on NIP data from 45 metazoan and 3 outgroup taxa. Our results demonstrate the usefulness of NIPs as phylogenetic marker also for deep metazoan phylogeny. In particular, we evaluate the Ecdysozoa hypothesis, as well as the general agreement of our tree reconstructions with current proposals of metazoan phylogeny.

Section snippets

Compilation of ortholog dataset

Initially, we retrieved orthologous protein-coding genes from the Ensembl Compara database (release 67, May 2012) (Flicek et al., 2011) in the following manner: For a set of eight selected query species (Acyrthosiphon pisum, Caenorhabditis elegans, Drosophila melanogaster, Ixodes scapularis, Nematostella vectensis, Schistosoma mansoni, Strongylocentrotus purpuratus, and Trichoplax adhaerens), all protein-coding gene IDs with the status ‘Known’ were determined using Ensembl Biomart. Then, these

Collection and characterization of a large NIP dataset

Starting from selected Ensembl Compara ortholog predictions, we compiled a set of orthologs covering 48 taxa, comprising 12 metazoan phyla: Cnidaria, Placozoa, Ctenophora, Porifera, Annelida, Mollusca, Platyhelminthes, Chordata, Echinodermata, Hemichordata, Arthropoda, and Nematoda (see Supplementary Table S1, Supplementary Material online). Monosiga brevicollis (Choanoflagellata), Coprinopsis cinerea (Fungi), and Dictyostelium purpureum (Amoebozoa) were added as outgroups.

The automated alignment

Discussion

Here, we extracted for the first time near intron pairs (NIPs) from a broad collection of genomes and used them as binary genome-level character in maximum parsimony analyses to explore the information value of NIPs to reconstruct deep metazoan phylogeny. The resulting tree deviates remarkably from contemporary hypotheses of metazoan relationships. Notably, the unusually distributed taxa Mnemiopsis, Ixodes, and Ambulacraria prevented the tree from resolving major clades such as Bilateria,

Conclusions

Overall, our study demonstrates that near intron pair (NIP) data could be used to derive a working hypothesis of the metazoan phylogeny using MP as tree reconstruction method. In particular, the analysis of NIP characters appears superior to an approach based on simple intron presence/absence data. Corresponding tree searches using Dollo parsimony and Wagner parsimony resulted in topologies that are clearly worse than the NIP-based predictions, respectively, see Supplementary Figs. S8–S9. Thus,

Funding

This work was supported by the Deutsche Forschungsgemeinschaft (KR2065/2 to VK and STA850/6 to PFS). The Deutsche Forschungsgemeinschaft had no role in the design or interpretation of the study.

Acknowledgments

We gratefully acknowledge the availability of the sequencing data of the not yet published genomes of Aplysia californica, Capitella teleta, Danio rerio, Helobdella robusta, Heterorhabditis bacteriophora, Ixodes scapularis, Lottia gigantea, Mnemiopsis leidyi, Rhodnius prolixus, Saccoglossus kowalevskii, Schistosoma japonicum, and Schmidtea mediterranea. We would like to thank for the insightful comments of three anonymous reviewers on a previous version of this manuscript.

References (69)

  • S. Cho et al.

    A phylogeny of Caenorhabditis reveals frequent loss of introns during nematode evolution

    Genome Res.

    (2004)
  • G. Churakov et al.

    Rodent evolution: back to the root

    Mol. Biol. Evol.

    (2010)
  • A. Coghlan et al.

    Origins of recently gained introns in Caenorhabditis

    Proc. Natl. Acad. Sci. USA

    (2004)
  • G.C. Conant et al.

    Solvent exposure imparts similar selective pressures across a range of yeast proteins

    Mol. Biol. Evol.

    (2009)
  • M. Csurös et al.

    In search of lost introns

    Bioinformatics

    (2007)
  • F. Delsuc et al.

    Tunicates and not cephalochordates are the closest living relatives of vertebrates

    Nature

    (2006)
  • C.W. Dunn et al.

    Broad phylogenomic sampling improves resolution of the animal tree of life

    Nature

    (2008)
  • I. Ebersberger et al.

    HaMStR: profile hidden markov model based search for orthologs in ESTs

    BMC Evol. Biol.

    (2009)
  • R.C. Edgar

    MUSCLE: multiple sequence alignment with high accuracy and high throughput

    Nucleic Acids Res.

    (2004)
  • G. Edgecombe et al.

    Higher-level metazoan relationships: recent progress and remaining questions

    Organ. Divers. Evol.

    (2011)
  • J.S. Farris

    The retention index and the rescaled consistency index

    Cladistics

    (1989)
  • J. Felsenstein

    Cases in which parsimony or compatibility methods will be positively misleading

    Syst. Zool.

    (1978)
  • P. Flicek et al.

    Ensembl 2011

    Nucleic Acids Res.

    (2011)
  • K.M. Halanych et al.

    Evidence from 18S ribosomal DNA that the lophophorates are protostome animals

    Science

    (1995)
  • A. Hejnol

    A twist in time–the evolution of spiral cleavage in the light of animal phylogeny

    Integr. Comp. Biol.

    (2010)
  • A. Hejnol et al.

    Assessing the root of bilaterian animals with scalable phylogenomic methods

    Proc. Biol. Sci.

    (2009)
  • M. Irimia et al.

    Spliceosomal introns as tools for genomic and evolutionary analysis

    Nucleic Acids Res.

    (2008)
  • H. Jow et al.

    Bayesian phylogenetics using an RNA substitution model applied to early mammalian evolution

    Mol. Biol. Evol.

    (2002)
  • V. Krauss et al.

    Phylogenetic mapping of intron positions: a case study of translation initiation factor eIF2gamma

    Mol. Biol. Evol.

    (2005)
  • V. Krauss et al.

    Near intron positions are reliable phylogenetic markers: an application to holometabolous insects

    Mol. Biol. Evol.

    (2008)
  • J.O. Kriegs et al.

    Retroposed elements as archives for the evolutionary history of placental mammals

    PLoS Biol.

    (2006)
  • J.O. Kriegs et al.

    Retroposon insertions provide insights into deep lagomorph evolution

    Mol. Biol. Evol.

    (2010)
  • M.A. Larkin et al.

    Clustal W and clustal X version 2.0

    Bioinformatics

    (2007)
  • N. Lartillot et al.

    PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating

    Bioinformatics

    (2009)
  • Cited by (11)

    • Deep metazoan phylogeny: When different genes tell different stories

      2013, Molecular Phylogenetics and Evolution
      Citation Excerpt :

      Difficult phylogenetic problems, such as the relationships between the major metazoan lineages, call for the development of new, sequence-independent genomic markers (SIGMs, e.g., protein domain architecture, gene order, gene fusions, duplications, insertions-deletions, or genetic code variants; Rokas and Holland, 2000) that would provide independent data to test conflicting phylogenetic hypotheses. Although attempts to use such markers, for example microRNAs to resolve sponge relationships (Sperling et al., 2010; Robinson et al., 2013), transposable elements (short interspersed elements, SINEs; Piskurek and Jackson, 2011) and changes in spliceosomal intron positions (NIPs; Lehmann et al., 2012), to resolve early metazoan relationships have thus far been unsuccessful, the growing number of fully sequenced genomes of non-bilaterian animals might provide sufficient data in the future to discover novel SIGMs to test phylogenomic hypotheses and finally enable us to fully appreciate the early evolution of animals. G.W. conceived the research and obtained the funding; T.N. and G.W. designed the research; T.N. and F.S. analyzed the data; M.A., Mn.A., M.E., J.H., B.S., W.M., M.W. and G.W. provided data; M.M., M.N., and J.V. provided samples; M.M. contributed to manuscript revision; and T.N. and G.W. wrote the paper.

    • Phylogenomics: An introduction

      2017, Phylogenomics: An Introduction
    • On 20 years of Lophotrochozoa

      2016, Organisms Diversity and Evolution
    View all citing articles on Scopus
    View full text