Skip to main content
Advertisement
  • Loading metrics

A Retrospective Study on Genetic Heterogeneity within Treponema Strains: Subpopulations Are Genetically Distinct in a Limited Number of Positions

  • Darina Čejková,

    Affiliations Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic, Department of Immunology, Veterinary Research Institute, Brno, Czech Republic

  • Michal Strouhal,

    Affiliation Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic

  • Steven J. Norris,

    Affiliation Pathology & Laboratory Medicine, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America

  • George M. Weinstock,

    Affiliation The Genome Institute, Washington University in St. Louis, St. Louis, Missouri, United States of America

  • David Šmajs

    dsmajs@med.muni.cz

    Affiliation Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic

Abstract

Background

Pathogenic uncultivable treponemes comprise human and animal pathogens including agents of syphilis, yaws, bejel, pinta, and venereal spirochetosis in rabbits and hares. A set of 10 treponemal genome sequences including those of 4 Treponema pallidum ssp. pallidum (TPA) strains (Nichols, DAL-1, Mexico A, SS14), 4 T. p. ssp. pertenue (TPE) strains (CDC-2, Gauthier, Samoa D, Fribourg-Blanc), 1 T. p. ssp. endemicum (TEN) strain (Bosnia A) and one strain (Cuniculi A) of Treponema paraluisleporidarum ecovar Cuniculus (TPLC) were examined with respect to the presence of nucleotide intrastrain heterogeneous sites.

Methodology/Principal Findings

The number of identified intrastrain heterogeneous sites in individual genomes ranged between 0 and 7. Altogether, 23 intrastrain heterogeneous sites (in 17 genes) were found in 5 out of 10 investigated treponemal genomes including TPA strains Nichols (n = 5), DAL-1 (n = 4), and SS14 (n = 7), TPE strain Samoa D (n = 1), and TEN strain Bosnia A (n = 5). Although only one heterogeneous site was identified among 4 tested TPE strains, 16 such sites were identified among 4 TPA strains. Heterogeneous sites were mostly strain-specific and were identified in four tpr genes (tprC, GI, I, K), in genes involved in bacterial motility and chemotaxis (fliI, cheC-fliY), in genes involved in cell structure (murC), translation (prfA), general and DNA metabolism (putative SAM dependent methyltransferase, topA), and in seven hypothetical genes.

Conclusions/Significance

Heterogeneous sites likely represent both the selection of adaptive changes during infection of the host as well as an ongoing diversifying evolutionary process.

Author Summary

The genus Treponema comprises several uncultivable human and animal pathogens including Treponema pallidum ssp. pallidum (TPA), the causative agent of syphilis, T. p. ssp. pertenue (TPE, the causative agent of yaws), and T. p. ssp. endemicum (TEN, the causative agent of bejel). Simian TPE strain Fribourg-Blanc and T. paraluisleporidarum, the agents of primate infections and venereal spirochetosis of rabbits and hares, respectively, represent animal pathogens. In this study, whole genome sequences of 10 treponemal strains were systematically analyzed for the presence of nucleotide sites where the treponemal strains differed within a single strain. Interestingly, most heterogeneous sites were identified among TPA and TEN strains but not among tested TPE strains. Although heterogeneous sites were found to be mostly strain-specific, several examples revealed the same heterogeneous site was identified in two genomes. These findings indicate that the number of intrastrain heterogeneous sites per genome is limited and that different treponemal strains tend to display variability in the same positions of several genes. The abundance of nonsynonymous mutations, nonconservative amino acid replacements and the fact that most of the heterogeneous sites were located within coding regions suggest that the heterogeneous sites represent beneficial adaptive mutations.

Introduction

The genus Treponema comprises several uncultivable human and animal pathogens including Treponema pallidum ssp. pallidum (TPA), the causative agent of syphilis, T. p. ssp. pertenue (TPE, the causative agent of yaws), and T. p. ssp. endemicum (TEN, the causative agent of bejel). A treponemal isolate Fribourg-Blanc isolated from a baboon (Papio cynocephalus) in West Africa [1],[2] was recently reclassified as a TPE strain [3]. Another animal pathogen closely related to uncultivable human treponemal pathogens is T. paraluisleporidarum ecovar Cuniculus (TPLC; formerly denoted as Treponema paraluiscuniculi) [46], the causative agent of venereal spirochetosis in rabbits. In addition, T. paraluisleporidarum ecovar Lepus [6] causes venereal spirochetosis in hares [710]. The human disease pinta is caused by a morphologically identical organism called T. carateum, but this organism has not been propagated in experimentally infected animals and has not been characterized genetically.

The first complete genome sequence of TPA strain Nichols was determined in 1998 [11]. In the last several years, whole genome sequences of twelve treponemal pathogens (including re-sequenced TPA strains Nichols and SS14) were completed and published [3],[1220]. In general, genome analyses performed in these studies revealed that genome differences between individual treponemal strains are very subtle, differing in less than 2% of the genome sequence between TPA strains and TPLC [21] and 0.2% between TPA and TPE strains [12]. Genetic diversity among the uncultivable pathogenic treponemes are localized mainly within tpr [2225], arp [2527], TP0470 [25], TP0136 [28],[29], TP0548 [29],[30], tp92 [31],[32], and mcp genes [15]. In addition, relatively high interstrain genetic diversity has been detected in several other genes, e.g. in TP0304 (hypothetical protein), TP0346 (lipoprotein), TP0515 (outer membrane protein), TP0558 (nickel-cobalt transporter) [33] and TP0967 (hypothetical protein) [25].

The presence of different treponemal subpopulations infecting the same host has been suggested by several early findings, e.g. by detection of two subpopulations using velocity sedimentation during the Hypaque separation procedure [34], and by the identification of subpopulation which is resistant to phagocytosis [35]. Genetic diversity within individual treponemal strains, i.e. intrastrain genetic diversity, was first found in tprJ and tprK genes during infection of human or animal hosts [3638]. Several other examples of intrastrain heterogeneity were found in the TPA Nichols [21], and in the TPA SS14 genome [14],[16]. In general, intrastrain heterogeneity was found within tpr genes, in sequences paralogous to tpr genes and in the intergenic regions between tpr genes [14],[16],[3640]. Other genes with identified intrastrain heterogeneity comprised TP0402 (encoding flagellum specific ATP synthase), TP0971 (encoding Tp34 lipoprotein, membrane antigen), TP1029 (encoding hypothetical protein), TP0341 (encoding MurC), and TP0967 (encoding hypothetical protein) loci [14],[16].

The occurrence of genome heterogeneity (including point mutations, insertions or deletions and gain and loss of mobile genetic elements such as plasmids or phages) within strains is common to many pathogenic bacteria [4144], and has been found to occur during the course of infection [4551]. In general, heterogeneous sites may contribute to immune evasion [49] and/or represent adaptive changes during infection of disparate host tissues and compartments [52]. The identification of within-host heterogeneity is an important step in studies tracking transmission networks or in studies mapping bacterial populations during colonization, dissemination and immune clearance [53],[54].

In this communication, whole genome sequences of 10 treponemal strains were systematically analyzed for the presence of intrastrain nucleotide heterogeneous sites. Distinct patterns in the frequency and locations of intrastrain heterogeneous sites were identified among the individual genomes examined.

Materials and Methods

Strains used in this study

The original sequencing data obtained during next-generation sequencing of pathogenic treponemes (Table 1) were used to analyze intrastrain genetic variability. In total, 10 treponemal strains were examined in this study including 4 TPA strains (Nichols, DAL-1, Mexico A, SS14), 4 TPE strains (CDC-2, Gauthier, Samoa D, Fribourg-Blanc), 1 TEN strain (Bosnia A) and one strain of TPLC (Cuniculi A). For the two remaining whole genome sequences (TPA strains Chicago and Sea84-1), the original sequencing data were not deposited in the SRA database.

To examine intrastrain heterogeneity within a single strain, selected intrastrain heterogeneous sites were tested in the TPA SS14 strain using four different DNA preparations (4933, 4934, 4950 and 4051), originating from two different rabbit passages. The original treponemal SS14 cells were obtained from Dr. D. L. Cox as stock 2735 (dated 09/24/97) and 2736 (dated 06/20/97), which were used to inoculate rabbits and to harvest treponemal cells of stocks 2839 and 2840, respectively. Bacterial stock 2839 of TPA SS14 was used for two independent isolations of genomic DNA using Wizard Genomic DNA Purification Kit (Promega, Madison, WI, USA), resulting in DNA isolates numbered 4933 and 4950. Similarly, bacterial stock 2840 of TPA SS14 was used for two independent isolations of genomic DNA designated as 4934 and 4951. At least one independent rabbit passage between stock 2735 and stock 2736 was performed.

Ethics statement

No animal was used in the study.

Identification of intrastrain heterogeneous sites

To ascertain intrastrain heterogeneity within individual treponemal strains, Illumina and 454 reads obtained during whole-genome sequencing procedures were used. Data analysis workflow is depicted in Fig 1. Initially, individual reads were mapped to the corresponding complete genome sequence using the Borrows-Wheeler Aligner (BWA) [55],[56], using default parameters, and requiring at least a 95% read identity relative to the reference genome. Duplicated reads were identified with the rmdup algorithm in the SAMtools package [55] and removed. To determine the frequency of each nucleotide (allele frequency) in every single genome position, the mpileup function in the SAMtools package and a python script were used [57]. Because of higher depth coverage and a lower error indel rate, the Illumina sequencing reads were used for intrastrain allele identifications.

thumbnail
Fig 1. Data analysis workflow.

(A) An automated identification pipeline and optimization process. (B) An application of further restrictions and verification of identified putative candidates.

https://doi.org/10.1371/journal.pntd.0004110.g001

To filter out sequencing errors present in the raw data [5865], nucleotide positions showing at least six independent (not duplicated) individual reads with a frequency ≥ 20% of the less frequent allele, were further examined. Moreover, several other restrictions were applied during identification of treponemal heterogeneous sites (Fig 1). First, nucleotide positions located within homopolymeric tracts (defined as a stretch of 6 or more identical nucleotides) or within a 2-nt distance of these tracts were omitted from further analysis. Second, at least three independent reads from both directions were required. Third, individual reads supporting a less frequent allele located at the 3’ terminus of the reads (i.e. four or less nucleotides from the 3’ terminus) were omitted. And fourth, heterogeneous positions separated from each other by less than 7 bp were also omitted. The resulting candidate sites for heterogeneous nucleotide positions were subsequently visually inspected using a Integrative Genome Viewer (IGV) [6366].

Using the above mentioned workflow applied on Illumina reads, putative heterogeneous sites were identified. Identified heterogeneous positions were confirmed using a parallel 454 workflow or by Sanger sequencing (Fig 1 and Table 2 and S2 Table). A detailed description of regions, comprising paralogous sequence regions or/and direct repeats, omitted from Illumina analysis are shown in S1 and S2 Tables. Altogether, 32 genomic regions covering 26,636 bp (2.34% of the entire genome length) were omitted in the TPA Nichols genome (S1 Table). Since paralogous regions in individual genomes are not identical, slightly different regions were omitted from the automated analyses of Illumina sequencing reads in each examined genome (S2 Table). Moreover, the TEN Bosnia A genome was sequenced using pooled segment genome sequencing (PSGS) [12] as separate sequencing runs, therefore the total length of the excluded regions was lower than in other examined genomes (S2 Table).

thumbnail
Table 2. Summary of the intrastrain variable sites identified within Illumina sequencing reads in investigated treponemal genomes.

https://doi.org/10.1371/journal.pntd.0004110.t002

DNA amplification and DNA sequencing

Altogether, 26 putative heterogeneous positions identified in the Illumina workflow, but not confirmed by the 454 sequences (Fig 2, Table 2 and S3 Table) were subjected to DNA amplification and Sanger sequencing. Moreover, six heterogeneous positions identified in the TPA SS14 genome in this study or by Matějková et al. [14] were tested in four different SS14 DNA preparations originating from two different rabbit passages (Table 3). Primers used for DNA amplification and sequencing are specified in S4 and S5 Tables. PCR was performed as follows: initial cycle at 94°C (1 minute), was followed by 30 cycles at 94°C (30 seconds), 55°C (30 seconds), and 72°C (1 minute), and by the final extension step at 72°C (7 minutes). Sequencing of the PCR products was performed using primers used for PCR amplifications with the dye-terminator Sanger sequencing technology. The frequency of alternative alleles in heterogeneous positions was calculated from the ratio of corresponding areas under the chromatogram curves. Sequence analysis of Sanger reads was performed using Lasergene software (DNASTAR, Inc., Madison, WI, USA).

thumbnail
Fig 2. A schematic representation of the identified heterogeneous positions in all investigated genomes.

The proportion of alternative alleles is based on nucleotide frequency within individual Illumina reads. While red cells represent identified sites of intrastrain heterogeneity, grey cells represent sites of intrastrain homogeneity. The numbers within cells indicate the number of alternative/standard reads in the sites where the number of alternative reads exceeded 10% but were lower than 20% and therefore remained below the threshold used in this study. Blue cells show nucleotide positions omitted from analysis due to excluded paralogous sequences (S2 Table). For the Bosnia A strain, the intrastrain heterogeneous sites TENDBA_0314/331578, TENDBA_0314/331618, TENDBA_0317/333355 and TENDBA_0621/672156 are not shown because in all other genomes these positions were excluded from analysis due to paralogous sequences. Note that the TPADAL_0897/976678 and TENDBA_0897/974407 positions are the same.

https://doi.org/10.1371/journal.pntd.0004110.g002

thumbnail
Table 3. Selected intrastrain heterogeneous sites identified in TPA SS14, examined in four different DNA preparations.

https://doi.org/10.1371/journal.pntd.0004110.t003

Conserved protein domain database search

The NCBI Conserved Domain Database [67] and InterProScan [68] were used to predict protein domains. Putative protein localization within a cell was determined using the PSORTb program [69].

Results

Identification of intrastrain heterogeneous sites

A set of 10 treponemal whole genome sequences including those of 4 TPA strains (Nichols, DAL-1, Mexico A, SS14), 4 TPE strains (CDC-2, Gauthier, Samoa D, Fribourg-Blanc), 1 TEN strain (Bosnia A) and one strain of TPLC (Cuniculi A) were examined with respect to the presence of intrastrain heterogeneous sites. All but one (TPA Mexico A) genomes were sequenced using both Illumina and 454 sequencing methods. Characteristics of the sequence data obtained with each strain, including the average coverage attained during Illumina and 454 sequencing, are shown in Table 1. Altogether, 890 potentially heterogeneous positions among investigated genomes were identified using an automated pipeline (Fig 1). Several criteria (see Materials and methods) were used to filter out sequencing errors from genetic heterogeneity naturally occurring in treponemal strains (i.e. representing intrastrain heterogeneous sites), which reduced the 890 nucleotide positions to 46 candidates (Fig 1). Regions containing paralogous sequences and tandem repeats (summarized in S1 and S2 Tables) were omitted from the automated analyses of intrastrain heterogeneity due to the risk of ambiguously mapped reads. Using these criteria, 32 genomic regions covering 26,636 bp (2.34% of the entire genome length) were excluded from the analysis of Illumina sequencing reads in the TPA Nichols genome (S1 Table). Except for the TEN strain Bosnia A, similar regions were also excluded in whole genome sequences in other tested genomes (S2 Table) (see Materials and Methods).

An instance of intrastrain heterogeneity was considered to be present if 1) two different nucleotides (or an indel) were detected at a given genome coordinate, and 2) this heterogeneity was present in at least two sequencing analyses using different sequencing chemistry. The automated analysis of Illumina reads revealed 46 candidates (Fig 1), of which 20 heterogeneous sites were directly verified by automated analysis of 454 reads. The remaining 26 candidate sites, solely found in Illumina reads, were sequenced using Sanger technology, and in three of them, heterogeneous sites were identified (Tables 2 and S3).

Intrastrain heterogeneous sites are mainly present in TPA and TEN but not in TPE strains

The 23 intrastrain heterogeneous sites, identified using the automated analysis of Illumina sequencing reads and either 454 or Sanger sequencing reads, were found in 5 out of 10 investigated treponemal genomes (Table 2), including TPA strains Nichols, DAL-1, and SS14, TPE strain Samoa D and TEN strain Bosnia A. No intrastrain heterogeneous sites were identified in TPA Mexico A, TPE CDC-2, Gauthier, Fribourg-Blanc and TPLC Cuniculi A genomes. Up to 7 intrastrain heterogeneous sites were identified in individual genomes. Whereas only one heterogeneous site was identified in the 4 examined TPE strains, 16 heterogeneous sites were detected among the 4 TPA strains analyzed. The TEN strain Bosnia A contained 5 single nucleotide heterogeneous sites, however, four of these heterogeneous sites (TENDBA_0314/331578, TENDBA_0314/331618, TENDBA_0317/333355 and TENDBA_0621/672156) were located within paralogous regions that had been excluded from analysis in all other genomes (S2 Table). In contrast to other genomes, the TEN Bosnia A genome was sequenced using the pooled segment genome sequencing method (PSGS) [20] as four distinct samples, whereas other treponemal genomes were not subdivided prior to Illumina sequencing. Therefore, orthologous genes to TENDBA_0314, TENDBA_0317 and TENDBA_0621 genes were not completely analyzed in other genomes. In contrast, the same heterogeneous site found in the tprK gene of TEN Bosnia A (TENDBA_0897/974407) was also identified in the TPA DAL-1 strain (TPADAL_0897/976768). Interestingly, this genome position is included in tprK variable regions of the TPA SS14 and Mexico A genomes, however, it was included in non-variable regions in all other genomes [37]. Therefore, in TPA SS14 and Mexico A genomes, these tprK hypervariable regions were excluded from analyses (Fig 2). In four cases, comprising genes TPASS_20117 (tprC), TENDBA_0314 (hypothetical gene), TPASS_20402 (fliI) and TPADAL_0720 (fliY), two heterogeneous sites were found in each gene (Fig 2 and Table 2).

Characteristics of identified intrastrain heterogeneous sites

All but one heterogeneous sites represented alternative nucleotides resulting from substitutions, while one indel-variable site was found (Table 2). Out of 23 identified heterogeneous sites, one was localized in an intergenic region and all others (n = 22) were within the predicted coding regions comprising 17 genes. The heterogeneous genes encode Tpr proteins (TprC, TprI, TprK and a chimeric TprGI), proteins involved in bacterial motility and chemotaxis (FliI and CheC-FliY), translation proteins (PrfA), peptidoglycan synthesis (MurC), general metabolism (putative SAM dependent methyltransferase), DNA metabolism (TopA), and hypothetical proteins of unknown function (TPANIC_0006, TPANIC_0222, TPANIC_0471; TPASS_21029; TPESAMD_0134; TENDBA_0314, TENDBA_0967).

One alternative allele resulted in replacement of a stop codon and resulted in protein elongation, while the others resulted in synonymous (n = 2) or nonsynonymous mutations (n = 18). Of the nonsynonymous mutations, 3 resulted in conservative and 15 in nonconservative amino acid replacements (Table 2). Transitions (n = 13) were found more frequently than transversions (n = 9). Most frequent were C→T and G→A (n = 9) transitions while T→C and A→G transitions were less frequent (n = 4). C→A and T→A transversions were not found.

Identification of the intrastrain heterogeneous sites in different passages of TPA SS14

To test whether intrastrain heterogeneous sites were present stably within different rabbit passages, a set of intrastrain heterogeneous sites identified in the TPA SS14 were examined in four different DNA preparations originating from two different rabbit passages (see Materials and methods, Table 3). While DNA samples 4933 and 4950 were isolated from the same batch of treponemal cells (batch 2839), DNA samples 4934 and 4951 were prepared from bacterial stock 2840. Only minimal differences in the presence and frequency of alternative alleles were found between 4933 and 4950 (and also between 4934 and 4951), whereas clear differences between DNA preparations obtained from bacterial stocks 2839 and 2840 were found (Table 3).

Discussion

In this study, correct identification of intrastrain variable sites was considered of critical importance. To filter out sequencing errors, several restrictions in detecting algorithms were applied. Paralogous genome regions were omitted from analyses due to the risk of incorrect mapping of individual reads belonging to different genome regions. Duplicated reads, i.e. reads that showed identical start and end points were automatically identified and removed from further analyses in order to analyze only uniquely generated sequencing reads and to remove potential bias during DNA amplification. Since most of the Illumina errors are nucleotide substitutions located at the 3’ DNA end [58],[70], sequence differences close to the 3’ DNA end (at positions that were 4 or less nucleotides from end) of individual reads were filtered out. An increased error rate, within and in close proximity to homoplymeric regions, was also reported in the original Solexa chemistry [71]. Therefore, we also filtered out differences in homopolymeric tracts and in close vicinity (defined as 2-nt distance) to homopolymeric tracts although we are aware that the variations in length of homopolymeric tracts, especially those composed of guanosine tandem repeats, are of biological importance. These tandem repeats are known to regulate transcription (if located in promoter regions) and have been identified in the T. pallidum genomes [72],[73]. To further increase validity of the results, only alternative reads reaching at least a 20% frequency were analyzed. In summary, these relatively stringent measures certainly led to a number of missed heterogeneous sites both in the analyzed and in the non-analyzed genome regions. In addition to missed single nucleotide heterogeneous sites, larger sequences showing genetic heterogeneity were likely also missed due to the relatively short length of Illumina reads and due to applied restrictions in the detection algorithm. An example of such sites could be the 1.3 kb-long tprK-like sequence between TP0126 and TP0127 or the 64 bp-long indel between TP0135 and TP0136, previously identified in the TPA Nichols genome [25],[39]. Another example comes from this work where one region of intrastrain heterogeneity comprising a 9 nt-long insertion sequence in TENDBA_0967 was found in the Bosnia A strain during manual inspection of individual reads. The insertion represents an additional tandem repetition within a larger region between coordinates 1044918 and 1044951. Despite the possibility of missed sites of intrastrain heterogeneity, the automated analysis pipeline used in this study revealed 46 putative heterogeneous sites and 23 of them (50.0%) were verified using an independent sequencing method with different sequencing chemistry. The remaining, non-verified 23 positions likely represent falsely identified sites, likely as a consequence of accumulated error-containing Illumina reads. The majority of heterogeneous sites identified in this study represented transitions and not transversions, which, in general, are common Illumina sequencing errors; A→C was most common, followed by G→T transversions [59],[70]. The number of heterogeneous sites in a particular genome did not correlate with average sequencing coverage nor with estimated percent Illumina error rate per nucleotide (Table 1).

Although heterogeneous sites were found to be mostly strain-specific, several examples revealed the same heterogeneous site was identified in two genomes. The same heterogeneous site was found in the tprK gene of the DAL-1 and Bosnia A genomes. Interestingly, the same position was also found to be heterogeneous in the Nichols genome, although the number of Illumina reads supporting the less frequent nucleotide remained below threshold (SRX012305, Fig 2). A similar situation was also found in two other sites, one in SS14 and Cuniculi A genomes and the other one in Samoa D and Nichols genomes (Fig 2). These findings indicate that the number of intrastrain heterogeneous sites per genome is limited and that different treponemal strains tend to display variability in the same positions of several genes. The abundance of nonsynonymous mutations, nonconservative amino acid replacements and the fact that most of the heterogeneous sites were located within coding regions suggest that the heterogeneous sites represent beneficial adaptive mutations [74].

In this study, 23 intrastrain heterogeneous sites in 17 genes were identified in 5 out of 10 investigated treponemal genomes, predominantly in TPA strains. The reason why most of the heterogeneous sites were identified in the TPA, but not in TPE strains, is not clear, however, it might reflect different tissue tropism of TPA and TPE strains, different growth rate in experimental rabbits, differences in pathogenesis or other reasons. Regardless, this finding indicates distinct genetic characteristics of TPA and TPE strains. Although the TEN strain Bosnia A resembled TPA strains in this respect, most of the heterogeneous positions were identified in paralogous regions which were excluded from the automated analysis of other genomes (Fig 2). The single heterogeneous site identified in nonparalogous regions in the Bosnia A genome thus resembles TPE strains. In fact, the Bosnia A genome is more related to TPE strains than to TPA strains, although several sequences similar to TPA sequences were identified in the Bosnia A genome [20]. In contrast to other TPA strains, analysis of the TPA Mexico A strain did not reveal any heterogeneous sites (Fig 1 and Table 2). Unlike other TPA strains, the Mexico A genome has been shown to contain two TPE-like sequences [15]. However, it remains unclear whether these two observations are related.

A comparison of our results with a previously published paper describing heterogeneous sites in the TPA SS14 strain [14] is shown in the Table 4. In the analyzed portion of the SS14 genome, Matějková et al. found 18 heterogeneous sites. Out of these 18 sites, we automatically detected 5 sites. In other 4 sites, the frequency of the alternative allele was below threshold and/or did not meet restriction criteria, nonetheless manual inspection revealed the presence of the alternative allele. In additional two cases, the heterogeneity was identified in 454 reads (SRX000109), but not by Illumina reads. Comparison of our results with those published by Matějková et al. [14] identified a substantial overlap, however, 7 sites (38.9%) detected by Matějková et al. were not found in our study. Interestingly, all non-detected heterogeneous sites were located in tpr genes (including tprC,I,J) or in the intergenic regions between them. At least two independent explanations can be proposed; one explanation involves the fact that the BWA (Borrows-Wheeler Aligner) mapping algorithm used in this study was not able to detect closely spaced heterogeneous sites representing a specific haplotype in relatively short Illumina or 454 reads, due to alignment restrictions. To align an individual read to the reference sequence, a 95% identity with the reference genome sequence was required in our study. However, no such reads were found in the raw data set (SRX012306, SRX000109). The other explanation involves falsely identified heterogeneous sites as a result of PCR-based errors introduced during amplification of diluted target DNA and subsequent cloning of PCR products, as was done in the work of Matějková et al. [14]. The latter explanation is also supported by the fact that the undetected heterogeneous sites were often supported by low numbers of alternative clones (Table 4). Deeper sequencing of identified heterogeneous genome sites will be needed to answer these questions.

thumbnail
Table 4. Comparison of heterogeneous positions identified in TPA SS14 strain by Matějková et al. [14] and by the automated pipeline used in this study.

https://doi.org/10.1371/journal.pntd.0004110.t004

In bacterial genomes, most mutations represent C→T transitions arising via deamination of cytosine [75], T→C transitions via oxidation of thymine and/or inefficient DNA repair [76], A→G transitions via deamination of adenine [76], and G→T transversions via oxidization of guanine [76]. In fact, these 4 (out of 12 possible) mutations were observed in 11 out of 22 single nucleotide substitutions (50%) indicating that most common types of substitutions overlap with the most frequently seen bacterial mutations. In contrast, sample oxidation frequently results in C→A and G→T changes [77], while Illumina errors are predominantly A→C transversions [59],[70]. Only three such substitutions (out of 22; 13.6%) were, in fact, found in this study indicating that these substitutions are not overrepresented. Interestingly, the candidate sites identified using the Illumina pipeline, but not verified by other sequencing techniques (S3 Table), frequently (in 73.9%) included these types of mutations, which points to Illumina as a source of errors and false-positive results.

TPA SS14 bacterial stocks 2839 and 2840 differed in at least 12–14 treponemal generations of separated cultivation corresponding to two rabbit subcultivations each, of approximately 100-fold increase, in the number of treponemes per subcultivation. Heterogeneous sites were clearly different in DNA preparations obtained from different bacterial stocks, indicating the dynamic nature of this heterogeneity. This observation could also explain the strain-specificity of intrastrain heterogeneous sites identified in this study. The role of rabbit passages in the occurrence of heterogeneous sites remains unknown, however, genetic heterogeneity has also been identified in treponemes isolated directly from human host (Natasha Arora, personal communication). The occurrence of intrastrain heterogeneity in TPA from human samples suggests its potential significance for molecular typing of syphilis treponemes by both sequencing approach [78],[79] and RFLP analysis of amplified genes [80],[81].

Out of 22 heterogeneous sites showing alternative nucleotides, 16 heterogeneous sites were found in conserved genome positions (where all investigated genomes had identical sequences), while 6 were found in genome positions in which the analyzed genomes differed in sequence. In 5 out of 6 sites, alternative nucleotides of heterogeneous positions matched nucleotide sequences present in analyzed genomes. Considering the highest divergence observed in treponemal genomes, which represents 0.84% sequence diversity between the conserved regions of the TPA and TPLC genomes [17], the theoretical probability that a heterogeneous site would be located at a nonconserved genome position is 8.4 x 10−3. In our study, heterogeneous sites were found more frequently (in 6 out of 22) in nonconserved genome positions (2.7 x 10−1; p < 0.001), suggesting the role of heterogeneous sites in the process of treponemal genome diversification.

This study identified heterogeneous sites in four tpr genes, in genes involved in bacterial motility and chemotaxis (2), in cell structure (1), translation (1), general and DNA metabolism (2), and in seven hypothetical genes. The average expression rate of these 17 genes (1.33) during experimental rabbit infection was greater than the whole genome average (1.0) [82] indicating that these genes are expressed during host infection. Interestingly, heterogeneous sites were identified in tprC, tprI, tprK and chimeric tprGI genes. Several studies have shown that Tpr antigens are expressed during infection and are able to elicit antibody and cellular immune responses in the infected host [23],[83],[84]. Moreover, several Tpr proteins have been predicted to be outer membrane proteins [23],[85]. In addition, the tprK gene undergoes antigenic changes in seven variable regions and TprK variants are selected by the immune response [86],[87]. It has also been shown that tprK variants accumulate during infection of the host [88],[89] and that individual TprK variants helped to disseminate T. pallidum infections [87]. As demonstrated by LaFond et al. [90], variable regions elicited a variant-specific antibody response indicating that minor sequence changes may affect antibody binding. In this context, nonconservative changes could result in strain-specific surface-exposed epitopes that are crucial for immune evasion as previously predicted for discrete variable regions within TprC and TprD [23]. In E. coli, the topA (corresponding to TPASS_20394) mutation has been shown to affect fitness relative to isogenic constructs [91]. Moreover, topA and genes involved in cell wall biosynthesis and translation have been shown to repeatedly mutate in independent lines of E. coli during long-term cultivation experiment [74]. Heterogeneous sites in pathogenic treponemal strains may therefore represent adaptive changes that take place during infection of various host tissues and compartments as described in other bacteria [52]. At the same time, these sites may represent snapshots of an ongoing evolutionary trajectory. Advances in deep sequencing techniques and prospective whole genome sequencing or metagenomic studies will help, in the future, to identify a larger and perhaps more complete set of treponemal intrastrain heterogeneous sites [53],[54],[92].

Supporting Information

S1 Table. Chromosomal paralogous regions not included in the automated analysis of Illumina sequencing reads of the TPA Nichols genome.

https://doi.org/10.1371/journal.pntd.0004110.s001

(XLS)

S2 Table. Chromosomal paralogous regions not included in the automated analyses of Illumina sequencing reads of all investigated genomes.

https://doi.org/10.1371/journal.pntd.0004110.s002

(XLS)

S3 Table. A set of 23 putative heterogeneous positions identified solely by the Illumina workflow, but not verified by other sequencing methods.

https://doi.org/10.1371/journal.pntd.0004110.s003

(XLS)

S4 Table. Primers used for DNA amplification and Sanger sequencing of 26 heterogeneous candidate positions (not-verified by 454 workflow).

https://doi.org/10.1371/journal.pntd.0004110.s004

(XLS)

S5 Table. List of primers used for DNA amplification and Sanger sequencing of selected intrastrain heterogeneous sites in four different TPA SS14 DNA preparations.

https://doi.org/10.1371/journal.pntd.0004110.s005

(XLS)

Acknowledgments

The authors thank Dr. David L. Cox for providing the DAL-1, Fribourg-Blanc, Mexico A and SS14 strains and Dr. Sylvia M. Bruisten for the Bosnia A strain. The authors are grateful to Dr. Ivan Rychlík for critical reading of the manuscript.

Author Contributions

Conceived and designed the experiments: MS DŠ GMW. Analyzed the data: DČ. Wrote the paper: DČ MS SJN DŠ.

References

  1. 1. Fribourg-Blanc A, Mollaret HH, Niel G. Serologic and microscopic confirmation of treponemosis in Guinea baboons. Bull Soc Pathol Exot Filiales. 1966;59: 54–59. pmid:5333741
  2. 2. Fribourg-Blanc A, Mollaret HH. Natural treponematosis of the African primate. Primates Med. 1969;3: 113–121. pmid:5006024
  3. 3. Zobaníková M, Strouhal M, Mikalová L, Čejková D, Ambrožová L, Pospíšilová P, et al. Whole genome sequence of the Treponema Fribourg-Blanc: unspecified simian isolate is highly similar to the yaws subspecies. PLoS Negl Trop Dis. 2013; 7:e2172. pmid:23638193
  4. 4. Jacobsthal E. Untersuchungen uber eine syphilisahnliche Spontanerkrankungen des Kaninchens (Paralues cuniculi). Derm Wschr. 1920;71: 569–571.
  5. 5. Smith JL, Pesetsky BR. The current status of Treponema cuniculi. Review of the literature. Br J Vener Dis. 1967;43: 117–127. pmid:5338028
  6. 6. Lumeij JT, Mikalová L, Smajs D. Is there a difference between hare syphilis and rabbit syphilis? Cross infection experiments between rabbits and hares. Vet Microbiol. 2013;164: 190–194. pmid:23473645
  7. 7. Horvath I, Kemenes F, Molnar L, Szeky A, Racz I. Experimental syphilis and serological examination for treponematosis in hares. Infect Immun. 1980;27: 231–234. pmid:6987170
  8. 8. Horvath I, Kemenes F, Molnar L. Isolation of pathogenic treponemes from hare. Experientia. 1979;35: 320–321. pmid:446601
  9. 9. Lumeij JT, de Koning J, Bosma RB, van der Sluis JJ, Schellekens JF. Treponemal infections in hares in the Netherlands. J Clin Microbiol. 1994;32: 543–546. pmid:8150971
  10. 10. Lumeij JT. Widespread treponemal infections of hare populations (Lepus europaeus) in the Netherlands. Eur J Wildl Res. 2011;57: 183–186.
  11. 11. Fraser CM, Norris SJ, Weinstock GM, White O, Sutton GG, Dodson R, et al. Complete genome sequence of Treponema pallidum, the syphilis spirochete. Science. 1998;281: 375–388. pmid:9665876
  12. 12. Cejková D, Zobaníková M, Chen L, Pospíšilová P, Strouhal M, Qin X, et al. Whole genome sequences of three Treponema pallidum ssp. pertenue strains: yaws and syphilis treponemes differ in less than 0.2% of the genome sequence. PLoS Negl Trop Dis. 2012;6: e1471. pmid:22292095
  13. 13. Giacani L, Jeffrey BM, Molini BJ, Le HT, Lukehart SA, Centurion-Lara A, et al. Complete genome sequence and annotation of the Treponema pallidum subsp. pallidum Chicago strain. J Bacteriol. 2010;192: 2645–2646. pmid:20348263
  14. 14. Matejková P, Strouhal M, Smajs D, Norris SJ, Palzkill T, Petrosino JF, et al. Complete genome sequence of Treponema pallidum ssp. pallidum strain SS14 determined with oligonucleotide arrays. BMC Microbiol. 2008;8: 76. pmid:18482458
  15. 15. Pětrošová H, Zobaníková M, Čejková D, Mikalová L, Pospíšilová P, Strouhal M, et al. Whole genome sequence of Treponema pallidum ssp. pallidum, strain Mexico A, suggests recombination between yaws and syphilis strains. PLoS Negl Trop Dis. 2012;6: e1832. pmid:23029591
  16. 16. Pětrošová H, Pospíšilová P, Strouhal M, Čejková D, Zobaníková M, Mikalová L, et al. Resequencing of Treponema pallidum ssp. pallidum strains Nichols and SS14: correction of sequencing errors resulted in increased separation of syphilis treponeme subclusters. PLoS One. 2013;8: e74319. pmid:24058545
  17. 17. Šmajs D, Zobaníková M, Strouhal M, Čejková D, Dugan-Rocha S, Pospíšilová P, et al. Complete genome sequence of Treponema paraluiscuniculi, strain Cuniculi A: the loss of infectivity to humans is associated with genome decay. PLoS One. 2011;6: e20415. pmid:21655244
  18. 18. Zobaníková M, Mikolka P, Cejková D, Pospíšilová P, Chen L, Strouhal M, et al. Complete genome sequence of Treponema pallidum strain DAL-1. Stand Genomic Sci. 2012;7: 12–21. pmid:23449808
  19. 19. Giacani L, Iverson-Cabral SL, King JC, Molini BJ, Lukehart SA, Centurion-Lara A. Complete genome sequence of the Treponema pallidum subsp. pallidum Sea81-4 strain. Genome Announc. 2014;2: e00333–14. pmid:24744342
  20. 20. Staudová B, Strouhal M, Zobaníková M, Cejková D, Fulton LL, Chen L, et al. Whole genome sequence of the Treponema pallidum subsp. endemicum strain Bosnia A: the genome is related to yaws treponemes but contains few loci similar to syphilis treponemes. PLoS Negl Trop Dis. 2014;8: e3261. pmid:25375929
  21. 21. Smajs D, Norris SJ, Weinstock GM. Genetic diversity in Treponema pallidum: implications for pathogenesis, evolution and molecular diagnostics of syphilis and yaws. Infect Genet Evol. 2012;12: 191–202. pmid:22198325
  22. 22. Giacani L, Sun ES, Hevner K, Molini BJ, Van Voorhis WC, Lukehart SA, et al. Tpr homologs in Treponema paraluiscuniculi Cuniculi A strain. Infect Immun. 2004;72: 6561–6576. pmid:15501788
  23. 23. Centurion-Lara A, Giacani L, Godornes C, Molini BJ, Brinck Reid T, Lukehart SA. Fine analysis of genetic diversity of the tpr gene family among treponemal species, subspecies and strains. PLoS Negl Trop Dis. 2013;7: e2222. pmid:23696912
  24. 24. Strouhal M, Smajs D, Matejková P, Sodergren E, Amin AG, Howell JK, et al. Genome differences between Treponema pallidum subsp. pallidum strain Nichols and T. paraluiscuniculi strain Cuniculi A. Infect Immun. 2007;75: 5859–5866. pmid:17893135
  25. 25. Mikalová L, Strouhal M, Čejková D, Zobaníková M, Pospíšilová P, Norris SJ, et al. Genome analysis of Treponema pallidum subsp. pallidum and subsp. pertenue strains: most of the genetic differences are localized in six regions. PLoS One. 2010;5: e15713. pmid:21209953
  26. 26. Harper KN, Liu H, Ocampo PS, Steiner BM, Martin A, Levert K, et al. The sequence of the acidic repeat protein (arp) gene differentiates venereal from nonvenereal Treponema pallidum subspecies, and the gene has evolved under strong positive selection in the subspecies that causes syphilis. FEMS Immunol Med Microbiol. 2008;53: 322–332. pmid:18554302
  27. 27. Liu H, Rodes B, George R, Steiner B. Molecular characterization and analysis of a gene encoding the acidic repeat protein (Arp) of Treponema pallidum. J Med Microbiol. 2007;56: 715–721. pmid:17510254
  28. 28. Brinkman MB, McGill MA, Pettersson J, Rogers A, Matejková P, Smajs D, et al. A novel Treponema pallidum antigen, TP0136, is an outer membrane protein that binds human fibronectin. Infect Immun. 2008;76: 1848–1857. pmid:18332212
  29. 29. Flasarová M, Smajs D, Matejková P, Woznicová V, Heroldová-Dvoráková M, Votava M. Molecular detection and subtyping of Treponema pallidum subsp. pallidum in clinical specimens. Epidemiol Mikrobiol Imunol. 2006;55: 105–111. pmid:16970074
  30. 30. Marra C, Sahi S, Tantalo L, Godornes C, Reid T, Behets F, et al. Enhanced molecular typing of Treponema pallidum: geographical distribution of strain types and association with neurosyphilis. J Infect Dis. 2010; 202: 1380–1388. pmid:20868271
  31. 31. Cameron CE, Lukehart SA, Castro C, Molini B, Godornes C, Van Voorhis WC. Opsonic potential, protective capacity, and sequence conservation of the Treponema pallidum subspecies pallidum Tp92. J Infect Dis. 2000;181: 1401–1413. pmid:10762571
  32. 32. Harper KN, Ocampo PS, Steiner BM, George RW, Silverman MS, Bolotin S, et al. On the origin of the treponematoses: a phylogenetic approach. PLoS Negl Trop Dis. 2008;2: e148. pmid:18235852
  33. 33. Nechvátal L, Pětrošová H, Grillová L, Pospíšilová P, Mikalová L, Strnadel R, et al. Syphilis-causing strains belong to separate SS14-like or Nichols-like groups as defined by multilocus analysis of 19 Treponema pallidum strains. Int J Med Microbiol. 2014;304: 645–653. pmid:24841252
  34. 34. Baseman JB, Nichols JC, Rumpp JW, Hayes NS. Purification of Treponema pallidum from infected rabbit tissue: resolution into two treponemal populations. Infect Immun. 1974;10: 1062–1067. pmid:16558090
  35. 35. Lukehart SA, Shaffer JM, Baker-Zander SA. A subpopulation of Treponema pallidum is resistant to phagocytosis: possible mechanism of persistence. J Infect Dis. 1992;166: 1449–1453. pmid:1431264
  36. 36. Stamm LV, Bergen HL. The sequence-variable, single-copy tprK gene of Treponema pallidum Nichols strain UNC and Street strain 14 encodes heterogeneous TprK proteins. Infect Immun. 2000;68: 6482–6486. pmid:11035764
  37. 37. Centurion-Lara A, Godornes C, Castro C, Van Voorhis WC, Lukehart SA. The tprK gene is heterogeneous among Treponema pallidum strains and has multiple alleles. Infect Immun. 2000;68: 824–831. pmid:10639452
  38. 38. LaFond RE, Centurion-Lara A, Godornes C, Rompalo AM, Van Voorhis WC, Lukehart SA. Sequence diversity of Treponema pallidum subsp. pallidum tprK in human syphilis lesions and rabbit-propagated isolates. J Bacteriol. 2003;185: 6262–6268. pmid:14563860
  39. 39. Smajs D, McKevitt M, Wang L, Howell JK, Norris SJ, Palzkill T, et al. BAC library of T. pallidum DNA in E. coli. Genome Res. 2002;12: 515–522. pmid:11875041
  40. 40. Giacani L, Brandt SL, Puray-Chavez M, Reid TB, Godornes C, Molini BJ, et al. Comparative investigation of the genomic regions involved in antigenic variation of the TprK antigen among treponemal species, subspecies, and strains. J Bacteriol. 2012;194: 4208–4225. pmid:22661689
  41. 41. van der Woude MW, Bäumler AJ. Phase and antigenic variation in bacteria. Clin Microbiol Rev. 2004;17: 581–611. pmid:15258095
  42. 42. Palmer GH, Bankhead T, Lukehart SA. “Nothing is permanent but change”- antigenic variation in persistent bacterial pathogens. Cell Microbiol. 2009;11: 1697–1705. pmid:19709057
  43. 43. Golubchik T, Batty EM, Miller RR, Farr H, Young BC, Larner-Svensson H, et al. Within-host evolution of Staphylococcus aureus during asymptomatic carriage. PLoS One. 2013;8: e61319. pmid:23658690
  44. 44. Stoesser N, Sheppard AE, Moore CE, Golubchik T, Parry CM, Nget P, et al. Extensive within-host diversity in fecally carried extended-spectrum-beta-lactamase-producing Escherichia coli isolates: implications for transmission analyses. J Clin Microbiol. 2015;53: 2122–2131. pmid:25903575
  45. 45. Braden CR, Morlock GP, Woodley CL, Johnson KR, Colombel AC, Cave MD, et al. Simultaneous infection with multiple strains of Mycobacterium tuberculosis. Clin Infect Dis. 2001;33: e42–47. pmid:11512106
  46. 46. Cave MD, Eisenach KD, Templeton G, Salfinger M, Mazurek G, Bates JH, et al. Stability of DNA fingerprint pattern produced with IS6110 in strains of Mycobacterium tuberculosis. J Clin Microbiol. 1994;32: 262–266. pmid:7907344
  47. 47. Niemann S, Richter E, Rüsch-Gerdes S. Stability of Mycobacterium tuberculosis IS6110 restriction fragment length polymorphism patterns and spoligotypes determined by analyzing serial isolates from patients with drug-resistant tuberculosis. J Clin Microbiol. 1999;37: 409–412. pmid:9889229
  48. 48. Niemann S, Richter E, Rüsch-Gerdes S, Schlaak M, Greinert U. Double infection with a resistant and a multidrug-resistant strain of Mycobacterium tuberculosis. Emerging Infect Dis. 2000;6: 548–551. pmid:10998389
  49. 49. Iverson-Cabral SL, Astete SG, Cohen CR, Rocha EP, Totten PA. Intrastrain heterogeneity of the mgpB gene in Mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences. Infect Immun. 2006;74: 3715–3726. pmid:16790744
  50. 50. Snyder LA, Loman NJ, Linton JD, Langdon RR, Weinstock GM, Wren BW, et al. Simple sequence repeats in Helicobacter canadensis and their role in phase variable expression and C-terminal sequence switching. BMC Genomics. 2010;11: 67. pmid:20105305
  51. 51. Arias CA, Torres HA, Singh KV, Panesso D, Moore J, Wanger A, et al. Failure of daptomycin monotherapy for endocarditis caused by an Enterococcus faecium strain with vancomycin-resistant and vancomycin-susceptible subpopulations and evidence of in vivo loss of the vanA gene cluster. Clin Infect Dis. 2007;45: 1343–1346. pmid:17968832
  52. 52. Sokurenko EV, Gomulkiewicz R, Dykhuizen DE. Source-sink dynamics of virulence evolution. Nat Rev Microbiol. 2006;4: 548–555. pmid:16778839
  53. 53. Worby CJ, Lipsitch M, Hanage WP. Within-host bacterial diversity hinders accurate reconstruction of transmission networks from genomic distance data. PLoS Comput Biol. 2014;10: e1003549. pmid:24675511
  54. 54. Paterson GK, Harrison EM, Murray GGR, Welch JJ, Warland JH, Holden MTG, et al. Capturing the cloud of diversity reveals complexity and heterogeneity of MRSA carriage, infection and transmission. Nat Commun. 2015;6: 6560. pmid:25814293
  55. 55. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25: 1754–1760. pmid:19451168
  56. 56. Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26: 589–595. pmid:20080505
  57. 57. Kao D. Code: get base counts from SAMtools' mpileup output. 2012. In: Next genetics blog [Internet]. Available: http://blog.nextgenetics.net/?e=56#body-anchor.
  58. 58. Yu G. GenHtr: a tool for comparative assessment of genetic heterogeneity in microbial genomes generated by massive short-read sequencing. BMC Bioinformatics. 2010;11: 508. pmid:20939910
  59. 59. Jerome JP, Bell JA, Plovanich-Jones AE, Barrick JE, Brown CT, Mansfield LS. Standing genetic variation in contingency loci drives the rapid adaptation of Campylobacter jejuni to a novel host. PLoS One. 2011;6: e16399. pmid:21283682
  60. 60. Liu Q, Guo Y, Li J, Long J, Zhang B, Shyr Y. Steps to ensure accuracy in genotype and SNP calling from Illumina sequencing data. BMC Genomics. 2012;13 Suppl 8: S8. pmid:23281772
  61. 61. Leshchiner I, Alexa K, Kelsey P, Adzhubei I, Austin-Tse CA, Cooney JD, et al. Mutation mapping and identification by whole-genome sequencing. Genome Res. 2012;22: 1541–1548. pmid:22555591
  62. 62. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43: 491–498. pmid:21478889
  63. 63. Altmann A, Weber P, Bader D, Preuss M, Binder EB, Müller-Myhsok B. A beginners guide to SNP calling from high-throughput DNA-sequencing data. Hum Genet. 2012;131: 1541–1554. pmid:22886560
  64. 64. Koboldt DC, Chen K, Wylie T, Larson DE, McLellan MD, Mardis ER, et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics. 2009;25: 2283–2285. pmid:19542151
  65. 65. Shen Y, Wan Z, Coarfa C, Drabek R, Chen L, Ostrowski EA, et al. A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome Res. 2010;20: 273–280. pmid:20019143
  66. 66. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29: 24–26. pmid:21221095
  67. 67. Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009;37: 3124.
  68. 68. Zdobnov EM, Apweiler R. InterProScan—an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001;17: 847–848. pmid:11590104
  69. 69. Yu NY, Wagner JR, Laird MR, Melli G, Rey S, Lo R, et al. PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics. 2010;26: 1608–1615. pmid:20472543
  70. 70. Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008;36: e105. pmid:18660515
  71. 71. Minoche AE, Dohm JC, Himmelbauer H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 2011;12: R112. pmid:22067484
  72. 72. Giacani L, Brandt SL, Ke W, Reid TB, Molini BJ, Iverson-Cabral S, et al. Transcription of TP0126, Treponema pallidum putative OmpW homolog, is regulated by the length of a homopolymeric guanosine repeat. Infect Immun. 2015;83: 2275–2289. pmid:25802057
  73. 73. Giacani L, Lukehart S, Centurion-Lara A. Length of guanosine homopolymeric repeats modulates promoter activity of subfamily II tpr genes of Treponema pallidum ssp. pallidum. FEMS Immunol Med Microbiol. 2007;51: 289–301. pmid:17683506
  74. 74. Barrick JE, Yu DS, Yoon SH, Jeong H, Oh TK, Schneider D, et al. Genome evolution and adaptation in a long-term experiment with Escherichia coli. Nature. 2009;461: 1243–1247. pmid:19838166
  75. 75. Fryxell KJ, Zuckerkandl E. Cytosine deamination plays a primary role in the evolution of mammalian isochores. Mol Biol Evol. 2000;17: 1371–1383. pmid:10958853
  76. 76. Gros L, Saparbaev MK, Laval J. Enzymology of the repair of free radicals-induced DNA damage. Oncogene. 2002;21: 8905–8925. pmid:12483508
  77. 77. Costello M, Pugh TJ, Fennell TJ, Stewart C, Lichtenstein L, Meldrim JC, et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 2013;41: e67. pmid:23303777
  78. 78. Flasarová M, Pospíšilová P, Mikalová L, Vališová Z, Dastychová E, Strnadel R, et al. Sequencing-based molecular typing of Treponema pallidum strains in the Czech Republic: all identified genotypes are related to the sequence of the SS14 strain. Acta Derm Venereol. 2012;92: 669–674. pmid:22434073
  79. 79. Grillová L, Pĕtrošová H, Mikalová L, Strnadel R, Dastychová E, Kuklová I, et al. Molecular typing of Treponema pallidum in the Czech Republic during 2011 to 2013: increased prevalence of identified genotypes and of isolates with macrolide resistance. J Clin Microbiol. 2014;52: 3693–3700. pmid:25100820
  80. 80. Pillay A, Liu H, Chen CY, Holloway B, Sturm AW, Steiner B, et al. Molecular subtyping of Treponema pallidum subspecies pallidum. Sex Transm Dis. 1998;25: 408–414. pmid:9773432
  81. 81. Marra C, Sahi S, Tantalo L, Godomes C, Reid T, Behets F, et al. Enhanced molecular typing of Treponema pallidum: geographical distribution of strain types and association with neurosyphilis. J Infect Dis. 2010;202: 1380–1388. pmid:20868271
  82. 82. Smajs D, McKevitt M, Howell JK, Norris SJ, Cai WW, Palzkill T, et al. Transcriptome of Treponema pallidum: gene expression profile during experimental rabbit infection. J Bacteriol. 2005;187: 1866–1874. pmid:15716460
  83. 83. Centurion-Lara A, Castro C, Barrett L, Cameron C, Mostowfi M, Van Voorhis WC, et al. Treponema pallidum major sheath protein homologue TprK is a target of opsonic antibody and the protective immune response. J Exp Med. 1999;189: 647–656. pmid:9989979
  84. 84. Leader BT, Hevner K, Molini BJ, Barrett LK, Van Voorhis WC, Lukehart SA. Antibody responses elicited against the Treponema pallidum repeat proteins differ during infection with different isolates of Treponema pallidum subsp. pallidum. Infect Immun. 2003;71: 6054–6057. pmid:14500529
  85. 85. Cox DL, Luthra A, Dunham-Ems S, Desrosiers DC, Salazar JC, Caimano MJ, et al. Surface immunolabeling and consensus computational framework to identify candidate rare outer membrane proteins of Treponema pallidum. Infect Immun. 2010;78: 5178–5194. pmid:20876295
  86. 86. Centurion-Lara A, LaFond RE, Hevner K, Godornes C, Molini BJ, Van Voorhis WC, et al. Gene conversion: a mechanism for generation of heterogeneity in the tprK gene of Treponema pallidum during infection. Mol Microbiol. 2004;52: 1579–1596. pmid:15186410
  87. 87. Reid TB, Molini BJ, Fernandez MC, Lukehart SA. Antigenic variation of TprK facilitates development of secondary syphilis. Infect Immun. 2014;82: 4959–4967. pmid:25225245
  88. 88. Giacani L, Molini BJ, Kim EY, Godornes BC, Leader BT, Tantalo LC, et al. Antigenic variation in Treponema pallidum: TprK sequence diversity accumulates in response to immune pressure during experimental syphilis. J Immunol. 2010;184: 3822–3829. pmid:20190145
  89. 89. LaFond RE, Centurion-Lara A, Godornes C, Van Voorhis WC, Lukehart SA. TprK sequence diversity accumulates during infection of rabbits with Treponema pallidum subsp. pallidum Nichols strain. Infect Immun. 2006;74: 1896–1906. pmid:16495565
  90. 90. LaFond RE, Molini BJ, Van Voorhis WC, Lukehart SA. Antigenic variation of TprK V regions abrogates specific antibody binding in syphilis. Infect Immun. 2006;74: 6244–6251. pmid:16923793
  91. 91. Crozat E, Philippe N, Lenski RE, Geiselmann J, Schneider D. Long-term experimental evolution in Escherichia coli. XII. DNA topology as a key target of selection. Genetics. 2005;169: 523–532. pmid:15489515
  92. 92. Tong SYC, Holden MTG, Nickerson EK, Cooper BS, Köser CU, Cori A, et al. Genome sequencing defines phylogeny and spread of methicillin-resistant Staphylococcus aureus in a high transmission setting. Genome Res. 2015;25: 111–118. pmid:25491771
  93. 93. Nichols HJ, Hough WH. Demonstration of Spirochaeta pallida in the cerebrospinal fluid: from a patient with nervous relapse following the use of salvarsan. JAMA. 1913; 60: 108.
  94. 94. Wendel GD Jr, Sanchez PJ, Peters MT, Harstad TW, Potter LL, Norgard MV. Identification of Treponema pallidum in amniotic fluid and fetal blood from pregnancies complicated by congenital syphilis. Obstet Gynecol. 1991;78: 890–895. pmid:1923218
  95. 95. Stamm LV, Kerner TC Jr, Bankaitis VA, Bassford PJ Jr. Identification and preliminary characterization of Treponema pallidum protein antigens expressed in Escherichia coli. Infect Immun. 1983;41: 709–721. pmid:6347894
  96. 96. Turner TB, Hollander DH. Biology of the treponematoses based on studies carried out at the International Treponematosis Laboratory Center of the Johns Hopkins University under the auspices of the World Health Organization. Monogr Ser World Health Organ. 1957;35: 3–266. pmid:13423342
  97. 97. Liska SL, Perine PL, Hunter EF, Crawford JA, Feeley JC. Isolation and transportation of Treponema pertenue in golden hamsters. Curr Microbiol. 1982;7: 41–43.
  98. 98. Gastinel P, Vaisman A, Hamelin A, Dunoyer F. Study of a recently isolated strain of Treponema pertenue. Prophyl Sanit Morale. 1963;35: 182–188. pmid:13946770
  99. 99. Turner TB, Hollander DH. Studies on treponemes from cases of endemic syphilis. Bull World Health Organ. 1952;7: 75–81. pmid:13019545
  100. 100. Anand A, Luthra A, Dunham-Ems S, Caimano MJ, Karanian C, LeDoyt M, et al. TprC/D (Tp0117/131), a trimeric, pore-forming rare outer membrane protein of Treponema pallidum, has a bipartite domain structure. J Bacteriol. 2012;194: 2321–2333. pmid:22389487