Introduction

Mitochondrial DNA (mtDNA) is a genetic marker frequently used in forensic analysis [1 and references therein], and in population genetic and phylogenetic research [2, 3 and references therein]. The advantages of this marker are manifold. A high copy number in cells enables analysis of small amounts of cell material (e.g., [1]) which is especially important in forensic studies where samples often contain small amounts and/or highly degraded DNA. Because of a high mutation rate, compared to nuclear DNA, analysis of a relatively short mtDNA region is often sufficient for forensic [1] identification or fine-scale phylogeographical studies [2, 3]. In general, uniparental (maternal) inheritance as well as lack of recombination of mtDNA simplifies the reconstruction of pedigree analysis and population history [2, 3].

However, heteroplasmy [4, 5] may render analysis in forensic case work difficult. Heteroplasmic organisms carry more than one mitochondrial genotype (e.g., [1, 6, 7]). Three possible mechanisms may generate heteroplasmy: paternal leakage, recombination, and mutation [2, 3 and references therein]. Paternal leakage and recombination seem to be rare events normally not leading to heteroplasmy; instead mutations in the haploid mitochondrial genome are the most important cause. Heteroplasmic point mutations clearly present a challenge for correct individual identification [1, 6] in forensics and for haplotype identification in population genetic/phylogenetic studies [2, 3].

As a consequence of mutation, heteroplasmy represents the intermediate step to fixation of either the wild type or the mutated type [5, 8, 9] and is therefore an essential step in the generation of genetic diversity in mtDNA. For a neutral mitochondrial heteroplasmic variant, the time to fixation has been estimated to be approximately 200 generations in humans and chinook salmon [2]. However, there have been cases reporting extremely fast fixation of heteroplasmic point mutations, within a few generations (e.g., [10, 11]) and in an extreme case in one generation [12]. The rapid segregation of haplotypes is thought to be generated by a bottleneck during oogenesis or embryogenesis reducing the number of mtDNA molecules, leading to random genetic drift [1315]. Therefore, the proportions of heteroplasmic components may vary among siblings or from mother to offspring [16, 17]; e.g., one sibling may have an apparently fixed, homoplasmic genotype while others display extensive variation of proportions. In forensic case work, this may lead to related individuals incorrectly being ruled as nonrelated. Variable levels of heteroplasmy may even occur within individuals between tissues (mosaicism) [1, 1821] and may therefore lead to incorrect haplotype assignment and exclusion of an individual as source of an evidence material. On the other hand, occurrence of the same heteroplasmic point mutation in otherwise identical sequences was used to increase the significance of a match for identifying the Romanov family members [2224]. Forensic guidelines for analysis of human samples allow for one sequence difference between individuals without excluding the possibility that the individuals belong to the same lineage (e.g., [25]). However, for dogs, the majority of haplotypes, based on the control region, differ by a single mutation from each other [26]. Therefore, basing exclusions on at least one sequence differences may not be practical since match probabilities would often be too high for an informative analysis. The reason for this low degree of difference between haplotypes is that dogs were domesticated only 10,000–15,000 years ago [26] and, consequently, few sequence differences have accumulated since the origin of dogs.

In dogs, the control region of the mtDNA genome has been used to study the geographical origin and age of dogs [2628] as well as their migration routes, e.g., of the Dingo to Australia or the Old World origin of New World dogs [29, 30]. In addition, the mtDNA control region has been widely used for the identification of dogs in forensic analyses [3139]. It has been proposed to standardize the nomenclature of the mtDNA control region in order to launch a dog database for forensic use [40] and complementary information about heteroplasmic states and mutational hot spots may be useful in this context. All of the aforementioned utilizations warrant the systematic study and identification of heteroplasmy in the mtDNA control region in dogs.

Although heteroplasmy has been occasionally reported in dogs [20, 35, 41], to our knowledge, there are no extensive studies of this phenomenon in dogs. Also, in other animal species, very few studies [11, 42] have evaluated how heteroplasmy levels vary among related individuals and between generations (but see, e.g., [8, 43] for studies in humans). This is probably because samples from far-related specimens or family/pedigree information are not available in most animal species. The domestic dog with extensively recorded pedigrees (e.g., the Swedish and Finnish Kennel Club databases), dense populations, and short generation times provides a unique opportunity to study heteroplasmy on a larger scale. In this study, we perform the first comprehensive study of heteroplasmy in a large number of dog pedigrees. We identify pedigrees with heteroplasmic point mutations and study the segregation and variation of heteroplasmy between generations and among siblings as well as between distant parts of the pedigrees. We also study the correlation between nucleotide positions showing heteroplasmy and mutational hot spots, and we discuss the implications of these results for forensic studies.

Materials and methods

Samples

Pedigree information was obtained from the public databases available through the Swedish (http://kennet.skk.se/hunddata/) and Finnish (http://jalostus.kennelliitto.fi) Kennel Clubs (Supplementary Tables 1 and 2). The representation of the samples analyzed in this study is shown in pedigrees (Fig. 1a–c; Supplementary Figs. 1, 2, and 3). Great care was taken to collect samples from different parts of the pedigree and to include several dogs from the same litters. The samples were collected as buccal epithelial cells using FTA-indicating cards according to the manufacturer’s specifications (Whatman International, UK) or as EDTA–blood samples from which genomic DNA was extracted using a commercially available kit (Puregene; Gentra Systems, Minneapolis, MN, USA). In order to test whether the level of heteroplasmy differs between tissues, we tested 18 dogs for both buccal cells and blood samples. As an additional quality measure, eight blood samples and 12 buccal samples were sequenced twice to compare the possible differences between reruns of the same sample.

Fig. 1
figure 1

Simplified pedigrees A, B, and C. Squares males, circles females, black filled squares/circles sampled males/females. Vertical lines present one generation; the number of horizontal lines crossing vertical lines indicates additional generations. The dashed line in pedigree B indicates that the sample Z1766 belongs to this pedigree, but the exact number of generations is unknown. The dashed box in pedigree C indicates that only in this part of the pedigree heteroplasmic sequences are found. Numbers 1–10 refer to the litters given in Table 1

mtDNA sequence analysis

Amplification and sequencing of a 582-bp-long fragment of the mtDNA control region was performed as described before [34]. Four primer pairs were used to score each heteroplasmic position three times and in both forward and reverse directions. Sequencing of PCR products was performed with BigDye Terminator chemistry on ABI 377 and ABI 3700 instruments (Applied Biosystems). Sequences were aligned with BioEdit [44] and checked by eye. Comparisons of sequences and identification of haplotypes were performed with DnaSP v5 [45].

Sequences and contigs were visually checked to guarantee optimal sequence quality with a minimum of background signal. Proportions of major and minor component were scored by eye (Supplementary Table 2). The proportion of heteroplasmy was estimated by assessing the height of the two peaks at the heteroplasmic position and calculating the proportion between the heights of each peak respective to the sum of the height of both peaks (similar to [8]). Importantly, direct sequencing approaches have been found to be able to score minor component levels of ~10% [9, 43]. Therefore, what appears as a fixed state may be a heteroplasmic state with the minor component at a frequency of less than 10%.

The proportions of the three respective electropherograms were averaged. For the indel case, the peak height of the visible nucleotide was compared to the nearest nucleotides because the deletion was obviously not possible to score. This was feasible because neighboring nucleotides generally showed congruent peak height differences relative to the heteroplasmic position in all three electropherograms. However, it is worth pointing out that these measures represent approximations because sequence quality may differ due to signal intensity and nucleotide incorporation and lab conditions [9 and references therein].

Comparison of heteroplasmic proportions between runs of the same dog sample

In order to ensure that laboratory conditions were stable and to assess the difference between runs, we separately amplified and sequenced eight blood samples and 12 buccal cell samples twice. For blood samples, five samples were scored with identical proportions in both runs and three differed by 10% between runs. For buccal cell samples, seven samples showed identical proportions in both runs and the other five differed by 5–10% between runs (Supplementary Table 2). All samples showed the same predominant nucleotide and were therefore identified as the same haplotype.

Comparison of heteroplasmic proportions between blood and buccal cell samples from the same dog

The difference in heteroplasmic proportions between buccal cell and blood samples from the same dog were assessed for a subset of nine heteroplasmic individuals. One sample had identical scores in buccal cell and blood sample, but the other eight cases showed discrepancies: by a maximum of 10% in heteroplasmic proportions in four cases, 20% in one case, 35% in two, and by a maximum of 40% in one case (Supplementary Table 2). In conclusion, in all cases, heteroplasmy could be identified when amplifying from different DNA sources, but the proportion of heteroplasmy differed more between different DNA sources than between different runs from the same DNA source. Thus, in general, there was good agreement between runs and the few cases which showed higher discrepancies did not impact our analysis noticeably. However, where possible, only one source (either buccal cell or saliva sample) was used for closely related individuals in order to avoid potential bias in heteroplasmic scores.

Results

Screening of pedigrees for heteroplasmies

We studied altogether 180 dogs in 58 pedigrees for signs of heteroplasmy by sequencing a 582-bp-long fragment of the mtDNA control region (nps 15,458–16,039). We used two to 12 dogs from each pedigree corresponding to a total minimum of 875 generations (3,469 years; Supplementary Table 1). Interestingly, the average generation time was 3.96 years (based on the birth year of sampled dogs, and going back in time to the most recent common ancestor of each pedigree), which is in good accordance to previous estimates for canids [46]. Out of the 58 pedigrees, we found three pedigrees (5.17%) with heteroplasmic point mutations, two samples from each pedigree having different haplotypes. The nucleotide positions were clearly heteroplasmic when checked by eye. It should be noted that the resolution of the used approach is limited (see “Materials and methods” section) and cannot detect low levels of heteroplasmy (<10%), and it is possible that some heteroplasmic pedigrees may have been missed.

Segregation of heteroplasmies through generations

To identify the characteristics of heteroplasmy, we studied additional dogs in the three identified pedigrees with heteroplasmy. A total of 131 dogs were studied and are subsequently referred to as pedigrees A, B, and C (Fig. 1; Supplementary Figs. 1, 2, and 3). The studied samples were chosen from litters and across generations to cover the entire pedigrees. We found heteroplasmy at nucleotide positions (pos) 16,003 (G/A), 15,639 (G/A), and 15,931 (A/−). All three positions have been found to vary between haplotypes within three or four out of the six major phylogenetic clades A-F (pos 16,003, clades B–D; pos 15,639, A–D; pos 15,931, A–D) previously identified from dogs [2628]. Therefore, it appears that these positions mutate more frequently than other positions.

For the smallest pedigree, B (Fig. 1b; Supplementary Fig. 2), 11 samples were analyzed, spanning a minimum of 107 years and 32 generations (average generation time 3.34 years). Out of these, seven samples had a heteroplasmic point mutation at position 16,003 resulting in a heteroplasmic mix of G/A (G and A giving haplotypes D1 and D4, respectively [47]). The other four samples carried the mutated haplotype D4. D1 and D4 belong to a haplogroup called d1, and it has been shown that haplogroup d1 is almost exclusively found in Scandinavia in spitz-type dog breeds [47] and therefore this special haplogroup may be especially informative in forensic case work. Taking also this specific heteroplasmy into account, a dog may be assigned with high confidence to pedigree B (and to Lapponian herders) because, so far, only dogs belonging to this pedigree show this heteroplasmy.

For pedigree C (Fig. 1c; Supplementary Fig. 3), we analyzed 46 dogs representing 120 generations (627 years) with an average generation time of 5.23 years. The pedigree had a heteroplasmic point mutation at pos 15,639 with a heteroplasmic mix of G/A (G and A giving haplotypes B1 and B35, respectively). Notably, this lineage showed heteroplasmic dogs (seven samples) only in one lineage (indicated by a black-framed box in Fig. 1c and Supplementary Fig. 3) out of five descending from the founder individual, and all samples from other parts were homoplasmic for the wild-type nucleotide G (38 samples).

Remarkably, a second heteroplasmy was identified in the same part of pedigree C (black-framed box in Fig. 1c and Supplementary Fig. 3). One dog (Z2178) which was homoplasmic wild-type G at pos 15,639 (giving haplotype B1) had a heteroplasmic single base indel at another position [15,931; the same position as the heteroplasmy in pedigree A (see below)] resulting in a mix of A/− (A and − giving the haplotypes B1 and B3, respectively). Thus, each heteroplasmy occurred independently in different dogs. A second sample (Z2225), which is closely related to Z2178 (Fig. 1c; Supplementary Fig. 3), showed apparently the wild-type nucleotide at both pos 15,639 and pos 15,931. Both dogs were sampled twice at independent occasions, and each sample was run twice to confirm sequence data. Thus, in this part of the pedigree, close relatives carried two different heteroplasmies or were homoplasmic for the wild type, potentially resulting in three different haplotype scores (B1, B3, and B35).

Finally, pedigree A (Fig. 1a; Supplementary Fig. 1) had a heteroplasmic single base indel at pos 15,931 with mixes of A/− (A and − giving haplotypes D1 and D3, respectively). Heteroplasmy occurred in all parts of the pedigree and 26 samples were heteroplasmic, 36 samples had the wild-type nucleotide A, whereas 12 samples had an apparent fixation to the deletion (−100%). The 74 samples analyzed across the pedigree encompassed 137 generations and 533 years, giving an average generation time of 3.89 years.

Proportions of heteroplasmy at different kinship levels

Within heteroplasmic parts of the pedigrees, proportions of the respective nucleotides varied considerably among dogs between apparent homoplasmy for each haplotype and any proportions in between (Fig. 1, Table 1). Proportions of heteroplasmy differed strikingly at all kinship levels, demonstrated best by pedigree A (Fig. 1; Supplementary Fig. 1). At the sibling level (within the same litter), some siblings had identical or nearly identical proportions of heteroplasmic point mutations while some had 100% of one of the two nucleotides and therefore looked like having fixed haplotypes. For example, among ten litters with at least three members, the heteroplasmic proportions differed by 100% in one litter (i.e., each opposite fixed state was carried by two siblings) and by 80% in two other litters (Table 1). The detection limit in Sanger sequencing for a minority peak is approximately 10%, and a proportion up to 20% may easily be overlooked. Therefore, in at least 10% of cases, siblings would normally have been scored with different haplotypes.

Table 1 The range and difference of heteroplasmic proportions between dogs from the same litter

Between generations, 30 mother–offspring pairs could be studied (Fig. 2). In general, the whole range of possible nucleotide proportion differences was observed. Thirteen cases (43.3%) included nucleotide proportion changes of 60% or more, thereby leading to changes in the haplotype score. Most importantly, three out of 30 mother–offspring pairs (10%) had proportion differences of 80–90% between generations. Thus, at the 10% detection level of minority peaks, switches between two apparently fixed states appeared frequently. Accordingly, at larger distances in the pedigrees, the probability of an erroneous exclusion of a dog is even higher. For example, dog Y418 and the sibling pair Y426 and Y427 (pedigree A; Fig. 1) are cousins (their mothers are siblings). Also here a total shift with apparent fixation of two different haplotypes has occurred which would lead to exclusion of kinship at the cousin level in dogs. Further, closely related litters in pedigree C (black framed box, Fig. 1c) showed full fixation of the wild-type allele in one litter (Z2147 and Z2141), whereas the other litter (litter 10 in Table 1; Z2137, Z2027, and Z2163) showed a heteroplasmic mix of wild-type (40%) and mutated allele (60%) with the mutated allele in majority. Further, shifts of heteroplasmic proportions among closely related dogs (e.g., Z2139, Z2146, and Z2135) covering three to four generations could be observed. Thus, the heteroplasmic proportions differ strongly and randomly between generations, and the pattern of transmission suggests that severe bottlenecks are involved in mtDNA inheritance in dogs. Comparing different main branches of the pedigrees, no clear pattern could be identified except that pedigree C showed heteroplasmy in only one branch of the pedigree. In the other pedigrees, all parts of the pedigrees showed full range of heteroplasmic proportions and therefore heteroplasmy persists through several generations in these pedigrees.

Fig. 2
figure 2

Difference of heteroplasmic proportions among mother–offspring pairs in percent

Discussion

We present here a comprehensive study of heteroplasmy in canine pedigrees. We show that close relatives or even siblings cannot be reliably used for forensic identification because the proportions of wild-type (WT) and mutant (MT) nucleotide vary considerably among siblings and across generations. Close relatives may show total reversion of WT and MT proportions which may lead to an erroneous exclusion of case material. Additionally, the special case of two independent mutations and therefore three different haplotypes in one of the studied pedigrees further challenge case studies in dogs.

Intergenerational changes have been previously shown in a series of other animals and in human (e.g., [8, 1012, 43]). For example, [43] found that five out 135 human mother–child comparisons had a strong shift in heteroplasmic point mutations (two point heteroplasmies found in mother and resolved to homoplasmy in children; three point heteroplasmies found in children only). The full fixation of the mutant variant within one or two generations in the North Atlantic right whale, Eubalaena glacialis, resulting in different haplotypes between siblings, has been identified by McLeod and White [11]. Finally, Santos et al. [8] studied 48 human pedigrees (422 individuals) corresponding to 321 mtDNA transmissions and identified both shifts (up to ~40% change in heteroplasmic proportions) in some pedigrees as well as an apparently stable pattern of heteroplasmy in other pedigrees. In the present study, it could be confirmed that rapid switches of heteroplasmic point mutations occur between generations among dogs and that different offspring may carry opposite genotypes (of the wild-type or mutant variant) or show a range of different heteroplasmic proportions.

It has been proposed that heteroplasmy may occur more frequently at hypervariable sites (mutational hot spots) [6, 9, 4850]. All three heteroplasmic positions identified in this study have shown variation in previous studies and are possibly mutational hot spots. To study the connection between heteroplasmy and hypervariability, we compiled data from this study and from literature about these two phenomena for positions within the 582-bp region (Table 2). The positions were identified based on having shown heteroplasmy and/or signs of hypervariability [26, 41].

Table 2 A review of heteroplasmic positions and potential mutational hot spots within the 582-bp segment of the control region
  1. 1.

    A study of 105 dogs from the UK by Wetton et al. [20] found a heteroplasmic single base indel in one dog in a blood sample. The heteroplasmic proportions ranged from near fixation of the deletion to predominance of the wild-type nucleotide in a sample of 12 hairs from the same dog. This heteroplasmy (15,931; A/−) occurred at the same position as in pedigree A.

  2. 2.

    Five point heteroplasmic positions among 133 dogs from Austria, four of which within the region sequenced in the present study, were detected by Eichmann and Parson [35] by direct sequencing. One of these positions (15,931; A/−) was the same as in pedigree A and the case found by [20]. However, [35] found this position to be heteroplasmic in sequences from phylogenetic clades A and B and [20] in clade B, whereas we found it to be heteroplasmic in sequences from clade D in pedigree A (and clade B in pedigree C). Additionally, three other positions in HVS-I were found to vary considerably among clades and therefore suggested to be mutational hot spots (Table 2).

  3. 3.

    A phylogenetic tree approach to calculate how often character changes (substitutions) have occurred at different nucleotide positions in the tree was applied by Gundry et al. [41]. They found three highly variable positions, with three to six substitutions in the phylogeny, in the 582-bp region studied in the present study. One of these, pos 15,639 with six changes and the three nucleotide states A/T/G, is identical to one of the two heteroplasmic positions in pedigree C in the present study.

  4. 4.

    A recent study of 1,543 dogs by Pang et al. [26] increased the number of reported haplotypes considerably to 220 for the here studied 582-bp region (for comparison—[41] reported 45 haplotypes). Therefore, we checked if additional highly variable positions can be identified on the basis of the existing phylogeny with six major phylogenetic clades (A–F). Thus, mutated positions were identified by studying the alignments within each clade: more than one nucleotide, or an indel, at a position implies a mutation within a clade. Positions with at least three mutational changes were scored as a potential mutational hot spot. Position 15,639 carried a “C” in haplotype A165 and therefore all four character states (T/G/A/C) have now been observed in this position. Position 15,931 carried a “G” in haplotype A94 and in haplotypes found by [41] and therefore, in addition to the indel, also has an A–G substitution (A/G/−).

To conclude, all three heteroplasmic positions identified in the present study appear to be mutational hot spots. The indel position 15,931 which was heteroplasmic both in two of our pedigrees and in the studies by [20] and [35], and varies within phylogenetic clades A–D, has to be considered a mutational hot spot. The same applies to position 15,639 which changed six times in the phylogenetic tree of [41], carries all four possible nucleotides [26], and was heteroplasmic in our analysis. Finally, position 16,003 carries two different nucleotides in different haplotypes in clades B–D and therefore likely mutated at least three times in history and showed a heteroplasmic mix of both nucleotides in the current study. Thus, all three heteroplasmies identified in the present study are found at positions which are among the most variable. In total, only six positions have been reported so far to show heteroplasmy, and all these positions show mutations. This clearly demonstrates that heteroplasmy occurs more often at mutational hot spots.

In this study, we show that extreme shifts of heteroplasmic proportions can be observed among dogs both in mother–offspring pairs, and among siblings as well as other relatives, with, e.g., one dog (sibling or mother) carrying the wild type in apparently fixed state and siblings or offspring having the mutated type in fixation. Therefore, using heteroplasmy for increasing match significance of case material should be done with caution in dogs. The issue of how to quantify the match significance in these cases is also not yet solved. Further, the heteroplasmies occurred in what appears to be mutational hot spot positions with higher mutation rates than other nucleotide positions. It was also demonstrated that up to three haplotypes occurred in one part of a pedigree because mutational hot spots generated increased diversity. Thereby, it seems that mutational hot spot positions are unreliable for exclusions or confirmations of case material while other positions with lower mutation rates are more reliable. For example, possibly an exclusion based on differences in two hot spot mutations may be unreliable while a single difference occurring in a position with a low mutation rate may be highly informative. This fact may be taken into account in forensic analysis. A problem is that knowledge about which positions are hypervariable is largely missing and relies on studies of phylogenies and a few reported cases of heteroplasmy as reviewed in this study. The detailed pedigree analysis presented here contributes a first step to increased knowledge about the prevalence and nature of heteroplasmy and hotspots in dogs, but further detailed analyses are warranted.