Systematic evaluation of error rates and causes in short samples in next-generation sequencing

Sci Rep. 2018 Jul 19;8(1):10950. doi: 10.1038/s41598-018-29325-6.

Abstract

Next-generation sequencing (NGS) is the method of choice when large numbers of sequences have to be obtained. While the technique is widely applied, varying error rates have been observed. We analysed millions of reads obtained after sequencing of one single sequence on an Illumina sequencer. According to our analysis, the index-PCR for sample preparation has no effect on the observed error rate, even though PCR is traditionally seen as one of the major contributors to enhanced error rates in NGS. In addition, we observed very persistent pre-phasing effects although the base calling software corrects for these. Removal of shortened sequences abolished these effects and allowed analysis of the actual mutations. The average error rate determined was 0.24 ± 0.06% per base and the percentage of mutated sequences was found to be 6.4 ± 1.24%. Constant regions at the 5'- and 3'-end, e.g., primer binding sites used in in vitro selection procedures seem to have no effect on mutation rates and re-sequencing of samples obtains very reproducible results. As phasing effects and other sequencing problems vary between equipment and individual setups, we recommend evaluation of error rates and types to all NGS-users to improve the quality and analysis of NGS data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology
  • High-Throughput Nucleotide Sequencing / methods*
  • Mutation
  • Polymerase Chain Reaction / methods*
  • Quality Control
  • Sequence Analysis, DNA
  • Software