Introduction

Next-generation sequencing (NGS) has revolutionized the field of biology over the last decade. The Genomes OnLine Database (GOLD) that monitors sequencing projects worldwide has grown from just 1575 sequencing projects in 2005 to over 70,000 in 2015 (Reddy et al. 2015). This is partly caused by a rapid drop in the price of high-throughput sequencing (Hayden 2014), but also an increase of free user-friendly bioinformatical tools such as MG-RAST (Meyer et al. 2008), MEGAN (Huson et al. 2016) and user fora such as seqanswers.com, biostars.org etc.

This “brave new world” was introduced into soil sciences more than 10 years ago (Daniel 2005) and is becoming increasingly popular, as it is the only approach known, which allows a direct assessment of microbial community composition and function on various trophic levels. Today, according to the web of science, more than 900 papers have been published on soil metagenomes. In early times, sequencing depth was in the range of less than 1 Gbase and often resulted in the identification of only major functional traits and house keeping genes; today in recent publications up to 100 Gbases have been sequenced (Hultman et al. 2015), which allowed even a partly reconstruction of genomes of single microbes from the obtained reads. However, the interpretation of soil metagenomics data is still a challenge, given the often complex composition of the microbiomes, as well as their huge dynamics in time and space (Ebrahimi and Or 2016).

Previous papers have focused on specific aspects of metagenomic data generation or analysis, such as the impact of the DNA extraction methods and read annotation stringency on the apparent composition of a metagenome (Delmont et al. 2013), the importance of coverage estimation (Rodriguez-R and Konstantinidis 2014a) or the change from the current use of gene-centric snapshots towards genome-centric temporal studies (Prosser 2015). Major steps and recommendations for further reading are summarized in Table 1. In this paper, we discuss some basic guidelines for the experimental design of metagenomic surveys to characterize community composition and function of soil microbiomes, without losing the environmental context.

Table 1 Checklist for analysis of metagenomic datasets

Sampling strategy

Soils are vertically and horizontally structured ecosystems, which are composed of a multitude of different microhabitats comprising diverse physical, chemical, and biological properties (Totsche et al. 2010). The degree of heterogeneity strongly depends on (i) the sampled compartment, e.g., the rhizosphere is less heterogeneous compared to bulk soil (Hinsinger et al. 2009), (ii) the soil texture, which strongly influences aggregate formation and also nucleic acid extraction efficiency, (iii) the above ground diversity and plant coverage, (iv) season, and (v) specific site characteristics like slope, shadowing, and groundwater table. (Petersen and Esbensen 2005). Taking this heterogeneity into account, the typically 500 mg to 10 g soil used for DNA extraction often do not reflect a single microsite, but a mixture of different compartments with differing chemical, physical, and biological properties, which often makes data interpretation quite challenging and only allows a correlative analysis of microbial data with abiotic soil properties, but does not increase our mechanistic understanding of how soil ecosystems work.

Although the effect of spatial heterogeneity can be reduced by increasing the amount of soil used for DNA extraction (Penton et al. 2016), resulting in the above-mentioned problems related to data interpretation, one soil sample taken from a given site is by far not representative and like in all other biological experiments true replicates need to be analyzed (Prosser 2010). In the best case, the underlying sampling design should include a geostatistical setting to better characterize the sampling site. In any case, a minimum of three replicates per treatment is needed to perform proper statistic testing. To identify the optimal number of replicates, a statistical power analysis can be performed, which reduces the chance of type II errors (Klironomos et al. 1999). However, often the result of such a power analysis might not meet the financial and computational frames of a sequencing project. Thus, to limit the influence of soil heterogeneity, a representative sampling strategy is often used, which includes a pooling of sub-samples (Fig. 1). Mainly for agricultural soils, besides horizontal distribution patterns, also issues related to different soil layers are of interest. Thus, the sampling strategy should be chosen based on the soil stratification to avoid the mixing of different soil horizons. Besides spatial issues also temporal issues need to be taken into account, as soil microbiomes change strongly in response to land management, like fertilization or tillage, plant development stage, climate, and season (Ollivier et al. 2011). Thus, one sampling time point does not help to understand the complexity of microbiomes at a given site but is often a snapshot, which is strongly influenced by the recent local conditions. Taking these strong dynamics of soil microbiomes in time and space (Kuzyakov and Blagodatskaya 2015) into account, the sampling strategy should be driven by a clear research hypothesis.

Fig. 1
figure 1

Soil Sampling. “Randomized Block” and “Spatial constraint induced pseudo replication” are two common ways of soil sampling compromising practicality and reproducibility. Field studies are often divided into blocks for pragmatic reasons (plowing, irrigation etc.) while other variables here illustrated by colors green and red are more easily distributed randomly to increase reproducibility. A, B, and C indicate replicates. Spatial constraints are often imposed since many biological systems are not uniformly distributed such as pine trees in a wild forest. Given microenvironments etc., soil samples are optimally composed of several subsamples which are not anomalous, such as in the shown examples, wheel tracks

As abiotic soil parameters are a major driver of soil microbiomes, besides the factors of interest, a minimum dataset is required, which needs to be analyzed and implemented independently from the research questions. Besides exact GPS coordinates and climatic conditions at the period of sampling, such metadata should include the soil type, soil texture, soil pH, stable pools of soil organic matter like total organic C and N, and labile pools of C, N, P, and S. If agricultural sites are studied, management-related properties like fertilization regimes, tillage, cropping sequence, plant protection measures, and plant biomass should be given. For unmanaged sites at least above ground diversity should be characterized.

Sample processing and downstream analysis

Soils should be stored after sampling at a suitable temperature, which is below 4 °C for short-term storage in the field and −20 °C for long-term storage (Lauber et al. 2010; Tatangelo et al. 2014). Compared to amplicon-based sequencing, the direct DNA sequencing (metagenomics) requires higher amounts of high-quality DNA, which in turn also depends on the kit used for library preparation (500 pg–1 μg). Thus, there is often a need to adapt the used DNA extraction protocol to fulfill these requirements. The use of multiple displacement amplification should be avoided taking the significant bias introduced into account (Yilmaz et al. 2010). Since DNA extraction protocols vary in efficiency depending on the nature of the samples and in removing various inhibitors we recommend testing the workflow on a few non-essential samples first (Frostegård et al. 1999). After a DNA extraction method has been selected, it should be used consistently given the inherent bias introduced throughout the whole project. Finally, depending on the aim of the study, one might also consider employing methods that separate extracellular DNA from intracellular DNA (Pietramellara et al. 2009) which allow a discrimination between alive and dead microbes. As recommended by the Earth Microbiome Project (Gilbert et al. 2014) and due to the impact of downstream procedures like DNA extraction or library preparation on detected microbial communities (Albertsen et al. 2015), it is essential to include negative controls, e.g., negative DNA extractions (Salter et al. 2014), mainly if low amounts of DNA (<5 ng) are used for sequencing.

Rapid advances in sequencing technology, which each have their specific challenges, make it impossible to provide universal guidelines. With 454 pyrosequencing being outdated and long read technologies such as Oxford Nanopore Technologies and PacBio® yet not frequently used for metagenomics, here, we focus on Illumina-based technologies, which are currently the de facto standard in metagenomics (Sanchez-Flores et al. 2015).

The needed quality of reads obtained by sequencing is highly dependent on questions asked, but nevertheless quality filtering of the sequences is essential and should be adjusted specifically for the dataset at hand to optimize the trade-off between read-loss and final quality of the dataset (Del Fabbro et al. 2013). Key quality controls should include the following steps: removal of sequencing adapters, quality and length filtering, and removal of possible contaminants such as PhiX and/or host DNA. A good combination is adapter removal for the removal of adapters, quality/length trimming, and merging of paired sequences (Schubert et al. 2016), followed by Deconseq for the removal of contaminants (Schmieder and Edwards 2011). Lack of proper contaminant removal is especially critical with Illumina sequencing as apparent from the large scale contamination of microbial isolate genomes with Illumina PhiX control DNA (Mukherjee et al. 2015).

The sequencing depth for a sound bioinformatic analysis strongly depends on the aims of the project. If binning is planned to assemble larger contigs from the obtained reads, sequencing depth of up to 100 Gbases per sample are needed (Hultman et al. 2015), for a pure comparison of single reads, for example, to reconstruct major nutrient cycles in a given soil much lower sequencing depth (5–10 Gbases) are required (Bergkemper et al. 2016). While highly recommended, estimating the obtained sequencing depth or coverage of a metagenome is challenging compared to, e.g., 16S rRNA-based amplicon sequencing. Using16S rRNA-based amplicon sequencing, we can assume that public databases allow us to identify the vast majority of reads, while comparing metagenomic datasets to public databases such as the NCBI non-redundant protein database or functional assignment databases such as KEGG (Kanehisa et al. 2016), SEED (Overbeek et al. 2005) or COG (Tatusov et al. 2000) would only identify a part of the reads and have a bias towards model and/or medically relevant organisms. Therefore, rarefaction analysis makes sense with 16S rRNA amplicons to assess species richness and sample coverage, while rarefaction of metagenomics datasets to assess metagenomic complexity and sample coverage would overestimate coverage, which is not even consistent across different samples. Thus, for more accurate coverage estimations of metagenomics data, database-independent approaches are needed. Nonpareil (Rodriguez-R and Konstantinidis 2014b), which examines the degree of overlap among individual sequences to assess if a sufficient coverage has been achieved, is a good alternative to overcome the above-mentioned problems (Rodriguez-R and Konstantinidis 2014a).

Assembling contigs from reads can significantly increase the quality of annotation, especially when working with the shorter reads provided by the HiSeq platforms. Assembly programs such as IDBA-UD (Peng et al. 2012) and MegaHit (Li et al. 2016) provide well-established pipelines which are also well accepted in literature. While general functional annotation databases such as the aforementioned are useful for descriptive studies and to obtain a broad overview of the data, they are often based on eukaryotic or model organisms leading to suboptimal functional assignments (Darzi et al. 2016). Thus, more targeted approaches might be very useful, such as the FOAM database (Prestat et al. 2014), which was developed specifically to screen environmental metagenomic data and is an improvement for any soil-related study. For studies of particular genes of interest, even more, focused approaches and specialized databases are needed depending on the research question. Depending on the availability of such specialized databases, one should either use or create custom databases to compare the metagenomics sequences to and/or employ hidden Markov models to detect conserved domains in the metagenomics sequences. Combining an initial metagenomic screen with subsequent amplicon sequencing can in some cases further increase sensitivity albeit often at a cost of limiting diversity (Bergkemper et al. 2016). For assembly-free taxonomic classification, several solutions are recommendable such as Kraken and Kaiju (Wood and Salzberg 2014; Menzel et al. 2016). In any case, the used bioinformatics pipeline must be well described as so far no “gold standard” for data analysis is available. The first data provided by the CAMI initiative (Critical Assessment of Metagenome Interpretation) has proven significant differences in the outcome of read analysis depending on the used software. In this respect, there is a need that sequences are deposited in public databases in their raw forms, as even data trimming introduces biases depending on the used method.

Outlook

Despite the ever growing sequence databases, most metagenomic reads cannot be assigned to a function, limiting both our ability to test hypotheses, but also the value of metagenomic datasets as a tool for novel discoveries. Besides developing targeted approaches for the isolation of microorganisms from soil, which allows a classical taxonomic assignment of genotypic and phenotypic traits, novel approaches integrating metagenomic datasets with other types of data such as metabolomics and abiotic factors are starting to yield much greater insight into the workings of the microbiome (Feng et al. 2016).

As the analysis of DNA provides a potential for the expression of certain genes only, there has been a great interest in applying a comparable pipeline like described above for the analysis of metatranscriptomes from soil (Baldrian et al. 2012). In principle, the same approach can be also used for the analysis of extracted RNA from soil after reverse transcription. Due to the high stability of rRNA compared to mRNA, depletion techniques are needed to reduce the amount of rRNA. Furthermore, the issue of spatial and temporal heterogeneity is more pronounced when analyzing RNA, as the stability of mRNA in cells is often in the order of minutes to hours, thus one sampling may reflect only a snapshot depending the actual environmental conditions.

Moreover, the development of long-read sequencing technologies opens a new field of application, which has the potential to provide additional information about operon structures from samples with low diversity or samples where a specific target was enriched beforehand. Such approaches will help us in the future to improve our understanding on mechanisms how gene expression is regulated opening a new field in soil microbial ecology addressing issues of “metaregulation.” Such studies could help us to improve our understanding for example on the molecular mechanisms of major ecosystem services provided by soils like plant growth promotion or carbon sequestration.