Elsevier

Geoderma

Volume 157, Issues 1–2, 15 June 2010, Pages 51-63
Geoderma

Observer-dependent variability of the thresholding step in the quantitative analysis of soil images and X-ray microtomography data

https://doi.org/10.1016/j.geoderma.2010.03.015Get rights and content

Abstract

For the investigation of many geometrical features of soils, computer-assisted image analysis has become a method of choice over the last few decades. This analysis involves numerous steps, regarding which subjective decisions have to be made by the individuals conducting the research. This is particularly the case with the thresholding step, required to transform the original (color or greyscale) images into the type of binary representation (e.g., pores in white, solids in black) needed for fractal analysis or simulation with Lattice–Boltzmann models. Limited information exists at present on whether different observers, analyzing the same soil, would be likely to obtain similar results. In this general context, the first objective of the research reported in this article was to determine, through a so-called “round-robin” test, how much variation exists among the outcomes of various image thresholding strategies (including any image pre-treatment deemed appropriate), routinely adopted by soil scientists. Three test images – of a field soil, a soil thin section, and a virtual section through a 3-dimensional CT data set – were thresholded by 13 experts, worldwide. At the same time, variability of the outcomes of a set of automatic thresholding algorithms, applied to portions of the test images, was also investigated. The experimental results obtained illustrate the fact that experts rely on very different approaches to threshold images of soils, and that there is considerable observer influence associated with this thresholding. This observer dependence is not likely to be alleviated by adoption of one of the many existing automatic thresholding algorithms, many of which produce thresholded images that are equally, or even more, variable than those of the experts. These observations suggest that, at this point, analysis of the same image of a soil, be it a simple photograph or 3-dimensional X-ray CT data, by different individuals can lead to very different results, without any assurance that any of them would be even approximately “correct” or best suited to the objective at hand. Different strategies are proposed to cope with this situation, including the use of physical “standards”, adoption of procedures to assess the accuracy of thresholding, benchmarking with physical measurements, or the development of computational methods that do not require binary images.

Introduction

In the early 1970s, the commercialization of image analysis instruments, like the Quantimet 720 (Fisher, 1971, Nawrath & Serra, 1979a, Nawrath & Serra, 1979b), with dedicated minicomputers and integrated scanners, contributed significantly to promote the quantitative use of photographs in the study of soils and natural porous media. For the first time, it became possible to make fast, relatively accurate measurements on scanned images produced by cameras, light and electron microscopes, or X-ray spectrometers. Soil scientists rapidly seized that opportunity (e.g., Jongerius et al., 1972, Murphy et al., 1977a, Murphy et al., 1977b). They continued to do so in the 1980s (e.g., Ringrose-Voase and Bullock, 1984), after the development of personal computers, and particularly in the 1990s, as inexpensive scanners, digital cameras, and versatile image manipulation software made it increasingly straightforward to acquire images and to analyze them (often with the help of fractal geometry), for a variety of purposes (e.g., Hallaire & Cointepas, 1993, Bielders et al., 1996, Deleporte et al., 1997, DeLeo et al., 1997, Beaudet-Vidal et al., 1998, Baveye, 2002, Ohrstrom et al., 2002, Dathe & Baveye, 2003, Ohrstrom et al., 2004, Morris & Mooney, 2004, Pendleton et al., 2005, Persson et al., 2005, Wantanaphong et al., 2006, Lipsius & Mooney, 2006, Vogel et al., 2006, Otten & Gilligan, 2006, Jacobson et al., 2007, Marcelino et al., 2007, Mooney & Morris, 2008, Papadopoulos et al., 2008, Persson & Olsson, 2008, Tarquis et al., 2008). In recent years, synchrotron-based X-ray computed tomography (CT) and table-top X-ray micro-CT scanners have allowed researchers to visualize in 3 dimensions the structure and composition of soils at micrometric resolutions, and have enabled significant advances to be made in our understanding of the functioning of soils at previously unexplored spatial scales (e.g., Garnier et al., 1998, Baveye et al., 2002, Elliot & Heck, 2007a, Elliot & Heck, 2007b, Sleutel et al., 2008).

In the midst of this technological evolution, Thompson et al. (1992) advised caution in the interpretation of data generated by analysis of images of soils, arguing that this analysis involves many successive steps, all of which can be affected by artefacts or subjectivity. Before measurements can be performed, unless one can simply take pictures in the field, soils need to be sampled, dried in some way, solidified by impregnation with one of a number of available resins, and cured for a period of time. At the stage when pictures of the soil are taken (in the field or in the laboratory), different lighting arrangements, cameras, lenses, resolutions, aperture and settings for exposition have to be selected among many possible choices. If computed tomography (CT) scanners are used to generate three-dimensional data, the resolution of the scanning, and a number of settings related to attenuation and contrast, can differ, depending on who does the scanning. Finally, the resulting 2-D or 3-D images of the soil have to be thresholded or segmented to produce a binary image, to which are then applied a wide range of statistical or mathematical methods. All these steps involve operational decisions that can vary from one observer to another.

To alleviate some of the resulting variability, Thompson et al. (1992) recommended that efforts should be made to standardize practices, in particular in terms of the thresholding step, to “assist operators in identifying discrepancies before measurement.” To that end, these authors suggested that standard images of soil pore space could be used for calibration of thresholding protocols. At present, it is unclear to what extent this advice has been heeded and what type of standardization, if any, has occurred. Certainly, the concept of standard image to which Thompson et al. (1992) were referring has not made much headway since 1992, as evinced by Marcelino et al.'s (2007) renewed call for standardisation, which is almost word-for-word identical to Thompson et al.'s (1992). None of the various soil science laboratories operating CT scanners in the world appears to have an agreed-upon standard that is run systematically to validate subsequent analyses of the data. Nevertheless, it is possible that, since many of the groups carrying out image analysis on photographs of soils or CT data end up using the same software, some standardisation has occurred anyway.

In similar situations, when questions arise about the variability associated with the way different institutions or individuals approach a particular process, it is customary to carry out a so-called “round robin” or “ring” test. A number of knowledgeable participants are provided independently with the same materials/data and are asked to apply the given process to them as they would routinely. Such round-robin tests have been carried out in soil science in the past, with interesting results, in particular to assess inter- and intra-laboratory variability in the chemical analysis of soils (e.g., Sager, 1999, Cools et al., 2004, Creamer et al., 2009). Tests involving individual researchers have also been carried out. Murphy et al. (1985), for example, asked seven experienced micromorphologists to characterize independently, using the same reference handbook, a number of thin sections obtained from soil horizons representing soil materials formed by a variety of pedogenic processes. In spite of differences among the seven descriptions, the overall degree of uniformity among them was considered encouraging. More recently, Brown et al. (1996) gave five modellers a full description of a field experiment carried out to determine the leaching potential of a novel pesticide, and asked them to use the same three mathematical models to predict concentrations of pesticide in soil water at a depth of one meter, and in the soil itself over a 1 m profile, 220 days after application. The simulation outcome revealed appreciable differences in the way modellers approached their analysis, and in the quantitative results they obtained. Further round-robin tests of pesticide fate or exposure modelling by Francaviglia et al., 2000, Beulke et al., 2006 led to essentially similar observations.

A full-fledged round-robin test of the quantitative analysis of soil images would require getting different observers separately in the same pits in the field, so they could photograph soil profiles as each of them sees fit, or circulating the same soil columns among different laboratories around the world, where CT scanners are routinely used, so that each observer can adjust the settings of their instrument as deemed appropriate. Aside from logistical (and therefore financial) hurdles in such a test that would need to be overcome, a major impediment is that some of the more intrusive steps involved in the full image analysis process (such as resin impregnation, cutting of soil blocks, or acid etching of thin sections to enhance contrast) cannot be duplicated by different observers on exactly the same samples, giving rise to uncertainty in treatment comparison. Therefore, until some of these challenges can be resolved, it is reasonable to envisage a round-robin test only with respect to specific steps of the whole image analysis process. There have been some attempts in that area, especially in relation with the thresholding/segmentation of images. Baveye et al., 1998, Ogawa et al., 1999, Boast & Baveye, 2006 applied two different thresholding algorithms to images of a soil profile, as a preliminary step in the fractal characterization of a preferential pathway. Marcelino et al.'s (2007) carried out a mini round-robin test with two individuals, one of whom replicated his analysis at two different times, to assess the influence of the method of microscopic visualization and thresholding (manual, semi-automatic, or automatic algorithms) on a number of soil parameters. Tarquis et al. (2008) applied four different threshold criteria to transform computed tomography grey-scale imagery of four Brazilian soils into binary imagery to estimate their mass fractal and entropy dimensions. They found that the threshold criteria used had a direct influence on the porosity obtained, varying from 8 to 24% in one of the samples, and on the fractal dimensions. Nevertheless, since they involve either few individual observers or a limited number of different image manipulation techniques, these various comparisons remain very limited in scope, and a larger-scale investigation appears necessary.

In this general context, this article is meant to pursue and improve on the preliminary efforts of Baveye et al., 1998, Ogawa et al., 1999, Boast & Baveye, 2006, Marcelino et al., 2007, Tarquis et al., 2008. The key objective is to assess, through a round-robin test, how much variation exists among the outcomes of image thresholding strategies (including any image pre-treatment deemed appropriate), routinely adopted by soil scientists. To that end, three test images – of a field soil, a soil thin section, and a virtual section through a 3-dimensional CT data set – were thresholded by 13 experts, worldwide. The majority of these experts were soil scientists, having published articles dealing with image analysis. Other experts had a recognized track record of image analyses in other fields (e.g., material science, remote sensing, or biofilm research). The outcome of their manipulation of the images was compiled and analyzed. At the same time, because application of automatic thresholding algorithms may offer at least a partial solution to the observer-dependence identified in the round-robin test, variability among the outcomes of a set of algorithms was also investigated. Subsets of the three test images were subjected to forty different thresholding algorithms, described in Sezgin and Sankur (2004), and the variability of their results was compared to that of the experts consulted.

Section snippets

Test images — round robin

Three test images – of a field soil, a soil thin section, and a virtual section through a 3-dimensional CT data set – were selected for this research. The key criterion in selecting these images was the fact that they either had been used for a piece of research leading to a publication in the soils literature, or they were about to be used in such a context. Therefore, these images were deemed representative of images or CT data analyzed by soils researchers. That does not mean that these

Round-robin test

Faced with the same test images, the 13 experts who participated in the Round Robin test manifested great variability in the software used. It appears clearly that up to now the image analysis software business has not caused any standardization of methods used in soil science research, as hypothesized in the introduction. Even the availability of freeware, like ImageJ, does not seem to prevent some experts from diversifying their software choices, including commercial packages. Similarly, the

Conclusion and perspectives

The results presented in the previous section illustrate the fact that experts rely on very different approaches to threshold images of soils and that there is observer bias associated with this thresholding. This observer dependence is not likely to be alleviated by adoption of one of the existing automatic thresholding algorithms, some of which produce thresholded images that are clearly pathological (e.g., zero or 100% porosity) and can be discarded, while the remaining ones yield outcomes

Acknowledgements

Thanks are due to Drs Andre Egbert and Hanna Jacobs from GE Sensing and Inspection Technologies Phoenix X-ray for their hospitality during D.V. Grinev's visit and for their assistance in obtaining the Nanotom image.

References (59)

  • M. Laba et al.

    Use of textural measurements to map invasive wetland plants in the Hudson River National Estuarine Research Reserve with IKONOS satellite imagery

    Remote Sensing of Environment

    (2010)
  • K. Lipsius et al.

    Using image analysis of tracer staining to examine the infiltration patterns in a water repellent contaminated sandy soil

    Geoderma

    (2006)
  • S.J. Mooney et al.

    Morphological approach to understanding preferential flow using image analysis with dye tracers and X-ray computed tomography

    Catena

    (2008)
  • C. Morris et al.

    A high-resolution system for the quantification of preferential flow in undisturbed soil using observations of tracers

    Geoderma

    (2004)
  • C.P. Murphy et al.

    Description of soil thin sections: an international comparison

    Geoderma

    (1985)
  • S. Ogawa et al.

    Surface fractal characteristics of preferential flow patterns in field soils: evaluation and effect of image processing

    Geoderma

    (1999)
  • P. Ohrstrom et al.

    Field-scale variation of preferential flow as indicated from dye coverage

    Journal of Hydrology

    (2002)
  • P. Ohrstrom et al.

    Characterizing unsaturated solute transport by simultaneous use of dye and bromide

    Journal of Hydrology

    (2004)
  • S. Sleutel et al.

    Comparison of different nano- and micro-focus X-ray computed tomography set-ups for the visualization of the soil microstructure and soil organic matter

    Computers & Geosciences

    (2008)
  • M.L. Thompson et al.

    Cautionary notes for the automated analysis of soil pore-space images

    Geoderma

    (1992)
  • S.D.C. Walsh et al.

    A new partial-bounceback Lattice–Boltzmann method for fluid flow through heterogeneous media

    Computers and Geosciences

    (2009)
  • J. Wantanaphong et al.

    Quantification of pore clogging characteristics in potential permeable reactive barrier (PRB) substrates using image analysis

    Journal of Contaminant Hydrology

    (2006)
  • P. Baveye et al.

    Influence of image resolution and thresholding on the apparent mass fractal characteristics of preferential flow patterns in field soils

    Water Resources Research

    (1998)
  • P. Baveye et al.

    Effect of sampling volume on the measurement of soil physical properties: simulation with X-ray tomography data

    Measurement Science & Technology

    (2002)
  • S. Beulke et al.

    User subjectivity in Monte Carlo modeling of pesticide exposure

    Environmental Toxicology and Chemistry

    (2006)
  • C.L. Bielders et al.

    Tillage-induced spatial distribution of surface crusts on a sandy paleustult from Togo

    Soil Science Society of America Journal

    (1996)
  • C. Boast et al.

    Alleviation of an indeterminacy problem affecting two classical iterative image thresholding algorithms

    International Journal of Pattern Recognition and Artificial Intelligence

    (2006)
  • C.D. Brown et al.

    Ring test with the models LEACHP, PRZM-2 and VARLEACH: variability between model users in prediction of pesticide leaching using a standard data set

    Pesticide Science

    (1996)
  • Y. Chen et al.

    A study of the upper limit of solid scatters density for grey lattice Boltzmann method

    Acta Mechanica Sinica

    (2008)
  • Cited by (164)

    View all citing articles on Scopus
    1

    The order of authors 4–16 is alphabetical.

    View full text