Efficient parallelization of geostatistical inversion using the quasi-linear approach
Highlights
► Parallel geostatistical inversion. ► Two level parallelization. ► Efficient C++ programming.
Introduction
Hydraulic conductivity is a key parameter in subsurface hydrology. In inverse modeling its spatial distribution is inferred from indirect measurements, such as hydraulic heads or concentrations. In principle, hydraulic conductivity can vary on all spatial scales, implying a high resolution of the estimated conductivity field. In three-dimensional applications, this may result in millions of hydraulic-conductivity values to be estimated and correspondingly large groundwater flow problems to be solved. The associated high demands with respect to computational power and memory make the application of high-performance computing techniques necessary. To be efficient, a parallelization strategy has to be adapted for each computational problem individually. In this paper, we explore the implementation of efficient parallel methods for geostatistical inversion.
All quantities that depend directly or indirectly on hydraulic conductivity may be used in the inversion procedure. These include, among others, direct estimates of hydraulic conductivity (e.g., from grain-size analysis, flowmeter tests, or small-scale hydraulic tests), hydraulic heads, and solute concentrations from tracer tests. For the sake of readability we limit ourselves to the inversion of hydraulic heads in this paper, but the same general inversion strategy can be applied to other types of data as well (Cirpka and Kitanidis, 2000, Nowak and Cirpka, 2006, Li et al., 2005, Li et al., 2007, Li et al., 2008, Schwede and Cirpka, 2009, Pollock and Cirpka, 2010).
The general objective of our approach is to obtain the hydraulic-conductivity field as a continuous spatial field. Upon discretization, this leads to as many discrete parameter values as elements of the grid, which is typically a number much higher than the number of the measurements available. The inversion problem is therefore mathematically ill posed (Hadamard, 1902). For regularization, one may try to define a small set of zones with constant conductivity values (Carrera and Neuman, 1986), or introduce smoothness constraints (Tikhonov, 1963). While classical Tikhonov regularization is achieved by adding a norm of the spatial parameter gradients to the objective function, geostatistical regularization is obtained by assuming that the parameter field is a random, auto-correlated space variable with certain spatial correlation structure (Matheron, 1971). In this context, inversion becomes (geo)statistical conditioning.
Various versions of geostatistical inversion exist, differing, among others, in the exact representation of the generated fields, the approach of minimizing the objective function (most commonly a variant of the Gauss–Newton or conjugate-gradient methods), and the question whether only a smooth best estimate is sought for or multiple conditional realizations (Sahuquillo et al., 1992, Gómez-Hernández et al., 1997, RamaRao et al., 1995, Benett, 1992, Valstar et al., 2004, Kitanidis, 1995, McLaughlin and Townley, 1996, Yeh et al., 1996, Hernandez et al., 2006).
Only few studies exist using methods of parallel computation in the context of geostatistical inversion. Berg and Illman (2011) implemented the sequential successive linear estimator (SSLE) (e.g., Yeh et al., 1996, Yeh and Liu, 2000) on parallel computers. Doherty, 2010a, Doherty, 2010b introduced a parallel version of the general parameter estimation package PEST, allowing parallel model runs. Specifically using parallel PEST, or the more recent BeoPEST (Schreuder, 2009), with the option of the pilot-point method (e.g. RamaRao et al., 1995), it is possible to estimate hydraulic-conductivity fields as continuous spatial distributions based on geostatistical rules exploiting parallel computer architecture (Hunt et al., 2010, Fienen et al., 2011). However, there are no solid rules of choosing the number and locations of pilot points, leading to non-unique user-dependent results. Cardiff and Barrash (2011) used the quasi-linear geostatistical approach (QLGA) of inversion (Kitanidis, 1995) on a parallel computer implementing simple parallelization of the sensitivity calculations. To the best of our knowledge, the QLGA has not yet been implemented using domain decomposition techniques on parallel computers.
In this paper, we choose the quasi-linear geostatistical approach (QLGA) (Kitanidis, 1995) as inverse kernel, using extensions and modifications to make the scheme more stable and efficient (e.g., Nowak and Cirpka, 2004, Sun and Yeh, 1990). In Section 2.3, we will review the existing approach of quasi-linear geostatistical inversion used as our basis.
In recent years, parallel computing has become increasingly important in the numerical solution of partial differential equations arising in many fields of computational physics, including flow and transport in porous media. In the present study, we will analyze the possibilities for parallelization of geostatistical inversion on mid-range and large computer clusters. The software framework “Distributed Unified Numerics Environment”—DUNE (Bastian et al., 2008a, Bastian et al., 2008b), providing a programming library with tools to solve partial differential equations in parallel, is used to implement a parallelized geostatistical inversion environment in C++.
Section snippets
Governing equations
For the sake of simplification, we restrict our analysis to steady-state groundwater flow without internal sources or sinksin which K is the hydraulic conductivity, h is the hydraulic head, and denotes the domain. We set Dirichlet boundary conditions at the in- and outflow of the domain, and , and no-flow Neumann boundary conditions along all other boundaries of the domain, in which and are given functions along the
Parallelizing the quasi-linear geostatistical inverse approach
Given a sequential implementation of the quasi-linear geostatistical inverse approach, the question arises which parts can be parallelized. The computation of sensitivities for different measurements may easily be parallelized, because the individual sensitivities are independent from each other. Each of these computations requires the solution of an adjoint problem, Eq. (16). On a second level, the solution of both the adjoint and forward problem, Eq. (1), can be parallelized using domain
Implementation
The software framework DUNE (Bastian et al., 2008a, Bastian et al., 2008b) provides an easy access to different grids (regular and non-regular, conforming and non-conforming) and facilitates the implementation of various finite element schemes with different shape functions. Here, we choose the continuous Galerkin method with bilinear finite elements on a regular cuboidal grid.
For the implementation of efficient parallel programs, three principal programming models exist: distributed memory
Conclusion and outlook
In this study, we have realized a parallelized implementation of the quasi-linear geostatistical approach (QLGA). We have demonstrated that three-dimensional groundwater flow in heterogeneous structures can be simulated with adequate spatial resolutions of the model domain on a mid-range cluster. With the two-level parallelization approach, the code can be adjusted to the number of available computer nodes in clusters of different sizes.
We have shown for our test examples that,
- (a)
a two-level
Acknowledgments
This study has been funded by the Baden-Württemberg Stiftung in its high-performance computing program, contract HPC-8, and by the German Federal Ministry for Education and Research (BMBF) within the geotechnologies program under the Grant 03G0742A. We thank Randall J. Hunt and an anonymous reviewer for helpful comments on the manuscript.
References (45)
On the geostatistical approach to the inverse problem
Advances Water Resources
(1996)- et al.
A modified Levenberg–Marquardt algorithm for quasi-linear geostatistical inversing
Advances in Water Resources
(2004) - et al.
Use of steady-state concentration measurements in geostatistical inversion
Advances Water Resources
(2009) - et al.
A generic grid interface for parallel and adaptive scientific computing. Part II: implementation and tests in DUNE
Computing
(2008) - et al.
A generic grid interface for parallel and adaptive scientific computing. Part I: abstract framework
Computing
(2008) Inverse Methods in Physical Oceanography
(1992)- et al.
Three-dimensional transient hydraulic tomography in a highly heterogeneous glaciofluvial aquifer-aquitard system
Water Resources Research
(2011) - Blatt, M., 2010. A parallel algebraic multigrid method for elliptic problems with highly discontinuous coefficients....
Towards algebraic multigrid for elliptic problems of second order
Computing
(1995)- et al.
3-D transient hydraulic tomography in unconfined aquifers with fast drainage response
Water Resources Research
(2011)
Estimation of aquifer parameters under transient and steady-state conditions. 3. Application to synthetic and field data
Water Resources Research
Characterization of mixing and dilution in heterogeneous aquifers by means of local temporal moments
Water Resources Research
Circulant Matrices, Pure and Applied Mathematics
Addendum to the PEST Manual
PEST, Model-Independent Parameter Estimation-User Manual
The design and implementation of fftw3
Proceedings of the IEEE
Stochastic simulation of transmissivity fields conditional to both transmissivity and piezometric data, 1. Theory
Journal of Hydrology
On the inversion of circulant matrices
Biometrika
Sur les problèmes aux dérivées partielles et leur signification physique
Inverse stochastic moment analysis of steady state flow in randomly heterogeneous media
Water Resources Research
Using a cloud to replenish parched groundwater modeling efforts
Ground Water
Cited by (7)
Efficient geostatistical inversion of transient groundwater flow using preconditioned nonlinear conjugate gradients
2017, Advances in Water ResourcesCitation Excerpt :While the new approach turned out to be more expensive for steady-state flow problems with isolated head observations, it leads to a significant reduction of computational effort in the case of transient data. While this work presents the algorithm in its basic form, we omitted aspects that are necessary for real-world application, namely the handling of three-dimensional observations and the inclusion of several types of observations (hydraulic heads, tracer data, data obtained by geophysical monitoring, among others Schwede et al., 2012). Preliminary results in this regard have been promising (Klein, 2016), and we will elaborate on these points in future work.
HT2DINV: A 2D forward and inverse code for steady-state and transient hydraulic tomography problems
2015, Computers and GeosciencesCitation Excerpt :In this approach, the measurements are introduced sequentially and the weights coefficients are derived by finding a Best Linear Unbiased Estimation (BLUE methods). The SSLE has been applied in several studies (e.g., Berg and Illman, 2011b; Huang et al., 2011; Illman et al., 2009; Mao et al. 2013; Straface et al., 2007; Yeh and Liu, 2000; Yeh and Zhu, 2007; Zhu and Yeh, 2005). Another method, the pilot point method, has been applied as a parameterization approach for reducing the number of unknown parameters to estimate by evaluating a few key-locations (called pilot points) in the aquifer where the values of the transmissivity can be assessed to fit the hydraulic head measurements via a deterministic or stochastic optimization process (e.g., Doherty, 2003).
Numerical solution of steady-state groundwater flow and solute transport problems: Discontinuous Galerkin based methods compared to the Streamline Diffusion approach
2015, Computer Methods in Applied Mechanics and EngineeringCitation Excerpt :For the solution of the steady-state convection-dominant transport equation, we have compared a higher order symmetric interior penalty discontinuous Galerkin (DG) method to the streamline diffusion (SDFEM) method. Putting special emphasis on their applicability to geostatistical inversion problems in 3-D, aiming at simulations with high mesh resolutions [57,58], we have analyzed efficiency and accuracy. Two main issues occurring in the solution of the convection-dominated transport equation were tackled:
Three-dimensional geostatistical inversion of synthetic tomographic pumping and heat-tracer tests in a nested-cell setup
2014, Advances in Water ResourcesCitation Excerpt :This leads to nested parallelization, including domain decomposition methods for each PDE to be solved and parallel evaluation of sensitivities. Details on the parallelization strategy have been published elsewhere [59]. The artificial test case analyzed in this study resembles the Lauswiesen field site near Tübingen in Germany [55,32], where field experiments on heat-tracer tomography are currently performed, even though not yet with the full setup described in the following.
A real-time interactive simulation framework for watershed decision making using numerical models and virtual environment
2013, Journal of HydrologyCitation Excerpt :Not only is it difficult to substantially improve the calculation speed of CPU frequency and thereby take full advantage of multi-core computers, neither can we realize computer cluster computing using network technology. In watershed numerical simulation, many parallel computation methods to reduce computing time have been studied (Iannone et al., 2007; Lecca et al., 2011; Singh et al., 2011; Vivoni et al., 2011; Schwede et al., 2012). However, most of these methods require large investments in equipment purchase and installation, as well as designated space for computer installation and related cooling system capacity.