Position paper
Characterising performance of environmental models

https://doi.org/10.1016/j.envsoft.2012.09.011Get rights and content

Abstract

In order to use environmental models effectively for management and decision-making, it is vital to establish an appropriate level of confidence in their performance. This paper reviews techniques available across various fields for characterising the performance of environmental models with focus on numerical, graphical and qualitative methods. General classes of direct value comparison, coupling real and modelled values, preserving data patterns, indirect metrics based on parameter values, and data transformations are discussed. In practice environmental modelling requires the use and implementation of workflows that combine several methods, tailored to the model purpose and dependent upon the data and information available. A five-step procedure for performance evaluation of models is suggested, with the key elements including: (i) (re)assessment of the model's aim, scale and scope; (ii) characterisation of the data for calibration and testing; (iii) visual and other analysis to detect under- or non-modelled behaviour and to gain an overview of overall performance; (iv) selection of basic performance criteria; and (v) consideration of more advanced methods to handle problems such as systematic divergence between modelled and observed values.

Highlights

► Numerical, graphical and qualitative methods for characterising performance of environmental models are reviewed. ► A structured, iterative workflow that combines several evaluation methods is suggested. ► Selection of methods must be tailored to the model scope and purpose, and quality of data and information available.

Introduction

Quantitative environmental models are extensively used in research, management and decision-making. Establishing our confidence in the outputs of such models is crucial in justifying their continuing use while also recognizing limitations. The question of evaluating a model's performance relative to our understanding and observations of the system has resulted in many different approaches and much debate on the identification of a most appropriate technique (Alexandrov et al., 2011; McIntosh et al., 2011). One reason for continued debate is that performance measurement is intrinsically case-dependent. In particular, the manner in which performance is characterised depends on the field of application, characteristics of the model, data, information and knowledge that we have at our disposal, and the specific objectives of the modelling exercise (Jakeman et al., 2006; Matthews et al., 2011).

Modelling is used across many environmental fields: hydrology, air pollution, ecology, hazard assessment, and climate dynamics, to name a few. In each of these fields, many different types of models are available, each incorporating a range of characteristics to measure and represent the natural system behaviours. Environmental models for management typically consist of multiple interacting components with errors that do not exhibit predictable properties. This makes the traditional hypothesis-testing associated with statistical modelling less suitable, at least on its own, because of the strong assumptions generally required, and the difficulty (sometimes impossibility) of testing hypotheses separately. Additionally if a single performance criterion is used, it generally measures only specific aspects of a model's performance, which may lead to counterproductive results such as favouring models that do not reproduce important features of a system (e.g., Krause et al., 2005; Hejazi and Moglen, 2008). Consequently, systems of metrics focussing on several aspects may be needed for a comprehensive evaluation of models, as advocated e.g., by Gupta et al. (2012).

It is generally accepted that the appropriate form of a model will depend on its specific objectives (Jakeman et al., 2006), which often fall in the broad categories of improved understanding of natural processes or response to management questions. The appropriate type of performance evaluation clearly depends on the model objectives as well. Additionally, there may be several views as to the purpose of a model, and multiple performance approaches may have to be used simultaneously to meet the multi-objective requirements for a given problem. In the end, the modeller must be confident that a model will fulfil its purpose, and that a ‘better’ model could not have been selected given the available resources. These decisions are a complex mixture of objectively identified criteria and subjective judgements that represent essential steps in the cyclic process of model development and adoption. In addition, the end-users of a model must also be satisfied, and may not be comfortable using the same performance measures as the expert modeller (e.g., Miles et al., 2000). It is clear that in this context, a modeller must be eclectic in choice of methods for characterising the performance of models.

Regardless, assessing model performance with quantitative tools is found to be useful, indeed most often necessary, and it is important that the modeller be aware of available tools. Quantitative tools allow comparison of models, point out where models differ from one another, and provide some measure of objectivity in establishing the credibility and limitations of a model. Quantitative testing involves the calculation of suitable numerical metrics to characterise model performance. Calculating a metric value provides a single common point of comparison between models and offers great benefits in terms of automation, for example automatic calibration and selection of models. The use of metric values also minimises potential inconsistencies arising from human judgement. Because of the expert knowledge often required to use these tools, the methods discussed in this paper are intended for use primarily by modellers, but they may also be useful to inform end-users or stakeholders about aspects of model performance.

This paper reviews methods for quantitatively characterising model performance, identifying key features so that modellers can make an informed choice suitable for their situation. A classification is used that cuts across a variety of fields. Methods with different names or developed for different applications are sometimes more similar than they at first appear. Studies in one domain can take advantage of developments in others. Although the primary applications under consideration are environmental, methods developed in other fields are also included in this review. We assume that a model is available, along with data representing observations from a real system, that preferably have not been used at any stage during model development and that can be compared with the model output. This dataset should be representative of the model aims; for instance it should contain flood episodes or pollution peaks if the model is to be used in such circumstances.

The following section provides a brief view of how characterisation of model performance fits into the broader literature on the modelling process. Section 3 reviews selection of a so-called ‘validation’ dataset. In Section 4, quantitative methods for characterising performance are summarized, within the broad categories of direct value comparison, coupling real and modelled values, preserving data patterns, indirect metrics based on parameter values, and data transformations. Section 5 discusses how qualitative and subjective considerations enter into adoption of the model in combination with quantitative methods. Section 6 presents an approach to selecting performance criteria for environmental modelling. Note that a shorter and less comprehensive version of this paper was published as Bennett et al. (2010).

Section snippets

Performance characterisation in context

With characterisation of model performance being a core part of model development and testing, there is naturally a substantial body of related work. This section presents some key links between similar methods that have developed separately in different fields. In many of the fields of environmental modelling, methods and criteria to judge the performance of models have been considered in the context of model development. Examples include work completed for hydrological models (Krause et al.,

Data for characterising performance

The most important component of quantitative testing is the use of observational data for comparison. However, some of the data must be used in the development and calibration (if required) of the model. This necessitates the division of available data to permit development, calibration and performance evaluation. Common methods for this division are presented in Table 1 and include cross-validation and bootstrapping. In cross-validation, the data are split into separate groups for development

Methods for measuring quantitative performance

Quantitative testing methods can be classified in many ways. We use a convenient grouping based on common characteristics. Direct value comparison methods directly compare model output to observed data as a whole (4.1). These contrast with methods that combine individual observed and modelled values in some way (4.2). Within this category, values can be compared point-by-point concurrently (4.2.1), by calculating the residual error (4.2.2), or by transforming the error in some way (4.2.3). The

Qualitative model evaluation

Despite the power of quantitative comparisons, model acceptance and adoption depend in the end strongly on qualitative, and often subjective, considerations. There may be even more subjectivity in considering the results of quantitative testing. Suppose one model is superior to another according to one metric, while it is the reverse according to another metric. What is the weight to be assigned to individual quantitative criteria in the overall assessment? How are these weights decided and by

Performance evaluation in practice: a suggested general procedure

As pointed out by many authors (e.g., Jakeman et al., 2006), performance evaluation is just one step of iterative model development. Evaluation results may indicate whether additional study is necessary. If performance is unsatisfactory, then different data, calibration procedures and/or model structures should be considered. With satisfactory performance, one may also evaluate whether simplification or other modification would entail significant performance loss. Modelling is an iterative

Conclusions

This paper provides an overview of qualitative and quantitative methods of characterising performance of environmental models. Qualitative issues are crucial to address. Not everything can be measured. However, quantitative methods that make use of data were emphasised as they are somewhat more objective and can be generalised across model applications. This includes not just methods that couple real and modelled values point by point, but also methods that involve direct value comparison,

References (166)

  • R. Costanza

    Model goodness of fit: a multiple resolution procedure

    Ecological Modelling

    (1989)
  • C. Dawson et al.

    HydroTest: a web-based toolbox of evaluation metrics for the standardised assessment of hydrological forecasts

    Environmental Modelling and Software

    (2007)
  • C. Dawson et al.

    HydroTest: further development of a web resource for the standardised assessment of hydrological models

    Environmental Modelling and Software

    (2010)
  • N.D. Evans et al.

    Identifiability of uncontrolled nonlinear rational systems

    Automatica

    (2002)
  • J. Ewen

    Hydrograph matching method for measuring model performance

    Journal of Hydrology

    (2011)
  • K.V. Gernaey et al.

    Activated sludge wastewater treatment plant modelling and simulation: state of the art

    Environmental Modelling and Software

    (2004)
  • R. Giordano et al.

    A fuzzy GIS-based system to integrate local and technical knowledge in soil salinity monitoring

    Environmental Modelling and Software

    (2012)
  • V. Grimm et al.

    A standard protocol for describing individual-based and agent-based models

    Ecological Modelling

    (2006)
  • V. Grimm et al.

    The ODD protocol: a review and first update

    Ecological Modelling

    (2010)
  • H.V. Gupta et al.

    Decomposition of the mean squared error and NSE performance criteria: implications for improving hydrological modelling

    Journal of Hydrology

    (2009)
  • A. Holmberg

    On the practical identifiability of microbial models incorporating Michaelis–Menten-type nonlinearities

    Mathematical Biosciences

    (1982)
  • A.J. Jakeman et al.

    Computation of the instantaneous unit hydrograph and identifiable component flows with application to two small upland catchments

    Journal of Hydrology

    (1990)
  • A.J. Jakeman et al.

    Ten iterative steps in development and evaluation of environmental models

    Environmental Modelling and Software

    (2006)
  • E.L. Kara et al.

    Time-scale dependence in numerical simulations: assessment of physical, chemical, and biological predictions in a stratified lake at temporal scales of hours to months

    Environmental Modelling and Software

    (2012)
  • C. Kelleher et al.

    Ten guidelines for effective data visualization in scientific publications

    Environmental Modelling and Software

    (2011)
  • T. Krueger et al.

    The role of expert opinion in environmental modelling

    Environmental Modelling and Software

    (2012)
  • V. Makler-Pick et al.

    Sensitivity analysis for complex ecological models – a new approach

    Environmental Modelling and Software

    (2011)
  • S. Marsili-Libelli et al.

    Identification of dynamic models for horizontal subsurface constructed wetlands

    Ecological Modelling

    (2005)
  • S. Marsili-Libelli et al.

    Water quality modelling for small river basins

    Environmental Modelling and Software

    (2008)
  • S. Marsili-Libelli et al.

    Confidence regions of estimated parameters for ecological systems

    Ecological Modelling

    (2003)
  • S. Marsili-Libelli

    Parameter estimation of ecological models

    Ecological Modelling

    (1992)
  • S. Marsili-Libelli

    Control of SBR switching by fuzzy pattern recognition

    Water Research

    (2006)
  • L.S. Matott et al.

    A benchmarking framework for simulation-based optimization of environmental models

    Environmental Modelling and Software

    (2012)
  • K.B. Matthews et al.

    Raising the bar? – the challenges of evaluating the outcomes of environmental modelling and software

    Environmental Modelling and Software

    (2011)
  • J. Nash et al.

    River flow forecasting through conceptual models part i – a discussion of principles

    Journal of Hydrology

    (1970)
  • T.G. Nguyen et al.

    A new approach to testing an integrated water systems model using qualitative scenarios

    Environmental Modelling and Software

    (2007)
  • J.P. Norton

    Algebraic sensitivity analysis of environmental models

    Environmental Modelling and Software

    (2008)
  • J. Nossent et al.

    Sobol' sensitivity analysis of a complex environmental model

    Environmental Modelling and Software

    (2011)
  • H. Akaike

    A new look at the statistical model identification

    IEEE Transactions on Automatic Control

    (1974)
  • V. Andréassian et al.

    Crash tests for a standardized evaluation of hydrological models

    Hydrology and Earth System Sciences

    (2009)
  • G.B. Arhonditsis et al.

    Evaluation of the current state of mechanistic aquatic biogeochemical modeling

    Marine Ecology-Progress Series

    (2004)
  • D. Bates et al.

    Nonlinear Regression Analysis and its Applications

    (1988)
  • B. Beck

    Model Evaluation and Performance

    (2006)
  • G. Bellochi et al.

    Validation of biophysical models: issues and methodologies. A Review

    Agronomy for Sustainable Development

    (2009)
  • N. Bennett et al.

    Performance evaluation of environmental models

  • L. Berthet et al.

    How significant are quadratic criteria? Part 2. On the relative contribution of large flood events to the value of a quadratic criterion

    Hydrological Sciences Journal

    (2010)
  • K.J. Beven

    Environmental Modelling: an Uncertain Future? An Introduction to Techniques for Uncertainty Estimation in Environmental Prediction

    (2008)
  • M.A. Boucher et al.

    Tools for the assessment of hydrological ensemble forecasts obtained by neural networks

    Journal of Hydroinformatics

    (2009)
  • D.P. Boyle et al.

    Toward improved calibration of hydrologic models: combining the strengths of manual and automatic methods

    Water Resources Research

    (2000)
  • F.H.S. Chiew et al.

    Assessing the adequacy of catchment streamflow yield estimates

    Australian Journal of Soil Research

    (1993)
  • Cited by (0)

    Position papers aim to synthesise some key aspect of the knowledge platform for environmental modelling and software issues. The review process is twofold – a normal external review process followed by extensive review by EMS Board members. See the Editorial in Volume 21 (2006).

    View full text