Position paperCharacterising performance of environmental models☆
Highlights
► Numerical, graphical and qualitative methods for characterising performance of environmental models are reviewed. ► A structured, iterative workflow that combines several evaluation methods is suggested. ► Selection of methods must be tailored to the model scope and purpose, and quality of data and information available.
Introduction
Quantitative environmental models are extensively used in research, management and decision-making. Establishing our confidence in the outputs of such models is crucial in justifying their continuing use while also recognizing limitations. The question of evaluating a model's performance relative to our understanding and observations of the system has resulted in many different approaches and much debate on the identification of a most appropriate technique (Alexandrov et al., 2011; McIntosh et al., 2011). One reason for continued debate is that performance measurement is intrinsically case-dependent. In particular, the manner in which performance is characterised depends on the field of application, characteristics of the model, data, information and knowledge that we have at our disposal, and the specific objectives of the modelling exercise (Jakeman et al., 2006; Matthews et al., 2011).
Modelling is used across many environmental fields: hydrology, air pollution, ecology, hazard assessment, and climate dynamics, to name a few. In each of these fields, many different types of models are available, each incorporating a range of characteristics to measure and represent the natural system behaviours. Environmental models for management typically consist of multiple interacting components with errors that do not exhibit predictable properties. This makes the traditional hypothesis-testing associated with statistical modelling less suitable, at least on its own, because of the strong assumptions generally required, and the difficulty (sometimes impossibility) of testing hypotheses separately. Additionally if a single performance criterion is used, it generally measures only specific aspects of a model's performance, which may lead to counterproductive results such as favouring models that do not reproduce important features of a system (e.g., Krause et al., 2005; Hejazi and Moglen, 2008). Consequently, systems of metrics focussing on several aspects may be needed for a comprehensive evaluation of models, as advocated e.g., by Gupta et al. (2012).
It is generally accepted that the appropriate form of a model will depend on its specific objectives (Jakeman et al., 2006), which often fall in the broad categories of improved understanding of natural processes or response to management questions. The appropriate type of performance evaluation clearly depends on the model objectives as well. Additionally, there may be several views as to the purpose of a model, and multiple performance approaches may have to be used simultaneously to meet the multi-objective requirements for a given problem. In the end, the modeller must be confident that a model will fulfil its purpose, and that a ‘better’ model could not have been selected given the available resources. These decisions are a complex mixture of objectively identified criteria and subjective judgements that represent essential steps in the cyclic process of model development and adoption. In addition, the end-users of a model must also be satisfied, and may not be comfortable using the same performance measures as the expert modeller (e.g., Miles et al., 2000). It is clear that in this context, a modeller must be eclectic in choice of methods for characterising the performance of models.
Regardless, assessing model performance with quantitative tools is found to be useful, indeed most often necessary, and it is important that the modeller be aware of available tools. Quantitative tools allow comparison of models, point out where models differ from one another, and provide some measure of objectivity in establishing the credibility and limitations of a model. Quantitative testing involves the calculation of suitable numerical metrics to characterise model performance. Calculating a metric value provides a single common point of comparison between models and offers great benefits in terms of automation, for example automatic calibration and selection of models. The use of metric values also minimises potential inconsistencies arising from human judgement. Because of the expert knowledge often required to use these tools, the methods discussed in this paper are intended for use primarily by modellers, but they may also be useful to inform end-users or stakeholders about aspects of model performance.
This paper reviews methods for quantitatively characterising model performance, identifying key features so that modellers can make an informed choice suitable for their situation. A classification is used that cuts across a variety of fields. Methods with different names or developed for different applications are sometimes more similar than they at first appear. Studies in one domain can take advantage of developments in others. Although the primary applications under consideration are environmental, methods developed in other fields are also included in this review. We assume that a model is available, along with data representing observations from a real system, that preferably have not been used at any stage during model development and that can be compared with the model output. This dataset should be representative of the model aims; for instance it should contain flood episodes or pollution peaks if the model is to be used in such circumstances.
The following section provides a brief view of how characterisation of model performance fits into the broader literature on the modelling process. Section 3 reviews selection of a so-called ‘validation’ dataset. In Section 4, quantitative methods for characterising performance are summarized, within the broad categories of direct value comparison, coupling real and modelled values, preserving data patterns, indirect metrics based on parameter values, and data transformations. Section 5 discusses how qualitative and subjective considerations enter into adoption of the model in combination with quantitative methods. Section 6 presents an approach to selecting performance criteria for environmental modelling. Note that a shorter and less comprehensive version of this paper was published as Bennett et al. (2010).
Section snippets
Performance characterisation in context
With characterisation of model performance being a core part of model development and testing, there is naturally a substantial body of related work. This section presents some key links between similar methods that have developed separately in different fields. In many of the fields of environmental modelling, methods and criteria to judge the performance of models have been considered in the context of model development. Examples include work completed for hydrological models (Krause et al.,
Data for characterising performance
The most important component of quantitative testing is the use of observational data for comparison. However, some of the data must be used in the development and calibration (if required) of the model. This necessitates the division of available data to permit development, calibration and performance evaluation. Common methods for this division are presented in Table 1 and include cross-validation and bootstrapping. In cross-validation, the data are split into separate groups for development
Methods for measuring quantitative performance
Quantitative testing methods can be classified in many ways. We use a convenient grouping based on common characteristics. Direct value comparison methods directly compare model output to observed data as a whole (4.1). These contrast with methods that combine individual observed and modelled values in some way (4.2). Within this category, values can be compared point-by-point concurrently (4.2.1), by calculating the residual error (4.2.2), or by transforming the error in some way (4.2.3). The
Qualitative model evaluation
Despite the power of quantitative comparisons, model acceptance and adoption depend in the end strongly on qualitative, and often subjective, considerations. There may be even more subjectivity in considering the results of quantitative testing. Suppose one model is superior to another according to one metric, while it is the reverse according to another metric. What is the weight to be assigned to individual quantitative criteria in the overall assessment? How are these weights decided and by
Performance evaluation in practice: a suggested general procedure
As pointed out by many authors (e.g., Jakeman et al., 2006), performance evaluation is just one step of iterative model development. Evaluation results may indicate whether additional study is necessary. If performance is unsatisfactory, then different data, calibration procedures and/or model structures should be considered. With satisfactory performance, one may also evaluate whether simplification or other modification would entail significant performance loss. Modelling is an iterative
Conclusions
This paper provides an overview of qualitative and quantitative methods of characterising performance of environmental models. Qualitative issues are crucial to address. Not everything can be measured. However, quantitative methods that make use of data were emphasised as they are somewhat more objective and can be generalised across model applications. This includes not just methods that couple real and modelled values point by point, but also methods that involve direct value comparison,
References (166)
- et al.
Technical assessment and evaluation of environmental models and software: letter to the Editor
Environmental Modelling and Software
(2011) - et al.
On structural identifiability
Mathematical Biosciences
(1970) - et al.
Ten iterative steps for model development and evaluation applied to Computational Fluid Dynamics for Environmental Fluid Mechanics
Environmental Modelling and Software
(2012) - et al.
Modelling with stakeholders within a development project
Environmental Modelling and Software
(2010) - et al.
Evaluation of a neighbourhood scale, street network dispersion model through comparison with wind tunnel data
Environmental Modelling and Software
(2012) - et al.
Nonlinear compartmental model indistinguishability
Automatica
(1996) - et al.
Reliability of parameter estimation in respirometric models
Water Research
(2005) - et al.
Multi-period and multi-criteria model conditioning to reduce prediction uncertainty in an application of topmodel within the GLUE framework
Journal of Hydrology
(2007) Applying multi-resolution analysis to differential hydrological grey models with dual series
Journal of Hydrology
(2007)A review of assessing the accuracy of classifications of remotely sensed data
Remote Sensing of Environment
(1991)
Model goodness of fit: a multiple resolution procedure
Ecological Modelling
HydroTest: a web-based toolbox of evaluation metrics for the standardised assessment of hydrological forecasts
Environmental Modelling and Software
HydroTest: further development of a web resource for the standardised assessment of hydrological models
Environmental Modelling and Software
Identifiability of uncontrolled nonlinear rational systems
Automatica
Hydrograph matching method for measuring model performance
Journal of Hydrology
Activated sludge wastewater treatment plant modelling and simulation: state of the art
Environmental Modelling and Software
A fuzzy GIS-based system to integrate local and technical knowledge in soil salinity monitoring
Environmental Modelling and Software
A standard protocol for describing individual-based and agent-based models
Ecological Modelling
The ODD protocol: a review and first update
Ecological Modelling
Decomposition of the mean squared error and NSE performance criteria: implications for improving hydrological modelling
Journal of Hydrology
On the practical identifiability of microbial models incorporating Michaelis–Menten-type nonlinearities
Mathematical Biosciences
Computation of the instantaneous unit hydrograph and identifiable component flows with application to two small upland catchments
Journal of Hydrology
Ten iterative steps in development and evaluation of environmental models
Environmental Modelling and Software
Time-scale dependence in numerical simulations: assessment of physical, chemical, and biological predictions in a stratified lake at temporal scales of hours to months
Environmental Modelling and Software
Ten guidelines for effective data visualization in scientific publications
Environmental Modelling and Software
The role of expert opinion in environmental modelling
Environmental Modelling and Software
Sensitivity analysis for complex ecological models – a new approach
Environmental Modelling and Software
Identification of dynamic models for horizontal subsurface constructed wetlands
Ecological Modelling
Water quality modelling for small river basins
Environmental Modelling and Software
Confidence regions of estimated parameters for ecological systems
Ecological Modelling
Parameter estimation of ecological models
Ecological Modelling
Control of SBR switching by fuzzy pattern recognition
Water Research
A benchmarking framework for simulation-based optimization of environmental models
Environmental Modelling and Software
Raising the bar? – the challenges of evaluating the outcomes of environmental modelling and software
Environmental Modelling and Software
River flow forecasting through conceptual models part i – a discussion of principles
Journal of Hydrology
A new approach to testing an integrated water systems model using qualitative scenarios
Environmental Modelling and Software
Algebraic sensitivity analysis of environmental models
Environmental Modelling and Software
Sobol' sensitivity analysis of a complex environmental model
Environmental Modelling and Software
A new look at the statistical model identification
IEEE Transactions on Automatic Control
Crash tests for a standardized evaluation of hydrological models
Hydrology and Earth System Sciences
Evaluation of the current state of mechanistic aquatic biogeochemical modeling
Marine Ecology-Progress Series
Nonlinear Regression Analysis and its Applications
Model Evaluation and Performance
Validation of biophysical models: issues and methodologies. A Review
Agronomy for Sustainable Development
Performance evaluation of environmental models
How significant are quadratic criteria? Part 2. On the relative contribution of large flood events to the value of a quadratic criterion
Hydrological Sciences Journal
Environmental Modelling: an Uncertain Future? An Introduction to Techniques for Uncertainty Estimation in Environmental Prediction
Tools for the assessment of hydrological ensemble forecasts obtained by neural networks
Journal of Hydroinformatics
Toward improved calibration of hydrologic models: combining the strengths of manual and automatic methods
Water Resources Research
Assessing the adequacy of catchment streamflow yield estimates
Australian Journal of Soil Research
Cited by (0)
- ☆
Position papers aim to synthesise some key aspect of the knowledge platform for environmental modelling and software issues. The review process is twofold – a normal external review process followed by extensive review by EMS Board members. See the Editorial in Volume 21 (2006).