Introduction

Globalisation has led to an accompanying increase in propagule and colonisation pressure (Briski et al. 2012) and, thereby, to an increase in the number of established non-native species reaching unprecedented levels (Seebens et al. 2017; Daly et al. 2023). In particular, while the European continent has been a historical epicentre for species translocations and introductions worldwide (Pyšek et al. 2008), it has experienced a substantial increase in the number of non-native and invasive species in recent decades (Seebens et al. 2017, 2021). This notable trend is a growing concern due to the harmful consequences they pose to biodiversity and human well-being. These include, but are not limited to, the impairment of crops and infrastructure (Laverty et al. 2015), and biotic and functional homogenisation (Olden et al. 2004). Also, the lack of understanding of which non-native species will ultimately become invasive limits our ability to undertake preventive and control measures (Jarić et al. 2019; Pyšek et al. 2020). Therefore, it is vital to understand the invasion process, providing insights into the dynamics of ecosystem functioning, the response of native species to the invader, and the impact of human activities on the environment (Arim et al. 2006). At local and regional scales, increasing rates of introduction have been associated with a loss of ecosystem resilience and the facilitation of subsequent non-native species (Haubrock et al. 2021; Le Hen et al. 2023). As the different drivers of invasions are context dependent and can act synergistically, more integrative monitoring efforts are needed to improve management investments from being focused solely on the few invasive species that are better known and considered to be problematic (Watkins et al. 2021). Despite the well-known risk that biological invasions pose worldwide, only a few countries have fulfilled the requirements to achieve the “Aichi Target number 9” of the Convention on Biological Diversity’s (CBD) relating to biological invasions (McGeoch et al. 2016), underlining the need for future improvements. In this sense, historical data can be an important source to understand how variations in the introduction of non-native species and their establishment in ecosystems occur, and the possible implications (Clavero & Villero 2014).

Long-term biomonitoring data are immeasurably valuable to ecological sciences, as they can provide a fine-scale record of changes in the composition and abundance of species and thus, community compositions over time (Haubrock et al. 2023a, b; Haubrock and Soto 2023; Fig. 1). The information within long-term biomonitoring data could be used to understand the dynamics of non-native species introductions and to track their spread across a range of spatio-temporal scales, modulated by the biological resistance of native species and the availability of local resources, but also, possibly, to gain a better understanding of the influence of environmental conditions on their success or failure (Guareschi et al. 2021). This could help to develop strategies to prevent their introduction, control their spread, or eradicate established populations. Yet, the ability of long-term biomonitoring to assess introduction dynamics and potentially introduction rates has not been tested (but see Haubrock et al. 2022; Soto et al. 2023). Long-term biomonitoring data, therefore, present enormous potential, e.g. covering large areas across Europe, but also a major challenge because they have not been tested to the extent of benefits they may provide to invasion biologists.

Fig. 1
figure 1

Two possible scenarios of non-native species presence (i.e., abundance or occurrence) over time (right axis) following an external change (i.e. climatic, anthropogenic disturbance, etc.) and how long-term biomonitoring can capture it. In scenario 1, an external change could lead to a decrease in non-native species presence, and in scenario 2 the changes can result in increases. External changes can also vary, altering ecosystem naturalness (left axis) from pristine to highly impacted through time. The long term biomonitoring is assumed to capture these changes in community composition and ecosystem alterations

Hence, we used long-term biomonitoring data collected across four European regions—Denmark, Hungary, England, and the Dutch-German-Luxembourg region—to evaluate their capability to assess non-native species introductions in regard to regional differences in temporal dynamics and introduction rates. We hypothesised that (i) introduction rates of non-native species have been consistent over time, not reflecting any distinguishable temporal patterns. We consequently hypothesised that (ii) long-term biomonitoring data can be indicative of non-native species introductions at large scale, albeit being taxonomically limited, whereas (iii) regional differences will be prominent, but depend on sampling effort. Therefore, the information can be used to identify trends and patterns in the spread of introduced species and to develop effective strategies for their management and control. Ultimately, our goal is to better understand the dynamics of introduced non-native species in ecosystems and promote international cooperation to develop more effective approaches for mitigating their potentially harmful effects.

Methods

To investigate the rate of non-native species introduction and their occurrence in long-term biomonitoring data, we utilised a recently collated database of aquatic macroinvertebrate abundances (Haase et al., in review, Haubrock et al. 2022; Fig. 2) and considered four regional clusters located within Europe, with sufficiently large spatial coverage over Denmark, Hungary, England, and the Dutch-German-Luxembourg regions (henceforth referred to as DGL). These time series contained abundance data of macroinvertebrate groups identified at the species level, exclusively collected from streams and rivers and covered a minimum of eight, not necessarily consecutive sampling years. Each time series was surveyed at the same geographic location throughout the sampling period (Table 1). Comparable sampling techniques and protocols [e.g. RIVPACS (in England) or DIN 38410 (in Germany)] were used across the sampling sites within each region and were uniform across each site’s sampling period.

Fig. 2
figure 2

Location of time series from Denmark (n = 174, purple), Hungary (n = 84, blue), England (n = 296, orange), and Dutch-German-Luxembourg region (n = 178, green) containing non-native species

Table 1 Description of the study time-series, by region, number of time series, the range sampling duration, the average duration of time series, and the average of annual samplings

Identification of non-native species

Non-native species were identified and verified by consulting the Centre for Agriculture and Bioscience International (CABI 2023); Google Scholar (https://scholar.google.com/), the Global Biodiversity Information Facility (GBIF; https://www.gbif.org/), and the Global non-native Species First Record Database (sTWIST; Seebens et al. 2017). A species was only considered as non-native when confirmed by three of the four sources.

Unique species occurrences over time

To investigate the temporal occurrence of unique species, we plotted the Empirical Cumulative Distribution Functions (ECDF) for both non-native and native species over time within each region (Fig. 4). The ECDF is a step function that increases by 1/n for each additional species occurrence, where n is the total number of non-native or native species identified in a given region.

We considered a Weibull probability distribution p(t) of unique species occurrences over time t, given by

$$p (t)=\frac{k}{\lambda }{\left(\frac{t}{\lambda }\right)}^{k-1}{e}^{-{\left(\frac{t}{\lambda }\right)}^{k}}$$
(1)

where t is measured in years after the first species was recorded in a given region. λ > 0 is a scale parameter for the probability distribution, and k > 0 is a shape parameter that indicates whether the rate of unique species occurrences decreases (k < 1), is constant (k = 1) or increases (k > 1) over time. The mean time to new species detection (mean; \(E(t)\)) and variance \(\mathrm{ var}(t)\) is given by

$$E(t)=\lambda\Gamma (1+1/k), \quad \mathrm{ var}(t)={\lambda }^{2}\left[\Gamma (1+2/k)-{\left(\Gamma (1+1/k)\right)}^{2}\right]$$
(2)

where \(\Gamma\) is the Gamma function defined by \(\Gamma (a)={\int }_{0}^{\infty }{z}^{a-1}{e}^{-z}\mathrm{d}z\).

We fitted the cumulative probability distribution function P(t) (CDF) for the Weibull distribution given by

$$P(t)=1-{e}^{-(t/\lambda {)}^{k}}$$
(3)

to the ECDF empirical data points. The non-linear regression curve fitting tool fitnlm from Matlab was used for the distribution fitting, and the strength of the fitting was quantified by the coefficient of determination (R2) and the root mean square error (RMSE). We reported the estimated model parameters k and λ for non-native and native species, across each region. In the context of multiple species occurrences, the CDF can be used to determine the time to saturation in recorded species. Also, the number of non-native species per region was compared with the number of non-native species identified in the four regions according to the Global non-native Species First Record Database (Seebens et al. 2017).

Evaluating the usefulness of long term data for invasion dynamics

To gather insights into regional differences in the presence of non-native species, we investigated the the full number of non-native species records (i), the number of time series with at least one non-native record and, (ii) the average relative abundance of non-native species (iii), across time. For this, we only used the respective response variable (i-iii) as a function of time and the number of time series per year to account for temporally varying data availability. As a means of evaluation, the slopes of (i-iii) for each of the four regions as well as their respective 95% confidence intervals were compared using linear mixed-effect models using the lmer function of the nlme R package (Pinheiro et al. 2013).

To determine whether sample size (i.e. number of time series over time) was sufficient to describe the presence of non-native species in the four regions, the cumulative number of identified non-native and native taxa were plotted separately against the cumulative number of time series investigated per year (Haubrock et al. 2023). For this, we used the specaccum function of the vegan R package (Oksanen et al. 2013), randomising each time series ten times (Ferry & Cailliet 1996; Ferry et al. 1997). Cumulative curves were considered to be asymptotic if ten previous values of the total number of prey taxa were within ± 0.5 of the range of the asymptotic number of taxa, indicating the minimum monitored time series years required to describe the diversity of non-native and native taxa (Huveneers et al. 2007).

Results

All 732 time series analysed from all four regions contained records of non-native species (Fig. 3). The number of native species in Denmark (n = 517), Hungary (n = 508), England (n = 511) and the DGL (n = 716) outweighed non-native species in each region (n = 3 in Denmark; 34 in Hungary; 17 in England; 37 in the DGL; Supplement 1) in a ratio between 172:1 in Denmark and 15:1 in Hungary. The species records within our time series showed that the overlap of non-native species present (%) was higher than for native species for DGL, Hungary, DGL+Hungary, and DGL+Hungary+England, (Fig. 3).

Fig. 3
figure 3

Venn-diagram showing the relative overlap of non-native (a) and native species (b) across Denmark, Hungary, England, and the Dutch-German-Luxembourg region (DGL) according to the available long-term data

Unique species occurrences over time

The ECDF plots highlight that in each region, all non-native species reported over time were identified more rapidly than native species were collected over the same period. While not accounting for the possibility of not having found or correctly identified all non-native and native species, it shows that the effective observations of non-native and native species initiated at the same point in time, occurring in short tandem repeats (Fig. 4). Also, the ECDF plots show that with consistent sampling efforts non-native species are identified in approximately the same proportion over time as native species (Fig. 4e–h).

Fig. 4
figure 4

Empirical Cumulative Distribution Functions (ECDF; blue step functions) for the proportion of non-native and native species from Denmark (a, b), Hungary (c, d), England (e, f), and Dutch-German-Luxembourg region (DGL) (g, h). The Cumulative Distribution Function (CDF) given in Eq. (1) (black solid curves) were fitted against the proportion of species occurrences over time, with the reported R2 and RMSE values that quantify the fitting given in Table 2. The red shaded areas represent 95% confidence regions for the range of predicted CDF values

We found that the rate of unique species occurrences decreased over time (k < 1) in Hungary and England for both non-native and native species, and in Denmark only for native species: in all other cases, this rate increased (k > 1). The mean time to detection was shortest for non-native species in Hungary, and longest for native species in the DGL region (Table 2).

Table 2 Estimated distribution parameter values \(\lambda\), \({\varvec{k}}\) from fitting the CDF (Eq. 3) to the empirical proportion of non-native and native species occurrences across different regions. The fitting is quantified by the coefficient of determination (R2) and the root mean squared error (RMSE). The mean and variance of time to detection was computed using Eq. (2)

Among the four investigated regions, no temporal pattern in the occurrence of non-natives could be identified as these non-native species were identified sporadically. Solely in Hungary, the majority of species (n = 15) were already present in 2005, in the first years of monitoring campaigns (Supplement 2). New native species were identified periodically in bursts, possibly reflecting the inclusion of new time series. Comparing non-native species within the time series from the four investigated regions and freshwater macroinvertebrates reported in sTWIST (Seebens et al. 2017), we found considerable inconsistencies in the number of non-native species reported in our data compared with sTWIST. This was especially evident in England, where sTWIST reported 1.8-times more non-native freshwater macroinvertebrate species than reported in the available time series, followed by the DGL- region (1.7:1). In Denmark, our time series reported 3 non-native freshwater macroinvertebrate species versus 5 reported in sTWIST, while for Hungary, our time series reported 8-times higher numbers of non-native freshwater macroinvertebrate species than reported in sTWIST (Supplement 3).

Evaluating the usefulness of long-term data for invasion dynamics

Model assumptions were met in all cases, except for the number of invaded time series in the DGL and changes in the relative abundance of alien species identified in Hungary. Our models hence indicated that over time, the number of non-native species recorded in long-term data from the four investigated regions was growing, albeit significantly increasing only in England and the DGL (Fig. 5a; Supplement 4). The number of invaded time series, however, increased significantly in Denmark and the DGL, while decreased significantly in Hungary (Fig. 5b; Supplement 5). The average relative abundance of non-native species remained constant over time across all four regions (Fig. 5c; Supplement 6). Species accumulation curves further revealed that non-native species observations saturated significantly earlier than observations for native species, reaching their asymptote after 3.5 to 41.7 monitoring years compared to 720.2 to 1010.9 monitoring years (Fig. 6; Supplement 7).

Fig. 5
figure 5

Slopes (± 95% CI) of the relationships between the number of non-native species records (a), the number of invaded time series (b), and the average relative abundance of non-native species (c) and time for Denmark, Hungary, England, and the Dutch-German-Luxembourg region (DGL). Positive trajectories are displayed in blue, negative trajectories in red. Slopes with asterisk were significantly different from zero (p < 0.05)

Fig. 6
figure 6

Saturation curves indicating the years it takes until the completeness of species inventories (A) is reached, showing all native (left side; with the number of species indicated in the bottom right) and all non-native species (right; with the number of species indicated in the bottom right) detected in Denmark (a), Hungary (b), England (c), and the Dutch-German-Luxembourg region (d) based on the available long-term data

Discussion

Recognition of the importance of using long-term data for a more in depth understanding of impacts (e.g. costs and magnitude) is not new (Blossey 1999; Gill et al. 2021; Le Hen et al. 2023). However, demands have not been aligned with realistic efforts in using wider datasets to allow more effective resource allocation in combating introductions and mitigating negative outcomes from non-native species. In this study, we investigated the occurrences of non-native and native species within long-term data from four European regions with particularly high spatio-temporal coverage of time series. We found staggering differences not only between non-native and native species, but also across regions. We further demonstrated that new non-native species occurrences were rare but continuous over time, detached from temporal patterns in native species occurrences. Continuous monitoring efforts recording occurrences of non-native species over time, likely represent only a subset of the non-native species established in the respective region, suggesting that time series provide a powerful tool for the detection and effective management and control of invasive populations.

Unique species occurrences over time

Empirical Cumulative Distribution Functions (ECDF) plots are a useful tool to highlight features of an investigated dataset. Compared with a histogram or density plot, they have the advantage of visualising each observation directly, meaning that there are no binning or smoothing parameters that need to be adjusted (Magurran 2013; Gatti 2014; Langrené & Warin 2021). The ECDF plots indicated that with prolonged sampling effort, non-native species are identified at the same pace as native species (Fig. 3c, d). This suggests that despite the known lag time in identifying and reporting of non-native species (McGeoch et al. 2012), established monitoring efforts will be able to identify all non-native species within studies sites, possibly also new arrivals. However, it should be noted that detecting non-native species with similar characteristics to native species can often be difficult, often prolonging the lag time. Interestingly, ECDF plots also revealed a decrease in first identifications of new native species, whereas first non-native species occurrences remained constant, indicating that more non-native species will possibly appear over time. In practical applications, the mean time can be used to make predictions about the expected rate of species detection in a population over time. It can help to guide conservation efforts aimed at preserving biodiversity by providing estimates of the time remaining before a certain proportion or number of species are lost.

However, ECDF plots have to be taken with care if saturation in terms of species identifications has not been reached. Here, non-native species identifications were continuous over the period—likely in the process of saturating—potentially reaching saturation in the future. The comparison with sTWIST further indicated that, albeit a considerable disparity in terms of total species recorded emphasised by the example of Hungary (1 non-native species in sTWIST vs 34 in our long term data), many more non-native species are likely to be identified in the respective regions. Yet, it should be noted that recent analyses based on sTWIST present established species cumulatively, ignoring that not all species reported eventually established, thereby ignoring the possibility of non-native species disappearing again post establishment, and possibly inadequately representing the true regional presence of non-native species nowadays (Seebens et al. 2018, 2021). In our dataset, non-native species occurred over multiple years and, therefore, were considered as established, although some likely failed to establish and to form populations that created new propagules (Briski et al. 2012). This discrepancy originated from the underlying assumption made by the sTWIST database, recording first records and assuming non-native species to remain in the country as established.

Evaluating the usefulness of long-term data

Globally, occurrence records of new non-native species are growing (Seebens et al. 2017). The resulting data are usually compiled and consequently extracted again from large databases such as GBIF. The uncertainty associated with the occurrence data from these databases can lead to misrepresentations of current distributions of non-native species and do not provide data on their permanence in the environment. Hence, the ecology of invasions still lacks specific long-term monitoring to detect species compared with other long-term ecological studies (Sukhotin & Berger 2013; Harvey et al. 2020). Local and regional biomonitoring is, therefore, an important temporal record of both the first records of non-native species and the local variation in establishment and dispersal, particularly for underrepresented but highly impactful taxa such as invasive vertebrates (Haubrock et al., 2023; Le Hen et al. 2023). Additionally, gaps in historical monitoring can be detrimental for accurately determining the environmental space where species can establish populations. As anthropogenic disturbances, i.e. landscape and climate changes, cause uncertain impacts on biodiversity, tracking long-term data on species distributions will become more important as it will provide information on species tolerances and survivorship outside optimal ranges (Hellmann et al. 2008). Thus, long-term information can avoid false positives (locations where species are no longer able to sustain populations) and false negatives (accounting for robust data on distributions of rare species, or those with detection issues “sleeper populations”; Spear et al. 2021; Bracken et al. 2022). In doing so, we will be able to use statistical and computational techniques to accurately predict potential distributions and map areas at risk of invasions.

Here, the applied model found comparable patterns in the presence of non-native species across regions, even significant increases in England and the DGL. Due to the consistent sampling methodology over time, obtained trends are more reliable than if based on large occurrence databases, as the origin of the data and methods used are traceable, as well as the temporal variation, allowing analysis of uncertainties (Hughes et al. 2021). The positive patterns found in the regions of England and the DGL may be related to a greater number of time series, showing that with more extensive monitoring networks, the detection of new species, both native and non-native, is improved. Interestingly, the number of time series with non-native species increased significantly in the DGL and Denmark, in the latter case artificially increased by P. antipodarum spreading passively (Haubrock et al. 2022). This suggests progressing invasions, as these regions are also those with time series collected over a longer time frame. However, the number of time series with non-native species was decreasing in Hungary. While management interventions could be a reason, it is also possible that local conditions resulted in non-native species failing to establish or that time series were not continuously monitored, underlining that monitoring has to be consequent and consistent (Haase et al. 2018). Hungary was also the country with the smallest number of time series spanning the shortest time interval, but the country with the second highest number of non-native species, likely due to the arrival of Ponto-Caspian species (Soto et al. 2023b).

Increasing introduction rates at regional and local scales can indicate that ecosystems are becoming less resilient to new introductions based on decreasing ability to support native species, thus being at risk of being functionally overwhelmed (Haubrock et al. 2021; Le Hen et al. 2023). On the other hand, the ongoing introduction of non-native species may also be the result of the loss of biodiversity due to the large-scale anthropisation of ecosystems, which creates new opportunities for invasions and replacement of ecosystem functions that otherwise have become extinct (Lundgren et al. 2020). This can exacerbate biodiversity loss (Bellard et al. 2022), loss of ecosystem functions (Vilà et al. 2010), and ultimately, affect human well-being (Bacher et al. 2018). On the other hand, decreasing introduction rates can indicate that efforts to prevent, control, or eradicate non-native and potential invasive species are successfully hindering their spread, dispersal is limited due to new (i.e. geographic) barriers, or that a form of ‘saturation’ has been reached, i.e. enabling additional management possibilities to bolster native biodiversity (Watkins et al. 2021). Such management interventions are crucial, yet difficult to implement as non-native aquatic macroinvertebrate species are particularly hard to detect due to the hidden nature of life underwater. In the case of Denmark, due to the number of time series on a small area and the widespread species P. antipodarum, we found that only a low number of monitoring years is needed to identify all non-natives. In both England and the DGL, which are both more species-rich, more monitoring years were needed. Thus, our results show that high monitoring effort, i.e. many monitored sites, decreases the time needed to identify all non-native species in a specific region. High spatial monitoring coverage—evenly distributed regionally but also globally (Hughes et al. 2021)—will result in non-native species and their spread more rapidly being identified. While our analysis only focused on riverine ecosystems, we believe that monitored lakes could on the one hand result in more non-native species being identified within each region, but also lead to a decrease in monitoring years required due to more sites being monitored over time.

Conclusion

Long-term data across wide geographic scales play a critical role in ecology and conservation biology as it shows its usefulness in deciphering a realistic picture of the status and trends of native and non-native species accumulation across geographical scales. Overall, long-term macroinvertebrate monitoring data could become critical in enabling invasion biologists and authorities-stakeholders to detect invasions early and apply rapid responses. Here, we showed a consistently increasing trend across macroinvertebrate groups, which was supported by the observed increase in new records and number of invaded time series. Of particular interest is the number of monitoring sites needed to collect sufficient monitoring years before all non-native species are identified. As more introductions are expected worldwide, continuous and widespread monitoring efforts will be needed to quantify the effectiveness of management interventions, and to propose priority areas for control and mitigation of plausible impacts aiming at the protection of biodiversity.