LTER-LIFE: a research infrastructure to develop Digital Twins of ecosystems in a changing world
LTER-LIFE: a research infrastructure to develop Digital Twins of ecosystems in a changing world
The challenge: to make ecology a predictive science
One of the grand challenges in ecology is to understand and predict how ecosystems are impacted by changes in environmental conditions, and external pressures. Given the unprecedented rate of global change in climate, land use, and urbanisation, it is urgent to tackle this grand challenge. To do so, we need to transform ecology into a predictive science. Predicting how complex ecosystems will behave under different scenarios requires combining empirical and observational data. This is analogous to global climate models, which also integrate fundamental knowledge and empirical data on climate processes to forecast consequences of different global emission scenarios. To make predictions for entire ecosystems, we need an integrated infrastructure with predictive power at temporal and spatial scales and levels of complexity not previously possible. To truly predict how ecosystems and landscapes will respond to current and future global change, we need to increase the availability of existing environmental long-term datasets, integrate disparate types of ecological data, and create a user-friendly and secure cloud-based digital modelling and simulation platform that can be used to link data to models and scenarios. This can be done by creating Digital Twins of entire ecosystems. The Netherlands, with its long tradition of long-term data collection on plants and animals, and modelling of ecological processes, is the perfect place to do this.
The approach: Digital Twins of entire ecosystems
A Digital Twin is “a digital replica of a living or non-living physical entity”. Digital Twins allow advanced data-driven modelling and simulation, using Big Data tools to generate novel insights that cannot be obtained with traditional observation models. Building Digital Twins of ecosystems has only recently become possible as Big Data, artificial intelligence (AI) applications and analytics, advanced computing infrastructure, and the FAIR principles have been developed and made available for ecology, ecosystem restoration, and biodiversity science. A Digital Twin of an ecosystem also provides tools to integrate data on abiotic factors (e.g., nutrient deposition, temperature, droughts), biotic factors (e.g., long-term occurrence data on animals and plants, life history data), and human activities (e.g., tourism, agriculture, fishery). It provides diagnostic (data-driven) and dynamic (process-based) models in one logical place, ensuring their necessary interoperability, scalability, storage, and processing capacity (Figure 1). Thus, ‘‘Big Data’’ collected on ecosystems and model-based data integration can be coupled in an unprecedented way with well-developed process-based models on the relationships between species and their environment. This approach brings together data scientists, informaticians, and ecologists in research communities around ecosystems.
The infrastructure: LTER-LIFE
To produce Digital Twins of entire ecosystems, LTER-LIFE will develop an infrastructure that includes a Virtual Research Environment (VRE) and services (e.g., catalogues and repositories that contain FAIR data, models, and software tools) that together can instantiate custom Virtual Laboratories to build Digital Twins of specific ecosystems (Figure 2 ).
The Virtual Laboratories provide VRE components, data, models, and other Big Data tools (e.g., AI) and allow scientific users to discover, access, and integrate research assets for specific scientific purposes, and manage the deployment, execution, and provenance of in silico experiments (analysis and simulations) on local or remote platforms. This includes the following assets:
- FAIR datasets of long-term studies of plants and animal populations as well as of environmental data. These data can be (a) at the data owner when FAIR and open, (b) at an LTER-LIFE server if the data owner cannot facilitate this, or (c) managed elsewhere (data or model outputs from authoritative sources such as remote-sensing observations, climate model projections or historical weather data).
- FAIR models such as (a) process-based models that formalise ecological knowledge on abiotic and biotic relationships, or (b) data-driven models that explore relationships between data using classical statistics, or more contemporary artificial intelligent tools. These models themselves are made FAIR, so that they can be discovered, accessed, and reused in new scientific workflows.
- Rules of interactions between assets (i.e., data and models), including solutions for scaling and enhancing interoperability of datasets, which are collected with different methodology.
- Tools for scenario studies to explore the effects of in situ local management strategies on ecosystem functioning.
These assets can be coupled in various ways (which is enabled by the LTER-LIFE platform) depending on the objectives of the specific research. Together, they will ultimately compose a Digital Twin of a specific ecosystem, which can then be used for scientific research or scenario studies to address societal issues.
With LTER-LIFE, we will offer a single point of entry for scientists studying entire ecosystems or components thereof (Figure 2). The LTER-LIFE portal gives users tailored access to assets and facilities. LTER-LIFE develops:
- Data bases and process models that are findable, accessible, interoperable, and reusable (FAIR, by applying (meta)data standards, controlled vocabularies, ontologies, and persistent identifiers).
- Workflows that allow coupling of these data and models at the desired temporal and spatial scale, and that provide Big Data methodology to analyse these data (AI and machine learning approaches such as deep learning and active learning which enable new theory discovery and novel insights from high-dimensional sparse data spaces) as well as tools for uncertainty analysis, scenario studies and forecasting.
- Virtual Labs that build upon infrastructure services and components of a VRE, allowing interdisciplinary communities of scientists to collaborate via a digital platform (including Big Data storage, cloud-based modelling, and dynamic simulations).
- Basic and advanced training in the use of the infrastructure.
LTER-LIFE will be an open-source infrastructure and users can either use the developed assets within their own local environment or run it in the cloud, facilitated by LTER-LIFE through our partner SURF. Setting up the LSRI LTER-LIFE as an open-source infrastructure aligns well with the current FAIR and Open Science developments. New data or models can be added as assets, which can then be used by other users. Workflows and data as used in projects will receive a digital identifier and will be stored by LTER-LIFE’s partners DANS-KNAW and SURF.
Building Digital Twins of entire ecosystems requires the integration of information and computer science; ecological knowledge on species interactions and processes; and long-term data on plants, animals, and the environment. It is impossible to create a Digital Twin for a generic ecosystem because ecosystems are too diverse to capture in a single model. We will thus need to develop our Digital Twins ecosystem by ecosystem, building both generic as well as ecosystem-specific assets. Therefore, we have initially selected two iconic sites, the Wadden Sea and the Veluwe, representing aquatic and terrestrial ecosystems that are characteristic for the Netherlands, in terms of biogeography and anthropogenic stressors they face, typical for human-dominated landscapes. Long-term ecological as well as environmental data have been extensively collected in both sites. In the second half of the project, we will include additional Dutch ecosystems. Using the generic assets that will be developed during the first half of the project (in addition to ecosystem-specific assets), we will be able to extend the infrastructure to other ecosystems relatively easily. Our ecosystem-oriented approach is essential as ecosystem consequences need to be tailored to specific ecosystems, because extreme regional effects will influence local ecosystems more than global average changes. However, we emphasise that more ecosystems will be taken aboard and that we will be able to make generalisations across ecosystems because we will focus on generalisable use cases.
Scientific impact of LTER-LIFE
LTER-LIFE will greatly advance our understanding of, and ability to, forecast how ecosystems are affected by global change including climate change, changes in land and water use, and nutrient enrichment. LTER-LIFE will provide the data, modelling, networking, and cross-disciplinary training infrastructure to address major outstanding ecological research questions such as: a) How do different sources of global change interact? b) What is the influence of climatic extremes? c) How do trophic effects cascade through ecosystems? d) How to scale global change factors to local impacts? This research is facilitated by the LTER-LIFE infrastructure as it will make long-term ecological and environmental data and process-based and data-driven models FAIR, and it will provide Virtual Labs to construct Digital Twins of entire ecosystems. This will create a dynamic, virtual representation of a specific ecosystem that is currently not existing.
LTER-LIFE facilitates curation of historic data, urgently needed to understand the future of ecosystems
For decades, ample data have been collected on Dutch ecosystems by professional organisations and citizen science projects, e.g., through our societal partners such as SOVON and Dutch Butterfly Conservation. At present, these data are only partly findable and accessible, e.g., through our data cooperation partners GBIF and NDFF (the Dutch database of flora and fauna data). Data that are not yet in these databases need to be made FAIR. This is challenging because these data are very heterogeneous and scattered. They are “long tail of science data”: relatively small sets of data collected by a large number of research groups that do not use standardised formats. LTER-LIFE will boost the FAIRification of these data, ensuring their proper curation. Curating long-term ecosystem data and making them available for research is essential because these data represent our only window into the past. They are crucial to understand how the large changes in ecosystem composition and functioning over the past decades can be related to changes in the abiotic environment and we will never be able to collect these data again.
Societal impact of LTER-LIFE
Already during its development, LTER-LIFE will be used to address major societal challenges. Digital Twins allow modelling in silico mitigation and adaptation measures, such as interventions to reduce nitrogen deposition. This can be used to forecast the effects of such measures on biodiversity and ecosystem functioning, facilitating evidence-based decision making. This will enable end-users to balance, for example, various forms of exploitation of the Wadden Sea against each other and against the requirements for preserving an area of Outstanding Universal Value. LTER-LIFE will thus support decision making in the management and exploitation of protected ecosystems and contribute to innovation in related fields of research through linking ecological science with, for example, social sciences and economics.
LTER-LIFE is built on three pillars and well-embedded in the Dutch Green Life Sciences
LTER-LIFE is built on and will capitalise on the synergy between three Large-Scale Research Infrastructures (LSRI) that are on the Dutch Roadmap for LSRI. Long-Term Ecosystem Research Netherlands (LTER-NL) carries out and connects time series on long-term ecosystem monitoring within so-called LTER sites and makes these data available for research. It is part of LTER-Europe, which is on the European ESFRI roadmap for large infrastructure. National Environmental Monitoring Network (NemNet) runs several national reference schemes of abiotic monitoring of soils, water and air, models and sensors, and various data products and maps. LifeWatch develops Virtual Laboratories to answer fundamental questions on the functioning and resilience of ecosystems. It is part of the European LifeWatch ERIC. These three LSRI that have joined forces in LTER-LIFE are highly complementary to several existing LRSIs within the Dutch Green Life Sciences. Together, they will greatly advance the understanding of the impressive diversity of life on Earth. LTER-LIFE’s partner LRSI ARISE combines DNA-methods, image analysis, and AI to identify species and map their interactions. The XL-EFES LSRI creates large-scale experimental outdoor facilities to study ecological interactions. Hence, these two LSRIs complement LTER-LIFE very well. Other LTER-LIFE partner LRSIs are the Ruisdael Observatory and C-MetNet, who collect atmospheric and climatic data that will be used as assets in LTER-LIFE.
LTER-LIFE constitutes a large and diverse consortium
The consortium is depicted in Figure 3[VM1] . The eight applicants of LTER-LIFE are affiliated with five institutions: Netherlands Institute of Ecology (NIOO-KNAW), University of Amsterdam (UvA) - Faculty of Science, National Institute for Public Health and the Environment (RIVM), Royal Netherlands Institute for Sea Research (NIOZ) & Wageningen University & Research (WUR) - Wageningen Data Competence Center. They have highly complementary expertise, from field ecology, environmental monitoring, and ecological modelling to computer and data science. The consortium is strengthened by four subcontracting partners (DANS-KNAW, eScience Center, SURF, Wageningen Research); scientific partners from all Dutch universities with an ecology research department; and societal partners ranging from ministries to local nature conservation organisations. The PE&RC graduate school is our educational partner. International partners come in via LTER-Europe and LifeWatch ERIC.
General approach in developing LTER-LIFE
LTER-LIFE will be developed using an agile approach, so that the infrastructure can be tested with relatively simple research questions from year 3 onwards. We will follow a use case-based study approach, initially focusing on scientific use cases from the Veluwe and Wadden Sea ecosystems. Common to these use cases is that they require integration across different scientific disciplines, heterogeneous data sources, and modelling techniques (including AI and machine learning). For each use case, we will FAIRify the required data and models and we will define how these interact with each other. After these assets have been developed, they will be tested and released for anybody to use within the LTER-LIFE infrastructure. The experiences from developing these initial use cases will be used to generalise, prioritise, and extend the services and VRE components offered by LTER-LIFE. Novel use cases will exploit the assets that were already developed. At some point, fewer and fewer additional assets will need to be developed to run a use case. Funding to run such use cases will need to come from other funding sources than LTER-LIFE funding. Training of scientific and societal users as well as PhD students is an integral part of the approach.
Use case development cycle
When a use case is taken into development, it is first broken down into the so-called user stories that are needed to realise infrastructure components and specific scenarios. The development will be incremental and follow agile practice; the components can be developed in parallel (e.g., data-access next to model-use) or sequential (e.g., tools for data and model integration after data access). When a user story has been prototyped and tested, the mature components will be taken into production and made available to other use cases as well. When (most) user stories have reached a stage of initial viability, the Digital Twin to address the main scientific question in the use case can be assembled and tested, with the results being used to refine the user stories if necessary. The Digital Twin will also be available for scientific and societal exploitation of other questions in the domain of the initial use case. This prolonged and diversified use is supported through capacity building activities related to both the infrastructure in general and the use case specific capacity building modules, used in the training part of the infrastructure. This diversified use supports inspiration to formulate novel use cases (e.g., different ecosystems with similar questions, novel processes within the existing use case, combination of use cases), closing the development cycle (Figure 3).
The development of LTER-LIFE will be structured in nine work packages (WPs, Figure 4). The core of the project is the close synergy of domain scientists (WP2&7) and data scientists and modellers (WP3-6). They will develop the LTER-LIFE infrastructure through use cases (section 2.1.2). These use cases will be prioritised in WP1. Use case-specific Digital Twins will be prototyped using research assets and the VRE provided by the LTER-LIFE infrastructure (Figure 2). After completing the Digital Twin of a specific use case, the users (scientists within and outside LTER-LIFE) will be trained in WP7 and the Digital Twins will be further used in externally-funded research (WP8). Project management is the focus of WP9. All WPs are led by one or two applicants. The Netherlands eScience Centre, SURF, DANS-KNAW and Wageningen Research will take up crucial parts of the WPs on software development, data storage and access, and training.