Skip to contents
library(diseasy)
#> Loading required package: diseasystore
#> 
#> Attaching package: 'diseasy'
#> The following object is masked from 'package:diseasystore':
#> 
#>     diseasyoption

Introduction

One of the model families in diseasy is the family of Ordinary Differential Equation (ODE) models. This family of models is presently represented only by the ?DiseasyModelOdeSeir, a model template that implements SEIR (Susceptible, Exposed, Infected, Recovered) ODE models according to user specification. This template allows for any number of EE, II or RR states in the model and interfaces with the ?DiseasySeason, ?DiseasyActivity, ?DiseasyVariant and ?DiseasyImmunity modules for scenario definitions.

The workflow for creating a specific ?DiseasyModelOdeSeir model would roughly follow these steps:

  1. Defining the model scenario. The first step to using most models in diseasy is to define the scenario. That is, to configure the functional modules that the model templates takes as argument (e.g. ?DiseasyActivity). These steps provide the model template with information about the population, the activity, variants to include etc.
  2. Choosing the model configuration. The model template may have some configurability that you control. In the case of ?DiseasyModelOdeSeir for example, you must choose the number of consecutive EE, II and RR states in the model.
  3. Mapping to observables. The ODE family of models is in some ways bespoke in the sense that the user needs to provide a mapping from and to real-world observables.

We distinguish between two mappings: First, observed real-world data, such as positive tests or hospitalisations, is used to estimate the “true” incidence of infections. Secondly, the modelled incidence is then mapped back to the observable quantities of interest.

The model must have an estimate of the disease incidence to be initialised. If no reverse mappings are given, the model can only provide predictions for the true incidence and true number of infected.

These steps are outlined in individual sections below where we work through defining a model in a specific example. Note that this example is contrived in the sense that the data we use in this example have been generated with this exact model template and we therefore can match the observations near-perfectly.

Example data

The data we will use this example is provided by ?DiseasystoreSeirExample which represents data in a Danish context.

We first create an observables module to interface with this data:

# Configure a observables module with the example data and database
observables <- DiseasyObservables$new(
  diseasystore = DiseasystoreSeirExample,
  conn = DBI::dbConnect(RSQLite::SQLite())
)

The data is stratified by age and covers a single infection wave. It contains three observables of interest for our example:

  • n_infected: The true number of infected in the data.
  • n_positive: The observed subset of infected (65% of true infected are observed but fewer are observed during the weekend).
  • n_admissions: The number of new hospital admissions due to infections (risk of hospitalisation dependent on age group).

When running the models, we should also define at what point in time they should provide predictions for. This is done by setting the last_queryable_date in the observables module:

last_queryable_date <- observables$ds$min_start_date + 45

Lets first look at the data we have available:

observables$get_observation(
  observable = "n_infected",
  stratification = rlang::quos(age_group),
  start_date = observables$ds$min_start_date,
  end_date = observables$ds$max_end_date
) |>
  ggplot2::ggplot(ggplot2::aes(x = date, y = n_infected)) +
  ggplot2::geom_vline(
    xintercept = last_queryable_date,
    color = "black",
    linetype = "dashed"
  ) +
  ggplot2::geom_point(color = "deepskyblue3") +
  ggplot2::labs(title = "True number of infected") +
  ggplot2::facet_grid(~ age_group)

Data to model: True number of infected.

observables$get_observation(
  observable = "n_positive",
  stratification = rlang::quos(age_group),
  start_date = observables$ds$min_start_date,
  end_date = observables$ds$max_end_date
) |>
  ggplot2::ggplot(ggplot2::aes(x = date, y = n_positive)) +
  ggplot2::geom_vline(
    xintercept = last_queryable_date,
    color = "black",
    linetype = "dashed"
  ) +
  ggplot2::geom_point(color = "seagreen") +
  ggplot2::labs(title = "Observed number of positive cases") +
  ggplot2::facet_grid(~ age_group)

Data to model: Number of positive cases.

observables$get_observation(
  observable = "n_admission",
  stratification = rlang::quos(age_group),
  start_date = observables$ds$min_start_date,
  end_date = observables$ds$max_end_date
) |>
  ggplot2::ggplot(ggplot2::aes(x = date, y = n_admission)) +
  ggplot2::geom_vline(
    xintercept = last_queryable_date,
    color = "black",
    linetype = "dashed"
  ) +
  ggplot2::geom_point(color = "violetred3") +
  ggplot2::labs(title = "Number of admissions") +
  ggplot2::facet_grid(~ age_group)

Data to model: Number of hospital admissions.

Defining the scenario

The first step to creating a ?DiseasyModelOdeSeir model is to define the model scenario. In this case, as we stated above, we can exactly match the model scenario to the data since the data was created using the same class of models.

First we define the population to include in the model via the ?DiseasyPopulation module, and then we define the activity scenario via the ?DiseasyActivity module, before finally defining the seasonality scenario via the ?DiseasySeason module.

Model population

In this example scenario, we use a simple three age-group stratification of the population in the model.

# Split the model population into three age groups
population <- DiseasyPopulation$new()
population$stratify_age(age_cuts_lower = c(0, 30, 60))

Activity scenario

This scenario describes the changes in activity in the society (e.g. restrictions). To see more on how to configure this module, see vignette("diseasy-activity").

# Configure an activity module using Danish population and contact information.
activity <- DiseasyActivity$new()
activity$set_contact_basis(contact_basis = contact_basis_nordic$DK)
activity$set_activity_units(dk_activity_units)

# The level of activity is fixed to the "baseline" level throughout the simulation.
activity$change_activity(date = as.Date("2020-01-01"), opening = "baseline")

Seasonality scenario

This particular scenario also includes simple seasonality and waning immunity:

# Waning immunity scenario
immunity <- DiseasyImmunity$new()
immunity$set_exponential_waning(time_scale = 180)
#> $infection
#> function (t) 
#> exp(-t/time_scale)
#> <environment: 0x5648db61d5b8>
#> attr(,"name")
#> [1] "exponential_waning"
#> attr(,"dots")
#> attr(,"dots")$time_scale
#> [1] 180

# Season scenario
season <- DiseasySeason$new()
season$set_reference_date(as.Date("2020-01-20"))
season$use_cosine_season()

Choosing the model configuration

The ?DiseasyModelOdeSeir model template has a number of hyperparameters and parameters that should be configured for the model to run.

The hyperparameters are:

  • compartment_structure: A named vector specifying the number of consecutive EE, II and RR states in the model.

The parameters are:

  • disease_progression_rates: A named vector specifying the progression rates from EE to II and from II to RR (i.e. waiting times). The rates determine the expected time spent in each state).
  • overall_infection_risk: A population-level scaling factor for transmission, controlling the overall risk of infection in the population.
# Hyper parameters
compartment_structure <- c("E" = 2L, "I" = 1L, "R" = 2L)

# Parameters
disease_progression_rates <- c("E" = 1 / 2.1, "I" = 1 / 4.5)
overall_infection_risk <- 0.025

# When to predict?
observables$set_last_queryable_date(last_queryable_date)

Mapping observables to model and back

Being a SEIR model, the ?DiseasyModelOdeSeir model template runs the dynamics of the infections in the population. These dynamics are inferred from the incidence of the infections in the population. These dynamics are not usually observed directly. Instead, they must be inferred from one or more observed data sources, such as positive tests or hospital admissions.

Mathematically, we map from the set of real-world observables O(t)\vec{O}(t) to the incidence Inc(t)Inc(t) via the map ff: O(t)fInc(t)\vec{O}(t) \overset{f}{\rightarrow} Inc(t) The model then produces a number of observations under the given configuration.

By default, the model will predict the evolution of the incidence1, Inc̃(t)\tilde{Inc}(t), but other outputs can be configured (see further below).

To get predictions for other real-world observables, we must map from the incidence (or other output) to the observable of interest.

From the model incidence Inc̃(t)\tilde{Inc}(t), we then map to the observable estimates Õi(t)\tilde{O}_i(t) via the maps gig_i: Inc̃(t)g1Õ1(t)\tilde{Inc}(t) \overset{g_1}{\rightarrow} \tilde{O}_1(t)Inc̃(t)g2Õ2(t)\tilde{Inc}(t) \overset{g_2}{\rightarrow} \tilde{O}_2(t)\vdotsInc̃(t)gnÕn(t)\tilde{Inc}(t) \overset{g_n}{\rightarrow} \tilde{O}_n(t)

Mapping from observables to incidence

For the model to be initialised, it needs an estimate of the incidence in the population. This incidence is then used to infer the dynamics of the disease in the population (See the article SEIR-initialisation in the online documentation for more information).

If an estimate of the incidence is not defined in the diseasystore, or if you wish to provide your own estimate from the observable, we can define a “synthetic observable” in the DiseasyObservables module for the model to use.

In this example, the incidence is not defined in the diseasystore and we therefore define a synthetic observable for the incidence which maps from the number of positive cases to the incidence. In many cases, testing data may not be reliable, and you may want to infer the incidence from other sources such as waste water data or admissions data.

# Map observables to incidence
observables$define_synthetic_observable(
  name = "incidence",
  mapping = \(n_positive, n_population) n_positive / (n_population * 0.65)
)

Mapping from model incidence to observables

By default, the model can only return predictions for the “true” number of infected (n_infected) and the incidence (incidence) since these are the only observables “known” to the model.

Any additional observables must be mapped from the incidence provided by the model, or other output must be defined to map from (see further below).

To convert from model outputs to real-world observables, we define a “map” and a “reduce” function:

The map function is passed to a dplyr::group_map() call on a data.frame grouped by the stratification. The function should be a two-argument function where the first argument contains a data.frame with columns n_infected and population (i.e. the size of the group), and where the second argument contains the group (as a data.frame). Note that the output of this function should contain the groups provided in the second argument.

The reduce function is passed to a dplyr::summarise() call on the output of the map. By default, a simple sum is used which works for all counting data. If the observable is not a count, the user must provide a custom reduce function. For example, the incidence is a rate and to summarise a weighted sum must be made.

In this example, we define maps from the incidence to the number of positive cases and the number of hospital admissions. These are both counting observables and the default reduce function can be used.

# Maps between the internal model incidence and observables
map_to_n_positive <- list(
  "map" = \(.x, .y) {
    dplyr::transmute(
      .x,
      .data$date,
      "n_positive" = 0.65 * .x$n_infected
    )
  }
)

map_to_n_admission <- list(
  "map" = \(.x, .y) {
    # Risk per age group
    risk_of_admission <- c("00-29" = 0.001, "30-59" = 0.01, "60+" = 0.1)

    # Delay admissions by a fixed distribution
    delay_distribution <- c(0, 0, 0.2, 0.3, 0.3, 0.1, 0.1) # Must sum = 1

    data.frame(
      "date" = as.vector(
        outer(
          X = .x$date,
          Y = seq_along(delay_distribution) - 1,
          FUN = "+"
        )
      ),
      "n_admission" = as.vector(
        outer(
          X = .x$n_infected * risk_of_admission[.y$age_group],
          Y = delay_distribution,
          FUN = "*"
        )
      )
    )
  }
)

# These maps are provided via a list to the model with the names of
# the elements denoting the observable being mapped to.
model_output_to_observable <- list(
  "n_positive" = map_to_n_positive,
  "n_admission" = map_to_n_admission
)

Custom model output

The diseasy ODE model templates uses a generic system for generating observables from the state_vector, ψ¯\overline{\psi}.

A observable can be configured to either derive from the full state vector ψ¯\overline{\psi} or the infection matrix ff (see the article SEIR-implementation in the online documentation).

For example, if the user wants to configure an observable for the prevalence of the disease rather than the incidence, a observable summing across the II states in the model can be used.

Alternatively, if the user wants to configure observables related to infections, such as hospitalisation, a observables summing across the infection matrix with age-specific risks of hospitalisation can be used.

In either case, the observable is configured by defining a vector of weights. For each of such vectors, a “surveillance” state is added to the state_vector which integrates the inner-product ω¯oψ¯\overline{\omega}_o \cdot \overline{\psi} or ω¯orowsf¯¯\overline{\omega}_o \cdot \sum_{rows}\overline{\overline{f}} depending on which element the observable is set to track.

In the former case, the surveillance states are derived from the state_vector whereas in the latter the observable is derived from the new infections occurring in the model.

In the case of the “n_infected” observable defined by default in the model, the weights are essentially a matrix with weights only for each I1I_1 state2 and the observable tracks the state_vector.

For example, in SEIR case (N=L=M=1N = L = M = 1) with two age-groups, the weights take the following form:

Ω¯¯ninfected=[0ri0000000000ri000]E1I1R1E2I2R2S1S2 \overline{\overline{\Omega}}_{n_{infected}} = \underbrace{\begin{bmatrix} 0 & r_i & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & r_i & 0 & 0 & 0 \end{bmatrix}}_{\begin{matrix} E_1 & I_1 & R_1 & E_2 & I_2 & R_2 & S_1 & S_2 \end{matrix}} where rir_iis the rate of transition out of II (i.e. the reciprocal of the time spent in II state).

As stated above, the model always provides a prediction for the “n_infected” observable but there are cases where other observables are automatically configured by diseasy.

For example, when the model includes a ?DiseasyImmunity module, observables are configured to track outcomes related to immunity.

Here, waning target named “hospitalisation” or “death” in the immunity scenario are assigned their own observables which are derived from the infection matrix (Note that there is no difference in waning between variants included in the model).

In the SEIR3 case (N=L=1,M=3N = L = 1, M = 3) with two age-groups, the weights take the following form: Ω¯¯immunity=[γ1γ2γ300010000γ1γ2γ301]R11R12R13R21R22R23S1S2 \overline{\overline{\Omega}}_{immunity} = \underbrace{\begin{bmatrix} \gamma_1 & \gamma_2 & \gamma_3 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & \gamma_1 & \gamma_2 & \gamma_3 & 0 & 1 \end{bmatrix}}_{\begin{matrix} \;\;R^1_1\; & R^2_1\; & R^3_1\;\; & R^1_2\; & R^2_2\; & R^3_2\;\; & S_1\; & S_2 \end{matrix}}

In the SEIR2 case (N=L=1,M=2N = L = 1, M = 2) with two age-groups and two variants, the weights take the following form: Ω¯¯immunity=[γ1γ200γ1γ2001000γ1γ200γ1γ201]R1,11R1,12R2,11R2,12R1,21R1,22R2,21R2,22S1S2 \overline{\overline{\Omega}}_{immunity} = \underbrace{\begin{bmatrix} \gamma_1 & \gamma_2 & 0 & 0 & \gamma_1 & \gamma_2 & 0 & 0 & 1 & 0 \\ 0 & 0 & \gamma_1 & \gamma_2 & 0 & 0 & \gamma_1 & \gamma_2 & 0 & 1 \end{bmatrix}}_{\begin{matrix} R^1_{1,1} & R^2_{1,1} & R^1_{2,1} & R^2_{2,1} & R^1_{1,2} & R^2_{1,2} & R^1_{2,2} & R^2_{2,2} & S_1 & S_2 \end{matrix}}

Putting it all together

With all the configuration done above, we can now create our model instance.

model <- DiseasyModelOdeSeir$new(
  # Scenario configuration
  population = population,
  activity = activity,
  immunity = immunity,
  season = season,
  observables = observables,

  parameters = list(
    # Parameter and hyper-parameter configuration
    "compartment_structure" = compartment_structure,
    "overall_infection_risk" = overall_infection_risk,
    "disease_progression_rates" = disease_progression_rates,

    # Mapping to observables
    "model_output_to_observable" = model_output_to_observable
  )
)

Predicting the incidence

From this point, we can easily get predictions for the individual observables.

model$get_results(
  observable = "incidence",
  prediction_length = 5
)
#> # A tibble: 5 × 4
#>   date       incidence realisation_id weight
#>   <date>         <dbl>          <dbl>  <dbl>
#> 1 2020-02-18   0.00199              1      1
#> 2 2020-02-19   0.00212              1      1
#> 3 2020-02-20   0.00227              1      1
#> 4 2020-02-21   0.00243              1      1
#> 5 2020-02-22   0.00261              1      1

… and we can plot the predictions along with observations for the data

plot(
  model,
  observable = "incidence",
  prediction_length = 60
)

Model prediction for the number of incidence - stratified by age group.

The results are easily stratified by the stratifications in the data

model$get_results(
  observable = "incidence",
  prediction_length = 3,
  stratification = rlang::quos(age_group)
)
#> # A tibble: 9 × 5
#>   date       age_group incidence realisation_id weight
#>   <date>     <chr>         <dbl>          <dbl>  <dbl>
#> 1 2020-02-18 00-29       0.00259              1      1
#> 2 2020-02-19 00-29       0.00275              1      1
#> 3 2020-02-20 00-29       0.00294              1      1
#> 4 2020-02-18 30-59       0.00207              1      1
#> 5 2020-02-19 30-59       0.00220              1      1
#> 6 2020-02-20 30-59       0.00236              1      1
#> 7 2020-02-18 60+         0.00106              1      1
#> 8 2020-02-19 60+         0.00113              1      1
#> 9 2020-02-20 60+         0.00121              1      1

… and we can see the plots for each strata.

plot(
  model,
  observable = "incidence",
  prediction_length = 60,
  stratification = rlang::quos(age_group)
)

Model prediction for the number of incidence - stratified by age group.Model prediction for the number of incidence - stratified by age group.Model prediction for the number of incidence - stratified by age group.

This can be done for all the observables we have mapped in the model.

model$get_results(
  observable = "n_infected",
  prediction_length = 5
)
#> # A tibble: 5 × 4
#>   date       n_infected realisation_id weight
#>   <date>          <dbl>          <dbl>  <dbl>
#> 1 2020-02-18     11642.              1      1
#> 2 2020-02-19     12360.              1      1
#> 3 2020-02-20     13232.              1      1
#> 4 2020-02-21     14202.              1      1
#> 5 2020-02-22     15243.              1      1
plot(
  model,
  observable = "n_infected",
  prediction_length = 60
)

Model prediction for the number of infected - stratified by age group.

model$get_results(
  observable = "n_infected",
  prediction_length = 3,
  stratification = rlang::quos(age_group)
)
#> # A tibble: 9 × 5
#>   date       age_group n_infected realisation_id weight
#>   <date>     <chr>          <dbl>          <dbl>  <dbl>
#> 1 2020-02-18 00-29          5350.              1      1
#> 2 2020-02-19 00-29          5673.              1      1
#> 3 2020-02-20 00-29          6076.              1      1
#> 4 2020-02-18 30-59          4682.              1      1
#> 5 2020-02-19 30-59          4974.              1      1
#> 6 2020-02-20 30-59          5321.              1      1
#> 7 2020-02-18 60+            1609.              1      1
#> 8 2020-02-19 60+            1713.              1      1
#> 9 2020-02-20 60+            1835.              1      1
plot(
  model,
  observable = "n_infected",
  prediction_length = 60,
  stratification = rlang::quos(age_group)
)

Model prediction for the number of infected - stratified by age group.Model prediction for the number of infected - stratified by age group.Model prediction for the number of infected - stratified by age group.

model$get_results(
  observable = "n_positive",
  prediction_length = 5
)
#> # A tibble: 5 × 4
#>   date       n_positive realisation_id weight
#>   <date>          <dbl>          <dbl>  <dbl>
#> 1 2020-02-18      7567.              1      1
#> 2 2020-02-19      8034.              1      1
#> 3 2020-02-20      8601.              1      1
#> 4 2020-02-21      9232.              1      1
#> 5 2020-02-22      9908.              1      1
plot(
  model,
  observable = "n_positive",
  prediction_length = 60
)

Model prediction for the number of positive.

model$get_results(
  observable = "n_positive",
  prediction_length = 3,
  stratification = rlang::quos(age_group)
)
#> # A tibble: 9 × 5
#>   date       age_group n_positive realisation_id weight
#>   <date>     <chr>          <dbl>          <dbl>  <dbl>
#> 1 2020-02-18 00-29          3478.              1      1
#> 2 2020-02-19 00-29          3687.              1      1
#> 3 2020-02-20 00-29          3949.              1      1
#> 4 2020-02-18 30-59          3043.              1      1
#> 5 2020-02-19 30-59          3233.              1      1
#> 6 2020-02-20 30-59          3459.              1      1
#> 7 2020-02-18 60+            1046.              1      1
#> 8 2020-02-19 60+            1114.              1      1
#> 9 2020-02-20 60+            1193.              1      1
plot(
  model,
  observable = "n_positive",
  prediction_length = 60,
  stratification = rlang::quos(age_group)
)

Model prediction for the number of positive - stratified by age group.Model prediction for the number of positive - stratified by age group.Model prediction for the number of positive - stratified by age group.

model$get_results(
  observable = "n_admission",
  prediction_length = 5
)
#> # A tibble: 5 × 4
#>    date n_admission realisation_id weight
#>   <dbl>       <dbl>          <dbl>  <dbl>
#> 1 18310        167.              1      1
#> 2 18311        169.              1      1
#> 3 18312        184.              1      1
#> 4 18313        200.              1      1
#> 5 18314        217.              1      1
plot(
  model,
  observable = "n_admission",
  prediction_length = 60
)

Model prediction for the number of hospital admissions.

model$get_results(
  observable = "n_admission",
  prediction_length = 3,
  stratification = rlang::quos(age_group)
)
#> # A tibble: 9 × 5
#>    date age_group n_admission realisation_id weight
#>   <dbl> <chr>           <dbl>          <dbl>  <dbl>
#> 1 18310 00-29            4.38              1      1
#> 2 18311 00-29            4.38              1      1
#> 3 18312 00-29            4.71              1      1
#> 4 18310 30-59           37.2               1      1
#> 5 18311 30-59           37.4               1      1
#> 6 18312 30-59           40.9               1      1
#> 7 18310 60+            125.                1      1
#> 8 18311 60+            127.                1      1
#> 9 18312 60+            139.                1      1
plot(
  model,
  observable = "n_admission",
  prediction_length = 60,
  stratification = rlang::quos(age_group)
)

Model prediction for the number of hospital admissions - stratified by age group.Model prediction for the number of hospital admissions - stratified by age group.Model prediction for the number of hospital admissions - stratified by age group.