Automated Detection of Seasonal Epidemic Onset in R
Source:vignettes/aedseo_introduction.Rmd
aedseo_introduction.Rmd
Introduction
The methodology used to detect the onset of seasonal respiratory epidemics can be divided into two essential criteria:
- The local estimate of the exponential growth rate, \(r\), is significantly greater than zero.
- The sum of cases (SoC) over the past \(k\) units of time exceeds a disease-specific threshold.
Here, \(k\) denotes the window size employed to obtain the local estimate of the exponential growth rate and the SoC. When both of these criteria are met, an alarm is triggered and the onset of the seasonal epidemic is detected.
Exponential growth rate
The exponential growth rate, denoted as \(r\), represents the per capita change in the number of new cases per unit of time. Given that incidence data are integer-valued, the proposed method relies on generalized linear models (GLM). For count data, the Poisson distribution is a suitable choice as a model. Hence, the count observations denoted as \(Y\) are assumed to follow a Poisson distribution
\[\begin{equation} Y \sim \mathrm{Pois}(\lambda) \end{equation}\]
Here, the link function, \(\log()\), connects the linear predictor to the expected value of the data point, expressed as \(\log(\lambda)=\mu\). Given a single continuous covariate \(t\), the mean \(\mu\) can be expressed as
\[\begin{equation} \mu = \alpha + r t \end{equation}\]
This is equivalent to a multiplicative model for \(\lambda\), i.e.
\[\begin{equation} \lambda = \exp(\alpha + r t) = \exp(\alpha) \exp(r t) \end{equation}\]
Intuitively, negative values of \(r\) result in a decline in the number of observed cases, while \(r=0\) represents stability, and positive values of \(r\) indicate an increase.
It is important to note that the Poisson distribution assumes that the mean and variance are equal. In reality, real data often deviate from this assumption, with the variance (\(v\)) being significantly larger than the mean. This biological phenomenon, known as overdispersion, can be addressed within a model in various ways. One approach is to employ quasi-Poisson regression, which assumes \(v=\sigma\lambda\), or to use negative binomial regression (not implemented yet), which assumes \(v=\lambda+\lambda^2/\theta\), where both \(\sigma\) and \(\theta\) are overdispersion parameters.
Simulations
To assess the effectiveness of the aedseo
function, we
simulate some data. The simulated data is generated using a negative
binomial model with a mean parameter \(\mu\) and a variance parameter \(\phi\mu\). In this context \(\phi\) denotes a dispersion parameter,
which is assumed to be greater than or equal to 1. The mean, denoted as
\(\mu(t)\), is defined by a linear
predictor that incorporates both a trend component and seasonality
components represented by Fourier terms. The equation \(\mu(t)\) is as follows:
\[\begin{equation} \mu(t) = \exp\Biggl( \theta + \beta t + \sum_{j=1}^m \biggl( \gamma_{\sin} \sin\Bigl( \frac{2 \pi t}{52}\Bigl) + \gamma_{\cos} \cos \Bigl( \frac{2 \pi t}{52} \Bigl) \biggl) \Biggl) \end{equation}\]
mu_t <- function(
t,
theta = 1,
exp_beta = 1.001,
gamma_sin = 1,
gamma_cos = 1,
trend = 1,
j = 1,
...) {
# Start construction of linear predictor
linear_predictor <- theta
# ... add a trend if the scenario request it
if (trend == 1) {
linear_predictor <- linear_predictor + log(exp_beta) * t
}
# ... add a seasonal component
linear_predictor <- linear_predictor +
gamma_sin * sin(2 * pi * t * j / 52) + gamma_cos * cos(2 * pi * t * j / 52)
return(exp(linear_predictor))
}
simulate_from_nbinom <- function(...) {
# Set some default values for the simulation
default_pars <- list(
"t" = 1,
"theta" = 1,
"exp_beta" = 1.001,
"gamma_sin" = 1,
"gamma_cos" = 1,
"trend" = 1,
"j" = 1,
"phi" = 1,
"seed" = 42
)
# Match call
mc <- as.list(match.call())[-1]
# ... and change parameters relative to the call
index_mc <- !names(default_pars) %in% names(mc)
mc <- append(mc, default_pars[index_mc])[names(default_pars)]
# Set the seed
set.seed(mc$seed)
# Calculate the number of time points
n <- length(eval(mc$t))
# Calculate mu_t
mu_t_scenario <- do.call(what = "mu_t", args = mc)
# ... and compute the variance of the negative binomial distribution
variance <- mu_t_scenario * mc$phi
# ... and infer the size in the negative binomial distribution
size <- (mu_t_scenario + mu_t_scenario^2) / variance
# Plugin and simulate the data
simulation <- rnbinom(n = n, mu = mu_t_scenario, size = size)
return(list("mu_t" = mu_t_scenario, "simulation" = simulation, "pars" = mc))
}
# Define the number of years and the number of months within a year
years <- 3
weeks <- 52
# ... calculate the total number of observations
n <- years * weeks
# ... and a vector containing the overall time passed
time_overall <- 1:n
# Create arbitrary dates
dates <- seq.Date(from = as.Date("2010-01-01"), by = "week", length.out = n)
# Simulate the data
simulation <- simulate_from_nbinom(t = time_overall, theta = log(1000), phi = 5)
# Collect the data in a 'tibble'
sim_data <- tibble(
Date = dates,
mu_t = simulation$mu_t,
y = simulation$simulation
)
A total of three years of weekly data are then simulated with the following set of parameters:
- \(\theta=\log(1000)\), representing an average intensity of 1000 cases.
- \(\exp(\beta)=1.001\), indicating a slowly increasing trend.
- \(\gamma_{\sin} = 1\) and \(\gamma_{\cos} = 1\), representing the seasonality component.
- \(\phi = 5\), which represents the dispersion parameter for the negative binomial distribution.
Applying the algorithm
In the following section, the application of the algorithm to the
simulated data is outlined. The first step is to transform the simulated
data into a aedseo_tsd
object using the tsd()
function.
# Construct an 'aedseo_tsd' object with the time series data
tsd_data <- tsd(
observed = sim_data$y,
time = sim_data$Date,
time_interval = "week"
)
In the following figure, the simulated data (solid circles) is visualized alongside the mean (solid line) for the three arbitrary years of weekly data.
# Have a glance at the time varying mean and the simulated data
plot(tsd_data)
Next, the time series data object is passed to the
aedseo()
function. Here, a window width of \(k=5\) is specified, meaning that a total of
5 weeks is used in the local estimate of the exponential growth rate.
Additionally, a 95% confidence interval is specified. Finally, the
exponential growth rate is estimated using quasi-Poisson regression to
account for overdispersion in the data.
# Apply the 'aedseo' algorithm
aedseo_results <- aedseo(
tsd = tsd_data,
k = 5,
level = 0.95,
disease_threshold = 2000,
family = "quasipoisson"
)
The aedseo package implements S3 methods
In this section, we will explore how to use the plot()
,
predict()
and summary()
S3 methods provided by
the aedseo
package. These methods enhance the functionality
of your aedseo
objects, allowing you to make predictions
and obtain concise summaries of your analysis results.
Visualizing Growth Rates
In the figure below, we present the local estimate of the growth rate
along with its corresponding 95% confidence interval. This visualization
can be generated by utilizing the plot()
S3 method
specifically designed for objects of the aedseo
class.
# Employing the S3 `plot()` method
plot(aedseo_results)
#> Warning: Using alpha for a discrete variable is not advised.
#> Warning: Using alpha for a discrete variable is not advised.
Predicting Growth Rates
The predict
method for aedseo
objects
allows you to make predictions for future time steps based on the
estimated growth rates. Here’s how to use it:
# Example: Predict growth rates for the next 5 time steps
(prediction <- predict(aedseo_results, n_step = 5))
#> # A tibble: 6 × 5
#> t time estimate lower upper
#> <int> <date> <dbl> <dbl> <dbl>
#> 1 0 2012-12-21 3279. 3279. 3279.
#> 2 1 2012-12-22 3736. 3590. 3889.
#> 3 2 2012-12-23 4257. 3930. 4613.
#> 4 3 2012-12-24 4851. 4303. 5471.
#> 5 4 2012-12-25 5527. 4710. 6490.
#> 6 5 2012-12-26 6298. 5157. 7697.
In the example above, we use the predict method to predict growth rates for the next 5 time steps. The n_step argument specifies the number of steps into the future you want to forecast. The resulting prediction object will contain estimates, lower bounds, and upper bounds for each time step.
Summarizing aedseo results
The summary method for aedseo
objects provides a concise
summary of your automated early detection of seasonal epidemic onset
(aedseo) analysis. You can use it to retrieve important information
about your analysis, including the latest growth warning and the total
number of growth warnings:
summary(aedseo_results)
#> Summary of aedseo Object
#>
#> Called using distributional family:
#> quasipoisson
#>
#> Window size for growth rate estimation and
#> calculation of sum of cases:
#> 5
#>
#> Disease specific threshold:
#> 2000
#>
#> Reference time point:
#> 2012-12-21
#>
#> Sum of cases at reference time point:
#> 12468
#> Latest sum of cases warning:
#> 2012-12-21
#>
#> Growth rate estimate at reference time point:
#> Estimate Lower (2.5%) Upper (97.5%)
#> 0.131 0.091 0.171
#>
#> Total number of growth warnings in the series:
#> 59
#> Latest growth warning:
#> 2012-12-21
#>
#> Latest seasonal onset alarm:
#> 2012-12-21
The summary method generates a summary message that includes details such as the reference time point, growth rate estimates, and the number of growth warnings in the series. It helps you quickly assess the key findings of your analysis.