Reproduction numbers during epidemics in R software

Reprouction number, R, is the number of secondary cases generated by a single individual over its entire infectious period in a complete susceptile population , which is commonly used to characterize pathogen transmissibility during a epidemic. The monitoring of R over time provides the feedback on the effectiveness of interventions.


There are two R packages used to measure the reproduction number: R0 and EpiEstim. The former package can calculate the basic reproduction number(R0) and the time-varing reproduction number(Rt), however, the latter package supplies a new framework to estimate the time-vary reproduction number. There are two papers which give the detail introduction to the packages,respectively and you can download them in here and here.


Basic reproduction number


The concept of reproduction number is firstly introduced by the work of Alfred Lotka and Ronald Ross (PS: He received the Nobel Prize for Physiology or Medicine in 1902). However,the first modern application to the basic reproduction number in epidemiology was George MacDonald in 1952, who constructed population models on malaria. The R0 is a nonnegative value, if above 1 indicate that the disease will spread among a population, however, on the contrary, the infection will be die in the future. But some literatures also emphasized that even when the R0 below one it can be transimitted during a long-distance flights.

A review of generic methods used to estimate transmissibility parameters during outbreaks was carried out. Most methods used the epidemic curve and the generation time distribution (Note: The generation time reflect the time lag between infection in a primary case and a secondary case). Overall, the epidemic curve can be finished by the surveillance data by the epicurve function in epitools. package, however, the time distribution should be obtained from the time lag between all infectee/infector pairs and can’t be observed directly. Therefore, the serial interval distribution comes!


Package R0


As aformentioned, the value of generation time was substituted with the serial interval distribution. In R0 package, the generation.time function can creat a discretized generation time distribution, which based on the tiem interval to choose. Two types of distribution can be used: Emperical distribution and Parametric distribution. The former requires the full specification of the distribution, however, the latter includes thress methods (gamma, lognormal and weibull distribution) and need the mean and standard deviation information. Aterlatively, you can use the maximum likelihood method on the observed time intervals between the symptom onsets in primary cases and secondary cases. The est.GT function do solve the process.

The package R0 includes five methods to calculate the basic reproduction number(also called initial number) and the time varying reproduction number:Attack Rate (AR), Exponential Growth (EG), Maximum Likelihood (ML),Time-Dependant (TD), and Sequential Bayesian (SB). A detailed description to the method aboved can be found here. Here, we just point out the difference among the five methods:

  1. Attack Rate (AR):
    Background: Origined from the classical suspective-infective-recoery model. Attack rate is the percentage of population eventually infected.
    Assumption: homogeneous mixing and closed population, No intervention.

  2. Exponential growth (EG):
    Background: Phenomenon about the exponential growth rate rate during the early phase of an outbreak.
    Assumption: The growth rate of the study period in epidemic curve is expential( deviance based R-squared statistic guide the choose).

  3. Maximum Likelihood (ML):
    Background: Phenomenon about the exponential growth rate rate during the early phase of an outbreak.
    Assumption: The secondary cases caused by an index case is Possion distribution with the excepted value of R. Moreover,the growth rate of study period in epidemic curve (from the first case on) is expential (Deviance based R-squared statistic guide the choose).

  4. Time-Dependant (TD):
    Background: A time-dependant method to compute the reproduction number by averaging over all the transmission networks compatible with observations.
    Assumption: It is possible to account for the importation cases during the epidemic. The confidence interval for Rt can be obtained by simulation.

  5. Sequential Bayesian (SB):
    Background: Sequential bayesian estimation for the initial reproduction number. Origined from the approximation to tehe classical suspective-infective-recoery model with Possion distribution.
    Assumption: Epidemic with a period of exponential growth and random mixing in the population.

The estimate.R function can realize the aformentioned methods by the “method” arguement. Overall, the exponential growth method (ML and TD) performance was the least affacted by the either aggregation or the over dispersion.

Example and code

Here, we used the dataset from the 1918 influenza pandemic and you can see its structure by the code “demo(Germany.1918)”. For simlicity, the example code only fixed on the Maximum Likelihood method. Overall, three steps to calculate the reproduction number: generate time distribution, estimate, and sensitively analysis. Analysis codes are as followed:

 library(R0);options(device = "options");data(Germany.1918) 
 epid = c("2012-01-01", "2012-01-02", "2012-01-02","2012-01-03")
 epid.count = c(1,2,4,8)
 GT.flu<-generation.time("gamma", c(2.6,1))  
 res.R <- estimate.R(Germany.1918, GT=GT.flu,methods=c("ML"))
 plotfit(res.R)

The below code is the sensitivity analysis:

 sensitivity.analysis(Germany.1918, GT.type="gamma",
GT.mean=seq(1,5,1), GT.sd.range=1, begin=1, end=27,est.method="EG",sa.type="GT")

Package EpiEstim


Most methdology to the reproduction number require the incidence data and the distribution of series interval. However, the estimate is right censored for the estimate of Rt requires the incidence data after time t. Moreover, some methods are sensitive to the selected time step or smoothing parameters. Anne Cori proposed a new framework and software[EpiEstim] to estimate the time-varying reproduction number. In the EpiEstim package, the time-varying reproduction number was divided into two types: Rt and Rtc. Here, we call Rt the instaneous reproduction number and call the Rtc the case reproduction number. The distinction between the Rtc and Rt is on the observation perspective: one is measured retrospectively, however,another is prospective based on the assumption that the infectious situation is constant in the future.

In practice, we assume that the infectious probability distribution ws of individual infected, depend on the individual biological factors such as symptom severity. Therefore, the infectious profile lie on the ws and the time since infection of the case, but indepent of the calendar time. Generaly, Rt can be estimate by the multiplication on the ratio of the number of new infectious cases at time t-1 and the ws. Rtc also called the cohort reproduction number becasue it counts the average number of secondary casued by a cohort infected at time t. Comparatively speaking, Rtc is more suitable for the situation that the contact rate and transmissibility can change over time, especially when the control measures are initiated. However, the correspending methods to the Rtc is inavailiable in EpiEstim package .

The EpiEstim package includes three types of methods to estimate the Rt: NonParametricSIParametricSI, and UncertainSI. The detained description to the method aboved can be foound here.All the method can be applied by the EstimateR function.

  1. For method “NonParametricSI”, the discrete distribution of the serial interval is directly specified in the argument “SI.Distr”;
  2. For method “ParametricSI”, the mean and standard deviation of the continuous distribution of the serial interval are given in the arguments “Mean.SI”” and “Std.SI”, which is derived automatically using “DiscrSI”;
  3. For method “UncertainSI”, mean and standard deviation of the serial interval vary according to truncated normal distributions.Firstly, we sample the mean from its truncated normal distribution(with mean Mean.SI,standard deviation Std.Mean.SI, minimum Min.Mean.SI and maximum Max.Mean.SI). Then,we sample the standard deviation from its truncated normal distribution (with mean Std.SI, standard deviation Std.Std.SI, minimum Min.Std.SI and maximum Max.Std.SI).

Example and code

Here, we use the dataset about the pendemic influenza in a school in Pennsylvania, 2009. You can see the structure of the dataset by the code “demo(Flu2009)”. Due the article length, we just give the result of the method “NonParametricSI”. The analysis code is as followed:

library(EpiEstim)
data("Flu2009") 
EstimateR(Flu2009$Incidence, T.Start=2:26, T.End=8:32, method="NonParametricSI", SI.Distr=Flu2009$SI.Distr, plot=TRUE, leg.pos=xy.coords(1,3)) 

The follow codes are the method ParametricSI and UncertainSI:

EstimateR(Flu2009$Incidence, T.Start=2:26, T.End=8:32, method="ParametricSI", Mean.SI=2.6, Std.SI=1.5, plot=TRUE) 

EstimateR(Flu2009$Incidence, T.Start=2:26, T.End=8:32, method="UncertainSI", Mean.SI=2.6, Std.Mean.SI=1, Min.Mean.SI=1, Max.Mean.SI=4.2, Std.SI=1.5, Std.Std.SI=0.5, Min.Std.SI=0.5, Max.Std.SI=2.5, n1=100, n2=100, plot=TRUE) 
Spatial-R /
Published under (CC) BY-NC-SA in categories statistics  tagged with Reproduction number  R