Spatial-R 2017-03-29T08:55:38+00:00 Resources for the research on infectious disease 2017-03-03T00:00:00+00:00 Spatial-R -on-infectious- disease RRID is a repository that includes the resources for the research on infectious disease. Majority of the resources focus on Infectious Respiratory Diseases and Vector-borne Disease (i.e. Influenza, Mumps, Dengue and Zika). Resources were classified as four parts: public datasets, software packages , famious labs, risk assessment and public codes.

Resources for Infectious Disease

Public Datasets

Combined dataset

Taiwan National Infectious Disease Statistics System: provides weekly datasets from the Taiwan National Infectius Disease Surveillance System. You can download the case dataset with other infromations such as gentle, country,age group with the interval at 5 years.

Ministry of Health in Singapore: provides weekly notifiable disease including Dengue Fever, Acute Upper Respiratory Tract infections, Chickenpox and Acute Diarrhoea in Singapore.

Australian Goverment Deaprtment of Health: provides monthly Bloodborne diseases, Gastrointestinal diseases, Other bacterial infections, Quarantinable diseases, Sexually transmissible infections, Vaccine preventable diseases, Vectorborne diseases and Zoonoses at state level, as well as annual dataset by age group and sex at country level.

Project Tycho: currently including data from all weekly notifiable disease reports for the United States dating back to 1888. For the level 2 dataset, the project provides an API (detailed document is here). I write some rough codes to download the dataset using R software. You can also use the codes in lgautier/project-tycho-utilities.

Morbidity and Mortality Weekly Report (MMWR): data for selected nationally notifiable diseases reported by the 50 states, New York City, the District of Columbia, and the U.S. territories at week interval.

Department of Health in Hongkong: includes monthly notifiable infectious diseases, Sentinel surveillance (weekly number of hospital admission episodes of hand, foot and mouth disease (HFMD)) and monthly antibiotic resistance surveillance.


DengueNet: a standard platform for sharing current surveillance data in order to detect and monitor incidence and trends of dengue and DHF.

Dengue in Southeast Asia: monthly dengue surveillance data collected from eight countries in Southeast Asia for the period 1993-2010. These data have been integrated from various sources, as described in detail in the supplementary material of the scientific paper of this study in the paper Region-wide synchrony and traveling waves of dengue across eight countries in Southeast Asia Supporting Information.

Dengue in Brazil: weekly number of dengue cases for cities in Brazil hosting games or football teams during the FIFA 2014 World Cup (from 2001 to 2014). These data have been used for the paper Risk of Dengue for Tourists and Teams during the World Cup 2014 in Brazil.

Dengue in Peru and Puerto Rico: this dataset was used to design an infectious disease forecasting project with the aim of galvanizing efforts to predict epidemics of dengue. Other information of those two cities including Population, Satellite Precipitation and Satellite Vegetation were in the website.

DengueCasesMalaysia: dengue weekly dataset in Malaysia at state level from 2011 to 2015.

Dengue in Guangdong: weekly dataset at country level. In additional, import or local information is also included.

Chikungunya: weekly reports of Chikungunya cases and deaths provided by the Pan American Health Organization (PAHO) and the French Agence régionale de santé (ARS) at country level.


The dataset resources for flu have been aggregated in the res4flu repository of Caijun. Here, we just list some famious resources and then add some additional resources.

fuID: collect defined epidemiological indicators and data on seasonal and pandemic Influenza from national, regional and global systems at weekly level.

FluView: national and regional level outpatient illness and viral surveillence aggregated at the regional and national levels. Package cdcfluview can download the dataset with information about age-group.

ILINet: influenza-like illness (ILI) activity level indicator determined by data reported to ILINet.

FluWeb Historical Influenza Database: free access to a number of rare and valuable sources of data concerning past influenza outbreaks.

EMPRES Global Animal Disease Information System (EMPRES-i): provides access to the H7N9 and H5N1case dataset with geographic infromation and reporting date.

Influenza activity for European: influenza activity surveillance for the countries in the European at weekly level. Period: 2009 to now 2013-17 Human Case List of Provincial/Ministry of Health/Government Confirmed Influenza A(H7N9) Cases. Addtional informations such as gentle, age, city or province, and clinical condition (death or not).

World Animal Health Information Database: provides access to all data (including avian influenza) held within OIE’s new World Animal Health Information System from 2016-02-06. Besides, you can gather the information about the follow-up report.


DELPHI: contains influenza data from FluView, ILINet, Google Flu Trends, Twitter Stream, Wikipedia Access Logs, and Outpatient ILI from Taiwan’s National Infectious Disease Statistics System at hexchotomy region .

Dengue-data-stub: download time-series for Dengue in Thailand, however, the package is still under development.


Immunization schedule for countries: immunization schedule by disease or country, which is collected by the WHO.

Software Packages

R0: estimation of R0 and real-time reproduction number from epidemics.

EpiEstim: implements a Bayesian approach for quantifying transmissibility (instantaneous and case reproduction numbers) over time during an epidemic(Cori et al. (AJE, 2013)). Examples are here.

EpiDynamics: implements the dynamic models written in the book : Modeling Infectious Diseases in Humans and Animals. You also can found the web introduction to this book.

EpiModel: mathematical modeling of infectious disease with deterministic compartmental models, stochastic agent-based models, and stochastic network models.

episerve: web-interface aiming to make the R tools for disease outbreak analysis available to to non-specialists. It relies on epibase for data structure and graphics, is integrated with Epicollect for data collection using mobile devices, and uses EpiEstim for reproduction number estimation.

pomp: provides facilities for implementing partially observed Markov process(POMP) models, simulating them, and fitting them to time series data by a variety of frequentist and Bayesian methods. The website for pomp is here. Famious but hard to use.

panelPomp: R package for statistical inference using panel (longitudinal data) POMPs (Partially Observed Markov Processes). Example is here: dynamic variation in sexual contact rates

bayessir: bayesian inference for hidden Markov Susceptible-Infected-Removed (SIR) model. Method was in the paper: Predictive modeling of cholera outbreaks in Bangladesh, Annals of Applied Statistics, 10, 575 – 595.

debinfer: bayesian inference for dynamical models of biological systems. Document: Bayesian inference for dynamical models of biological systems in R.

WeibullHM: fitting a hierarchical Bayesian Weibull hidden Markov model via a forward filtering backward sampling MCMC algorithm. Document will come soon.

BDAepimodel: tractable fitting of stochastic epidemic models via Bayesian data augmentation. Detailed vignette to the package as well as some examples of fit the dynamic model with bayesian aspect using the pomp package is also provided.

ABSEIR: spatial SEIR modeling via Approximate Bayesian Computation. Analytical techniques is more completely described in this paper, as well as in his thesis work: Application Of Heterogeneous Computing Techniques To Compartmental Spatiotemporal Epidemic Models.

SimInf: provides an efficient and flexible framework for data-driven stochastic disease spread modelling that integrates within-herd infection dynamics as continuous-time Markov chains and livestock movements between herds as scheduled events. Website.

IDSpatialStats: present an interpretable measure of spatial clustering, τ, which can be understood as a measure of relative risk.

GI: calculate generation interval distributions. Document: Intrinsic and realized generation intervals in infectious-disease transmission.

rEDM: empirical dynamic modeling based on attractor reconstruction. Document: Nonlinear Tools for a Nonlinear World: Applications of Empirical Dynamic Modeling to Marine Ecosystems. This method was also used to test the threshold value for envrionmental factors on infectious disease, seeing paper(Global environmental drivers of influenza).

rAedesSim: population mosquito modeling.


riem: allows to get weather data and air pollutant data from Automated Surface Observing System (ASOS) stations (airports) in the whole world thanks to the Iowa Environment Mesonet website.

GSODR: global summary daily weather data in R. Provides automated downloading, parsing, cleaning, unit conversion and formatting of Global Surface Summary of the Day (GSOD) weather data from the from the USA National Climatic Data Center (NCDC). Vignettes.

rnoaa: lient for many ‘NOAA’ data sources including the ‘NCDC’ climate ‘API’. In addition, we have an interface for ‘NOAA’ sea ice data, the ‘NOAA’ severe weather inventory, ‘NOAA’ Historical Observing ‘Metadata’ Repository (‘HOMR’) data, ‘NOAA’ storm data via ‘IBTrACS’, tornado data via the ‘NOAA’ storm prediction center

rcaaqs: faciliate the calculation of air quality metrics according to Canadian Ambient Air Quality Standards.

Risk Assessment

Vector-borne Disease Airport Importation Risk Tool: using the method published in the nature journal: The global distribution and burden of dengue and Web-based GIS: the vector-borne disease airline importation risk (VBD-AIR) tool.

Epigrass: pigrass is a framework for the construction and simulation of complex network epidemiology models.

heemod: models for health economic evaluation. Features: accounting for time-dependency, probabilistic uncertainty analysis, and deterministic sensitivity analysis. Most of the analyses presented in Decision Modelling for Health Economic Evaluation can be performed with heemod.

Assessing the Impact of OCV Use on Protection and Epidemic Risk: This tool estimates the proportion of the population directly protected from vaccine, the number of cases prevented (direct + indirect effects), and the final number expected to be infected for a given population size and vaccine coverage (required inputs).

Public Codes

chikungunya_simulation: a generalized ABM which captures these interactions at a micro-scale by explicitly modeling each human and mosquito to predict the complex trajectory of the infection. The model has been integrated with GIS, census and climate data to effectively model the host and agent behavior and as a proof of concept, is also trained and validated using 2013-14 Caribbean Chikungunya epidemic data.

Heterogeneity, Mixing, and the Spatial Scales of Mosquito-Borne Pathogen Transmission: Perkins TA, Scott TW, Le Menach A, Smith DL (2013) Heterogeneity, Mixing, and the Spatial Scales of Mosquito-Borne Pathogen Transmission. PLoS Computational Biology 9(12): e1003327.

Socially structured human movement shapes dengue transmission despite the diffusive effect of mosquito dispersal: Reiner, R.C. Jr, Stoddard, S.T., Scott, T.W. (2014) Socially structured human movement shapes dengue transmission despite the diffusive effect of mosquito dispersal. Epidemics 6: 30-36.

Model-based projections of Zika virus infections in childbearing women in the Americas: Perkins TA, Siraj AS, Ruktanonchai CW, Kraemer MUGK, Tatem AJ. (2016) Model-based projections of Zika virus infections in childbearing women in the Americas. Nature Microbiology 1:16126.

A Comparative Study of Techniques for Estimation and Inference of Nonlinear Stochastic Time Series: Masters thesis on nonlinear stochastic time series forecasting, including Basic particle filter, Iterated Filtering 2, S-maps and Spatiotemporal Epidemics.

Controlling dengue with vaccines in Thailand: methods was published in the Projected Impact of Dengue Vaccination in Yucatán, Mexico and Controlling Dengue with Vaccines in Thailand . Codes is Here.

Bayesian Inference for Infectious Disease Transmission Models Based on Ordinary Differential Equations: dissertation thesis on Bayesian parameter inference for dynamic infectious disease modelling: rotavirus in Germany.

The Impact of a One- versus Two-Dose Oral Cholera Vaccine Regimen in Outbreak Settings: A Modeling Study: Azman AS, Luquero FJ, Ciglenecki I, Grais RF, Sack DA, Lessler J (2015) The Impact of a One-Dose versus Two-Dose Oral Cholera Vaccine Regimen in Outbreak Settings: A Modeling Study. PLoS Med 12(8): e1001867.

Household Transmission of Influenza A and B in a School-based Study of Non-Pharmaceutical Interventions: Andrew S. Azman a , James H. Stark b , Benjamin M. Althouse Household Transmission of Influenza A and B in a School-based Study of Non-Pharmaceutical Interventions, Epidemics, 2013,5, 181-186.

Association between Severity of MERS-CoV Infection and Incubation Period: Victor Virlogeux, Minah Park, Joseph T. Wu, Benjamin J. Cowling (2016). Association between Severity of MERS-CoV Infection and Incubation Period. Emerging Infectious Diseases, 22(3): 526-528.

pomp-astic inference methods for epidemic models illustrated on German rotatvirus surveillance data: calculate the mle with iterated filtering using the pomp package for what is refered to as Model StSt+ in the manuscript.

Famious Lab

DELPHI: Developming the theory and practice of epidemiological forecasting. Github. Publicly Available Tools: Epidemiological time series visualizer (EpiVis), API to automatically updated epidemiological data sources (Epi-Data) and Visual comparison of scored submissions to CDC’s Predict the Flu Challenge(FluScores).

Infectious Disease Dynamics Group at Johns Hopkins University: run the gamut of the study of disease dynamics, from original data collection, to methodological research, to policy engagements. Github.

Koelle Research Group: focuses on using mathematical models and statistical approaches to better understand the ecological and evolutionary dynamics of infectious diseases.

Dennis L. Chao: develop computer simulation models of infectious disease outbreaks such as influenza, dengue and cholera. Open analysis codes on dengue and cholera.

Vladimir N. Minin: Predictive modeling of cholera outbreaks in Bangladesh, Latent continuous time Markov chains for partially-observed multistate disease processes and Likelihood-based inference for partially observed multi-type Markov branching processes. Package:bayessir.

Edward Ionides: Time series analysis with applications to ecology, epidemiology, health economics, cell motion and neuroscience. Methodological work on inference for partially observed stochastic dynamic systems. Tutorials and slides on time series analysis is usefull.

Aaron A. King: sophisticated mathematical, computational, and statistical tools to advance theoretical ecology and evolution on infectious disease.

CHICAS: centre for the health informatics, computing and statsistics. Campylobacter Transmission, Inference for Vector-borne Diseases[GIthub], Modelling the Evolution of Seasonal Influenza,

Perkins Lab: use mathematical models to answer questions about the transmission and control of infectious diseases. Our work primarily focuses on dengue, Zika, chikungunya, malaria, and other vector-borne diseases.

seeg lab: spatial ecology and epidemiology group in Department of Zoology, University of Oxford. movement: analysis of movement data in disease modelling and mapping, seegMBG: contain a number of miscellaneous functions to streamline model-based geostatistical analyses as applied in several SEEG projects, seegSDM: Streamlined Functions for Species Distribution Modelling in the SEEG Research Group.

SpeLL: concerned with baseline works on the denominator, i.e. the number of hosts, which are key spatial variables used in most epidemiological models, and we actively work on the improvement of large-scale data sets on the distribution of human and livestock populations. A second focus of active research is the spatial epidemiology of avian influenza, a major disease of poultry with a strong zoonotic potentia.

Wan Yang: applies mathematical modeling and Bayesian inference methods to study the transmission dynamics of infectious diseases such as influenza, Ebola, and measles. how environmental factors influence the transmission of influenza, its seasonality, and the underlying mechanisms.


Your contributions are always welcome! If you have additional resources on infectious disease, please open an issue in here.


This work is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License - CC BY-NC-SA 4.0.

Incomplete vaccination with an imperfect vaccine 2016-05-05T00:00:00+00:00 Spatial-R This is my reading notes from the paper: Pertussis immunity and epidemiology: mode and duration of vaccine-induced immunity, which was subjected on the nature, degree and durability of the vaccine protection. As the analysis codes were published on the website, therefore, i just have been editing the codes to suit the mumps (a contagious disease often breaks out in the highly vaccinated population recently) data in Zhejiang Province.


  • The resurgence of pertussis in some countries that maintain high vaccination coverage have drawn attention to the gaps in our understanding of the epidemiology effects of the pertussis vaccine.
  • Candiate explantations range from the vaccine-driven evaluation of the aetiological agent, to the changes of the reporting and surveillance and lossing the efficience due to the swith from the whole cell pertussis vaccine to aP.
  • The nature of the vaccine-induced protection can leave district footprints in the transient disease incidence patterns following the roll out of a vaccination program.
  • The mode of the vaccine failure can determine the depth and duration of the honeymoon period, as well as the characteristics of the resurgence.
  • Alought the immune memory shapes the epidemiological dynamics, the uncertainty can be reudced by the mechanistic model.


It has been shown that the nature of vaccine-induced protection can leave distinct footprints in the transient disease incidence patterns following the roll out a vaccination programme. In particular, the mode of the vaccine failure can determine the depth and duration of the honeymoon period, as well as the characteristcs of the resurgence.

Vaccine effect on vaccine-preventable disease includes two aspects: Immunity against infection and immunity against transmission and disease. This is also the key point to understand the model structure and the flow among the different comparments.

Immunity against infection

Three, not mutually exclusive, modes by which vaccines might fail in this goal are:

  • Primary vaccine failure, a vaccine exhibit primary failure if it fails to provide any form of protection against infection to some fraction of the vaccinated people. Therefore, the primary vaccine failure is quantified by the fraction of vaccined individuals the vaccine fails to protect.
  • Leakiness, a vaccine is said to be leakness when it readuced, but not eliminate the potential infection. The leakiness of a vaccine is measured by the probability of infection upon exposure for a vaccined individual relative to the same probability for an unvaccinated individual.
  • Waning, vaccine indduced protection is said to wane when it cease after some time. Here, we quantify the speed of waning by the mean duration of protection, or, equivalently, its reciprocal, the rate at which immunity is lost.

Immunity against tramsimission and disease

Even when a vaccine fail to provide the protection against infection, it might still reduced the infections’ transmissibility and or the severity of the disease sympoton. These effects can be qualityfied by:

  • relative infectiousness, which is model as the ratio of the tramsimission rate of a vaccinated person to that of an unvaccinated person.
  • relative reporting probability, the ratio of the reporting probability of a vaccinated individual relative to that of an unvaccinated individual.


Some statement:

  • In the absence of primary vaccine failure, vaccinated individuals whose protection against infection has failure are likely to be recorded as cases (possibly due to the vaccine-induced protection against server disease) but may be just as infectious as unvaccinated individuals. Here, we relate our assumption of zero primary failure, considering models with modest levels of aP primary failure.
  • For simplicity, infection and vaccine-derived immunity was assumed to be perfect and lifelong, althought the infection-derived immunity lasts longer than the vaccine-derived immunity.
  • Transmission rate was assumed to be periodic function of time, with a period of 1 year.
  • Parameters were estimated by fitting the deterministic models using the maximum likehood estimating via the trajectory matching. The best fits from the initial runs were used to starting points for the subsequent iterations of the trajectory matching to reveal the ML estimate.
  • A profile is derived by creat an array of fixed values for a parameter and maximizing the likehood at each fixed value over the remaining model parameters.


  • Monthly reportes for six regions from the beginning of 1996 until the end of 2009.
  • Regional demographic data: population size, annual number of live births and deaths.
  • The covaerage of aP and the vaccination schedule.


A standard Susceptible-exposed-infected-recoverd (SEIR) model with eight compartments. It includes two compartments each of susceptible (S_{i}), exposed(E_{I}) and infected (I_{i}) compartments in order to distinguish individuals who were never vaccinated (i=1) from those who vaccinated (i=2). The vaccinated compartment contained individuals who were vaccinated and still maintain some vaccine-derived protection against infection. In others words, there are a spectrum of models, differing in certain parameters and not in others, that were all roughly equally well-supported by the data.

One dimensional likelihood profile was performed to provide the value of the best likelihoods which can be obtained at the fixed values of the profile parameter by maximizing over all the remaining model parameters. Fox each region, we started the profiling procedure by dividing the profile parameter’s given range into equally spaced intervals , then, we gathered all the points from the early searches where the profile parameter had values that fell within the interval and selected the point with the highest likelihood. To refine the profine on the initial profile, we created the seven copies of the initial and perturbed the value of the profile paramter by multiplying it by the 0.95^3, 0.95^2,0.95, 1.00,1.05,1.05^2,1.05^3( It depends on the arguement). A local proflle was generated for each region using the 100 equally spaced intervals over the entire allowed range of each profile parameter and selected the points with the highest likehood whose profile parameter falls within the interval. A function was then fit through the collection of points using local regression (locfit package).


  • Deterministic model do not explain the data as well as the stochastic models.
  • Relative infectiousness of vaccinated infected individual is equal to that of unvaccinated ones.
  • Full model has to many degress of freedom for these parameters(wanning and leakiness) to be uniquely indentiable.
  • In others words, there are a spectrum of models, differing in certain parameters and not in others, that were all roughly equally well-supported by the data.
  • Low relatively reporting probability can be interpreted that the preponderance of infection in the vaccinated individuals are mild or asymptomatic, which indicated that the vaccine does benefit individuals directly by diminishing disease severity.
A short insight into Vaccine Epidemiology 2016-05-02T00:00:00+00:00 Spatial-R What is right to be done cannot be done too soon!


This is a short insight into the vaccine epidemiology after working in the department of immunization program of Zhejiang provincial center for disease control and protection for nearly two years. The reason to write this blog is to sum up the knowledge on the vaccination research program and extract some useful ideas or clues to do the further investigations into the vaccine epidemiology. Here, I have just simply defined the vaccine epidemiology as all the epidemiology issues related to both the vaccine and infectious diseases. Fox instance, decision makers may have great interests to investigate how the efficiency of certain vaccine on infectious disease or test whether there is a cross-protection for other types of vires in the clinic trials or not. Compared to the traditional epidemiology, vaccine epidemiology has focused more on the vaccine than the disease. As my primary interest is about the resparitory disease, many analysis methods talk about in the following sections may not be sutable for the vector-borne or other types of infectious diseases. If you have any questions, just feel free to contact me(Email:


Infectious diseases, also called communicable disease, are caused by pathogenic microorganisms, such as bacteria, viruses, parasites or fungi; the diseases can be spread, directly or indirectly, from one person to another. Vaccination is the administration of antigenic material (a vaccine) to stimulate an individual’s immune system to develop adaptive immunity to a pathogen, which is the most powerful and cost-effective way to prevente the infectious disease [@Meireles_2015]. Generally speaking, a vaccine typically contains an intact but inactivated (non-infective) or attenuated (with reduced infectivity) forms of the pathogensagent that resembles a disease-causing micro-organism and primes the immune system [@greycite40640]. Acoording to the report of World Health Organization, the licensed vaccines are currently available to prevent or contribute to the prevention and control of twenty-five preventable infections.

Epidemiology is the study and analysis of the patterns, causes, and effects of health and disease conditions in defined populations. It is the cornerstone of public health, and shapes policy decisions and evidence-based practice by identifying risk factors for disease and targets for preventive healthcare @greycite40642. As we emphasized particularly on infectious diseases, it is important to understand three essential elements to the infectious disease: source of infection(host), route of transmission(including droplet contact, Fecal-oral transmission, Sexual transmission, Oral transmission, Transmission by direct contact, Vertical transmission, Iatrogenic transmission, and Vector-borne transmission) and susceptible population(population with no immunity against pathogens). Therefore, the priority in vaccine epidemiology is to explore the effect of vaccine on the distribution of host and susceptible population, as well as the impact on the transmissibility.

Despite this obvious progress and the established programs, some vaccine-preventable diseases have re-emerged worldwide, such as pertussis and measles [@Wicker_2014]. Moreover, numerous large-scale Mumps and vericella outbreaks have occurred in the vaccinated populations with high vaccination coverage[@Rubin_2011]. Thus, deep insight into the vaccine epidemiology to sustain the achivement on the widespread immunity and take targeted vaccine-related prophylatical measures to infectious disease is neccessary.

Vaccine Epdemiology

Before we talk about the vaccine epidemiology, we should figure out three definitions of vaccine related items: vaccine efficiency, vaccine effectiveness and vaccine impact. Vaccine efficacy is commonly defined as the direct effect of a vaccine measured in pre-licensure randomized clinical trials, where vaccination is allocated under optimal conditions. Vaccine (direct) effectiveness is estimated by comparing vaccinated and unvaccinated individuals exposed to the same vaccination programme in the observational post-licensure studies(case-control and cohort study from the same population). The impact of a vaccination programme, defined here as the population prevented fraction when exposure is the programme, is measured by comparing populations with and without a vaccination programme, most commonly the same population before and after vaccination or the cluster randomized trials [@Hanquet_2013].

Vaccine Strategies

The common interventions related to the infectious disease include vaccination, screening(allows the infected individuals to be treated before they progress to be more server disease and /or infect other individuals), social distancing(isolation of susceptible individuals, school closure, travel restriction and cancellation of mass gathering such as football matchs), post-exposure treatment(chemotreament opinons such as antimicrobials can be dealt with static models).

A optimal vaccination strategy need take the following situation into consideration to maximize the health impact and minimize the logistical barries: vaccine supplies, vaccination time, and vaccination situation with nonvccine-related factors changed. Situation in the vaccine supply can be divided into two type: limited vaccine supply or full vaccine supply. The vaccination time is focused on the time when to carry out the vaccination program. Use of the vaccine prior to the outbreak is considered as preemptive vaccination, while the vaccination after the outbreak begins is referred to as the reactive vaccination [@Boettiger_2013]. How to allocate the limited vaccine in the reactive vaccination compaign is complicated but valuable. Moreover, it is also needed pay great attention to the delay time in reactive vaccination campaigns. Previous publications have found that vaccination delay time in a outbreak setting also shape the performance of different vaccine strategies [@Azman_2014]. In contrast to regularly vaccination compaigns, ractive campaigns are often subjected to limited vaccine, and logistical delay forcing vaccinations to make difficult allocation decisions.

Some infectious diseases, especially for the respiratory and vector-borne infectious diseases were deeply influenced by other nonvccine-related factors such as the meteorological factors(the temperature and relative humility) and air pollution(particle matter, O3 and NO2)[@Cowling_2013] [@Passos_2014]. Some literatures have reported that the combination of low temperature and high humidity contributes more to respiratory infections, duing to the fact that recovery of the virus was higher at a higher relative humidity and the stability of the aerosol was at a maximum with relative humidity of 60%. Moreover, nature disaster or the extreme weather such as typhoon and heat wave may change the transmission and susceptibility to communicable disease. Therefore, we need motivate considerations to the influence of nonvccine-related factors on vaccination strategies.

For many multi-strain pathogens there is evidence of a board, short-lived immune response immediately after infection with one strain that confers at least partial cross-protection against all strains of that pathogen [@Reich_2013]. At the other hand, infections with one pathogen can affect the sverity, infectivity, or susceptibility to subsequent infection with other pathogens, and these effects can have profound clinical, epidemiological, and evolutionary implications [@Shrestha_2013]. While the nature, strength, timing of the putative interaction between the influenza and pneumococcal pneumonia were well indentified on the individual and population levels [@Weinberger_2011] [@Shrestha_2013]. However, whether the interaction among other infectious disease such as the hepatitis A, B, C and E is not clear.

Measure Indicators

The measure indicators in the vaccine epidemioloy can be separated into three parts: regular epidemiological indicators, sociological indicators and indicators in special situation. The regular epidemiological indicators are attack rate, morbidity, mortality, fatality rate, There are no difference to the difination of that indicators between the vaccine epidemiology and tradtional epidemiology.

Sociological indicators includes disability adjusted life year (DLAY) and cost benefit (effectivness) rate in the health ecconomic analysis. DLAY, which was jointly developed for the Global Burden of Disease (GBD) study by the World Bank, the World Health Organization (WHO) and the Harvard School of Public Health in the late 1980s, measures both mortality and morbidity and combines them in one single figure, allowing the comparison of health hazards and providing an evidence-based tool for healthcare policy prioritization and for monitoring intervention effects. Since its development, the DALY measure has been used widely in both national and global disease burden and cost-effectiveness studies [@OOSTVOGELS_2014]. A specific property of infectious diseases that distinguishes them from chronic diseases is that infected persons who are treated may not only be cured (i.e. curative intervention) or protected against an infection (i.e. preventive intervention such as vaccination), but also that a successful intervention might reduce or prevent transmission of the pathogen to other susceptible persons(called herd immunity or indirect effect). If herd immunity has a large impact on transmission, dynamic transmission models are recommended for health economic analyses. [@Jit_2011]. Mathematical modelling offers public health planners the ability to make predictions about the impact of emerging diseases as well as the effects of possible response and control measures. Such models are needed to bridge the gap between clinical trials and population-level use. This is particularly crucial for infectious diseases, where mass interventions such as vaccination and screening can result in effects on a population-level not seen on an individual level, including herd immunity, changes in the epidemiology of infection and changes to pathogen ecology as a consequence of selective pressure. The taxonomy categorises models based on whether (i) states in the model change over time (dynamic) or not (static), (ii) changes to the model occur at random (stochastic) or are fixed (deterministic), (iii) the model averages the behaviour of populations (aggregate) or tracks individuals (individual-based), (iv) events occur in discrete or continuous time, (v) individuals can enter or leave the population (open) or not (closed) and (vi) the model’s equations are linear or non-linear functions or parameters.

The indicators in special situation contain basic reproduction number(RO) and Mininum relative single does efficacy. Reprouction number, is the number of secondary cases generated by a single individual over its entire infectious period in a complete susceptile population , which is commonly used to characterize pathogen transmissibility during a epidemic. The monitoring of R over time provides the feedback on the effectiveness of interventions [@Griffin_2015]. Mininum relative single does efficacy was firstly used to figure out, under what conditions the use of single-dose compaigns are expected to be as or more effictive than the two-dose compaigns with the same amount of vaccine [@Azman_2015].


This is a short insight into vaccine epdemiology on the population level. Other issues like the adverse effects following immunization and vaccine clinic trials are not discussed in this blog. However, if you have any ideas on the AEFI or not things, just contact with me. Here, i will point out serveral analysis direction on vaccine epidemiology and hope that more people will contribute themself to the vaccine analysis.

  1. The impact or ecconomic health analysis of the vaccine intervention on infectious disease.
  2. Nonvccine-related factors such as meterological factors or air pollutant on infectious disease.
  3. The spatial or temporal distribution on infectious disease and the outbreak detection.


Reproduction numbers during epidemics in R software 2015-07-11T00:00:00+00:00 Spatial-R Reprouction number, R, is the number of secondary cases generated by a single individual over its entire infectious period in a complete susceptile population , which is commonly used to characterize pathogen transmissibility during a epidemic. The monitoring of R over time provides the feedback on the effectiveness of interventions.

There are two R packages used to measure the reproduction number: R0 and EpiEstim. The former package can calculate the basic reproduction number(R0) and the time-varing reproduction number(Rt), however, the latter package supplies a new framework to estimate the time-vary reproduction number. There are two papers which give the detail introduction to the packages,respectively and you can download them in here and here.

Basic reproduction number

The concept of reproduction number is firstly introduced by the work of Alfred Lotka and Ronald Ross (PS: He received the Nobel Prize for Physiology or Medicine in 1902). However,the first modern application to the basic reproduction number in epidemiology was George MacDonald in 1952, who constructed population models on malaria. The R0 is a nonnegative value, if above 1 indicate that the disease will spread among a population, however, on the contrary, the infection will be die in the future. But some literatures also emphasized that even when the R0 below one it can be transimitted during a long-distance flights.

A review of generic methods used to estimate transmissibility parameters during outbreaks was carried out. Most methods used the epidemic curve and the generation time distribution (Note: The generation time reflect the time lag between infection in a primary case and a secondary case). Overall, the epidemic curve can be finished by the surveillance data by the epicurve function in epitools. package, however, the time distribution should be obtained from the time lag between all infectee/infector pairs and can’t be observed directly. Therefore, the serial interval distribution comes!

Package R0

As aformentioned, the value of generation time was substituted with the serial interval distribution. In R0 package, the generation.time function can creat a discretized generation time distribution, which based on the tiem interval to choose. Two types of distribution can be used: Emperical distribution and Parametric distribution. The former requires the full specification of the distribution, however, the latter includes thress methods (gamma, lognormal and weibull distribution) and need the mean and standard deviation information. Aterlatively, you can use the maximum likelihood method on the observed time intervals between the symptom onsets in primary cases and secondary cases. The est.GT function do solve the process.

The package R0 includes five methods to calculate the basic reproduction number(also called initial number) and the time varying reproduction number:Attack Rate (AR), Exponential Growth (EG), Maximum Likelihood (ML),Time-Dependant (TD), and Sequential Bayesian (SB). A detailed description to the method aboved can be found here. Here, we just point out the difference among the five methods:

  1. Attack Rate (AR):
    Background: Origined from the classical suspective-infective-recoery model. Attack rate is the percentage of population eventually infected.
    Assumption: homogeneous mixing and closed population, No intervention.

  2. Exponential growth (EG):
    Background: Phenomenon about the exponential growth rate rate during the early phase of an outbreak.
    Assumption: The growth rate of the study period in epidemic curve is expential( deviance based R-squared statistic guide the choose).

  3. Maximum Likelihood (ML):
    Background: Phenomenon about the exponential growth rate rate during the early phase of an outbreak.
    Assumption: The secondary cases caused by an index case is Possion distribution with the excepted value of R. Moreover,the growth rate of study period in epidemic curve (from the first case on) is expential (Deviance based R-squared statistic guide the choose).

  4. Time-Dependant (TD):
    Background: A time-dependant method to compute the reproduction number by averaging over all the transmission networks compatible with observations.
    Assumption: It is possible to account for the importation cases during the epidemic. The confidence interval for Rt can be obtained by simulation.

  5. Sequential Bayesian (SB):
    Background: Sequential bayesian estimation for the initial reproduction number. Origined from the approximation to tehe classical suspective-infective-recoery model with Possion distribution.
    Assumption: Epidemic with a period of exponential growth and random mixing in the population.

The estimate.R function can realize the aformentioned methods by the “method” arguement. Overall, the exponential growth method (ML and TD) performance was the least affacted by the either aggregation or the over dispersion.

Example and code

Here, we used the dataset from the 1918 influenza pandemic and you can see its structure by the code “demo(Germany.1918)”. For simlicity, the example code only fixed on the Maximum Likelihood method. Overall, three steps to calculate the reproduction number: generate time distribution, estimate, and sensitively analysis. Analysis codes are as followed:

 library(R0);options(device = "options");data(Germany.1918) 
 epid = c("2012-01-01", "2012-01-02", "2012-01-02","2012-01-03")
 epid.count = c(1,2,4,8)
 GT.flu<-generation.time("gamma", c(2.6,1))  
 res.R <- estimate.R(Germany.1918, GT=GT.flu,methods=c("ML"))

The below code is the sensitivity analysis:

 sensitivity.analysis(Germany.1918, GT.type="gamma",
GT.mean=seq(1,5,1),, begin=1, end=27,est.method="EG",sa.type="GT")

Package EpiEstim

Most methdology to the reproduction number require the incidence data and the distribution of series interval. However, the estimate is right censored for the estimate of Rt requires the incidence data after time t. Moreover, some methods are sensitive to the selected time step or smoothing parameters. Anne Cori proposed a new framework and software[EpiEstim] to estimate the time-varying reproduction number. In the EpiEstim package, the time-varying reproduction number was divided into two types: Rt and Rtc. Here, we call Rt the instaneous reproduction number and call the Rtc the case reproduction number. The distinction between the Rtc and Rt is on the observation perspective: one is measured retrospectively, however,another is prospective based on the assumption that the infectious situation is constant in the future.

In practice, we assume that the infectious probability distribution ws of individual infected, depend on the individual biological factors such as symptom severity. Therefore, the infectious profile lie on the ws and the time since infection of the case, but indepent of the calendar time. Generaly, Rt can be estimate by the multiplication on the ratio of the number of new infectious cases at time t-1 and the ws. Rtc also called the cohort reproduction number becasue it counts the average number of secondary casued by a cohort infected at time t. Comparatively speaking, Rtc is more suitable for the situation that the contact rate and transmissibility can change over time, especially when the control measures are initiated. However, the correspending methods to the Rtc is inavailiable in EpiEstim package .

The EpiEstim package includes three types of methods to estimate the Rt: NonParametricSIParametricSI, and UncertainSI. The detained description to the method aboved can be foound here.All the method can be applied by the EstimateR function.

  1. For method “NonParametricSI”, the discrete distribution of the serial interval is directly specified in the argument “SI.Distr”;
  2. For method “ParametricSI”, the mean and standard deviation of the continuous distribution of the serial interval are given in the arguments “Mean.SI”” and “Std.SI”, which is derived automatically using “DiscrSI”;
  3. For method “UncertainSI”, mean and standard deviation of the serial interval vary according to truncated normal distributions.Firstly, we sample the mean from its truncated normal distribution(with mean Mean.SI,standard deviation Std.Mean.SI, minimum Min.Mean.SI and maximum Max.Mean.SI). Then,we sample the standard deviation from its truncated normal distribution (with mean Std.SI, standard deviation Std.Std.SI, minimum Min.Std.SI and maximum Max.Std.SI).

Example and code

Here, we use the dataset about the pendemic influenza in a school in Pennsylvania, 2009. You can see the structure of the dataset by the code “demo(Flu2009)”. Due the article length, we just give the result of the method “NonParametricSI”. The analysis code is as followed:

EstimateR(Flu2009$Incidence, T.Start=2:26, T.End=8:32, method="NonParametricSI", SI.Distr=Flu2009$SI.Distr, plot=TRUE, leg.pos=xy.coords(1,3)) 

The follow codes are the method ParametricSI and UncertainSI:

EstimateR(Flu2009$Incidence, T.Start=2:26, T.End=8:32, method="ParametricSI", Mean.SI=2.6, Std.SI=1.5, plot=TRUE) 

EstimateR(Flu2009$Incidence, T.Start=2:26, T.End=8:32, method="UncertainSI", Mean.SI=2.6, Std.Mean.SI=1, Min.Mean.SI=1, Max.Mean.SI=4.2, Std.SI=1.5, Std.Std.SI=0.5, Min.Std.SI=0.5, Max.Std.SI=2.5, n1=100, n2=100, plot=TRUE) 
Methods for Environmental Epidemiology in R 2014-08-14T00:00:00+00:00 Spatial-R Recently, the short effect of air pollution on the mortality have drawn more and more attention in China. Many experts have try different methods to analysis the impact of air pollution on health. The study design has fallen into four types: ecological time series, case-crossover, panel and cohort studies. In general, the first three methods are the best way to analysis the short effect (acute effect), whereas the cohort study is used to estimate the acute and chronic effect of the air pollution. Here, i will introduce the ecological time series and case-crossover methods based on the R software. And later, i will cover the hierarchical model for multi-sites time series study to pooling the risk across locations. It is also noticed that two method also can be applied to other study, fox example, the short effect of meteorological factors on mortality or acute infectious disease. (Some methods can be found in the book: **Statistical methods in environmental epidemiology with R: A case study in air pollution and health**. You can download it form here

Time series and case-crossover

Time series and case-crossover analysis are the most common methods used to estimate the short effects of air pollution on health. Both methods typically treat the outcome as the counts representing the number of times a particular event occurred on a given day. Time series allows for over-dispersion associated with the poisson distribution and controls for the long-term trend and seasonality using the nonparametric or parametric splines. The case-crossover method compares the exposure during a case day when the event occurred with the exposure in nearby control days and examine whether the event is associated with the exposure. It is obvious that the confounding related the individual characteristics are controlled by the design. Both methods have the advantage and weakness, therefore, the choose to the analysis method depends your purpose. Some experts also compared two methods and you can download the paper from here

No matter which method you choose, you should download the tsModel package into R software. Thanks to Roger D. Peng, who have develop the package for statistical methods in environmental epidemiology. Another attribution for him is the reproducible research. His homepage can be found here.

Models in R software

Generally, the origin data from the Center for Disease Control and Prevention (CDC) was recorded as case with the information such as death date, cause of death and address. You can tidy up the data more conveniently by the package dplyr. Fox example, using the code as followed you can get the daily count for each city:


In China, how to get the air pollution data and meteorologic factors such as daily temperature and humidity is indeed a big problem. The air pollution data can be obtained form the website. The way is showed in my homepage. However, you need make the compute work all the time for the updated data every hour. The meteorologic data can be obtained from the China Meteorological Data Sharing Service System. After all the data is ok, you can merge the air pollution data, meteorological data and the mortality data into one data. The code can be here:


Here, The way to merge air pollution data and meteorological data is based on the functionmerge and set the argument by with c(“date”,”city”). Therefore, you should make sure that both the dat.air (air pollution dat) and dat.mer (meteorological data) included the two variables date and city. The argument all.y=T mean that the dat.air was added to the dat.mer according to the date and city, just for the reason that the date in meteorological data is always completed. The final data may have some missing data for example the concentrations of air pollutants, you can complete it using the interpolation methods in xts package. There are some example you can learn.

Now, we will try the statistical methods and estimate the short effect of the air pollution on health. The time-series method is as followed:


Here, ns() means a nature cubic spline; 7 degree for time ( ns(date,6&7) represents the year in data is 6 year). Dichotomous variables indicating the day of week(Dow) and the public holidays (holiday) are also included in the model. The final composition is a nature cubic spline for temperature and humidity with 3 df, respectively. Sometimes, the model also include another variable factor(infectious) to adjust the impact from the infectious disease.

There are many different designs for choosing the control days in case-crossover study. **Time-stratified case crossover ** is commonly used for the fixed and disjointed time strata(eg., month). The code to case-crossover is as followed:<-glm(count~pm10+ns(temperature,3)+ns(humidity,3)+factor(strata)  

Here, the argument factor(strata) is the big difference compared the time-series method. An appropriate time to stratify the date depends on your experience. The way to create the variable strata is not a difficult thing after the time strata is fixed.

Static Map in R 2014-07-19T00:00:00+00:00 Spatial-R Numerous packages in R sofware can help you creat the geographic map. Here, i will summarize the packages related to the map so as to visualizate your geographical data more conveniently.

Origin map data

If don’t have the map data, you can download them from the GADM by the function getDate in the raster package. The website supplies the “RData” files (including four levels) which can be used in R environment directly. Alternatively, you can download the “shapefile” file and then use the function readShapePoly (in the maptools package) to read the data. Considering the specificity for the china map, i had put all the relevant data in my github: The codes to download the data from GADM are as follow:

adm <- getData('GADM', country='China', level=1)
adm1 <- getData('GADM', country='Taiwan', level=0)
adm2 <- getData('GADM', country='Hong kong', level=0)
adm3 <- getData('GADM', country='Macao', level=0)
plot(adm);plot(adm1, add = T);plot(adm2, add = T);plot(adm3, add = T)

The maps package also supplies the map data for the world. However, the data is out of date and the China data doesn’t show the Chongqing city. Generally speaking, the data from GAMD is the best choose.

Sp and maptools package

Sp and maptools package are the basic packages in R sofware to deal with the shapefile file. After you have read the map data, you just use the plot function to get the map.

plot(,main="China map in maptools package")

The way to fill the polys and points with different colors can be found in here and here.

Rworldmap package

Rworldmap is a package for visualising global data, concentrating on data referenced by country codes or gridded at half degree resolution. The mapping process then involves 3 steps (or 2 if your data are already in an R dataframe):

  1. read data into R
  2. join data to a map (using function joinCountryData2Map())
  3. display the map (using function mapCountryData)

The example is as follow:

sPDF <- joinCountryData2Map( countryExData,joinCode = "ISO3", nameJoinColumn = "ISO3V10")
mapParams <- mapCountryData( sPDF, nameColumnToPlot="BIODIVERSITY",addLegend=FALSE ) addMapLegend, c(mapParams, legendWidth=0.5, legendMar =4))

Ggplot2 package

Ggplot2 package offers great power to plot data in R. The plots are designed to comply with the grammar of graphics philosophy and can be produced to a publishable level relatively easily. After you have read the shapefile data in R, the fortify function can be applied to transformed the shapefile to dataframe which can be directly used to the ggplot2 plot. The example is follow:

 china_map1<-fortify(china_map,region='BOU2_4M_ID') ### transform to the dataframe
 ggplot(china_map1, aes(long, lat)) +theme_bw(base_family="serif",base_size=10)+
 geom_path(aes(long,lat, group=group, fill=hole), color="black", size=0.3)


Ggmap,plotGoogleMaps and RgoogleMaps packages can use the googlemap API to get or display the map data. Moreover, the downloaded map can combine with ggplot package. However, the googla map doesn’t work in China for a long time.

Air Quality Data from Website 2014-06-11T00:00:00+00:00 Spatial-R More people were concerned about the air quality in China, espicially the concentration of PM2.5 that is believed to pose the greatest health risks. If you want to analysis the relationship between the air pollution or tempreture and the mortality, the access to the data on the air pollutants is requsite. Here, i will introduce three ways to get the air quality data from the website.

Frist website

The website: supplies the real-time air quality index for PM10,PM2.5,SO2,NO2,O3 and NO. You can backstepping the concentration for each air pollutant based on the air quality index, respectively. The format to air quality index is in the China air quality standard (2012). Moreover, the website gives the meteorology data such as Temperature, dew point temperature, pressure,humidity and wind speed. The code to catch the data is as follow:

options(verbose = TRUE); URL = ""
doc = htmlParse(URL);  nodes = getNodeSet(doc, "//a[@href]")
hrefs = sapply(nodes, function(x) xmlGetAttr(x, "href"))  ###get all the site
ks = which(hrefs == "")
ks1 = which(hrefs == "")
ref2 = hrefs[ks:ks1]  ### get the site in China

data = list();i = 1
while (i < (n/3 + 1)) {
   url = ref2[i]; names = substr(url, 23, (nchar(url) - 1))
   try(y <- readHTMLTable(url, warn = FALSE, which = 6), silent = T)
 if (length(y) == "5") {
   mo = data.frame(y[, c(1, 3)])
   dy = data.frame(t(as.numeric(as.character(mo$Current))))
   names(dy) <- as.character(mo$Var.1); dy$site = rep(names, 1)
   data[[i]] = dy;i = i + 1
  } else {
     i = i + 1
now.time = substr(Sys.time(), 1, 13) = paste(now.time, "-pollutant", ".csv", sep = "")
dat = ldply(data, rbind.fill)

Second website

The website: supply an API to the data in National Bureau of Environmental Protection. You just apply for a appkey and then you can get the air quality data in 190 cities in China conveniently. The website also provide detailed materials about the way to ues the appkey. Thanks to the anson and his team. The code is as follow:

 pm=getURL(url);pm2 <-fromJSON(pm);pm3=data.frame("rbind",pm2))
 pm4=apply(pm3,2,as.character); dt=gsub("-","",substr(pm4[1,21],1,13))

Third website

Some website also punish the air quality data for certain city. For example, the Haikou municipal environmental protection bureau provide the data in its website: You also can use R to download the data. The code is follow:



Based on the R sofware, you can get the air quality data very conveniently. However, if you want to collect the data consistently, the while loop will work well and let your compute run the code in the certain time, for example in the integral point.

Coordinate Data from Website 2014-06-11T00:00:00+00:00 Spatial-R The longlat data is essential prerequisite for the spatial analysis. Especially in epidemiology, the detail address data is there. The only thing you need do is to transfrom the address into the coordinate. If the data is not large, you can get the coordinate data one by one in google map or baidu map. However, when number of the data is above 50, the “input-copy” way is extremly boring and insecurity. Thanks to the open API in google map and baidu map, you can get the coordinate data very conveniently. Note the format of coordinate data in google map and baidu map is different. If you get the coordinate data from google map and then show it in baidu map, the deviation comes.

Google Map

Google place API is powerful in the tranformation for the english address. Make sure the format of the address is OK. You can copy the code into your R console before input your google place API(Apply for your API in Note that the google maps api has a daily limit of 2,500 so that your vector shouldn’t be too long. The code you can find in Neal D. Goldstein, Amy H. Auchincloss and Brian K. Lee.

Baidu Map

Baed on the baidu map geocoding API v2.0, you can search for the coordinate for the chinese address. There is no daily limitation for this API. At the same time, you also can transform the Chinese address into english, then use the google place API. The code is as follow:

key = ""  #### Copy your own key here
get.latlong <- function(data) {
  url.base <- ""
  url <- paste(url.base, key, "&callback=renderOption&output=xml&address=",data, sep = "")
  url.result <- getURL(url)
  longlat <- unlist(str_match_all(url.result, "[0-9]+[.]*[0-9]*[<>]"))[c(2:3)]
 lt <- data.frame(t(sapply(dat, get.latlong)))
 lt[, 1] <- gsub("<", "", lt[, 1]);lt[, 2] <- gsub("<", "", lt[, 2])  ### The final result is in lt

Moreover, you can use the geoconv API 2.0 to transform the format of the geocoordinate data(form google map or sogou map to baidu map)


Which way you choose is based on your data and the platform to show the result. Baidu map api is suitable for chinese address and google map is for english address.