Course description

Block 1 | Block 2 | Block 4 | Block 5 | Block 6 | Stata 1 | Stata 2

Block 1

Principles of Biostatistics - M. Pagano (Harvard School of Public Health)

Introduces the fundamental principles of statistics applied to biomedicine. The topics to be covered include: descriptive statistics, measures of central tendency, probability, diagnostic testing, population and sample, comparison of proportions. At the end of the course, students will be able to understand the descriptive statistical methodologies which are used in clinical and epidemiological studies and to utilize the estimates obtained from suitably selected samples, in order to draw statistical inferences.

Linear Regression for Medical Research - R. Bellocco (University of Milano-Bicocca and Karolinska Institutet)

The course introduces students to the practice and application of regression modeling. Through the use of Stata¨, students will learn how to fit a regression, estimate, and test regression coefficients. Particular emphasis will be placed on the interpretation of the regression coefficients of continuous and categorical predictors. Analysis of variance models and their correspondence with regression models will also be covered together with procedures and issues in model selection, including confounding and interaction. Model building, goodness of fit, residual analysis, and appropriate regression diagnostics will be discussed.

Causal Inference in Epidemiology - N.P. Jewell (Berkeley University)

Causal inference from observational data is a key task of biostatistics and of allied sciences such as sociology, econometrics, behavioral sciences, demography, economics, health services research, etc. These disciplines share a methodological framework for causal inference that has been developed over the last decades. This course presents this unifying causal theory and shows how biostatistical concepts and methods can be understood within this general framework. The course emphasizes conceptualization but also introduces statistical models and methods for causal effect estimation. Specifically, this course strives to a) formally define causal concepts such as causal effect and confounding using potential outcomes and counterfactuals, b) identify the conditions required to estimate causal effects using Directed Acyclic Graphs (DAGs), and c) introduce analytical methods that, under those conditions, provide estimates that can be endowed with a causal interpretation. Examples of such methods are regression adjustment, standardization and inverse probability weighting.

Block 2

Principles of Epidemiology - E. Mostofsky (Harvard School of Public Health)

This course provides an introduction to the skills needed by public health professionals and clinicians to critically interpret the epidemiologic literature. It will provide participants with the basic principles and practical experience needed to develop these skills. This will be accomplished by covering the basic principles and methods of the design, conduct and interpretation of epidemiologic studies, including descriptive studies, observational analytic studies (case-control and cohort), and randomized clinical trials. In addition, the course will address the calculation and interpretation of measures of disease frequency and association; the assessment of association versus causation in the interpretation of study results; and an introduction to issues related to the evaluation of chance, bias, confounding, and effect modification. Lectures will be complemented by seminars devoted to case studies, exercises, or critiques of relevant examples of epidemiologic studies.

Logistic Regression for Medical Research - K. Gauvreau (Harvard School of Public Health)

This course introduces students to the practice and application of logistic regression modeling for binary outcomes. Students will fit, evaluate, and interpret binary data models arising from epidemiological studies, clinical trials, or other application areas. Topics include assessment of confounding and effect modification, use of indicator variables, model building methods, goodness-of-fit assessment, presentation of logistic regression models for reports and publications, and an introduction to conditional and ordinal logistic regression. Data sets from the medical and public health literature will be used as case studies to be analyzed using the Stata® statistical package.

Mediation Analysis - A. Bellavia (Harvard School of Public Health)

The course will introduce traditional and new methods for mediation analysis. These methods are commonly used to assess social and biological pathways by which causal effects operate. Fundamentals of mediation analysis will be presented for dichotomous, continuous, and time-to-event outcomes, and discussion will be given as to when the standard approaches to mediation analysis are or are not valid. The relationship between traditional methods for mediation in the biomedical and the social sciences and new methods in causal inference will be discussed. The course will also introduce some of the recent developments in the field, including extensions to incorporate multiple mediators and interactions. Stata macros and commands to implement these techniques will be presented, and several applications from epidemiology and the social sciences will be illustrated and discussed. Basic knowledge of linear and logistic regression is recommended.

Block 4

Research Methods in Health: Biostatistics - M. Bonetti (Bocconi University)

This course is designed to provide the student with an understanding of the foundations of biostatistics and of the various statistical techniques that have been developed to answer research questions in the health sciences. Students will be introduced to methods for the comparison of outcome between two groups (t-test and non parametric tests), as well as the extension to the comparison of outcome across several groups (ANOVA); methods for the study of association between two continuous variables (correlation and linear regression); the analysis of contingency tables; the study of survival (time-to-event) data. The afternoon sessions are devoted to discussion and learning to use Stata® to implement materials covered in the morning lectures.

Longitudinal Data Analysis - G. Fitzmaurice (Harvard School of Public Health)

This course focuses on methods for analyzing longitudinal and repeated measures data. The defining feature of longitudinal studies is that measurements of the same individuals are taken repeatedly through time, thereby allowing the direct study of change over time. This type of study design encompasses epidemiological follow-up studies as well as clinical trials. The course covers many well-established methods for the analysis of longitudinal data when the response variable is continuous. Methods for discrete response variables (e.g., repeated binary responses and counts) are introduced, but not emphasized. An introductory course in biostatistics and a good background in linear regression analysis are prerequisites for this course.

Molecular Epidemiology - L. Mucci (Harvard School of Public Health)

This course will give students an introduction to the key biomarkers assessed in molecular epidemiology studies and frequently used study designs options. Topics include blood-based biomarkers including metabolomics, genome-wide association studies, tissue-based biomarkers such as immunohistochemistry, and the transcriptome. While examples will be drawn primarily from cancer molecular epidemiology studies, the content will be relevant to any field of chronic disease epidemiology (e.g., cardiovascular epidemiology, neuroepidemiology). Throughout the course, we will emphasize study design strategies particularly related to validity and efficiency, employ publicly available data resources during hands-on labs, and discuss biological and clinical translation. Working knowledge on fundamentals of epidemiologic study design as covered in Principles of Epidemiology is a prerequisite.

Monitoring and Evaluation of Public Health Programs: Systems Approaches and Techniques - E. Savoia (Harvard School of Public Health)

It is important for individuals working within the field of public health and social programs to be able to determine if existing programs are effective. Evaluation methods applied to complex system interventions are rarely taught in degree programs. This course is a unique opportunity to gather skills in applying evaluation frameworks and methods to real practice interventions. Course participants will gain knowledge on program evaluation applied to complex public health and social interventions (i.e. response to public health emergencies, violence prevention efforts); traditional evaluation methods as well as developmental evaluation will be taught. Basic monitoring techniques will be addressed. Students will learn through lectures, being actively involved in discussions, critically analyzing logic models and case studies, and working together to develop an evaluation plan for a public health or a social change initiative based on participants' interests.

Block 5

Research Methods in Health: Epidemiology - M. Mittleman (Harvard School of Public Health)

This course will explore in greater depth the fundamental epidemiologic concepts introduced in Principles of Epidemiology (Week 1). The course will be taught with an emphasis on causal inference in epidemiologic research. Topics will mainly focus on chronic disease epidemiology, with a special emphasis on practical study design. Epidemiologic examples from major chronic diseases/conditions (e.g. heart disease and cancer) will be discussed. Students will revisit the issues of confounding, selection bias, effect modification, and generalizability in the context of these topics. Lectures will be augmented by workshops to illustrate practical examples in the epidemiologic literature. The material covered in Principles of Epidemiology will be assumed of the students entering this course.

Survival Analysis - N. Orsini (Karolinska Institutet)

The course introduces statistical methods for survival analysis, that is, the analysis of studies where the outcome is a time-to-event. Measures covered are survival probabilities, survival percentiles, and rates. The methods include Kaplan-Meier survival curve, Poisson regression, Cox regression, Laplace regression. Modelling strategies include flexible modelling of quantitative predictors and interaction analysis. The concepts and methods are illustrated through real-life examples taken from medical, epidemiological, and public-health research. The emphasis is placed on interpretation and practical relevance. Guided, hands-on computer activities enable the participants to utilize the presented statistical methods.

Nutritional Epidemiology: Principles and Applications to Cancer - M. Song (Massachusetts General Hospital and Harvard Medical School, Boston, MA)

The course introduces students to the key principles and methods of nutritional epidemiology, including approaches for dietary assessment, utility of biochemical indicators and metabolomics, rationale and analytic approaches for adjustment for total energy intake, correction for measurement error, and selected issues relating to data analysis and presentation. Using diet-cancer research as an example, students will learn how to apply and integrate these principles and methods to address practical questions of their interest. Some special topics in the gut microbiome and clinical translation will also be covered to demonstrate the applications of nutritional epidemiology in novel areas.

Joint Modelling of Longitudinal and Survival Data - M. Crowther (University of Leicester)

The joint modelling of longitudinal and survival data has been an area of growing interest in recent years, with the benefits of the approach becoming recognised in ever widening fields of study. The models can provide both an effective way of conducting an analysis of a survival endpoint (e.g. time to death), influenced by a time-varying covariate measured with error, or alternatively correct for non-random dropout in the analysis of a longitudinal outcome (e.g. a biomarker such as blood pressure). This week-long course will provide an introduction to joint modelling through real applications to both clinical trial data and electronic health records, using examples in cancer, liver cirrhosis and cardiovascular disease. We will study the methodological framework, underlying assumptions, estimation, model building and predictions. We will also consider current developments in the field, looking at some of the many extensions of the standard framework, such as the ability to model multiple biomarkers and competing risks. The course will consist of lectures, classroom exercises, and computing exercises making use of the stjm and megenreg packages in Stata, written by the course lecturer.

Block 6

Statistical methods for population-based cancer survival analysis - P. Dickman (Karolinska Institutet) & P. Lambert (University of Leicester and Karolinska Institutet)

The course will address the principles, methods, and application of statistical methods to studying the survival of cancer patients using data collected by population-based cancer registries. We cover central concepts, such as how to estimate and model relative survival, as well as recent methodological developments including cure models, flexible parametric models, loss in expectation of life, and estimation in the presence of competing risks. Comparison of alternative methodological approaches (e.g., to estimating and modeling relative/net survival) will be a focus of the course and participants will get the opportunity to apply and contrast a range of methods to real data. A large amount of time will be devoted to exercise sessions where Drs Lambert and Dickman along with 3 other experienced faculty members will be available to work with participants individually or in small groups. The exercise sessions will also provide an opportunity for participants to discuss their own research projects with the faculty (and with each other). We encourage potential participants to read the detailed course description at

Stata 1

Basics of Stata® - B. Pongiglione (Institute of Education, University College London)

This course is designed to introduce students to the basics of Stata. It will focus on the minimum set of commands everyone should know to organize their own work. Specific topics include data-management, data-reporting, graphics and basic use of do-files. By the end of this one-day course, the student should be capable of using Stata independently.

Data Visualization with Stata® - G. Capelli (University of Cassino and Southern Lazio)

The course introduces students to the logic and the strategies for visualizing data in Stata. Among the topics, the course will explore the issues in the choice of the most appropriate graphic (distributional, compositional or correlational) for different data and aims, and tips and tricks to prepare data for different graphical schemes. In particular, the power and flexibility of multiple "layers" in twoway Stata panels will be exploited. By the end of this one-day course, students will be able to produce Stata Graphs, and export them to JPG, TIFF or PDF formats for further applications.

Analysis of prospective studies with Stata® - F. Ghilotti (Karolinska Institutet)

This course is designed to introduce student to the analysis of cohort studies, managing person-times, estimating counts and incidence rate ratios of both fixed and time-varying exposures and fitting count regression models. By the end of the course, the student will be familiar these epidemiogical techniques using Stata.

Stata 2

Basics of Stata® - F. Gallo (Local Health Authority of Cuneo, Epidemiology Division)

This course is designed to introduce students to the basics of Stata. It will focus on the minimum set of commands everyone should know to organize their own work. Specific topics include data-management, data-reporting, graphics and basic use of do-files. By the end of this one-day course, the student should be capable of using Stata independently.

Epi tables using Stata® - A. Discacciati (Karolinska Institutet)

This course is designed to introduce students to basic Stata commands useful in epidemiological research: descriptive statistics to estimate the incidence of a binary response and to characterize the demographic information supplied by study participants; statistical tests to identify univariate predictors associated with the binary response; graph the incidence of a binary response as a function of a predictor; and table of standardized means and proportions.

Multiple Imputation using Stata® - N. Orsini (Karolinska Institutet)

The course provides a practical overview of methods to estimate missing data. The course will introduce the basics of multiple imputation, in particular imputation by chained equations. By the end of this one day course, participants should be capable to analyse data by multiple imputation in Stata. Students should have a background in linear regression methods prior to taking this course.