Course description

Block 1 | Block 2 | Block 3 | Block 4 | Block 5 | Stata 1 | Stata 2

Block 1

Principles of Biostatistics - N. Orsini (Karolinska Institutet)

Introduces the fundamental principles of statistics applied to medical and epidemiological research. The topics to be covered include: descriptive statistics, Bayes theorem, diagnostic testing, distinction between parameters and an estimate of it, test of hypothesis and degree of confidence, errors in statistical inference, inference on comparative measures (mean difference, risk difference, risk ratio, odds ratio, rate ratio) in experimental and observational studies. At the end of the course, students will be familiar with fundamental statistical concepts in clinical and epidemiological studies and will utilise estimates obtained from samples of data to carefully draw statistical inferences.

Linear Regression for Medical Research - R. Bellocco (University of Milano-Bicocca and Karolinska Institutet)

The course introduces students to the practice and application of regression modeling. Students will learn how to fit a regression, estimate, and test regression coefficients. Particular emphasis will be placed on the interpretation of the regression coefficients of continuous and categorical predictors, in presence of including confounding and interaction. Analysis of variance models and their correspondence with regression models will also be covered.Model building,model predictions, goodness of fit, residual analysis, and appropriate regression diagnostics will be presented. In the applied sessions students will use Stata toapply linear regression models using data from epidemiological studies.

Causal Inference in Epidemiology - Michele Santacatterina (NYU Grossman School of Medicine)

In the last few decades, many techniques have been developed to estimate causal effects from observational real-world data. This course will introduce techniques to identify and estimate causal effects from observational real-world data. Specifically, the course will first explain the use of direct acyclic graphs, the potential outcome framework, and identification assumptions to identify causal estimands. The course will then introduce statistical methods to estimate these estimands, such as regression adjustment, inverse probability weighting, matching, and doubly robust estimators. All theoretical concepts will be set into the context of real-life research problems, taken from medicine and epidemiology. Lab sessions in Stata will provide an opportunity for 'hands-on' training in causal inference. Causal inference is an essential research topic in the statistical, medical, epidemiological, and social sciences. By the end of this course, you will be able to identify, estimate and compute causal effects using observational data, thus improving your research and decision-making skills.

Block 2

Principles of Epidemiology - E. Mostofsky (Harvard T.H. Chan School of Public Health)

This course provides an introduction to the skills needed by public health professionals and clinicians to critically interpret the epidemiologic literature. It will provide participants with the basic principles and practical experience needed to develop these skills. This will be accomplished by covering the basic principles and methods of the design, conduct and interpretation of epidemiologic studies, including descriptive studies, observational analytic studies (case-control and cohort), and randomized clinical trials. In addition, the course will address the calculation and interpretation of measures of disease frequency and association; the assessment of association versus causation in the interpretation of study results; and an introduction to issues related to the evaluation of chance, bias, confounding, and effect modification. Lectures will be complemented by seminars devoted to case studies, exercises, or critiques of relevant examples of epidemiologic studies.

Logistic Regression for Medical Research - M. Bottai (Karolinska Institutet)

The course introduces students to the practice and application of logistic regression modeling for binary outcomes. Students will estimate, evaluate, and interpret binary data models arising from epidemiological studies, clinical trials, or other application areas. Topics include assessment of confounding and effect modification, use of indicator variables, model building methods, goodness-of-fit assessment, presentation of logistic regression models for reports and publications, and an introduction to conditional and ordinal logistic regression. Data sets from the medical and public health literature will be used as case studies to be analyzed using the Stata statistical program.

Joint Modelling of Longitudinal and Survival Data - M. Crowther (Red Door Analytics)

The joint modelling of longitudinal and survival data has been an area of growing interest in recent years, with the benefits of the approach becoming recognised in ever widening fields of study. The models can provide both an effective way of conducting an analysis of a survival endpoint (e.g. time to death), influenced by a time-varying covariate measured with error, or alternatively correct for non-random dropout in the analysis of a longitudinal outcome (e.g. a biomarker such as blood pressure). This week-long course will provide an introduction to joint modelling through real applications to both clinical trial data and electronic health records, using examples in cancer, liver cirrhosis and cardiovascular disease. We will study the methodological framework, underlying assumptions, estimation, model building and predictions. We will also consider current developments in the field, looking at some of the many extensions of the standard framework, such as the ability to model multiple biomarkers and competing risks. The course will consist of lectures, classroom exercises, and computing exercises making use of the stjm and merlin packages in Stata, written by the course lecturer.

Block 3

Statistical methods for population-based cancer survival analysis - P. Dickman (Karolinska Institutet) & P. Lambert (University of Leicester and Karolinska Institutet) & T. Andersson (Karolinska Institutet) & M. Rutherford (University of Leicester) & E. Syriopoulou (Karolinska Institutet)

The course will address the principles, methods, and application of statistical methods to studying the survival of cancer patients using data collected by population-based cancer registries. We cover central concepts, such as how to estimate and model relative/net survival. We will cover the use of flexible parametric survival models, cure models, loss in expectation of life, and estimation in the presence of competing risks. Comparison of different approaches (e.g., to estimating and modelling relative/net survival) will be a focus of the course and participants will get the opportunity to apply and contrast a range of methods to real data. A large amount of timewill be devoted to exercise sessions where the faculty members will be available to work with participants individually or in small groups. The exercise sessions will also provide an opportunity for participants to discuss their own research projects withthe faculty (and with each other). We encourage potential participants to read the detailed course description at

Block 4

Research Methods in Health: Biostatistics - M. Bonetti (Bocconi University)

This course is designed to provide the student with an understanding of the foundations of biostatistics and of the various statistical techniques that have been developed to answer research questions in the health sciences. Students will be introduced to methods for the comparison of outcome between two groups (t-test and non parametric tests), as well as the extension to the comparison of outcome across several groups (ANOVA); methods for the study of association between two continuous variables (correlation and linear regression); the analysis of contingency tables; the study of survival (time-to-event) data. The afternoon sessions are devoted to discussion and learning to use Stata to implement materials covered in the morning lectures.

Longitudinal Data Analysis - G. Fitzmaurice (Harvard T.H. Chan School of Public Health)

This course focuses on methods for analyzing longitudinal and repeated measures data. The defining feature of longitudinal studies is that measurements of the same individuals are taken repeatedly through time, thereby allowing the direct study of change over time. This type of study design encompasses epidemiological follow-up studies as well as clinical trials. The course covers many well-established methods for the analysis of longitudinal data when the response variable is continuous. Methods for discrete response variables (e.g., repeated binary responses and counts) are introduced, but not emphasized. An introductory course in biostatistics and a good background in linear regression analysis are prerequisites for this course.

Applied Epidemiologic Methods: Integrating Diet/Lifestyle and Omics in the Era of Precision Nutrition - M. Song (Harvard T.H. Chan School of Public Health)

This course provides an overview of the key principles and epidemiologicmethods in studying the relation of diet/lifestyleand disease. It also presents the latest advances in integrating omics profiling (including metabolomics and microbiome) to better understand the role of diet/lifestyle in disease prevention in the era of precision nutrition. Students are expected to learn how to apply and integrate the basic principles and methods in epidemiology to address practical questions of their interest. Specifically, students are expected to: Be familiar with the basic methodology of nutritional and lifestyle epidemiology; Be able to interpret published studies; Be familiar with study design and analysis considerations in applied epidemiology; Get introduced to integrated epidemiologic research of diet/lifestyle and omics.

Block 5

Research Methods in Health: Epidemiology - M. Mittleman (Harvard T.H. Chan School of Public Health)

This course will explore in greater depth the fundamental epidemiologic concepts introduced in Principles of Epidemiology (Week 1). The course will be taught with an emphasis on causal inference in epidemiologic researchwith afocus on chronic disease epidemiology and an emphasis on practical study design. Students will revisit the issues of confounding, selection bias, effect measure modification on the additive and multiplicative scales, and generalizability. Workshops will augment lectures to illustrate practical examples in the epidemiologic literature. The material covered in Principles of Epidemiology will be assumed of the students entering this course.

Survival Analysis - N. Orsini (Karolinska Institutet)

The course introduces statistical methods for survival analysis, that is, the analysis of studies where the outcome is a time-to-event. Measures covered are survival probabilities, survival percentiles, and event rates. Popular statistical methods are covered: non-parameteric Kaplan-Meier; parametric Exponential and Weibull models, semi-parametric Cox models, and quantile models. Advanced modelling strategies include splines for quantitative predictors and interaction analysis. The concepts and methods are illustrated through real-life examples taken from medical, epidemiological, and public-health research. The emphasis is placed on interpretation and practical relevance. Guided, hands-on computer activities enable the participants to understand and utilize the presented statistical methods.

Mediation Analysis - A. Bellavia (Harvard T.H. Chan School of Public Health)

The course will introduce traditional and novel approaches for mediation analysis in clinical and epidemiologic. Mediation analysis allows assessing social and biological pathways through which causal effects operate. Fundamentals of mediation analysis will be presented for dichotomous, continuous, and time-to-event outcomes, and discussion will be given as to when the standard approaches to mediation analysis are or are not valid. The relationship between traditional methods for mediation in the biomedicaland the social sciences and recent developments in causal mediation analysis will be discussed. The course will also introduce some of the recent developments in the field, including extensions to evaluate complex datasets with multiple mediators and interactions. The course will introduce Stata macros and commands to implement these approachesand will illustrate several applications from epidemiology and the social sciences. Basic knowledge of linear and logistic regression is recommended.

Stata 1

Basics of Stata® - B. Pongiglione (Institute of Education, University College London)

This course is designed to introduce students to the basics of Stata. It will focus on the minimum set of commands everyone should know to organize their own work. Specific topics include data-management, data-reporting, graphics and basic use of do-files. By the end of this one-day course, the student should be capable of using Stata independently.

Meta-analysis using Stata® - R. D'Amico (University of Modena and Reggio Emilia)

Covers Stata commands for a variety of tasks regarding the combination of results from randomised controlled trials that consider binary, continuous and time to event outcomes: data preparation and input, fixed and random-effect models, forest plots, heterogeneity across studies, publications bias, sensitivity analysis, and meta-regression models.

Analysis of prospective studies with Stata® - R. Bellocco (University of Milano-Bicocca and Karolinska Institutet) & M. Ponzano (University of Genova)

This course is designed to introduce student to the analysis of cohort studies, managing person-times, estimating counts and incidence rate ratios of both fixed and time-varying exposures and fitting count regression models. By the end of the course, the student will be familiar these epidemiogical techniques using Stata.

Data Visualization with Stata® - G. Capelli (University of Cassino and Southern Lazio)

The course introduces students to the logic and the strategies for visualizing data in Stata. Among the topics, the course will explore the issues in the choice of the most appropriate graphic (distributional, compositional or correlational) for different data and aims, and tips and tricks to prepare data for different graphical schemes. In particular, the power and flexibility of multiple "layers" in twoway Stata panels will be exploited. By the end of this one-day course, students will be able to produce Stata Graphs, and export them to JPG, TIFF or PDF formats for further applications.

Stata 2

Basics of Stata® - F. Gallo (Local Health Authority of Cuneo, Epidemiology Division)

This course is designed to introduce students to the basics of Stata. It will focus on the minimum set of commands everyone should know to organize their own work. Specific topics include data-management, data-reporting, graphics and basic use of do-files. By the end of this one-day course, the student should be capable of using Stata independently.

Epi tables using Stata® - A. Discacciati (Karolinska Institutet)

This course is designed to introduce students to basic Stata commands useful in epidemiological research: descriptive statistics to estimate the incidence of a binary response and to characterize the demographic information supplied by study participants; statistical tests to identify univariate predictors associated with the binary response; graph the incidence of a binary response as a function of a predictor; and table of standardized means and proportions.

Practical Introduction to Propensity Score - K. Diaz-Ordaz (London School of Hygiene and Tropical Medicine)

Propensity scoring is a popular analysis method for adjusting for confounding in order to estimate a causal effect using observational data. This intensive (6 hours - two 1-hr lectures and two 2 h lab sessions) one-day workshop will provide participants with a practical understanding of the principles underlying propensity score analysis. Participants will learn how to apply the main propensity score methods (stratification, matching, and inverse-probability-of-treatment-weighting), and understand their advantages and disadvantages. Participants will also work through practical exercises using Stata, to learn how to apply these techniques and how to interpret the results.

Multiple Imputation to handle missing data - T. Morris (MRC Clinical Trials Unit at UCL)

This course will teach you the practical tools to use multiple imputation. It will start by describing the problems created by missing data, including the classic taxonomy of 'missing completely at random', 'missing at random' and 'missing not at random'. This will be used to understand when analysis or complete cases is reasonable and when more complex methods are required: that is, when complete case analysis is likely to be biased or have poor precision. Participants will first learn about the idea of multiple imputation will for a single incomplete variable. For multiple incomplete variables, multiple imputation using fully conditional specification will be introduced. Participants will learn how to combine inference from multiply imputed datasets, how many imputations to use, and to appreciate the concept of 'compatibility' for multiple imputation. Throughout the course, emphasis will be on getting multiple imputation right, rather than on getting Stata to do it. However, practical sessions will focus on how to implement methodsin Stata and how to interpret output. The course will end with a discussion of how to report analyses using multiple imputation.