Data Science Boot Up Camp

July 9 - July 19, 2018

Room 056 and 052
Graduate School of Mathematical Sciences
The University of Tokyo

3-8-1 Komaba, Meguro-ku, Tokyo 153-8914, Japan


Travel information

Campus map

Flyer (in Japanese)

Objectives :
Data science resides at the intersection among mathematics, statistics and computer science,
dealing with collecting and analyzing large amounts of data.
The object of the school is to introduce basic notions of statistical data analysis methods
as well as their implementation by computer software.
We will welcome students of various disciplines including mathematics, physics, astronomy,
engineering, linguistics and social science participating the school.

Supported by
Top Global University Project, MEXT Japan

July 9 - July 13 Lecture series, Exercises and Tutorials
July 17 - July 19 Student Seminars

Lecturers will include
Philip B. Stark (Department of Statistics, University of California, Berkeley) | syllabus
Yuta Koike (Graduate School of Mathematical Sciences, The University of Tokyo) | syllabus
Documents used in Koike's lectures will be uploaded to

Research talks

July 13

10:00-11:00 Room 056
Christopher Hoover (UC Berkeley)
Data Science Applications to Global Environmental Health: Enhanced Understanding for Enhanced Impact

11:10-12:10 Room 056
Rosanna Neuhausler (UC Berkeley)
Investigating critical timescales of water storage and discharge in the Sierra Nevada, California | Abstract

13:30-14:30 Room 117
Yue You (UC Berkeley)
Targeted Learning for Variable Importance in Precision Medicine | Abstract

14:40-15:40 Room 117
Valerii Sopin (HSE Moscow)
Trees, Bagging, Random Forests and Boosting

Information: Toshitake Kohno, Graduate School of Mathemarical Sciences, the University of Tokyo

Titles and Abstracts of Lectures

Philip B. Stark
Title: Foundations of Statistics and an Introduction to Statistical Inference
These lectures will complement those of Prof. Koike by focusing on foundational issues in statistics, statistical inferential thinking, the interpretation of statistical calculations, and nonparametric and exact methods. Topics will include types of uncertainty; theories of probability and their shortcomings; systematic and stochastic errors; frequentist and Bayesian approaches to estimation and inference and their shortcomings; confounding; the method of comparison; the importance of experimental/observational design; assessing estimators; interpreting p-values, confidence sets, posterior probabilities, and credible sets; common fallacies in statistical inference; the Neyman model for causal inference; interference in experiments; abstract permutation methods; pseudo-random number generation; computational implementation of permutation methods and resampling methods in Python. Examples will be drawn from physical, social, and health sciences.

Yuta Koike
Title: Introduction to Statistical Data Analysis
In this lecture we present elementary statistical data analysis methods and their implementation by R. Starting with the basic usage of R, we explain some elementary methods from multivariate analysis such as linear regression, principal component analysis and discriminant analysis and how to implement them by R. This lecture focuses on the practical implementation of the methods rather than the theoretical details.