Nonparametric methods featuring the bootstrap
Course Topics
Nonparametric methods form an important core of statistical techniques and are typically used when data do not meet parametric assumptions. Understanding the foundation of these methods, as well as when and how to implement them, is important for those who hope to work with real data sets where strict adherence to assumptions is unreliable. In this short course, we will talk about the basic philosophy of nonparametric statistics, what they’re all about, why we need them, and in what situations they are useful. We will list assumptions that are required for parametric tests, and then discuss specific nonparametric tests, such as the rank-sum and signed-rank test (alternatives to the t-test*), the Kruskal-Wallis test (an alternative to ANOVA*), and the Spearman rank correlation (an alternative to the Pearson correlation). We will give examples of how to implement these methods in R. Additionally, we will examine the concept of nonparametric bootstrap sampling, and how this can be a useful tool in many situations. We will give an example of how bootstrap sampling can be applied to a regression problem to allow us to make inference on the data when normal assumptions for more traditional methods are not met. This course will feature both lecture and computer laboratory elements.
*LISA will host a short course on the different t-tests and ANOVA, taught by William Tyler Bradley. If you would like to review these methods, you may want to attend this course.
Packages used: R
Data Used: To examine the rank sum and signed rank tests, we will used a dataset called “Yields from a Barley Field Trial,” a dataset from an article in the Journal of the American Society Agronomy in 1934, built into R within the MASS package. The variables of interest will be barley yield from 1931 and 1932. For the Kruskal-Wallis test, we will use “Motor Trend Car Road Tests,” a 1974 set from Motor Trend U.S. Magazine, also built into R (no packages are necessary). We will attempt to determine if there is a relationship between the number of cylinders in a car and miles per gallon.
The Spearman correlation and nonparametric bootstrap methods will have us inspect a custom-made dataset with two variables. We will by compare average IQ in various countries in 2002 with the average amount of televisions per 1,000 people in 1997. The data itself is available from two links provided below.
(Note: datasets are subject to change).