Bootstrap Resampling

Interactive demonstration of hypothesis testing with bootstrap resampling in R.

Data Description

(Adopted from an example by Ĺšaunak Sen)

Systolic blood pressure was measured in progeny from a backcross between two mouse strains. 50 (randomly chosen) mice were genotyped at the D4Mit214 marker. We want to detect association between the D4Mit214 marker genotype and blood pressure. The values show the systolic blood pressure (in mm of Hg) by the marker genotype, BA (heterozygous) or BB (homozygous).

Bootstrapping


R code

This is the bare R code with number of replicates = 1,000 and α = 0.05.

# Heterozygous (BA)
a = c(86, 88, 89, 89, 92, 93, 94, 94, 94, 95, 95, 96, 96, 97, 97, 98, 98, 99, 99, 101, 106, 107, 110, 113, 116, 118)

# Homozygous (BB)
b = c(89, 90, 92, 93, 93, 96, 99, 99, 99, 102, 103, 104, 105, 106, 106, 107, 108, 108, 110, 110, 112, 114, 116, 116)

# Difference between means of observed datasets
diff.observed = mean(b) - mean(a)

# Level of significance
alpha = 0.05

# Number of replicates
n = 1000

# Difference between means of bootstrapped datasets (n replicates)
diff.bootstrap = NULL

for (i in 1 : n) {
	# Sample with replacement
	a.bootstrap = sample  (a, length(a), TRUE)
	b.bootstrap = sample  (b, length(b), TRUE)
	
	diff.bootstrap[i] = mean(b.bootstrap) - mean(a.bootstrap)
}

# Confidence interval
quantile(diff.bootstrap, c(alpha/2, 1 - alpha/2))

See also Permutation Test.

1. Observed Samples

The vertical lines are means of BA and BB. The “observed” difference between the two means is about 4.75.

2. Means of Bootstrapped Samples

3. Difference between Means of Bootstrapped Samples

Distribution of the differences between the bootstrapped datasets. The solid line is the observed difference. The dashed line is the mean of the bootstrapped differences.

4. Confidence Interval and Decision

 

	

5. Bootstrapped and Null Differences

The dark gray is the bootstrapped difference as above. The light gray is distribution of the differences under the null hypothesis, generated by shifting the bootstrapped differences by their mean.

6. P-value

P-value is estimated as the portion of the null (light gray) curve that is equal to or more extreme (the tail) than the “observed” (the solid vertical line) difference.