Permutation Test

Interactive demonstration of hypothesis testing with permutation test in R

Steps

  • Compute difference between means of two conditions observed difference
  • Combine two conditions into one dataset (to break the association i.e., Ho)
  • Repeat the following two steps for a large number of times (e.g., 1,000)
    • Sample two datasets from combined dataset without replacement
    • Compute difference between means of two sampled (permuted) datasets
  • Compute fraction of how many times permuted (i.e., null) differences ≥ observed difference out of total number of permutations p-value

Data Description

(Adopted from an example by Śaunak Sen)

Systolic blood pressure was measured in progeny from a backcross between two mouse strains. 50 (randomly chosen) mice were genotyped at the D4Mit214 marker. We want to detect association between the D4Mit214 marker genotype and blood pressure. The values show the systolic blood pressure (in mm of Hg) by the marker genotype, BA (heterozygous) or BB (homozygous).

R code

This is the bare R code with number of permutations = 1,000.

# Heterozygous (BA)
a = c(86, 88, 89, 89, 92, 93, 94, 94, 94, 95, 95, 96, 96, 97, 97, 98, 98, 99, 99, 101, 106, 107, 110, 113, 116, 118)
 	
# Homozygous (BB)
b = c(89, 90, 92, 93, 93, 96, 99, 99, 99, 102, 103, 104, 105, 106, 106, 107, 108, 108, 110, 110, 112, 114, 116, 116)

# Combine the two datasets into a single dataset
# i.e., under the null hypothesis, there is no difference between the two groups
combined = c(a,b)

# Observed difference
diff.observed = mean(b) - mean(a)

number_of_permutations = 1000
	
diff.random = NULL
for (i in 1 : number_of_permutations) {

	# Sample from the combined dataset without replacement
	shuffled = sample (combined, length(combined))
	
	a.random = shuffled[1 : length(a)]
	b.random = shuffled[(length(a) + 1) : length(combined)]

	# Null (permuated) difference
	diff.random[i] = mean(b.random) - mean(a.random)
}

# P-value is the fraction of how many times the permuted difference is equal or more extreme than the observed difference

pvalue = sum(abs(diff.random) >= abs(diff.observed)) / number_of_permutations
print (pvalue)

See also Bootstrap Resampling.

Parameters


1. Observed values

The original dataset. The vertical lines are means of BA and BB. The “observed” difference between the two means is about 4.75.

2. Means of permuted datasets

3. Differences between means of permuted datasets

4. P-value and Decision