# Permutation Test

## Steps

• Compute difference between means of two conditions observed difference
• Combine two conditions into one dataset (to break the association i.e., Ho)
• Repeat the following two steps for a large number of times (e.g., 1,000)
• Sample two datasets from combined dataset without replacement
• Compute difference between means of two sampled (permuted) datasets
• Compute fraction of how many times permuted (i.e., null) differences ≥ observed difference out of total number of permutations p-value

## Data Description

(Adopted from an example by Śaunak Sen)

Systolic blood pressure was measured in progeny from a backcross between two mouse strains. 50 (randomly chosen) mice were genotyped at the `D4Mit214` marker. We want to detect association between the `D4Mit214` marker genotype and blood pressure. The values show the systolic blood pressure (in mm of Hg) by the marker genotype, `BA` (heterozygous) or `BB` (homozygous).

## R code

This is the bare R code with number of permutations = `1,000`.

```# Heterozygous (BA)
a = c(86, 88, 89, 89, 92, 93, 94, 94, 94, 95, 95, 96, 96, 97, 97, 98, 98, 99, 99, 101, 106, 107, 110, 113, 116, 118)

# Homozygous (BB)
b = c(89, 90, 92, 93, 93, 96, 99, 99, 99, 102, 103, 104, 105, 106, 106, 107, 108, 108, 110, 110, 112, 114, 116, 116)

# Combine the two datasets into a single dataset
# i.e., under the null hypothesis, there is no difference between the two groups
combined = c(a,b)

# Observed difference
diff.observed = mean(b) - mean(a)

number_of_permutations = 1000

diff.random = NULL
for (i in 1 : number_of_permutations) {

# Sample from the combined dataset without replacement
shuffled = sample (combined, length(combined))

a.random = shuffled[1 : length(a)]
b.random = shuffled[(length(a) + 1) : length(combined)]

# Null (permuated) difference
diff.random[i] = mean(b.random) - mean(a.random)
}

# P-value is the fraction of how many times the permuted difference is equal or more extreme than the observed difference

pvalue = sum(abs(diff.random) >= abs(diff.observed)) / number_of_permutations
print (pvalue)
```

## 1. Observed values

The original dataset. The vertical lines are means of `BA` and `BB`. The “observed” difference between the two means is about `4.75`.