Detection of long repeats
Purpose: To detect if long runs of a nucleotide (say, A) are “accidental”, i.e., caused by a Bernoulli mechanism.
Technique: Derive probability of maximum repeat length (in a sequence of length N) exceeding given number under Bernoulli trials.
- Derivation:
- Probability of #(repeats) equal to y
- Probability of #(repeats) equal at least to y
- Probability of the maximum run of repeats equal at least to y, given number of runs
- Expected #(failures) (non-A) = #(runs of length ? 0)
- Finally ...