Analysis of Patterns
Purpose: Answer questions concerning occurrence of a word of finite length (“gaga”) in a long sequence of nucleotides.
Null hypothesis: As usual, DNA sequence result of a series of Bernoulli trials.
Specific questions:
- How frequent is the word in a sequence of length N ?
- What is the distance between successive repeats of the word ?
Counting method: Including overlaps (otherwise difficult)