2 #+seq_todo: TODO | DONE
5 * A simple model of evolution
6 Evolution is the result of changes in the genetic composition of
7 populations over time. One of the simplest models of evolution is as
8 follows. There is a population of N individuals, among which there
9 are two genetic types: red and blue[fn:1]. Here is the initial
10 generation of the population (N=10).
12 #+begin_src ditaa :file drift-1-gen.png :cmdline -r :exports both
13 /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+
14 Generation 1 |cRED| |cBLU| |cBLU| |cBLU| |cRED| |cRED| |cBLU| |cRED| |cRED| |cRED|
15 | | | | | | | | | | | | | | | | | | | |
16 +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/
20 [[file:../../../images/babel/drift-1-gen.png]]
21 There is no mutation, no selection and no sex; the next generation
22 is made up by randomly choosing 10 individuals from the previous
23 generation[fn:2]. A single individual can be chosen more than once,
24 or not at all; the number of times an individual is chosen
25 corresponds to the number of progeny it has in the next
26 generation. Even without mutation or natural selection the
27 proportions of red and blue types will change, because different
28 individuals will have different numbers of offspring, by chance.
30 So the first two generations might look like this.
32 #+begin_src ditaa :file drift-2-gen.png :cmdline -r :exports both
33 /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+
34 Generation 1 |cRED| |cBLU| |cBLU| |cBLU| |cRED| |cRED| |cBLU| |cRED| |cRED| |cRED|
35 | | | | | | | | | | | | | | | | | | | |
36 +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/
37 /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+
38 Generation 2 |cBLU| |cBLU| |cRED| |cRED| |cRED| |cBLU| |cRED| |cRED| |cBLU| |cBLU|
39 | | | | | | | | | | | | | | | | | | | |
40 +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/
44 [[file:../../../images/babel/drift-2-gen.png]]
47 This is a form of evolution called "genetic drift". It is inevitable,
48 although if the population is very large it will have less effect.
50 Let X_t be the number of red individuals in generation t, and let p_t
51 be the proportion of red individuals. So X_1 is 6 and p_1 is 0.6. To
52 choose the number of red individuals in generation 2 we make 10
53 choices, each time having probability 6/10 of getting a red
54 individual. So X_2 is a /binomial/ random variable, with 10 trials and
55 success probability 0.6. In general, the random process is described
56 by the following transition probabilities.
58 #+begin_src latex :file transprob.png
60 \Pr(X_t=j|X_{t-1}=i) = \frac{j(j-1)}{2}\Big(\frac{i}{N}\Big)^j\Big(\frac{N-i}{N}\Big)^{n-j}
65 [[file:../../../images/babel/transprob.png]]
67 We can simulate the evolution over many generations in R. This code
68 simulates the change in frequency in a single population over 100
69 generations. We'll make the population larger (N=1000) but still
70 start off with 60% red individuals.
72 #+source: simpledrift(N=1000, X1=600, ngens=100)
73 #+begin_src R :file simpledrift.png :exports code
77 p[g] <- rbinom(1, size=N, prob=p[g-1]) / N
78 plot(p, type="l", ylim=c(0,1), xlab="Generation", ylab="Proportion red")
81 #+results[03beb832ebe2136388baae04b9f9e699af5d0426]: simpledrift
82 [[file:../../../images/babel/simpledrift.png]]
84 But how variable is this process? To answer this we need to repeat
85 the simulation many times (i.e. simulate many identical but
86 independent populations). We could do that as follows
88 #+begin_src R :session t
89 drift.slow <- function(N, X1, ngens, nreps) {
90 p <- matrix(NA, nrow=ngens, ncol=nreps)
94 p[g,rep] <- rbinom(1, size=N, prob=p[g-1,rep]) / N
100 But this is not a good implementation. One should make use of
101 "vectorisation", which makes the simulation much more efficient when
102 there are many replicates[fn:3]. Note the way that rbinom simulates
103 all replicates at once, but still one generation at a time.
105 #+begin_src R :session t
106 drift.faster <- function(N, X1, ngens, nreps) {
107 p <- matrix(NA, nrow=ngens, ncol=nreps)
110 p[gen,] <- rbinom(n=nreps, size=N, prob=p[gen-1,]) / N
115 To run the simulation:
117 #+source: drift(N=1000, X1=600, nreps=10, ngens=100)
118 #+begin_src R :session t :file repdrift.png :exports code
119 p <- drift.faster(N, X1, ngens, nreps)
120 matplot(p, type="l", ylim=c(0,1), lty=1)
123 #+results[685ae7b4150a9413db180d2917384052ec288ab5]: drift
124 [[file:../../../images/babel/repdrift.png]]
125 And let's quickly see how much of a speed difference the vectorisation
128 #+source: compare-times(N=1000, X1=600, nreps=1000, ngens=100)
129 #+begin_src R :session t :colnames t :results output
130 functions <- c(drift.slow=drift.slow, drift.faster=drift.faster)
131 times <- sapply(functions, function(f) as.numeric(system.time(f(N, X1, ngens, nreps))[1]))
133 cat(sprintf("\nFactor speed-up = %.1f\n", times[1] / times[2]))
136 #+results[ba4b29e0bf6cc6da506361b76253285f7eab31a9]: compare-times
137 : drift.slow drift.faster
140 : Factor speed-up = 29.7
144 [fn:1] Every individual is chacterised by a single type; no sex,
145 recombination, mutation, selection, etc.
147 [fn:2] All members of the previous generation die as the next
148 generation is formed.
150 [fn:3] Note that we can't vectorise the entire simulation because
151 drift is a Markov process.
154 #+options: author:nil date:nil num:nil toc:nil
155 #+latex_header: \usepackage{amsmath}
156 #+latex_header: \usepackage[left=2cm,top=2cm,right=3cm,head=2cm,foot=2cm]{geometry}
157 #+latex_header: \newcommand{\Pr}{\text{Pr}}
158 #+latex_header: \newcommand{\pipe}{\arrowvert}
161 # org-export-latex-image-default-option: "width=30em"
164 *** TODO How do we put titles on figures?
165 *** TODO Connect daughters to parents with lines