FIXME/drift.org

   1 #+title: Genetic drift
   2 #+seq_todo: TODO | DONE
   3 #+property: cache no
   4
   5 * A simple model of evolution
   6   Evolution is the result of changes in the genetic composition of
   7   populations over time. One of the simplest models of evolution is as
   8   follows. There is a population of N individuals, among which there
   9   are two genetic types: red and blue[fn:1]. Here is the initial
  10   generation of the population (N=10).
  11
  12 #+begin_src ditaa :file drift-1-gen.png :cmdline -r :exports both
  13                   /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+
  14   Generation 1    |cRED| |cBLU| |cBLU| |cBLU| |cRED| |cRED| |cBLU| |cRED| |cRED| |cRED|
  15                   |    | |    | |    | |    | |    | |    | |    | |    | |    | |    |
  16                   +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/
  17 #+end_src
  18
  19 #+results:
  20 [[file:../../../images/babel/drift-1-gen.png]]
  21   There is no mutation, no selection and no sex; the next generation
  22   is made up by randomly choosing 10 individuals from the previous
  23   generation[fn:2]. A single individual can be chosen more than once,
  24   or not at all; the number of times an individual is chosen
  25   corresponds to the number of progeny it has in the next
  26   generation. Even without mutation or natural selection the
  27   proportions of red and blue types will change, because different
  28   individuals will have different numbers of offspring, by chance.
  29
  30   So the first two generations might look like this.
  31
  32 #+begin_src ditaa :file drift-2-gen.png :cmdline -r :exports both
  33                   /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+
  34   Generation 1    |cRED| |cBLU| |cBLU| |cBLU| |cRED| |cRED| |cBLU| |cRED| |cRED| |cRED|
  35                   |    | |    | |    | |    | |    | |    | |    | |    | |    | |    |
  36                   +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/
  37                   /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+ /----+
  38   Generation 2    |cBLU| |cBLU| |cRED| |cRED| |cRED| |cBLU| |cRED| |cRED| |cBLU| |cBLU|
  39                   |    | |    | |    | |    | |    | |    | |    | |    | |    | |    |
  40                   +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/ +----/
  41 #+end_src
  42
  43 #+results:
  44 [[file:../../../images/babel/drift-2-gen.png]]
  45
  46
  47   This is a form of evolution called "genetic drift". It is inevitable,
  48   although if the population is very large it will have less effect.
  49
  50   Let X_t be the number of red individuals in generation t, and let p_t
  51   be the proportion of red individuals. So X_1 is 6 and p_1 is 0.6. To
  52   choose the number of red individuals in generation 2 we make 10
  53   choices, each time having probability 6/10 of getting a red
  54   individual. So X_2 is a /binomial/ random variable, with 10 trials and
  55   success probability 0.6. In general, the random process is described
  56   by the following transition probabilities.
  57
  58 #+begin_src latex :file transprob.png
  59   \begin{equation}
  60   \Pr(X_t=j|X_{t-1}=i) = \frac{j(j-1)}{2}\Big(\frac{i}{N}\Big)^j\Big(\frac{N-i}{N}\Big)^{n-j}
  61   \end{equation}
  62 #+end_src
  63
  64 #+results:
  65 [[file:../../../images/babel/transprob.png]]
  66
  67   We can simulate the evolution over many generations in R. This code
  68   simulates the change in frequency in a single population over 100
  69   generations. We'll make the population larger (N=1000) but still
  70   start off with 60% red individuals.
  71
  72 #+source: simpledrift(N=1000, X1=600, ngens=100)
  73 #+begin_src R :file simpledrift.png :exports code
  74   p <- numeric(ngens)
  75   p[1] <- X1/N
  76   for(g in 2:ngens)
  77       p[g] <- rbinom(1, size=N, prob=p[g-1]) / N
  78   plot(p, type="l", ylim=c(0,1), xlab="Generation", ylab="Proportion red")
  79 #+end_src
  80
  81 #+results[03beb832ebe2136388baae04b9f9e699af5d0426]: simpledrift
  82 [[file:../../../images/babel/simpledrift.png]]
  83
  84   But how variable is this process? To answer this we need to repeat
  85   the simulation many times (i.e. simulate many identical but
  86   independent populations). We could do that as follows
  87
  88 #+begin_src R :session t
  89   drift.slow <- function(N, X1, ngens, nreps) {
  90       p <- matrix(NA, nrow=ngens, ncol=nreps)
  91       p[1,] <- X1/N
  92       for(rep in 1:nreps) {
  93           for(g in 2:ngens)
  94               p[g,rep] <- rbinom(1, size=N, prob=p[g-1,rep]) / N
  95       }
  96       p
  97   }
  98 #+end_src
  99
 100   But this is not a good implementation. One should make use of
 101   "vectorisation", which makes the simulation much more efficient when
 102   there are many replicates[fn:3]. Note the way that rbinom simulates
 103   all replicates at once, but still one generation at a time.
 104
 105 #+begin_src R :session t
 106   drift.faster <- function(N, X1, ngens, nreps) {
 107       p <- matrix(NA, nrow=ngens, ncol=nreps)
 108       p[1,] <- X1/N
 109       for(gen in 2:ngens)
 110           p[gen,] <- rbinom(n=nreps, size=N, prob=p[gen-1,]) / N
 111       p
 112   }
 113 #+end_src
 114
 115   To run the simulation:
 116
 117 #+source: drift(N=1000, X1=600, nreps=10, ngens=100)
 118 #+begin_src R :session t :file repdrift.png :exports code
 119   p <- drift.faster(N, X1, ngens, nreps)
 120   matplot(p, type="l", ylim=c(0,1), lty=1)
 121 #+end_src
 122
 123 #+results[685ae7b4150a9413db180d2917384052ec288ab5]: drift
 124 [[file:../../../images/babel/repdrift.png]]
 125   And let's quickly see how much of a speed difference the vectorisation
 126   makes.
 127
 128 #+source: compare-times(N=1000, X1=600, nreps=1000, ngens=100)
 129 #+begin_src R :session t :colnames t :results output
 130   functions <- c(drift.slow=drift.slow, drift.faster=drift.faster)
 131   times <- sapply(functions, function(f) as.numeric(system.time(f(N, X1, ngens, nreps))[1]))
 132   print(times)
 133   cat(sprintf("\nFactor speed-up = %.1f\n", times[1] / times[2]))
 134 #+end_src
 135
 136 #+results[ba4b29e0bf6cc6da506361b76253285f7eab31a9]: compare-times
 137   :   drift.slow drift.faster
 138   :        6.064        0.204
 139   :
 140   : Factor speed-up = 29.7
 141
 142 * Footnotes
 143
 144   [fn:1] Every individual is chacterised by a single type; no sex,
 145   recombination, mutation, selection, etc.
 146
 147   [fn:2] All members of the previous generation die as the next
 148   generation is formed.
 149
 150   [fn:3] Note that we can't vectorise the entire simulation because
 151   drift is a Markov process.
 152
 153 * Config                                                           :noexport:
 154 #+options: author:nil date:nil num:nil toc:nil
 155 #+latex_header: \usepackage{amsmath}
 156 #+latex_header: \usepackage[left=2cm,top=2cm,right=3cm,head=2cm,foot=2cm]{geometry}
 157 #+latex_header: \newcommand{\Pr}{\text{Pr}}
 158 #+latex_header: \newcommand{\pipe}{\arrowvert}
 159
 160 # Local Variables:
 161 # org-export-latex-image-default-option: "width=30em"
 162 # End:
 163 * Tasks                                                            :noexport:
 164 *** TODO How do we put titles on figures?
 165 *** TODO Connect daughters to parents with lines
 166