writeup/fp.tex

   1 %\documentclass[a4paper]{article}[12pt]
   2 %\usepackage{graphicx}
   3 %\oddsidemargin 0.0in
   4 %\textwidth 6.5in
   5 %\evensidemargin 1.0in
   6 %\textheight 9.0in
   7 \documentclass[11pt,letterpaper]{article}
   8
   9 \usepackage{amsmath} % just math
  10 \usepackage{graphicx}
  11 \usepackage[left=1in,top=1in,right=1in,bottom=1in,nohead]{geometry}
  12
  13 \title{6.867 Final Project}
  14 \author{John Lai and Jean Yang}
  15 \begin{document}
  16 \maketitle
  17 \section*{The K Markov Chains Model}
  18 We first define the standard Markov model:
  19 \begin{eqnarray*}
  20 p(x_1, \ldots, x_m) = t(x_1) \prod_{i=2}^m t(x_i | x_{i-1})
  21 \end{eqnarray*}
  22 If we use a mixture of $k$ Markov models, then we have:
  23 \begin{eqnarray*}
  24 p(x_1, \ldots, x_m) = \sum_{z=1}^k q(z) \left(t_z(x_1) \prod_{i=2}^m t_z(x_i | x_{i-1})\right), \sum_{z=1}^k q(z) = 1
  25 \end{eqnarray*}
  26 \subsection{Maximum Likelihood Estimation}
  27 Before developing the EM algorithm for the K Markov Chains model, we first take a look at maximum likelihood estimation in the case where we observe exactly the underlying Markov Chain that our observations were drawn from.  In this case, simlar to GMMs, the maximum likelihood estimates for the parameters in our model are simple.  Suppose that we have $n$ training sequence and define $\delta(z, i)$ to be 1 iff sample i was drawn from chain $z$.  The maximum likelihood estimates for our parameters will be:
  28 \begin{eqnarray*}
  29 n(z) & = & \sum_{i=1}^n \delta(z, i)\\
  30 q^*(z) & = & \frac{n(z)}{n}\\
  31 t_z^*(x) & = & \frac{\sum_{i=1}^n \delta(z, i) [[ x_{i1} == x]]}{\sum_{i=1}^n \delta(z, i)}\\
  32 t_z^*(x_i | x_j) & =&  \frac{\sum_{i=1}^n \sum_{j=1}^{|\underline{x}_i| - 1} \delta(z, i) [[ x_{ij} == x_j \wedge x_{i,j+1} == x_i]]}{\sum_{i=1}^n \sum_{j=1}^{|\underline{x}_i| - 1} \delta(z, i) [[ x_{ij} == x_j]]}
  33 \end{eqnarray*}
  34 Now that we have the estimates for the case where we know which Markov Chain each sequence was generated from, we can extend this to the case where we have a probability distribution over the Markov Chains that could have generated the sequence.  We can keep all of our estimates the same, except that we need to change the way the $\delta(z, i)$ are computed.  Rather than being strictly 0 or 1, $\delta(z, i)$ now represents a normalized probability that example $i$ was generated from the $z^{th}$ Markov Chain.  Here is how we compute $\delta(z, i)$:
  35 \begin{eqnarray*}
  36 \delta(z, i) = \frac{q(z) t_z(x_{i1}) \prod_{j=2}^{|\underline{x}_i|} t(x_{ij} | x_{i,j-1})}{\sum_z q(z) t_z(x_{i1}) \prod_{j=2}^{|\underline{x}_i|} t_z(x_{ij} | x_{i,j-1})}
  37 \end{eqnarray*}
  38 \end{document}