NOTES

   1
   2
   3 PUTATIVE INSIGHTS (things we think we've learned so far)
   4
   5 - At least for deeper mod searches, evaluation time for real vs synthetic
   6   spectra swamps everything else.  (Generation of synthetic spectra is
   7   noticeable, at about 15%.)  This means that ordering spectra by parent mass
   8   is pointless?!
   9
  10 - We can afford to be a little sloppy in how we generate the comparisons
  11   (as long as we're not generating duplicates, of course).
  12
  13 - The number of leaves at level N is probably about N times more than all of
  14   the previous N-1 levels put together.
  15
  16 - SEQUEST does its FFT step only for a fixed number (500?) of candidate
  17   matches for each spectrum.  If the number of matches explodes with
  18   increasing depth, maybe this implies that only their preliminary scoring
  19   algorithm really matters for mod searches?
  20
  21 - X!Tandem limits modification combinations searched to 2**12 or so.  For
  22   deeper searches they just silently give up.
  23
  24 - The way X!Tandem quantizes peaks leads to noticeable quantization error.
  25
  26 - Myrimatch silently fails if the number of possible mod positions/kinds is
  27   more than 2^31 or 2^64 (depending on sizeof(int)).
  28
  29
  30 ------------------------------------------------------------------------------
  31
  32 One of our goals is to keep things simple and compact.  Here's a recent
  33 comparison against similar programs (generated using David A. Wheeler's
  34 'SLOCCount'):
  35
  36 greylag: cpp:   898 py: 1400 (+336 sh to set up parallel jobs at SIMR)
  37 xtandem: cpp: 13058 (+ 1271 for parallel tandem -> 14329)
  38 omssa:   cpp:  7583 (plus an unknown, possibly large number from the NCBI
  39                      toolkits [33 distinct headers])
  40                     (the toolkits are 1000000 sloc, 65% cpp, 34% c)
  41 myrimatch: cpp:  6534 (not counting expat code)
  42
  43
  44 ------------------------------------------------------------------------------
  45
  46 This is a nice way to print the source code three-up, in a fairly small font,
  47 which makes it easy to study off-line:
  48
  49   enscript -E -B -3 -r -s 0 --borders -fCourier4.8 --mark-wrapped-lines=arrow
  50            --margins=:30::