development/lazy_lyxtext.txt

   1
   2      Lazy Generation of LyXText
   3      --------------------------
   4
   5      Currently we generate the Row structure at the time of LyXText
   6      creation. This is the "formatting" period of the document
   7      loading. As can be seen this has a significant hit on the amount
   8      of time used to load large documents. On my computer this
   9      manifests itself as a 15 second load time of the UserGuide. In
  10      many ways this is acceptable, but we actually get the same hit
  11      when switching between documents. Because of this the TextCache
  12      was created, however it seems that textcache has some internal
  13      problems or it makes problems in LyXText show.
  14
  15      My suggestion is to make LyXText only generate Rows as need, i.e.
  16      lazy evaluation/generation of the on screen formatting. This will
  17      give us: blinding fast document loading. It also lessens the need
  18      for the TextCache class significantly and with some small further
  19      alterations TextCache can be removed completely.
  20
  21      The Rows are currently stored in a home made double linked list.
  22      To achieve the lazy evaluation this should be changed slightly:
  23
  24      struct RowPar {
  25             // Pointer to the paragraph that this rowpar holds rows for.
  26             LyXParagraph * par;
  27             // Only used for lazy evaluation when switching between buffers.
  28             unsigned int par_y;
  29             typedef vector<Row> Rows;
  30             // The actual rows, generated as needed.
  31             Rows rows;
  32         };
  33
  34      struct Row {
  35             LyXParagraph::size_type pos;
  36             mutable int fill;
  37             unsigned short height;
  38             unsigned short ascent_of_text;
  39             unsigned baseline;
  40         };
  41
  42      typedef list<RowPar> ParRowList;
  43      ParRowList parrow;
  44
  45      Several LyXText methods needs to be changed to handle this new
  46      structure.
  47
  48      Note about RowPar::par_y: If we only want top-down¹ this variable
  49      is not needed. If we however want to be able to begin evaluation
  50      at the location of the cursor it is needed. Then we don't have to
  51      store the formatted document in the textcache. Only a list of
  52
  53      struct Cache {
  54             LyXParagraph * par;
  55             unsigned int par_y;
  56      };
  57
  58      is needed. Since we regenerate all the rows subtleties with row
  59      lifetime etc. are avoided.
  60
  61      Memory usage: the current Row has a size of about 28 bytes on a
  62      32 bit machine and we have one Row for each line on the screen.
  63      For a 10 page document with ca. 50 lines on each page this is
  64      28 * 10 * 50 = 14000 bytes.
  65      The approach above has three pointer less in each row, so
  66      this is 16 * 10 * 50 = 8000 bytes, we have however some
  67      additional overhead: the RowPars one for each paragraph if we
  68      assume 5 lines per paragraph and 20 bytes for each RowPar we get:
  69      10 * 50 / 5 * 20 = 2000. This is a sum of 10000 on the new scheme.
  70
  71      Speed: some operations will most likely be somewhat faster since
  72      they now can operate on the number of paragraph instead of the
  73      number of rows. I do not foresee any operations becoming slower.
  74      The actual lazy evaluation will be done upon displaying, and when
  75      the lyx process is idle (idle_callback), this will mostly be
  76      initiated by LyXScreen and done in LyXText::GetRowNearY and in
  77      LyXText::GetRow. Perhaps also in LyXText::GetVisibleRow. All of
  78      the evaluation will be done on a per-paragraph basis.
  79
  80      If we want to operate on rows only it is easy to create iterators
  81      that can handle this. (This means that a way to force the
  82      evaluation of the rows might be something like:
  83
  84                 // traverse all the rows and generate the missing ones.
  85                 for_each(rowlist.begin(), rowlist.end(), dummy_func);
  86
  87      Note about the buffer structure in the lyx repository: LyXText
  88      really is orthogonal to the new buffer structure in that the
  89      BufferStructure is used for storage of the paragraphs, and
  90      LyXText is used for the caching of on-screen display (of
  91      lines/rows).
  92
  93      Note on impact on other code: Since the current rowlist is
  94      internal to LyXText the impact on code outside LyXText will be
  95      very small. However _inside_ LyXText the impact will be huge, and
  96      all methods that access the rowlist will have to change.
  97
  98      When to begin coding on this. Of course since this is something
  99      that I want to do, I'd like to begin right away. There can be
 100      argued that this should wait until table and floats has been
 101      moved out of LyXText, but I only partially agree. These changes
 102      will be on a fairly low-level and the removal of table and floats
 103      will not impact the development of this code much. Also this can
 104      very well be done in a separate branch in cvs. Anyway this coding
 105      will not begin until after 1.1.5 is released.
 106
 107      Fun stuff: Instead of doing the lazy evaluation only when needed
 108      or by the idle callback this could also be done by a separate
 109      thread taking advantage of SMP and/or io-blocking in the main
 110      thread. We could even have the evaluation of each paragraph we
 111      done in its own thread (or by a few² worker threads)
 112
 113
 114          Lgb
 115
 116 ¹ Row evaluation from the beginning of the document to the end. Never
 117 from the middle.
 118
 119 ² Number of CPUs f.ex.