.werks/8052

   1 Title: Speedup availability queries by new caching (disabled per default)
   2 Level: 3
   3 Edition: cee
   4 Component: livestatus
   5 Version: 1.2.5i4
   6 Date: 1401867265
   7 Class: feature
   8
   9 The Check_MK Micro Core now has an alternative implementation of the
  10 Livestatus table <tt>statehist</tt>. This table is the basis for all
  11 availability computations. In the current implementation, which is still
  12 the only when using the Nagios core, for each query all historic logfiles
  13 that cover the query range have to be evaluated. Despite caching this can
  14 mean an intense effort in CPU and IO usage. If you have a larger number of
  15 hosts and services then a query for a larger time frame could last for minutes.
  16
  17 The new implementation needs to be enabled in the global settings
  18 for the Check_MK Micro Core: <i>In-memory cache for availability data
  19 (experimental)</i>. You also have to configure a time range. This limits how
  20 long into the past you can do availability queries. The default setting is
  21 two years.
  22
  23 During the start of The Core all historic log files for that time ranged are
  24 parsed into a very efficient in-memory database so that future availability
  25 queries do not need any disk IO or logfile parsing. The cache is automatically
  26 updated when new alerts happen. Please also note that The Core is not
  27 restarted during normal operation and activation of changes, so the cache
  28 is just invalidated when you reboot your server or do a software update
  29 of Check_MK.
  30
  31 The parser can process 500.000 messages per second and more, so if your disk
  32 IO is fast enough even parsing a large history does not take longer than
  33 a couple of minutes. This is done in the background and does not prevent
  34 The Core from working or queries from being answered. Even availability
  35 queries are being answered while the cache is still being built up. If the
  36 queried time range is already in the cache then the query can immediately
  37 be processed. Otherwise it waits for the cache to be ready.
  38
  39 When it comes to timeperiod definitions the new implementation has a
  40 different behaviour: It reflects later changes in the definitions of your
  41 timeperiods.  This is conveniant when you want to work with service periods
  42 for your availability queries. The classical implementation evaluates the
  43 <tt>TIMEPERIOD TRANSITION</tt> entries in your logfiles. The new one directly
  44 takes the current definitions into account and computes them for the time
  45 range in the past.
  46
  47 <b>Note:</b> As of today this implemention is still highly <i>experimental</i>
  48 and might not only produce wrong results, but might crash your core.