vcl/README.scheduler

   1 = Introduction =
   2
   3 The VCL scheduler handles LOs primary event queue. It is simple by design,
   4 currently just a single-linked list, processed in list-order by priority
   5 using round-robin for reoccurring tasks.
   6
   7 The scheduler has the following behaviour:
   8
   9 B.1. Tasks are scheduled just priority based
  10 B.2. Implicitly cooperative AKA non-preemptive
  11 B.3. It's not "fair" in any way (a consequence of B.2)
  12 B.4. Tasks are handled round-robin (per priority)
  13 B.5. Higher priorities have lower values
  14 B.6. A small set of priorities instead of an flexible value AKA int
  15
  16 There are some consequences due to this design.
  17
  18 C.1. Higher priority tasks starve lower priority tasks
  19      As long as a higher task is available, lower tasks are never run!
  20      See Anti-pattern.
  21
  22 C.2. Tasks should be split into sensible blocks
  23      If this can't really be done, process pending tasks by calling
  24      Application::Reschedule(). Or use a thread.
  25
  26 C.3. This is not an OS scheduler
  27      There is no real way to "fix" B.2. and B.3.
  28      If you need to do a preemptive task, use a thread!
  29      Otherwise make your task suspendable.
  30
  31
  32 = Driving the scheduler AKA the system timer =
  33
  34   1. There is just one system timer, which drives LO event loop
  35   2. The timer has to run in the main window thread
  36   3. The scheduler is run with the Solar mutex acquired
  37   4. The system timer is a single-shot timer
  38   5. The scheduler system event / message has a low system priority.
  39      All system events should have a higher priority.
  40
  41 Every time a task is started, the scheduler timer is adjusted. When the timer
  42 fires, it posts an event to the system message queue. If the next most
  43 important task is an Idle (AKA instant, 0ms timeout), the event is pushed to
  44 the back of the queue, so we don't starve system messages, otherwise to the
  45 front.
  46
  47 Every time the scheduler is invoked it searches for the next task to process,
  48 restarts the timer with the timeout for the next event and then invokes the
  49 task. After invoking the task and if the task is still active, it is pushed
  50 to the end of the queue and the timeout is eventually adjusted.
  51
  52
  53 = Locking =
  54
  55 The locking is quite primitive: all interaction with internal Scheduler
  56 structures are locked. This includes the ImplSchedulerContext and the
  57 Task::mpSchedulerData, which is actually a part of the scheduler.
  58 Before invoking the task, we have to release the lock, so others can
  59 Start new Tasks.
  60
  61
  62 = Lifecycle / thread-safety of Scheduler-based objects =
  63
  64 A scheduler object it thread-safe in the way, that it can be associated to
  65 any thread and any thread is free to call any functions on it. The owner must
  66 guarantee that the Invoke() function can be called, while the Scheduler object
  67 exists / is not disposed.
  68
  69
  70 = Anti-pattern: Dependencies via (fine grained) priorities =
  71
  72 "Idle 1" should run before "Idle 2", therefore give "Idle 1" a higher priority
  73 then "Idle 2". This just works correct for low frequency idles, but otherwise
  74 always breaks!
  75
  76 If you have some longer work - even if it can be split by into schedulable,
  77 smaller blocks - you normally don't want to schedule it with a non-default
  78 priority, as it starves all lower priority tasks. Even if a block was processed
  79 in "Idle 1", it is scheduled with the same (higher) priority again. Changing
  80 the "Idle" to a "Timer" also won't work, as this breaks the dependency.
  81
  82 What is needed is task based dependency handling, so if "Task 1" is done, it
  83 has to start "Task 2" and if "Task 1" is started again, it has to stop
  84 "Task 2". This currently has to be done by the implementor, but this feature
  85 can be added to the scheduler reasonably.
  86
  87
  88 = Implementation details =
  89
  90 == General: event priority for DoYield ==
  91
  92 There are three types of events, with different priority:
  93
  94 1. LO user events
  95 2. System events
  96 3. LO Scheduler event
  97
  98 They should be processed according to the following code:
  99
 100 bool DoYield( bool bWait, bool bAllCurrent )
 101 {
 102     bool bWasEvent = ProcessUserEvents( bAllCurrent );
 103     if ( !bAllCurrent && bWasEvent )
 104         return true;
 105     bWasEvent = ProcessSystemEvents( bAllCurrent, &bWasSchedulerEvent ) || bWasEvent;
 106     if ( !bWasSchedulerEvent && IsSchedulerEvent() )
 107     {
 108         ProcessSchedulerEvent()
 109         bWasEvent = true;
 110     }
 111     if ( !bWasEvent && bWait )
 112     {
 113         WaitForSystemEvents();
 114         bWasEvent = true;
 115     }
 116     return bWasEvent;
 117 }
 118
 119 == General: main thread deferral ==
 120
 121 In almost all VCL backends, we run main thread deferrals by disabling the
 122 SolarMutex using a boolean. In the case of the redirect, this makes
 123 tryToAcquire and doAcquire return true or 1, while a release is ignored.
 124 Also the IsCurrentThread() mutex check function will act accordingly, so all
 125 the DBG_TESTSOLARMUTEX won't fail.
 126
 127 Since we just disable the locks when we start running the deferred code in the
 128 main thread, we won't let the main thread run into stuff, where it would
 129 normally wait for the SolarMutex.
 130
 131 Eventually this will move into the SolarMutex. KDE / Qt also does main
 132 thread redirects using Qt::BlockingQueuedConnection.
 133
 134 == General: processing all current events for DoYield ==
 135
 136 This is easily implemented for all non-priority queue based implementations.
 137 Windows and macOS both have a timestamp attached to their events / messages,
 138 so simply get the current time and just process anything < timestamp.
 139 For the KDE backend this is already the default behaviour - single event
 140 processing isn't even supported. The headless backend accomplishes this by
 141 just processing a copy of the list of current events.
 142
 143 Problematic in this regard is the Gtk+ backend. g_main_context_iteration
 144 dispatches "only those highest priority event sources". There is no real way
 145 to tell, when these became ready. I've added a workaround idea to the TODO
 146 list. FWIW: Qt runs just a single timer source in the glib main context,
 147 basically the same we're doing with the LO scheduler as a system event.
 148
 149 The gen X11 backend has some levels of redirection, but needs quite some work
 150 to get this fixed.
 151
 152 == General: non-main thread yield ==
 153
 154 Yielding from a non-main thread must not wait in the main thread, as this
 155 may block the main thread until some events happen.
 156
 157 Currently we wait on an extra conditional, which is cleared by the main event
 158 loop.
 159
 160 == General: invalidation of elapsed timer event messages ==
 161
 162 Since the system timer to run the scheduler is single-shot, there should never
 163 be more than one elapsed timer event in system event queue. When stopping or
 164 restarting the timer, we eventually have to remove the now invalid event from
 165 the queue.
 166
 167 But for the Windows and macOS backends this may fail as they have delayed
 168 posting of events, so a consecutive remove after a post will actually yield no
 169 remove. On Windows we even get unwanted processing of events outside of the
 170 main event loop, which may call the Scheduler, as timer management is handled
 171 in critical scheduler code.
 172
 173 To prevent these problems, we don't even try to remove these events, but
 174 invalidate them by versioning the timer events. Timer events with invalid
 175 versions are processed but simply don't run the scheduler.
 176
 177 == General: track time of long running tasks ==
 178
 179 There is TaskStopwatch class. It'll track the time and report a timeout either
 180 when the tasks time slice is finished or some system event did occur.
 181
 182 Eventually it will be merged into the main scheduler, so each invoked task can
 183 easily track it's runtime and eventually this can be used to "blame" / find
 184 other long running tasks, so interactivity can be improved.
 185
 186 There were some questions coming up when implementing it:
 187
 188 === Why does the scheduler not detect that we only have idle tasks pending,
 189 and skip the instant timeout? ===
 190
 191 You never know how long a task will run. Currently the scheduler simply asks
 192 each task when it'll be ready to run, until two runnable tasks are found.
 193 Normally this is very quick, as LO has a lot of one-shot instant tasks / Idles
 194 and just a very few long term pending Timers.
 195
 196 Especially UNO calls add a lot of Idles to the task list, which just need to
 197 be processed in order.
 198
 199 === Why not use things like Linux timer wheels? ===
 200
 201 LO has relatively few timers and a lot one-shot Idles. 99% of time the search
 202 for the next task is quick, because there are just ~5 long term timers per
 203 document (cache invalidation, cursor blinking etc.).
 204
 205 This might become a problem, if you have a lot of open documents, so the long
 206 term timer list increases AKA for highly loaded LOOL instances.
 207
 208 But the Linux timer wheel mainly relies on the facts that the OS timers are
 209 expected to not expire, as they are use to catch "error" timeouts, which rarely
 210 happen, so this definitely not matches LO's usage.
 211
 212 === Not really usable to find misbehaving tasks ===
 213
 214 The TaskStopwatch class is just a little time keeper + detecting of input
 215 events. This is not about misbehaving Tasks, but long running tasks, which
 216 have to yield to the Scheduler, so other Tasks and System events can be
 217 processed.
 218
 219 There is the TODO to merge the functionality into the Scheduler itself, at
 220 which point we can think about profiling individual Tasks to improve
 221 interactivity.
 222
 223 == macOS implementation details ==
 224
 225 Generally the Scheduler is handled as expected, except on resize, which is
 226 handled with different runloop-modes in macOS. In case of a resize, the normal
 227 runloop is suspended in sendEvent, so we can't call the scheduler via posted
 228 main loop-events. Instead the scheduler uses the timer again.
 229
 230 Like the Windows backend, all Cocoa / GUI handling also has to be run in
 231 the main thread. We're emulating Windows out-of-order PeekMessage processing,
 232 via a YieldWakeupEvent and two conditionals. When in a RUNINMAIN call, all
 233 the DBG_TESTSOLARMUTEX calls are disabled, as we can't release the SolarMutex,
 234 but we can prevent running any other SolarMutex based code. Those wakeup
 235 events must be ignored to prevent busy-locks. For more info read the "General:
 236 main thread deferral" section.
 237
 238 We can neither rely on macOS dispatch_sync code block execution nor the
 239 message handling, as both can't be prioritized or filtered and the first
 240 does also not allow nested execution and is just processed in sequence.
 241
 242 There is also a workaround for a problem for pushing tasks to an empty queue,
 243 as [NSApp postEvent: ... atStart: NO] doesn't append the event, if the
 244 message queue is empty.
 245
 246 An additional problem is the filtering of events on Window close. This drops
 247 posted timer events, when a Window is closed resulting in a busy DoYield loop,
 248 so we have to re-post the event, after closing a window.
 249
 250 == Windows implementation details ==
 251
 252 Posted or sent event messages often trigger processing of WndProc in
 253 PeekMessage, GetMessage or DispatchMessage, independently from the message to
 254 fetch, remove or dispatch ("During this call, the system delivers pending,
 255 nonqueued messages..."). Additionally messages have an inherited priority
 256 based on the function used to generate them. Even if WM_TIMER messages should
 257 have the lowest priority, a manually posted WM_TIMER is processed with the
 258 priority of a PostMessage message.
 259
 260 So we're giving up on processing all our Scheduler events as a message in the
 261 system message loop. Instead we just indicate a 0ms timer message by setting
 262 the m_bDirectTimeout in the timer object. This timer is always processed, if
 263 the system message wasn't already our timer. As a result we can also skip the
 264 polling. All this is one more reason to drop the single message processing
 265 in favour of always processing all pending (system) events.
 266
 267 There is another special case, we have to handle: window updates during move
 268 and resize of windows. These system actions run in their own nested message
 269 loop. So we have to completely switch to timers, even for 0ms. But these
 270 posted events prevent any event processing, while we're busy. The only viable
 271 solution seems to be to switch to WM_TIMER based timers, as these generate
 272 messages with the lowest system priority (but they don't allow 0ms timeouts).
 273 So processing slows down during resize and move, but we gain working painting,
 274 even when busy.
 275
 276 An additional workaround is implemented for the delayed queuing of posted
 277 messages, where PeekMessage in WinSalTimer::Stop() won't be able remove the
 278 just posted timer callback message. See "General: invalidation of elapsed
 279 timer event messages" for the details.
 280
 281 To run the required GUI code in the main thread without unlocking the
 282 SolarMutex, we "disable" it. For more infos read the "General: main thread
 283 deferral" section.
 284
 285 == KDE implementation details ==
 286
 287 This implementation also works as intended. But there is a different Yield
 288 handling, because Qts QAbstractEventDispatcher::processEvents will always
 289 process all pending events.
 290
 291
 292 = TODOs and ideas =
 293
 294 == Task dependencies AKA children ==
 295
 296 Every task can have a list of children / a child.
 297
 298  * When a task is stopped, the children are started.
 299  * When a task is started, the children are stopped.
 300
 301 This should be easy to implement.
 302
 303 == Per priority time-sorted queues ==
 304
 305 This would result in O(1) scheduler. It was used in the Linux kernel for some
 306 time (search Ingo Molnar's O(1) scheduler). This can be a scheduling
 307 optimization, which would prevent walking longer event list. But probably the
 308 management overhead would be too large, as we have many one-shot events.
 309
 310 To find the next task the scheduler just walks the (constant) list of priority
 311 queues and schedules the first ready event of any queue.
 312
 313 The downside of this approach: Insert / Start / Reschedule(for "auto" tasks)
 314 now need O(log(n)) to find the position in the queue of the priority.
 315
 316 == Always process all (higher priority) pending events ==
 317
 318 Currently Application::Reschedule() processes a single event or "all" events,
 319 with "all" defined as "100 events" in most backends. This already is ignored
 320 by the KDE backend, as Qt defines its QAbstractEventDispatcher::processEvents
 321 processing all pending events (there are ways to skip event classes, but no
 322 easy way to process just a single event).
 323
 324 Since the Scheduler is always handled by the system message queue, there is
 325 really no more reasoning to stop after 100 events to prevent LO Scheduler
 326 starvation.
 327
 328 == Drop static inherited or composed Task objects ==
 329
 330 The sequence of destruction of static objects is not defined. So a static Task
 331 can not be guaranteed to happen before the Scheduler. When dynamic unloading
 332 is involved, this becomes an even worse problem. This way we could drop the
 333 mbStatic workaround from the Task class.
 334
 335 == Run the LO application in its own thread ==
 336
 337 This would probably get rid of most of the macOS and Windows implementation
 338 details / workarounds, but is quite probably a large amount of work.
 339
 340 Instead of LO running in the main process / thread, we run it in a 2nd thread
 341 and defer al GUI calls to the main thread. This way it'll hopefully not block
 342 and can process system events.
 343
 344 That's just a theory - it definitely needs more analysis before even attending
 345 an implementation.
 346
 347 == Re-evaluate the macOS ImplNSAppPostEvent ==
 348
 349 Probably a solution comparable to the Windows backends delayed PostMessage
 350 workaround using a validation timestamp is better then the current peek,
 351 remove, re-postEvent, which has to run in the main thread.
 352
 353 Originally I didn't evaluate, if the event is actually lost or just delayed.
 354
 355 == Drop nMaxEvents from Gtk+ based backends ==
 356
 357 gint last_priority = G_MAXINT;
 358 bool bWasEvent = false;
 359 do {
 360     gint max_priority;
 361     g_main_context_acquire( NULL );
 362     bool bHasPending = g_main_context_prepare( NULL, &max_priority );
 363     g_main_context_release( NULL );
 364     if ( bHasPending )
 365     {
 366         if ( last_priority > max_priority )
 367         {
 368             bHasPending = g_main_context_iteration( NULL, bWait );
 369             bWasEvent = bWasEvent || bHasPending;
 370         }
 371         else
 372             bHasPending = false;
 373     }
 374 }
 375 while ( bHasPending )
 376
 377 The idea is to use g_main_context_prepare and keep the max_priority as an
 378 indicator. We cannot prevent running newer lower events, but we can prevent
 379 running new higher events, which should be sufficient for most stuff.
 380
 381 This also touches user event processing, which currently runs as a high
 382 priority idle in the event loop.
 383
 384 == Drop nMaxEvents from gen (X11) backend ==
 385
 386 A few layers of indirection make this code hard to follow. The SalXLib::Yield
 387 and SalX11Display::Yield architecture makes it impossible to process just the
 388 current events. This really needs a refactoring and rearchitecture step, which
 389 will also affect the Gtk+ and KDE backend for the user event handling.
 390
 391 == Merge TaskStopwatch functionality into the Scheduler ==
 392
 393 This way it can be easier used to profile Tasks, eventually to improve LO's
 394 interactivity.