vcl/README.scheduler.md

   1 # VCL Scheduler
   2
   3 ## Introduction
   4
   5 The VCL scheduler handles LOs primary event queue. It is simple by design,
   6 currently just a single-linked list, processed in list-order by priority
   7 using round-robin for reoccurring tasks.
   8
   9 The scheduler has the following behaviour:
  10
  11 B.1. Tasks are scheduled just priority based
  12 B.2. Implicitly cooperative AKA non-preemptive
  13 B.3. It's not "fair" in any way (a consequence of B.2)
  14 B.4. Tasks are handled round-robin (per priority)
  15 B.5. Higher priorities have lower values
  16 B.6. A small set of priorities instead of an flexible value AKA int
  17
  18 There are some consequences due to this design.
  19
  20 C.1. Higher priority tasks starve lower priority tasks
  21      As long as a higher task is available, lower tasks are never run!
  22      See Anti-pattern.
  23
  24 C.2. Tasks should be split into sensible blocks
  25      If this can't really be done, process pending tasks by calling
  26      `Application::Reschedule()`. Or use a thread.
  27
  28 C.3. This is not an OS scheduler
  29      There is no real way to "fix" B.2. and B.3.
  30      If you need to do a preemptive task, use a thread!
  31      Otherwise make your task suspendable.
  32
  33
  34 ## Driving the scheduler AKA the system timer
  35
  36   1. There is just one system timer, which drives LO event loop
  37   2. The timer has to run in the main window thread
  38   3. The scheduler is run with the Solar mutex acquired
  39   4. The system timer is a single-shot timer
  40   5. The scheduler system event / message has a low system priority.
  41      All system events should have a higher priority.
  42
  43 Every time a task is started, the scheduler timer is adjusted. When the timer
  44 fires, it posts an event to the system message queue. If the next most
  45 important task is an Idle (AKA instant, 0ms timeout), the event is pushed to
  46 the back of the queue, so we don't starve system messages, otherwise to the
  47 front.
  48
  49 Every time the scheduler is invoked it searches for the next task to process,
  50 restarts the timer with the timeout for the next event and then invokes the
  51 task. After invoking the task and if the task is still active, it is pushed
  52 to the end of the queue and the timeout is eventually adjusted.
  53
  54
  55 ## Locking
  56
  57 The locking is quite primitive: all interaction with internal Scheduler
  58 structures are locked. This includes the `ImplSchedulerContext` and the
  59 `Task::mpSchedulerData`, which is actually a part of the scheduler.
  60 Before invoking the task, we have to release the lock, so others can
  61 Start new Tasks.
  62
  63 The Scheduler just processes its own Tasks in the main thread and needs
  64 the `SolarMutex` for it and for `DeInit` (tested by `DBG_TESTSOLARMUTEX`). All
  65 the other interaction just take the scheduler mutex or don't need locking
  66 at all.
  67
  68 There is a "workaround" for static Task objects, which would crash LO on
  69 destruction, because `Task::~Task` would try to de-register itself in the
  70 Scheduler, while the `SchedulerLock` would be long gone. OTOH this makes
  71 Task handling less error-prone, than doing "special" manual cleanup.
  72 There is also a "= TODOs and ideas =" to get rid if static Tasks.
  73
  74 Actually the scheduler mutex should never be locked when calling into
  75 non-scheduler code, so it was converted to a non-recursive
  76 `std::mutex`.
  77
  78
  79 ## Idle processing
  80
  81 Confusingly, there are 2 concepts that are called 'idle':
  82
  83 * Instant (zero timeout) tasks, represented e.g. by the Idle class. This is
  84 a misnomer, as these tasks are processed after returning to the main loop.
  85 This is not necessarily when LO is idle, in fact such tasks may be invoked
  86 while there is input in the OS event queue pending.
  87 (TODO: This case should be fixed by renaming.)
  88
  89 * Low priority tasks, represented by priorities `TaskPriority::HIGH_IDLE` and
  90 lower. In addition to being invoked only when there is no task with a higher
  91 priority, pending input in the OS event queue also takes precedence.
  92
  93
  94 ## Lifecycle / thread-safety of Scheduler-based objects
  95
  96 A scheduler object it thread-safe in the way, that it can be associated to
  97 any thread and any thread is free to call any functions on it. The owner must
  98 guarantee that the `Invoke()` function can be called, while the Scheduler object
  99 exists / is not disposed.
 100
 101
 102 ## Anti-pattern: Dependencies via (fine grained) priorities
 103
 104 "Idle 1" should run before "Idle 2", therefore give "Idle 1" a higher priority
 105 then "Idle 2". This just works correct for low frequency idles, but otherwise
 106 always breaks!
 107
 108 If you have some longer work - even if it can be split by into schedulable,
 109 smaller blocks - you normally don't want to schedule it with a non-default
 110 priority, as it starves all lower priority tasks. Even if a block was processed
 111 in "Idle 1", it is scheduled with the same (higher) priority again. Changing
 112 the "Idle" to a "Timer" also won't work, as this breaks the dependency.
 113
 114 What is needed is task based dependency handling, so if "Task 1" is done, it
 115 has to start "Task 2" and if "Task 1" is started again, it has to stop
 116 "Task 2". This currently has to be done by the implementor, but this feature
 117 can be added to the scheduler reasonably.
 118
 119
 120 ## Implementation details
 121
 122 ### General: event priority for `DoYield`
 123
 124 There are three types of events, with different priority:
 125
 126 1. LO user events
 127 2. System events
 128 3. LO Scheduler event
 129
 130 They should be processed according to the following code:
 131
 132 ```
 133 bool DoYield( bool bWait, bool bAllCurrent )
 134 {
 135     bool bWasEvent = ProcessUserEvents( bAllCurrent );
 136     if ( !bAllCurrent && bWasEvent )
 137         return true;
 138     bWasEvent = ProcessSystemEvents( bAllCurrent, &bWasSchedulerEvent ) || bWasEvent;
 139     if ( !bWasSchedulerEvent && IsSchedulerEvent() )
 140     {
 141         ProcessSchedulerEvent()
 142         bWasEvent = true;
 143     }
 144     if ( !bWasEvent && bWait )
 145     {
 146         WaitForSystemEvents();
 147         bWasEvent = true;
 148     }
 149     return bWasEvent;
 150 }
 151 ```
 152
 153 ### General: main thread deferral
 154
 155 In almost all VCL backends, we run main thread deferrals by disabling the
 156 `SolarMutex` using a boolean. In the case of the redirect, this makes
 157 `tryToAcquire` and `doAcquire` return `true` or `1`, while a release is ignored.
 158 Also the `IsCurrentThread()` mutex check function will act accordingly, so all
 159 the `DBG_TESTSOLARMUTEX` won't fail.
 160
 161 Since we just disable the locks when we start running the deferred code in the
 162 main thread, we won't let the main thread run into stuff, where it would
 163 normally wait for the SolarMutex.
 164
 165 Eventually this will move into the `SolarMutex`. KDE / Qt also does main
 166 thread redirects using `Qt::BlockingQueuedConnection`.
 167
 168 ### General: processing all current events for `DoYield`
 169
 170 This is easily implemented for all non-priority queue based implementations.
 171 Windows and macOS both have a timestamp attached to their events / messages,
 172 so simply get the current time and just process anything < timestamp.
 173 For the **KDE** backend this is already the default behaviour - single event
 174 processing isn't even supported. The headless backend accomplishes this by
 175 just processing a copy of the list of current events.
 176
 177 Problematic in this regard is the **Gtk+** backend. `g_main_context_iteration`
 178 dispatches "only those highest priority event sources". There is no real way
 179 to tell, when these became ready. I've added a workaround idea to the TODO
 180 list. FWIW: **Qt** runs just a single timer source in the glib main context,
 181 basically the same we're doing with the LO scheduler as a system event.
 182
 183 The gen **X11** backend has some levels of redirection, but needs quite some work
 184 to get this fixed.
 185
 186 ### General: non-main thread yield
 187
 188 Yielding from a non-main thread must not wait in the main thread, as this
 189 may block the main thread until some events happen.
 190
 191 Currently we wait on an extra conditional, which is cleared by the main event
 192 loop.
 193
 194 ### General: invalidation of elapsed timer event messages
 195
 196 Since the system timer to run the scheduler is single-shot, there should never
 197 be more than one elapsed timer event in system event queue. When stopping or
 198 restarting the timer, we eventually have to remove the now invalid event from
 199 the queue.
 200
 201 But for the Windows and macOS backends this may fail as they have delayed
 202 posting of events, so a consecutive remove after a post will actually yield no
 203 remove. On Windows we even get unwanted processing of events outside of the
 204 main event loop, which may call the Scheduler, as timer management is handled
 205 in critical scheduler code.
 206
 207 To prevent these problems, we don't even try to remove these events, but
 208 invalidate them by versioning the timer events. Timer events with invalid
 209 versions are processed but simply don't run the scheduler.
 210
 211 ### General: track time of long running tasks
 212
 213 There is `TaskStopwatch` class. It'll track the time and report a timeout either
 214 when the tasks time slice is finished or some system event did occur.
 215
 216 Eventually it will be merged into the main scheduler, so each invoked task can
 217 easily track it's runtime and eventually this can be used to "blame" / find
 218 other long running tasks, so interactivity can be improved.
 219
 220 There were some questions coming up when implementing it:
 221
 222 #### Why does the scheduler not detect that we only have idle tasks pending,
 223 and skip the instant timeout?
 224
 225 You never know how long a task will run. Currently the scheduler simply asks
 226 each task when it'll be ready to run, until two runnable tasks are found.
 227 Normally this is very quick, as LO has a lot of one-shot instant tasks / Idles
 228 and just a very few long term pending Timers.
 229
 230 Especially UNO calls add a lot of Idles to the task list, which just need to
 231 be processed in order.
 232
 233 #### Why not use things like Linux timer wheels?
 234
 235 LO has relatively few timers and a lot one-shot Idles. 99% of time the search
 236 for the next task is quick, because there are just ~5 long term timers per
 237 document (cache invalidation, cursor blinking etc.).
 238
 239 This might become a problem, if you have a lot of open documents, so the long
 240 term timer list increases AKA for highly loaded LOOL instances.
 241
 242 But the Linux timer wheel mainly relies on the facts that the OS timers are
 243 expected to not expire, as they are use to catch "error" timeouts, which rarely
 244 happen, so this definitely not matches LO's usage.
 245
 246 #### Not really usable to find misbehaving tasks
 247
 248 The TaskStopwatch class is just a little time keeper + detecting of input
 249 events. This is not about misbehaving Tasks, but long running tasks, which
 250 have to yield to the Scheduler, so other Tasks and System events can be
 251 processed.
 252
 253 There is the TODO to merge the functionality into the Scheduler itself, at
 254 which point we can think about profiling individual Tasks to improve
 255 interactivity.
 256
 257 ### macOS implementation details
 258
 259 Generally the Scheduler is handled as expected, except on resize, which is
 260 handled with different runloop-modes in macOS. In case of a resize, the normal
 261 `runloop` is suspended in `sendEvent`, so we can't call the scheduler via posted
 262 main loop-events. Instead the scheduler uses the timer again.
 263
 264 Like the Windows backend, all Cocoa / GUI handling also has to be run in
 265 the main thread. We're emulating Windows out-of-order PeekMessage processing,
 266 via a `YieldWakeupEvent` and two conditionals. When in a `RUNINMAIN` call, all
 267 the `DBG_TESTSOLARMUTEX` calls are disabled, as we can't release the
 268 `SolarMutex`, but we can prevent running any other `SolarMutex` based code.
 269 Those wakeup events must be ignored to prevent busy-locks. For more info read
 270 the "General: main thread deferral" section.
 271
 272 We can neither rely on macOS dispatch_sync code block execution nor the
 273 message handling, as both can't be prioritized or filtered and the first
 274 does also not allow nested execution and is just processed in sequence.
 275
 276 There is also a workaround for a problem for pushing tasks to an empty queue,
 277 as [NSApp postEvent: ... atStart: NO] doesn't append the event, if the
 278 message queue is empty.
 279
 280 An additional problem is the filtering of events on Window close. This drops
 281 posted timer events, when a Window is closed resulting in a busy DoYield loop,
 282 so we have to re-post the event, after closing a window.
 283
 284 ### Windows implementation details
 285
 286 Posted or sent event messages often trigger processing of WndProc in
 287 `PeekMessage`, `GetMessage` or `DispatchMessage`, independently from the message
 288 to fetch, remove or dispatch ("During this call, the system delivers pending,
 289 nonqueued messages..."). Additionally messages have an inherited priority
 290 based on the function used to generate them. Even if `WM_TIMER` messages should
 291 have the lowest priority, a manually posted `WM_TIMER` is processed with the
 292 priority of a PostMessage message.
 293
 294 So we're giving up on processing all our Scheduler events as a message in the
 295 system message loop. Instead we just indicate a 0ms timer message by setting
 296 the `m_bDirectTimeout` in the timer object. This timer is always processed, if
 297 the system message wasn't already our timer. As a result we can also skip the
 298 polling. All this is one more reason to drop the single message processing
 299 in favour of always processing all pending (system) events.
 300
 301 There is another special case, we have to handle: window updates during move
 302 and resize of windows. These system actions run in their own nested message
 303 loop. So we have to completely switch to timers, even for 0 ms. But these
 304 posted events prevent any event processing, while we're busy. The only viable
 305 solution seems to be to switch to `WM_TIMER` based timers, as these generate
 306 messages with the lowest system priority (but they don't allow 0 ms timeouts).
 307 So processing slows down during resize and move, but we gain working painting,
 308 even when busy.
 309
 310 An additional workaround is implemented for the delayed queuing of posted
 311 messages, where `PeekMessage` in `WinSalTimer::Stop()` won't be able remove the
 312 just posted timer callback message. See "General: invalidation of elapsed
 313 timer event messages" for the details.
 314
 315 To run the required GUI code in the main thread without unlocking the
 316 `SolarMutex`, we "disable" it. For more infos read the "General: main thread
 317 deferral" section.
 318
 319 ### KDE implementation details
 320
 321 This implementation also works as intended. But there is a different Yield
 322 handling, because Qts `QAbstractEventDispatcher::processEvents` will always
 323 process all pending events.
 324
 325
 326 ## TODOs and ideas
 327
 328 ### Task dependencies AKA children
 329
 330 Every task can have a list of children / a child.
 331
 332  * When a task is stopped, the children are started.
 333  * When a task is started, the children are stopped.
 334
 335 This should be easy to implement.
 336
 337 ### Per priority time-sorted queues
 338
 339 This would result in O(1) scheduler. It was used in the Linux kernel for some
 340 time (search Ingo Molnar's O(1) scheduler). This can be a scheduling
 341 optimization, which would prevent walking longer event list. But probably the
 342 management overhead would be too large, as we have many one-shot events.
 343
 344 To find the next task the scheduler just walks the (constant) list of priority
 345 queues and schedules the first ready event of any queue.
 346
 347 The downside of this approach: Insert / Start / Reschedule(for "auto" tasks)
 348 now need O(log(n)) to find the position in the queue of the priority.
 349
 350 ### Always process all (higher priority) pending events
 351
 352 Currently `Application::Reschedule()` processes a single event or "all" events,
 353 with "all" defined as "100 events" in most backends. This already is ignored
 354 by the KDE backend, as Qt defines its `QAbstractEventDispatcher::processEvents`
 355 processing all pending events (there are ways to skip event classes, but no
 356 easy way to process just a single event).
 357
 358 Since the Scheduler is always handled by the system message queue, there is
 359 really no more reasoning to stop after 100 events to prevent LO Scheduler
 360 starvation.
 361
 362 ### Drop static inherited or composed Task objects
 363
 364 The sequence of destruction of static objects is not defined. So a static Task
 365 can not be guaranteed to happen before the Scheduler. When dynamic unloading
 366 is involved, this becomes an even worse problem. This way we could drop the
 367 mbStatic workaround from the Task class.
 368
 369 ### Run the LO application in its own thread
 370
 371 This would probably get rid of most of the macOS and Windows implementation
 372 details / workarounds, but is quite probably a large amount of work.
 373
 374 Instead of LO running in the main process / thread, we run it in a 2nd thread
 375 and defer al GUI calls to the main thread. This way it'll hopefully not block
 376 and can process system events.
 377
 378 That's just a theory - it definitely needs more analysis before even attending
 379 an implementation.
 380
 381 ### Re-evaluate the macOS `ImplNSAppPostEvent`
 382
 383 Probably a solution comparable to the Windows backends delayed PostMessage
 384 workaround using a validation timestamp is better then the current peek,
 385 remove, re-postEvent, which has to run in the main thread.
 386
 387 Originally I didn't evaluate, if the event is actually lost or just delayed.
 388
 389 ### Drop `nMaxEvents` from Gtk+ based backends
 390
 391 ```
 392 gint last_priority = G_MAXINT;
 393 bool bWasEvent = false;
 394 do {
 395     gint max_priority;
 396     g_main_context_acquire( NULL );
 397     bool bHasPending = g_main_context_prepare( NULL, &max_priority );
 398     g_main_context_release( NULL );
 399     if ( bHasPending )
 400     {
 401         if ( last_priority > max_priority )
 402         {
 403             bHasPending = g_main_context_iteration( NULL, bWait );
 404             bWasEvent = bWasEvent || bHasPending;
 405         }
 406         else
 407             bHasPending = false;
 408     }
 409 }
 410 while ( bHasPending )
 411 ```
 412
 413 The idea is to use g_main_context_prepare and keep the `max_priority` as an
 414 indicator. We cannot prevent running newer lower events, but we can prevent
 415 running new higher events, which should be sufficient for most stuff.
 416
 417 This also touches user event processing, which currently runs as a high
 418 priority idle in the event loop.
 419
 420 ### Drop nMaxEvents from gen (X11) backend
 421
 422 A few layers of indirection make this code hard to follow. The `SalXLib::Yield`
 423 and `SalX11Display::Yield` architecture makes it impossible to process just the
 424 current events. This really needs a refactoring and rearchitecture step, which
 425 will also affect the Gtk+ and KDE backend for the user event handling.
 426
 427 ### Merge TaskStopwatch functionality into the Scheduler
 428
 429 This way it can be easier used to profile Tasks, eventually to improve LO's
 430 interactivity.