vcl/README.scheduler.md

   1 # VCL Scheduler
   2
   3 ## Introduction
   4
   5 The VCL scheduler handles LOs primary event queue. It is simple by design,
   6 currently just a single-linked list, processed in list-order by priority
   7 using round-robin for reoccurring tasks.
   8
   9 The scheduler has the following behaviour:
  10
  11 B.1. Tasks are scheduled just priority based
  12
  13 B.2. Implicitly cooperative AKA non-preemptive
  14
  15 B.3. It's not "fair" in any way (a consequence of B.2)
  16
  17 B.4. Tasks are handled round-robin (per priority)
  18
  19 B.5. Higher priorities have lower values
  20
  21 B.6. A small set of priorities instead of an flexible value AKA int
  22
  23 There are some consequences due to this design.
  24
  25 C.1. Higher priority tasks starve lower priority tasks
  26      As long as a higher task is available, lower tasks are never run!
  27      See Anti-pattern.
  28
  29 C.2. Tasks should be split into sensible blocks
  30      If this can't really be done, process pending tasks by calling
  31      `Application::Reschedule()`. Or use a thread.
  32
  33 C.3. This is not an OS scheduler
  34      There is no real way to "fix" B.2. and B.3.
  35      If you need to do a preemptive task, use a thread!
  36      Otherwise make your task suspendable.
  37
  38
  39 ## Driving the scheduler AKA the system timer
  40
  41   1. There is just one system timer, which drives LO event loop
  42   2. The timer has to run in the main window thread
  43   3. The scheduler is run with the Solar mutex acquired
  44   4. The system timer is a single-shot timer
  45   5. The scheduler system event / message has a low system priority.
  46      All system events should have a higher priority.
  47
  48 Every time a task is started, the scheduler timer is adjusted. When the timer
  49 fires, it posts an event to the system message queue. If the next most
  50 important task is an Idle (AKA instant, 0ms timeout), the event is pushed to
  51 the back of the queue, so we don't starve system messages, otherwise to the
  52 front.
  53
  54 Every time the scheduler is invoked it searches for the next task to process,
  55 restarts the timer with the timeout for the next event and then invokes the
  56 task. After invoking the task and if the task is still active, it is pushed
  57 to the end of the queue and the timeout is eventually adjusted.
  58
  59
  60 ## Locking
  61
  62 The locking is quite primitive: all interaction with internal Scheduler
  63 structures are locked. This includes the `ImplSchedulerContext` and the
  64 `Task::mpSchedulerData`, which is actually a part of the scheduler.
  65 Before invoking the task, we have to release the lock, so others can
  66 Start new Tasks.
  67
  68 The Scheduler just processes its own Tasks in the main thread and needs
  69 the `SolarMutex` for it and for `DeInit` (tested by `DBG_TESTSOLARMUTEX`). All
  70 the other interaction just take the scheduler mutex or don't need locking
  71 at all.
  72
  73 There is a "workaround" for static Task objects, which would crash LO on
  74 destruction, because `Task::~Task` would try to de-register itself in the
  75 Scheduler, while the `SchedulerLock` would be long gone. OTOH this makes
  76 Task handling less error-prone, than doing "special" manual cleanup.
  77 There is also a "= TODOs and ideas =" to get rid if static Tasks.
  78
  79 Actually the scheduler mutex should never be locked when calling into
  80 non-scheduler code, so it was converted to a non-recursive
  81 `std::mutex`.
  82
  83
  84 ## Idle processing
  85
  86 Confusingly, there are 2 concepts that are called 'idle':
  87
  88 * Instant (zero timeout) tasks, represented e.g. by the Idle class. This is
  89 a misnomer, as these tasks are processed after returning to the main loop.
  90 This is not necessarily when LO is idle, in fact such tasks may be invoked
  91 while there is input in the OS event queue pending.
  92 (TODO: This case should be fixed by renaming.)
  93
  94 * Low priority tasks, represented by priorities `TaskPriority::HIGH_IDLE` and
  95 lower. In addition to being invoked only when there is no task with a higher
  96 priority, pending input in the OS event queue also takes precedence.
  97
  98
  99 ## Lifecycle / thread-safety of Scheduler-based objects
 100
 101 A scheduler object it thread-safe in the way, that it can be associated to
 102 any thread and any thread is free to call any functions on it. The owner must
 103 guarantee that the `Invoke()` function can be called, while the Scheduler object
 104 exists / is not disposed.
 105
 106
 107 ## Anti-pattern: Dependencies via (fine grained) priorities
 108
 109 "Idle 1" should run before "Idle 2", therefore give "Idle 1" a higher priority
 110 then "Idle 2". This just works correct for low frequency idles, but otherwise
 111 always breaks!
 112
 113 If you have some longer work - even if it can be split by into schedulable,
 114 smaller blocks - you normally don't want to schedule it with a non-default
 115 priority, as it starves all lower priority tasks. Even if a block was processed
 116 in "Idle 1", it is scheduled with the same (higher) priority again. Changing
 117 the "Idle" to a "Timer" also won't work, as this breaks the dependency.
 118
 119 What is needed is task based dependency handling, so if "Task 1" is done, it
 120 has to start "Task 2" and if "Task 1" is started again, it has to stop
 121 "Task 2". This currently has to be done by the implementor, but this feature
 122 can be added to the scheduler reasonably.
 123
 124
 125 ## Implementation details
 126
 127 ### General: event priority for `DoYield`
 128
 129 There are three types of events, with different priority:
 130
 131 1. LO user events
 132 2. System events
 133 3. LO Scheduler event
 134
 135 They should be processed according to the following code:
 136
 137 ```
 138 bool ImplYield(bool bWait, bool bAllCurrent)
 139 {
 140     DBG_TESTSOLARMUTEX();
 141     assert(IsMainThread());
 142
 143     bool bWasEvent = ProcessUserEvents( bAllCurrent );
 144     if ( !bAllCurrent && bWasEvent )
 145         return true;
 146
 147     SolarMutexReleaser();
 148     bWasEvent = ProcessSystemEvents( bAllCurrent, &bWasSchedulerEvent ) || bWasEvent;
 149     if ( !bWasSchedulerEvent && IsSchedulerEvent() )
 150     {
 151         ProcessSchedulerEvent()
 152         bWasEvent = true;
 153     }
 154     if ( !bWasEvent && bWait )
 155     {
 156         WaitForSystemEvents();
 157         bWasEvent = true;
 158     }
 159     return bWasEvent;
 160 }
 161 ```
 162
 163 ### General: main thread deferral
 164
 165 In almost all VCL backends, we run main thread deferrals by disabling the
 166 `SolarMutex` using a boolean. In the case of the redirect, this makes
 167 `tryToAcquire` and `doAcquire` return `true` or `1`, while a release is ignored.
 168 Also the `IsCurrentThread()` mutex check function will act accordingly, so all
 169 the `DBG_TESTSOLARMUTEX` won't fail.
 170
 171 Since we just disable the locks when we start running the deferred code in the
 172 main thread, we won't let the main thread run into stuff, where it would
 173 normally wait for the SolarMutex.
 174
 175 Eventually this will move into the `SolarMutex`. KDE / Qt also does main
 176 thread redirects using `Qt::BlockingQueuedConnection`.
 177
 178 ### General: processing all current events for `DoYield`
 179
 180 This is easily implemented for all non-priority queue based implementations.
 181 Windows and macOS both have a timestamp attached to their events / messages,
 182 so simply get the current time and just process anything < timestamp.
 183 For the **KDE** backend this is already the default behaviour - single event
 184 processing isn't even supported. The headless backend accomplishes this by
 185 just processing a copy of the list of current events.
 186
 187 Problematic in this regard is the **Gtk+** backend. `g_main_context_iteration`
 188 dispatches "only those highest priority event sources". There is no real way
 189 to tell, when these became ready. I've added a workaround idea to the TODO
 190 list. FWIW: **Qt** runs just a single timer source in the glib main context,
 191 basically the same we're doing with the LO scheduler as a system event.
 192
 193 The gen **X11** backend has some levels of redirection, but needs quite some work
 194 to get this fixed.
 195
 196 ### General: non-main thread yield
 197
 198 Yielding from a non-main thread must not wait in the main thread, as this
 199 may block the main thread until some events happen.
 200
 201 Currently we wait on an extra conditional, which is cleared by the main event
 202 loop.
 203
 204 ### General: invalidation of elapsed timer event messages
 205
 206 Since the system timer to run the scheduler is single-shot, there should never
 207 be more than one elapsed timer event in system event queue. When stopping or
 208 restarting the timer, we eventually have to remove the now invalid event from
 209 the queue.
 210
 211 But for the Windows and macOS backends this may fail as they have delayed
 212 posting of events, so a consecutive remove after a post will actually yield no
 213 remove. On Windows we even get unwanted processing of events outside of the
 214 main event loop, which may call the Scheduler, as timer management is handled
 215 in critical scheduler code.
 216
 217 To prevent these problems, we don't even try to remove these events, but
 218 invalidate them by versioning the timer events. Timer events with invalid
 219 versions are processed but simply don't run the scheduler.
 220
 221 ### General: track time of long running tasks
 222
 223 There is `TaskStopwatch` class. It'll track the time and report a timeout either
 224 when the tasks time slice is finished or some system event did occur.
 225
 226 Eventually it will be merged into the main scheduler, so each invoked task can
 227 easily track it's runtime and eventually this can be used to "blame" / find
 228 other long running tasks, so interactivity can be improved.
 229
 230 There were some questions coming up when implementing it:
 231
 232 #### Why does the scheduler not detect that we only have idle tasks pending, and skip the instant timeout?
 233
 234 You never know how long a task will run. Currently the scheduler simply asks
 235 each task when it'll be ready to run, until two runnable tasks are found.
 236 Normally this is very quick, as LO has a lot of one-shot instant tasks / Idles
 237 and just a very few long term pending Timers.
 238
 239 Especially UNO calls add a lot of Idles to the task list, which just need to
 240 be processed in order.
 241
 242 #### Why not use things like Linux timer wheels?
 243
 244 LO has relatively few timers and a lot one-shot Idles. 99% of time the search
 245 for the next task is quick, because there are just ~5 long term timers per
 246 document (cache invalidation, cursor blinking etc.).
 247
 248 This might become a problem, if you have a lot of open documents, so the long
 249 term timer list increases AKA for highly loaded LOOL instances.
 250
 251 But the Linux timer wheel mainly relies on the facts that the OS timers are
 252 expected to not expire, as they are use to catch "error" timeouts, which rarely
 253 happen, so this definitely not matches LO's usage.
 254
 255 #### Not really usable to find misbehaving tasks
 256
 257 The TaskStopwatch class is just a little time keeper + detecting of input
 258 events. This is not about misbehaving Tasks, but long running tasks, which
 259 have to yield to the Scheduler, so other Tasks and System events can be
 260 processed.
 261
 262 There is the TODO to merge the functionality into the Scheduler itself, at
 263 which point we can think about profiling individual Tasks to improve
 264 interactivity.
 265
 266 ### macOS implementation details
 267
 268 Generally the Scheduler is handled as expected, except on resize, which is
 269 handled with different runloop-modes in macOS. In case of a resize, the normal
 270 `runloop` is suspended in `sendEvent`, so we can't call the scheduler via posted
 271 main loop-events. Instead the scheduler uses the timer again.
 272
 273 Like the Windows backend, all Cocoa / GUI handling also has to be run in
 274 the main thread. We're emulating Windows out-of-order PeekMessage processing,
 275 via a `YieldWakeupEvent` and two conditionals. When in a `RUNINMAIN` call, all
 276 the `DBG_TESTSOLARMUTEX` calls are disabled, as we can't release the
 277 `SolarMutex`, but we can prevent running any other `SolarMutex` based code.
 278 Those wakeup events must be ignored to prevent busy-locks. For more info read
 279 the "General: main thread deferral" section.
 280
 281 We can neither rely on macOS `dispatch_sync` code block execution nor the
 282 message handling, as both can't be prioritized or filtered and the first
 283 does also not allow nested execution and is just processed in sequence.
 284
 285 There is also a workaround for a problem for pushing tasks to an empty queue,
 286 as `[NSApp postEvent: ... atStart: NO]` doesn't append the event, if the
 287 message queue is empty.
 288
 289 An additional problem is the filtering of events on Window close. This drops
 290 posted timer events, when a Window is closed resulting in a busy DoYield loop,
 291 so we have to re-post the event, after closing a window.
 292
 293 ### Windows implementation details
 294
 295 Posted or sent event messages often trigger processing of WndProc in
 296 `PeekMessage`, `GetMessage` or `DispatchMessage`, independently from the message
 297 to fetch, remove or dispatch ("During this call, the system delivers pending,
 298 nonqueued messages..."). Additionally messages have an inherited priority
 299 based on the function used to generate them. Even if `WM_TIMER` messages should
 300 have the lowest priority, a manually posted `WM_TIMER` is processed with the
 301 priority of a PostMessage message.
 302
 303 So we're giving up on processing all our Scheduler events as a message in the
 304 system message loop. Instead we just indicate a 0ms timer message by setting
 305 the `m_bDirectTimeout` in the timer object. This timer is always processed, if
 306 the system message wasn't already our timer. As a result we can also skip the
 307 polling. All this is one more reason to drop the single message processing
 308 in favour of always processing all pending (system) events.
 309
 310 There is another special case, we have to handle: window updates during move
 311 and resize of windows. These system actions run in their own nested message
 312 loop. So we have to completely switch to timers, even for 0 ms. But these
 313 posted events prevent any event processing, while we're busy. The only viable
 314 solution seems to be to switch to `WM_TIMER` based timers, as these generate
 315 messages with the lowest system priority (but they don't allow 0 ms timeouts).
 316 So processing slows down during resize and move, but we gain working painting,
 317 even when busy.
 318
 319 An additional workaround is implemented for the delayed queuing of posted
 320 messages, where `PeekMessage` in `WinSalTimer::Stop()` won't be able remove the
 321 just posted timer callback message. See "General: invalidation of elapsed
 322 timer event messages" for the details.
 323
 324 To run the required GUI code in the main thread without unlocking the
 325 `SolarMutex`, we "disable" it. For more infos read the "General: main thread
 326 deferral" section.
 327
 328 ### KDE implementation details
 329
 330 This implementation also works as intended. But there is a different Yield
 331 handling, because Qts `QAbstractEventDispatcher::processEvents` will always
 332 process all pending events.
 333
 334
 335 ## TODOs and ideas
 336
 337 ### Task dependencies AKA children
 338
 339 Every task can have a list of children / a child.
 340
 341  * When a task is stopped, the children are started.
 342  * When a task is started, the children are stopped.
 343
 344 This should be easy to implement.
 345
 346 ### Per priority time-sorted queues
 347
 348 This would result in O(1) scheduler. It was used in the Linux kernel for some
 349 time (search Ingo Molnar's O(1) scheduler). This can be a scheduling
 350 optimization, which would prevent walking longer event list. But probably the
 351 management overhead would be too large, as we have many one-shot events.
 352
 353 To find the next task the scheduler just walks the (constant) list of priority
 354 queues and schedules the first ready event of any queue.
 355
 356 The downside of this approach: Insert / Start / Reschedule(for "auto" tasks)
 357 now need O(log(n)) to find the position in the queue of the priority.
 358
 359 ### Always process all (higher priority) pending events
 360
 361 Currently `Application::Reschedule()` processes a single event or "all" events,
 362 with "all" defined as "100 events" in most backends. This already is ignored
 363 by the KDE backend, as Qt defines its `QAbstractEventDispatcher::processEvents`
 364 processing all pending events (there are ways to skip event classes, but no
 365 easy way to process just a single event).
 366
 367 Since the Scheduler is always handled by the system message queue, there is
 368 really no more reasoning to stop after 100 events to prevent LO Scheduler
 369 starvation.
 370
 371 ### Drop static inherited or composed Task objects
 372
 373 The sequence of destruction of static objects is not defined. So a static Task
 374 can not be guaranteed to happen before the Scheduler. When dynamic unloading
 375 is involved, this becomes an even worse problem. This way we could drop the
 376 mbStatic workaround from the Task class.
 377
 378 ### Run the LO application in its own thread
 379
 380 This would probably get rid of most of the macOS and Windows implementation
 381 details / workarounds, but is quite probably a large amount of work.
 382
 383 Instead of LO running in the main process / thread, we run it in a 2nd thread
 384 and defer al GUI calls to the main thread. This way it'll hopefully not block
 385 and can process system events.
 386
 387 That's just a theory - it definitely needs more analysis before even attending
 388 an implementation.
 389
 390 ### Re-evaluate the macOS `ImplNSAppPostEvent`
 391
 392 Probably a solution comparable to the Windows backends delayed PostMessage
 393 workaround using a validation timestamp is better then the current peek,
 394 remove, re-postEvent, which has to run in the main thread.
 395
 396 Originally I didn't evaluate, if the event is actually lost or just delayed.
 397
 398 ### Drop `nMaxEvents` from Gtk+ based backends
 399
 400 ```
 401 gint last_priority = G_MAXINT;
 402 bool bWasEvent = false;
 403 do {
 404     gint max_priority;
 405     g_main_context_acquire( NULL );
 406     bool bHasPending = g_main_context_prepare( NULL, &max_priority );
 407     g_main_context_release( NULL );
 408     if ( bHasPending )
 409     {
 410         if ( last_priority > max_priority )
 411         {
 412             bHasPending = g_main_context_iteration( NULL, bWait );
 413             bWasEvent = bWasEvent || bHasPending;
 414         }
 415         else
 416             bHasPending = false;
 417     }
 418 }
 419 while ( bHasPending )
 420 ```
 421
 422 The idea is to use `g_main_context_prepare` and keep the `max_priority` as an
 423 indicator. We cannot prevent running newer lower events, but we can prevent
 424 running new higher events, which should be sufficient for most stuff.
 425
 426 This also touches user event processing, which currently runs as a high
 427 priority idle in the event loop.
 428
 429 ### Drop nMaxEvents from gen (X11) backend
 430
 431 A few layers of indirection make this code hard to follow. The `SalXLib::Yield`
 432 and `SalX11Display::Yield` architecture makes it impossible to process just the
 433 current events. This really needs a refactoring and rearchitecture step, which
 434 will also affect the Gtk+ and KDE backend for the user event handling.
 435
 436 ### Merge TaskStopwatch functionality into the Scheduler
 437
 438 This way it can be easier used to profile Tasks, eventually to improve LO's
 439 interactivity.
 440
 441 ## See Also
 442
 443 - [Solar Mutex](https://wiki.openoffice.org/wiki/Terms/Solar_Mutex)
 444 - [LibreOffice Main Loop](https://wiki.documentfoundation.org/Development/LHM_LiMux/Main_Loop)
 445 - [AOO Advanced Threading-Architecture (proposal)](https://wiki.openoffice.org/wiki/Architecture/Proposal/Advanced_Threading-Architecture)
 446 - [Revise OOo Multi-Threading Efforts](https://wiki.openoffice.org/wiki/Effort/Revise_OOo_Multi-Threading)
 447 - [Multi-Threading Analysis](https://wiki.openoffice.org/wiki/Analysis/Multi-Threading)
 448 - [AOO Wiki - Category:Multi-Threading](https://wiki.openoffice.org/wiki/Category:Multi-Threading)