postfix-master/SCHEDULER_README.html

   1 <!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
   2         "http://www.w3.org/TR/html4/loose.dtd">
   3
   4 <html>
   5
   6 <head>
   7
   8 <title>Postfix Queue Scheduler</title>
   9
  10 <meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
  11
  12 </head>
  13
  14 <body>
  15
  16 <h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix
  17 Queue Scheduler</h1>
  18
  19 <hr>
  20
  21 <h2> Overview </h2>
  22
  23 <p> The queue manager is by far the most complex part of the Postfix
  24 mail system. It schedules delivery of new mail, retries failed
  25 deliveries at specific times, and removes mail from the queue after
  26 the last delivery attempt.  There are two major classes of mechanisms
  27 that control the operation of the queue manager. </p>
  28
  29 <p> Topics covered by this document: </p>
  30
  31 <ul>
  32
  33 <li> <a href="#concurrency"> Concurrency scheduling</a>, concerned
  34 with the number of concurrent deliveries to a specific destination,
  35 including decisions on when to suspend deliveries after persistent
  36 failures.
  37
  38 <li> <a href="#jobs"> Preemptive scheduling</a>, concerned with
  39 the selection of email messages and recipients for a given destination.
  40
  41 <li> <a href="#credits"> Credits</a>, something this document would not be
  42 complete without.
  43
  44 </ul>
  45
  46 <!--
  47
  48 <p> Once started, the <a href="qmgr.8.html">qmgr(8)</a> process runs until "postfix reload"
  49 or "postfix stop".  As a persistent process, the queue manager has
  50 to meet strict requirements with respect to code correctness and
  51 robustness. Unlike non-persistent daemon processes, the queue manager
  52 cannot benefit from Postfix's process rejuvenation mechanism that
  53 limit the impact from resource leaks and other coding errors
  54 (translation: replacing a process after a short time covers up bugs
  55 before they can become a problem).  </p>
  56
  57 -->
  58
  59 <h2> <a name="concurrency"> Concurrency scheduling </a> </h2>
  60
  61 <p> The following sections document the Postfix 2.5 concurrency
  62 scheduler, after a discussion of the limitations of the existing
  63 concurrency scheduler. This is followed by results of medium-concurrency
  64 experiments, and a discussion of trade-offs between performance and
  65 robustness.  </p>
  66
  67 <p> The material is organized as follows: </p>
  68
  69 <ul>
  70
  71 <li> <a href="#concurrency_drawbacks"> Drawbacks of the existing
  72 concurrency scheduler </a>
  73
  74 <li> <a href="#concurrency_summary_2_5"> Summary of the Postfix 2.5
  75 concurrency feedback algorithm </a>
  76
  77 <li> <a href="#dead_summary_2_5"> Summary of the Postfix 2.5 "dead
  78 destination" detection algorithm </a>
  79
  80 <li> <a href="#pseudo_code_2_5"> Pseudocode for the Postfix 2.5
  81 concurrency scheduler </a>
  82
  83 <li> <a href="#concurrency_results"> Results for delivery to
  84 concurrency limited servers </a>
  85
  86 <li> <a href="#concurrency_discussion"> Discussion of concurrency
  87 limited server results </a>
  88
  89 <li> <a href="#concurrency_limitations"> Limitations of less-than-1
  90 per delivery feedback </a>
  91
  92 <li> <a href="#concurrency_config"> Concurrency configuration
  93 parameters </a>
  94
  95 </ul>
  96
  97 <h3> <a name="concurrency_drawbacks"> Drawbacks of the existing
  98 concurrency scheduler </a> </h3>
  99
 100 <p> From the start, Postfix has used a simple but robust algorithm
 101 where the per-destination delivery concurrency is decremented by 1
 102 after delivery failed due to connection or handshake failure, and
 103 incremented by 1 otherwise.  Of course the concurrency is never
 104 allowed to exceed the maximum per-destination concurrency limit.
 105 And when a destination's concurrency level drops to zero, the
 106 destination is declared "dead" and delivery is suspended.  </p>
 107
 108 <p> Drawbacks of +/-1 concurrency feedback per delivery are: <p>
 109
 110 <ul>
 111
 112 <li> <p> Overshoot due to exponential delivery concurrency growth
 113 with each pseudo-cohort(*). This can be an issue with high-concurrency
 114 channels. For example, with the default initial concurrency of 5,
 115 concurrency would proceed over time as (5-10-20).  </p>
 116
 117 <li> <p> Throttling down to zero concurrency after a single
 118 pseudo-cohort(*) failure. This was especially an issue with
 119 low-concurrency channels where a single failure could be sufficient
 120 to mark a destination as "dead", causing the suspension of further
 121 deliveries to the affected destination. </p>
 122
 123 </ul>
 124
 125 <p> (*) A pseudo-cohort is a number of delivery requests equal to
 126 a destination's delivery concurrency. </p>
 127
 128 <p> The revised concurrency scheduler has a highly modular structure.
 129 It uses separate mechanisms for per-destination concurrency control
 130 and for "dead destination" detection.  The concurrency control in
 131 turn is built from two separate mechanisms: it supports less-than-1
 132 feedback per delivery to allow for more gradual concurrency
 133 adjustments, and it uses feedback hysteresis to suppress concurrency
 134 oscillations.  And instead of waiting for delivery concurrency to
 135 throttle down to zero, a destination is declared "dead" after a
 136 configurable number of pseudo-cohorts reports connection or handshake
 137 failure.  </p>
 138
 139 <h3> <a name="concurrency_summary_2_5"> Summary of the Postfix 2.5
 140 concurrency feedback algorithm </a> </h3>
 141
 142 <p> We want to increment a destination's delivery concurrency when
 143 some (not necessarily consecutive) number of deliveries complete
 144 without connection or handshake failure.  This is implemented with
 145 positive feedback g(N) where N is the destination's delivery
 146 concurrency.  With g(N)=1 feedback per delivery, concurrency increases
 147 by 1 after each positive feedback event; this gives us the old
 148 scheduler's exponential growth in time. With g(N)=1/N feedback per
 149 delivery, concurrency increases by 1 after an entire pseudo-cohort
 150 N of positive feedback reports; this gives us linear growth in time.
 151 Less-than-1 feedback per delivery and integer truncation naturally
 152 give us hysteresis, so that transitions to larger concurrency happen
 153 every 1/g(N) positive feedback events.  </p>
 154
 155 <p> We want to decrement a destination's delivery concurrency when
 156 some (not necessarily consecutive) number of deliveries complete
 157 after connection or handshake failure.  This is implemented with
 158 negative feedback f(N) where N is the destination's delivery
 159 concurrency.  With f(N)=1 feedback per delivery, concurrency decreases
 160 by 1 after each negative feedback event; this gives us the old
 161 scheduler's behavior where concurrency is throttled down dramatically
 162 after a single pseudo-cohort failure.  With f(N)=1/N feedback per
 163 delivery, concurrency backs off more gently.  Again, less-than-1
 164 feedback per delivery and integer truncation naturally give us
 165 hysteresis, so that transitions to lower concurrency happen every
 166 1/f(N) negative feedback events.  </p>
 167
 168 <p> However, with negative feedback we introduce a subtle twist.
 169 We "reverse" the negative hysteresis cycle so that the transition
 170 to lower concurrency happens at the <b>beginning</b> of a sequence
 171 of 1/f(N) negative feedback events.  Otherwise, a correction for
 172 overload would be made too late.  This makes the choice of f(N)
 173 relatively unimportant, as borne out by measurements later in this
 174 document.  </p>
 175
 176 <p> In summary, the main ingredients for the Postfix 2.5 concurrency
 177 feedback algorithm are a) the option of less-than-1 positive feedback
 178 per delivery to avoid overwhelming servers, b) the option of
 179 less-than-1 negative feedback per delivery to avoid giving up too
 180 fast, c) feedback hysteresis to avoid rapid oscillation, and d) a
 181 "reverse" hysteresis cycle for negative feedback, so that it can
 182 correct for overload quickly.  </p>
 183
 184 <h3> <a name="dead_summary_2_5"> Summary of the Postfix 2.5 "dead destination" detection algorithm </a> </h3>
 185
 186 <p> We want to suspend deliveries to a specific destination after
 187 some number of deliveries suffers connection or handshake failure.
 188 The old scheduler declares a destination "dead" when negative (-1)
 189 feedback throttles the delivery concurrency down to zero. With
 190 less-than-1 feedback per delivery, this throttling down would
 191 obviously take too long.  We therefore have to separate "dead
 192 destination" detection from concurrency feedback.  This is implemented
 193 by introducing the concept of pseudo-cohort failure. The Postfix
 194 2.5 concurrency scheduler declares a destination "dead" after a
 195 configurable number of pseudo-cohorts suffers from connection or
 196 handshake failures. The old scheduler corresponds to the special
 197 case where the pseudo-cohort failure limit is equal to 1.  </p>
 198
 199 <h3> <a name="pseudo_code_2_5"> Pseudocode for the Postfix 2.5 concurrency scheduler </a> </h3>
 200
 201 <p> The pseudo code shows how the ideas behind new concurrency
 202 scheduler are implemented as of November 2007.  The actual code can
 203 be found in the module qmgr/qmgr_queue.c.  </p>
 204
 205 <pre>
 206 Types:
 207         Each destination has one set of the following variables
 208         int concurrency
 209         double success
 210         double failure
 211         double fail_cohorts
 212
 213 Feedback functions:
 214         N is concurrency; x, y are arbitrary numbers in [0..1] inclusive
 215         positive feedback: g(N) = x/N | x/sqrt(N) | x
 216         negative feedback: f(N) = y/N | y/sqrt(N) | y
 217
 218 Initialization:
 219         concurrency = initial_concurrency
 220         success = 0
 221         failure = 0
 222         fail_cohorts = 0
 223
 224 After success:
 225         fail_cohorts = 0
 226         Be prepared for feedback &gt; hysteresis, or rounding error
 227         success += g(concurrency)
 228         while (success >= 1)            Hysteresis 1
 229             concurrency += 1            Hysteresis 1
 230             failure = 0
 231             success -= 1                Hysteresis 1
 232         Be prepared for overshoot
 233         if (concurrency &gt; concurrency limit)
 234             concurrency = concurrency limit
 235
 236 Safety:
 237         Don't apply positive feedback unless
 238             concurrency &lt; busy_refcount + init_dest_concurrency
 239         otherwise negative feedback effect could be delayed
 240
 241 After failure:
 242         if (concurrency &gt; 0)
 243             fail_cohorts += 1.0 / concurrency
 244             if (fail_cohorts &gt; cohort_failure_limit)
 245                 concurrency = 0
 246         if (concurrency &gt; 0)
 247             Be prepared for feedback &gt; hysteresis, rounding errors
 248             failure -= f(concurrency)
 249             while (failure &lt; 0)
 250                 concurrency -= 1        Hysteresis 1
 251                 failure += 1            Hysteresis 1
 252                 success = 0
 253             Be prepared for overshoot
 254             if (concurrency &lt; 1)
 255                 concurrency = 1
 256 </pre>
 257
 258 <h3> <a name="concurrency_results"> Results for delivery to concurrency-limited servers </a> </h3>
 259
 260 <p> Discussions about the concurrency scheduler redesign started
 261 early 2004, when the primary goal was to find alternatives that did
 262 not exhibit exponential growth or rapid concurrency throttling.  No
 263 code was implemented until late 2007, when the primary concern had
 264 shifted towards better handling of server concurrency limits. For
 265 this reason we measure how well the new scheduler does this
 266 job.  The table below compares mail delivery performance of the old
 267 +/-1 feedback per delivery with several less-than-1 feedback
 268 functions, for different limited-concurrency server scenarios.
 269 Measurements were done with a FreeBSD 6.2 client and with FreeBSD
 270 6.2 and various Linux servers.  </p>
 271
 272 <p> Server configuration: </p>
 273
 274 <ul> <li> The mail flow was slowed down with 1 second latency per
 275 recipient ("<a href="postconf.5.html#smtpd_client_restrictions">smtpd_client_restrictions</a> = sleep 1"). The purpose was
 276 to make results less dependent on hardware details, by avoiding
 277 slow-downs by queue file I/O, logging I/O, and network I/O.
 278
 279 <li> Concurrency was limited by the server process limit
 280 ("<a href="postconf.5.html#default_process_limit">default_process_limit</a> = 5" and "<a href="postconf.5.html#smtpd_client_event_limit_exceptions">smtpd_client_event_limit_exceptions</a>
 281 = static:all"). Postfix was stopped and started after changing the
 282 process limit, because the same number is also used as the backlog
 283 argument to the listen(2) system call, and "postfix reload" does
 284 not re-issue this call.
 285
 286 <li> Mail was discarded with "<a href="postconf.5.html#local_recipient_maps">local_recipient_maps</a> = static:all" and
 287 "<a href="postconf.5.html#local_transport">local_transport</a> = discard". The discard action in access maps or
 288 header/body checks
 289 could not be used as it fails to update the <a href="postconf.5.html#in_flow_delay">in_flow_delay</a> counters.
 290
 291 </ul>
 292
 293 <p> Client configuration: </p>
 294
 295 <ul>
 296
 297 <li> Queue file overhead was minimized by sending one message to a
 298 virtual alias that expanded into 2000 different remote recipients.
 299 All recipients were accounted for according to the maillog file.
 300 The <a href="postconf.5.html#virtual_alias_expansion_limit">virtual_alias_expansion_limit</a> setting was increased to avoid
 301 complaints from the <a href="cleanup.8.html">cleanup(8)</a> server.
 302
 303 <li> The number of deliveries was maximized with
 304 "<a href="postconf.5.html#smtp_destination_recipient_limit">smtp_destination_recipient_limit</a> = 2". A smaller limit would cause
 305 Postfix to schedule the concurrency per recipient instead of domain,
 306 which is not what we want.
 307
 308 <li> Maximum concurrency was limited with
 309 "<a href="postconf.5.html#smtp_destination_concurrency_limit">smtp_destination_concurrency_limit</a> = 20", and
 310 <a href="postconf.5.html#initial_destination_concurrency">initial_destination_concurrency</a> was set to the same value.
 311
 312 <li> The positive and negative concurrency feedback hysteresis was
 313 1.  Concurrency was incremented by 1 at the END of 1/feedback steps
 314 of positive feedback, and was decremented by 1 at the START of
 315 1/feedback steps of negative feedback.
 316
 317 <li> The SMTP client used the default 30s SMTP connect timeout and
 318 300s SMTP greeting timeout.
 319
 320 </ul>
 321
 322 <h4> Impact of the 30s SMTP connect timeout </h4>
 323
 324 <p> The first results are for a FreeBSD 6.2 server, where our
 325 artificially low listen(2) backlog results in a very short kernel
 326 queue for established connections. The table shows that all deferred
 327 deliveries failed due to a 30s connection timeout, and none failed
 328 due to a server greeting timeout.  This measurement simulates what
 329 happens when the server's connection queue is completely full under
 330 load, and the TCP engine drops new connections.  </p>
 331
 332 <blockquote>
 333
 334 <table>
 335
 336 <tr> <th>client<br> limit</th> <th>server<br> limit</th> <th>feedback<br>
 337 style</th> <th>connection<br> caching</th> <th>percentage<br>
 338 deferred</th> <th colspan="2">client concurrency<br> average/stddev</th>
 339 <th colspan=2>timed-out in<br> connect/greeting </th> </tr>
 340
 341 <tr> <td align="center" colspan="9"> <hr> </td> </tr>
 342
 343 <tr><td align="center">20</td> <td align="center">5</td> <td
 344 align="center">1/N</td> <td align="center">no</td> <td
 345 align="center">9.9</td> <td align="center">19.4</td> <td
 346 align="center">0.49</td> <td align="center">198</td> <td
 347 align="center">-</td> </tr>
 348
 349 <tr><td align="center">20</td> <td align="center">5</td> <td
 350 align="center">1/N</td> <td align="center">yes</td> <td
 351 align="center">10.3</td> <td align="center">19.4</td> <td
 352 align="center">0.49</td> <td align="center">206</td> <td
 353 align="center">-</td> </tr>
 354
 355 <tr><td align="center">20</td> <td align="center">5</td> <td
 356 align="center">1/sqrt(N)</td> <td align="center">no</td>
 357 <td align="center">10.4</td> <td align="center">19.6</td> <td
 358 align="center">0.59</td> <td align="center">208</td> <td
 359 align="center">-</td> </tr>
 360
 361 <tr><td align="center">20</td> <td align="center">5</td> <td
 362 align="center">1/sqrt(N)</td> <td align="center">yes</td>
 363 <td align="center">10.6</td> <td align="center">19.6</td> <td
 364 align="center">0.61</td> <td align="center">212</td> <td
 365 align="center">-</td> </tr>
 366
 367 <tr><td align="center">20</td> <td align="center">5</td> <td
 368 align="center">1</td> <td align="center">no</td> <td
 369 align="center">10.1</td> <td align="center">19.5</td> <td
 370 align="center">1.29</td> <td align="center">202</td> <td
 371 align="center">-</td> </tr>
 372
 373 <tr><td align="center">20</td> <td align="center">5</td> <td
 374 align="center">1</td> <td align="center">yes</td> <td
 375 align="center">10.8</td> <td align="center">19.3</td> <td
 376 align="center">1.57</td> <td align="center">216</td> <td
 377 align="center">-</td> </tr>
 378
 379 <tr> <td align="center" colspan="9"> <hr> </td> </tr>
 380
 381 </table>
 382
 383 <p> A busy server with a completely full connection queue.  N is
 384 the client delivery concurrency.  Failed deliveries time out after
 385 30s without completing the TCP handshake. See text for a discussion
 386 of results. </p>
 387
 388 </blockquote>
 389
 390 <h4> Impact of the 300s SMTP greeting timeout </h4>
 391
 392 <p> The next table shows results for a Fedora Core 8 server (results
 393 for RedHat 7.3 are identical). In this case, the artificially small
 394 listen(2) backlog argument does not impact our measurement.  The
 395 table shows that practically all deferred deliveries fail after the
 396 300s SMTP greeting timeout. As these timeouts were 10x longer than
 397 with the first measurement, we increased the recipient count (and
 398 thus the running time) by a factor of 10 to keep the results
 399 comparable. The deferred mail percentages are a factor 10 lower
 400 than with the first measurement, because the 1s per-recipient delay
 401 was 1/300th of the greeting timeout instead of 1/30th of the
 402 connection timeout.  </p>
 403
 404 <blockquote>
 405
 406 <table>
 407
 408 <tr> <th>client<br> limit</th> <th>server<br> limit</th> <th>feedback<br>
 409 style</th> <th>connection<br> caching</th> <th>percentage<br>
 410 deferred</th> <th colspan="2">client concurrency<br> average/stddev</th>
 411 <th colspan=2>timed-out in<br> connect/greeting </th> </tr>
 412
 413 <tr> <td align="center" colspan="9"> <hr> </td> </tr>
 414
 415 <tr> <td align="center">20</td> <td align="center">5</td> <td
 416 align="center">1/N</td> <td align="center">no</td> <td
 417 align="center">1.16</td> <td align="center">19.8</td> <td
 418 align="center">0.37</td> <td align="center">-</td> <td
 419 align="center">230</td> </tr>
 420
 421 <tr> <td align="center">20</td> <td align="center">5</td> <td
 422 align="center">1/N</td> <td align="center">yes</td> <td
 423 align="center">1.36</td> <td align="center">19.8</td> <td
 424 align="center">0.36</td> <td align="center">-</td> <td
 425 align="center">272</td> </tr>
 426
 427 <tr> <td align="center">20</td> <td align="center">5</td> <td
 428 align="center">1/sqrt(N)</td> <td align="center">no</td>
 429 <td align="center">1.21</td> <td align="center">19.9</td> <td
 430 align="center">0.23</td> <td align="center">4</td> <td
 431 align="center">238</td> </tr>
 432
 433 <tr> <td align="center">20</td> <td align="center">5</td> <td
 434 align="center">1/sqrt(N)</td> <td align="center">yes</td>
 435 <td align="center">1.36</td> <td align="center">20.0</td> <td
 436 align="center">0.23</td> <td align="center">-</td> <td
 437 align="center">272</td> </tr>
 438
 439 <tr> <td align="center">20</td> <td align="center">5</td> <td
 440 align="center">1</td> <td align="center">no</td> <td
 441 align="center">1.18</td> <td align="center">20.0</td> <td
 442 align="center">0.16</td> <td align="center">-</td> <td
 443 align="center">236</td> </tr>
 444
 445 <tr> <td align="center">20</td> <td align="center">5</td> <td
 446 align="center">1</td> <td align="center">yes</td> <td
 447 align="center">1.39</td> <td align="center">20.0</td> <td
 448 align="center">0.16</td> <td align="center">-</td> <td
 449 align="center">278</td> </tr>
 450
 451 <tr> <td align="center" colspan="9"> <hr> </td> </tr>
 452
 453 </table>
 454
 455 <p> A busy server with a non-full connection queue.  N is the client
 456 delivery concurrency. Failed deliveries complete at the TCP level,
 457 but time out after 300s while waiting for the SMTP greeting.  See
 458 text for a discussion of results.  </p>
 459
 460 </blockquote>
 461
 462 <h4> Impact of active server concurrency limiter </h4>
 463
 464 <p> The final concurrency-limited result shows what happens when
 465 SMTP connections don't time out, but are rejected immediately with
 466 the Postfix server's <a href="postconf.5.html#smtpd_client_connection_count_limit">smtpd_client_connection_count_limit</a> feature
 467 (the server replies with a 421 status and disconnects immediately).
 468 Similar results can be expected with concurrency limiting features
 469 built into other MTAs or firewalls.  For this measurement we specified
 470 a server concurrency limit and a client initial destination concurrency
 471 of 5, and a server process limit of 10; all other conditions were
 472 the same as with the first measurement. The same result would be
 473 obtained with a FreeBSD or Linux server, because the "pushing back"
 474 is done entirely by the receiving side. </p>
 475
 476 <blockquote>
 477
 478 <table>
 479
 480 <tr> <th>client<br> limit</th> <th>server<br> limit</th> <th>feedback<br>
 481 style</th> <th>connection<br> caching</th> <th>percentage<br>
 482 deferred</th> <th colspan="2">client concurrency<br> average/stddev</th>
 483 <th>theoretical<br>defer rate</th> </tr>
 484
 485 <tr> <td align="center" colspan="9"> <hr> </td> </tr>
 486
 487 <tr> <td align="center">20</td> <td align="center">5</td> <td
 488 align="center">1/N</td> <td align="center">no</td> <td
 489 align="center">16.5</td> <td align="center">5.17</td> <td
 490 align="center">0.38</td> <td align="center">1/6</td> </tr>
 491
 492 <tr> <td align="center">20</td> <td align="center">5</td> <td
 493 align="center">1/N</td> <td align="center">yes</td> <td
 494 align="center">16.5</td> <td align="center">5.17</td> <td
 495 align="center">0.38</td> <td align="center">1/6</td> </tr>
 496
 497 <tr> <td align="center">20</td> <td align="center">5</td> <td
 498 align="center">1/sqrt(N)</td> <td align="center">no</td>
 499 <td align="center">24.5</td> <td align="center">5.28</td> <td
 500 align="center">0.45</td> <td align="center">1/4</td> </tr>
 501
 502 <tr> <td align="center">20</td> <td align="center">5</td> <td
 503 align="center">1/sqrt(N)</td> <td align="center">yes</td>
 504 <td align="center">24.3</td> <td align="center">5.28</td> <td
 505 align="center">0.46</td> <td align="center">1/4</td> </tr>
 506
 507 <tr> <td align="center">20</td> <td align="center">5</td> <td
 508 align="center">1</td> <td align="center">no</td> <td
 509 align="center">49.7</td> <td align="center">5.63</td> <td
 510 align="center">0.67</td> <td align="center">1/2</td> </tr>
 511
 512 <tr> <td align="center">20</td> <td align="center">5</td> <td
 513 align="center">1</td> <td align="center">yes</td> <td
 514 align="center">49.7</td> <td align="center">5.68</td> <td
 515 align="center">0.70</td> <td align="center">1/2</td> </tr>
 516
 517 <tr> <td align="center" colspan="9"> <hr> </td> </tr>
 518
 519 </table>
 520
 521 <p> A server with active per-client concurrency limiter that replies
 522 with 421 and disconnects.  N is the client delivery concurrency.
 523 The theoretical defer rate is 1/(1+roundup(1/feedback)).  This is
 524 always 1/2 with the fixed +/-1 feedback per delivery; with the
 525 concurrency-dependent feedback variants, the defer rate decreases
 526 with increasing concurrency. See text for a discussion of results.
 527 </p>
 528
 529 </blockquote>
 530
 531 <h3> <a name="concurrency_discussion"> Discussion of concurrency-limited server results </a> </h3>
 532
 533 <p> All results in the previous sections are based on the first
 534 delivery runs only; they do not include any second etc. delivery
 535 attempts. It's also worth noting that the measurements look at
 536 steady-state behavior only. They don't show what happens when the
 537 client starts sending at a much higher or lower concurrency.
 538 </p>
 539
 540 <p> The first two examples show that the effect of feedback
 541 is negligible when concurrency is limited due to congestion. This
 542 is because the initial concurrency is already at the client's
 543 concurrency maximum, and because there is 10-100 times more positive
 544 than negative feedback.  Under these conditions, it is no surprise
 545 that the contribution from SMTP connection caching is also negligible.
 546 </p>
 547
 548 <p> In the last example, the old +/-1 feedback per delivery will
 549 defer 50% of the mail when confronted with an active (anvil-style)
 550 server concurrency limit, where the server hangs up immediately
 551 with a 421 status (a TCP-level RST would have the same result).
 552 Less aggressive feedback mechanisms fare better than more aggressive
 553 ones.  Concurrency-dependent feedback fares even better at higher
 554 concurrencies than shown here, but has limitations as discussed in
 555 the next section.  </p>
 556
 557 <h3> <a name="concurrency_limitations"> Limitations of less-than-1 per delivery feedback </a> </h3>
 558
 559 <p> Less-than-1 feedback is of interest primarily when sending large
 560 amounts of mail to destinations with active concurrency limiters
 561 (servers that reply with 421, or firewalls that send RST).  When
 562 sending small amounts of mail per destination, less-than-1 per-delivery
 563 feedback won't have a noticeable effect on the per-destination
 564 concurrency, because the number of deliveries to the same destination
 565 is too small. You might just as well use zero per-delivery feedback
 566 and stay with the initial per-destination concurrency. And when
 567 mail deliveries fail due to congestion instead of active concurrency
 568 limiters, the measurements above show that per-delivery feedback
 569 has no effect.  With large amounts of mail you might just as well
 570 use zero per-delivery feedback and start with the maximal per-destination
 571 concurrency.  </p>
 572
 573 <p> The scheduler with less-than-1 concurrency
 574 feedback per delivery solves a problem with servers that have active
 575 concurrency limiters.  This works only because feedback is handled
 576 in a peculiar manner: positive feedback will increment the concurrency
 577 by 1 at the <b>end</b> of a sequence of events of length 1/feedback,
 578 while negative feedback will decrement concurrency by 1 at the
 579 <b>beginning</b> of such a sequence.  This is how Postfix adjusts
 580 quickly for overshoot without causing lots of mail to be deferred.
 581 Without this difference in feedback treatment, less-than-1 feedback
 582 per delivery would defer 50% of the mail, and would be no better
 583 in this respect than the old +/-1 feedback per delivery.  </p>
 584
 585 <p> Unfortunately, the same feature that corrects quickly for
 586 concurrency overshoot also makes the scheduler more sensitive for
 587 noisy negative feedback.  The reason is that one lonely negative
 588 feedback event has the same effect as a complete sequence of length
 589 1/feedback: in both cases delivery concurrency is dropped by 1
 590 immediately.  As a worst-case scenario, consider multiple servers
 591 behind a load balancer on a single IP address, and no backup MX
 592 address.  When 1 out of K servers fails to complete the SMTP handshake
 593 or drops the connection, a scheduler with 1/N (N = concurrency)
 594 feedback stops increasing its concurrency once it reaches a concurrency
 595 level of about K,  even though the good servers behind the load
 596 balancer are perfectly capable of handling more traffic. </p>
 597
 598 <p> This noise problem gets worse as the amount of positive feedback
 599 per delivery gets smaller.  A compromise is to use fixed less-than-1
 600 positive feedback values instead of concurrency-dependent positive
 601 feedback.  For example, to tolerate 1 of 4 bad servers in the above
 602 load balancer scenario, use positive feedback of 1/4 per "good"
 603 delivery (no connect or handshake error), and use an equal or smaller
 604 amount of negative feedback per "bad" delivery.  The downside of
 605 using concurrency-independent feedback is that some of the old +/-1
 606 feedback problems will return at large concurrencies.  Sites that
 607 must deliver mail at non-trivial per-destination concurrencies will
 608 require special configuration.  </p>
 609
 610 <h3> <a name="concurrency_config"> Concurrency configuration parameters </a> </h3>
 611
 612 <p> The Postfix 2.5 concurrency scheduler is controlled with the
 613 following configuration parameters, where "<i>transport</i>_foo"
 614 provides a transport-specific parameter override.  All parameter
 615 default settings are compatible with earlier Postfix versions. </p>
 616
 617 <blockquote>
 618
 619 <table border="0">
 620
 621 <tr> <th> Parameter name </th> <th> Postfix version </th> <th>
 622 Description </th> </tr>
 623
 624 <tr> <td colspan="3"> <hr> </td> </tr>
 625
 626 <tr> <td> <a href="postconf.5.html#initial_destination_concurrency">initial_destination_concurrency</a><br>
 627 <a href="postconf.5.html#transport_initial_destination_concurrency"><i>transport</i>_initial_destination_concurrency</a> </td> <td
 628 align="center"> all<br> 2.5 </td> <td> Initial per-destination
 629 delivery concurrency </td> </tr>
 630
 631 <tr> <td> <a href="postconf.5.html#default_destination_concurrency_limit">default_destination_concurrency_limit</a><br>
 632 <a href="postconf.5.html#transport_destination_concurrency_limit"><i>transport</i>_destination_concurrency_limit</a> </td> <td align="center">
 633 all<br> all </td> <td> Maximum per-destination delivery concurrency
 634 </td> </tr>
 635
 636 <tr> <td> <a href="postconf.5.html#default_destination_concurrency_positive_feedback">default_destination_concurrency_positive_feedback</a><br>
 637 <a href="postconf.5.html#transport_destination_concurrency_positive_feedback"><i>transport</i>_destination_concurrency_positive_feedback</a> </td>
 638 <td align="center"> 2.5<br> 2.5 </td> <td> Per-destination positive
 639 feedback amount, per delivery that does not fail with connection
 640 or handshake failure </td> </tr>
 641
 642 <tr> <td> <a href="postconf.5.html#default_destination_concurrency_negative_feedback">default_destination_concurrency_negative_feedback</a><br>
 643 <a href="postconf.5.html#transport_destination_concurrency_negative_feedback"><i>transport</i>_destination_concurrency_negative_feedback</a> </td>
 644 <td align="center"> 2.5<br> 2.5 </td> <td> Per-destination negative
 645 feedback amount, per delivery that fails with connection or handshake
 646 failure </td> </tr>
 647
 648 <tr> <td> <a href="postconf.5.html#default_destination_concurrency_failed_cohort_limit">default_destination_concurrency_failed_cohort_limit</a><br>
 649 <a href="postconf.5.html#transport_destination_concurrency_failed_cohort_limit"><i>transport</i>_destination_concurrency_failed_cohort_limit</a> </td>
 650 <td align="center"> 2.5<br> 2.5 </td> <td> Number of failed
 651 pseudo-cohorts after which a destination is declared "dead" and
 652 delivery is suspended </td> </tr>
 653
 654 <tr> <td> <a href="postconf.5.html#destination_concurrency_feedback_debug">destination_concurrency_feedback_debug</a></td> <td align="center">
 655 2.5 </td> <td> Enable verbose logging of concurrency scheduler
 656 activity </td> </tr>
 657
 658 <tr> <td colspan="3"> <hr> </td> </tr>
 659
 660 </table>
 661
 662 </blockquote>
 663
 664 <h2> <a name="jobs"> Preemptive scheduling </a> </h2>
 665
 666 <p>
 667
 668 The following sections describe the new queue manager and its
 669 preemptive scheduler algorithm. Note that the document was originally
 670 written to describe the changes between the new queue manager (in
 671 this text referred to as <tt>nqmgr</tt>, the name it was known by
 672 before it became the default queue manager) and the old queue manager
 673 (referred to as <tt>oqmgr</tt>). This is why it refers to <tt>oqmgr</tt>
 674 every so often.
 675
 676 </p>
 677
 678 <p>
 679
 680 This document is divided into sections as follows:
 681
 682 </p>
 683
 684 <ul>
 685
 686 <li> <a href="#<tt>nqmgr</tt>_structures"> The structures used by
 687 nqmgr </a>
 688
 689 <li> <a href="#<tt>nqmgr</tt>_pickup"> What happens when nqmgr picks
 690 up the message </a> - how it is assigned to transports, jobs, peers,
 691 entries
 692
 693 <li> <a href="#<tt>nqmgr</tt>_selection"> How the entry selection
 694 works </a>
 695
 696 <li> <a href="#<tt>nqmgr</tt>_preemption"> How the preemption
 697 works </a> - what messages may be preempted and how and what messages
 698 are chosen to preempt them
 699
 700 <li> <a href="#<tt>nqmgr</tt>_concurrency"> How destination concurrency
 701 limits affect the scheduling algorithm </a>
 702
 703 <li> <a href="#<tt>nqmgr</tt>_memory"> Dealing with memory resource
 704 limits </a>
 705
 706 </ul>
 707
 708 <h3> <a name="<tt>nqmgr</tt>_structures"> The structures used by
 709 nqmgr </a> </h3>
 710
 711 <p>
 712
 713 Let's start by recapitulating the structures and terms used when
 714 referring to queue manager and how it operates. Many of these are
 715 partially described elsewhere, but it is nice to have a coherent
 716 overview in one place:
 717
 718 </p>
 719
 720 <ul>
 721
 722 <li> <p> Each message structure represents one mail message which
 723 Postfix is to deliver. The message recipients specify to what
 724 destinations is the message to be delivered and what transports are
 725 going to be used for the delivery. </p>
 726
 727 <li> <p> Each recipient entry groups a batch of recipients of one
 728 message which are all going to be delivered to the same destination.
 729 </p>
 730
 731 <li> <p> Each transport structure groups everything what is going
 732 to be delivered by delivery agents dedicated for that transport.
 733 Each transport maintains a set of queues (describing the destinations
 734 it shall talk to) and jobs (referencing the messages it shall
 735 deliver). </p>
 736
 737 <li> <p> Each transport queue (not to be confused with the on-disk
 738 <a href="QSHAPE_README.html#active_queue">active queue</a> or <a href="QSHAPE_README.html#incoming_queue">incoming queue</a>) groups everything what is going be
 739 delivered to given destination (aka nexthop) by its transport.  Each
 740 queue belongs to one transport, so each destination may be referred
 741 to by several queues, one for each transport.  Each queue maintains
 742 a list of all recipient entries (batches of message recipients)
 743 which shall be delivered to given destination (the todo list), and
 744 a list of recipient entries already being delivered by the delivery
 745 agents (the busy list). </p>
 746
 747 <li> <p> Each queue corresponds to multiple peer structures.  Each
 748 peer structure is like the queue structure, belonging to one transport
 749 and referencing one destination. The difference is that it lists
 750 only the recipient entries which all originate from the same message,
 751 unlike the queue structure, whose entries may originate from various
 752 messages. For messages with few recipients, there is usually just
 753 one recipient entry for each destination, resulting in one recipient
 754 entry per peer. But for large mailing list messages the recipients
 755 may need to be split to multiple recipient entries, in which case
 756 the peer structure may list many entries for single destination.
 757 </p>
 758
 759 <li> <p> Each transport job groups everything it takes to deliver
 760 one message via its transport. Each job represents one message
 761 within the context of the transport. The job belongs to one transport
 762 and message, so each message may have multiple jobs, one for each
 763 transport. The job groups all the peer structures, which describe
 764 the destinations the job's message has to be delivered to. </p>
 765
 766 </ul>
 767
 768 <p>
 769
 770 The first four structures are common to both <tt>nqmgr</tt> and
 771 <tt>oqmgr</tt>, the latter two were introduced by <tt>nqmgr</tt>.
 772
 773 </p>
 774
 775 <p>
 776
 777 These terms are used extensively in the text below, feel free to
 778 look up the description above anytime you'll feel you have lost a
 779 sense what is what.
 780
 781 </p>
 782
 783 <h3> <a name="<tt>nqmgr</tt>_pickup"> What happens when nqmgr picks
 784 up the message </a> </h3>
 785
 786 <p>
 787
 788 Whenever <tt>nqmgr</tt> moves a queue file into the <a href="QSHAPE_README.html#active_queue">active queue</a>,
 789 the following happens: It reads all necessary information from the
 790 queue file as <tt>oqmgr</tt> does, and also reads as many recipients
 791 as possible - more on that later, for now let's just pretend it
 792 always reads all recipients.
 793
 794 </p>
 795
 796 <p>
 797
 798 Then it resolves the recipients as <tt>oqmgr</tt> does, which
 799 means obtaining (address, nexthop, transport) triple for each
 800 recipient. For each triple, it finds the transport; if it does not
 801 exist yet, it instantiates it (unless it's dead). Within the
 802 transport, it finds the destination queue for given nexthop; if it
 803 does not exist yet, it instantiates it (unless it's dead). The
 804 triple is then bound to given destination queue. This happens in
 805 qmgr_resolve() and is basically the same as in <tt>oqmgr</tt>.
 806
 807 </p>
 808
 809 <p>
 810
 811 Then for each triple which was bound to some queue (and thus
 812 transport), the program finds the job which represents the message
 813 within that transport's context; if it does not exist yet, it
 814 instantiates it. Within the job, it finds the peer which represents
 815 the bound destination queue within this jobs context; if it does
 816 not exist yet, it instantiates it.  Finally, it stores the address
 817 from the resolved triple to the recipient entry which is appended
 818 to both the queue entry list and the peer entry list. The addresses
 819 for same nexthop are batched in the entries up to recipient_concurrency
 820 limit for that transport. This happens in qmgr_assign() and apart
 821 from that it operates with job and peer structures it is basically the
 822 same as in <tt>oqmgr</tt>.
 823
 824 </p>
 825
 826 <p>
 827
 828 When the job is instantiated, it is enqueued on the transport's job
 829 list based on the time its message was picked up by <tt>nqmgr</tt>.
 830 For first batch of recipients this means it is appended to the end
 831 of the job list, but the ordering of the job list by the enqueue
 832 time is important as we will see shortly.
 833
 834 </p>
 835
 836 <p>
 837
 838 [Now you should have pretty good idea what is the state of the
 839 <tt>nqmgr</tt> after couple of messages was picked up, what is the
 840 relation between all those job, peer, queue and entry structures.]
 841
 842 </p>
 843
 844 <h3> <a name="<tt>nqmgr</tt>_selection"> How the entry selection
 845 works </a> </h3>
 846
 847 <p>
 848
 849 Having prepared all those above mentioned structures, the task of
 850 the <tt>nqmgr</tt>'s scheduler is to choose the recipient entries
 851 one at a time and pass them to the delivery agent for corresponding
 852 transport. Now how does this work?
 853
 854 </p>
 855
 856 <p>
 857
 858 The first approximation of the new scheduling algorithm is like this:
 859
 860 </p>
 861
 862 <blockquote>
 863 <pre>
 864 foreach transport (round-robin-by-transport)
 865 do
 866     if transport busy continue
 867     if transport process limit reached continue
 868     foreach transport's job (in the order of the transport's job list)
 869     do
 870         foreach job's peer (round-robin-by-destination)
 871              if peer-&gt;queue-&gt;concurrency &lt; peer-&gt;queue-&gt;window
 872                  return next peer entry.
 873         done
 874     done
 875 done
 876 </pre>
 877 </blockquote>
 878
 879 <p>
 880
 881 Now what is the "order of the transport's job list"? As we know
 882 already, the job list is by default kept in the order the message
 883 was picked up by the <tt>nqmgr</tt>. So by default we get the
 884 top-level round-robin transport, and within each transport we get
 885 the FIFO message delivery. The round-robin of the peers by the
 886 destination is perhaps of little importance in most real-life cases
 887 (unless the recipient_concurrency limit is reached, in one job there
 888 is only one peer structure for each destination), but theoretically
 889 it makes sure that even within single jobs, destinations are treated
 890 fairly.
 891
 892 </p>
 893
 894 <p>
 895
 896 [By now you should have a feeling you really know how the scheduler
 897 works, except for the preemption, under ideal conditions - that is,
 898 no recipient resource limits and no destination concurrency problems.]
 899
 900 </p>
 901
 902 <h3> <a name="<tt>nqmgr</tt>_preemption"> How the preemption
 903 works </a> </h3>
 904
 905 <p>
 906
 907 As you might perhaps expect by now, the transport's job list does
 908 not remain sorted by the job's message enqueue time all the time.
 909 The most cool thing about <tt>nqmgr</tt> is not the simple FIFO
 910 delivery, but that it is able to slip mail with little recipients
 911 past the mailing-list bulk mail.  This is what the job preemption
 912 is about - shuffling the jobs on the transport's job list to get
 913 the best message delivery rates. Now how is it achieved?
 914
 915 </p>
 916
 917 <p>
 918
 919 First I have to tell you that there are in fact two job lists in
 920 each transport. One is the scheduler's job list, which the scheduler
 921 is free to play with, while the other one keeps the jobs always
 922 listed in the order of the enqueue time and is used for recipient
 923 pool management we will discuss later. For now, we will deal with
 924 the scheduler's job list only.
 925
 926 </p>
 927
 928 <p>
 929
 930 So, we have the job list, which is first ordered by the time the
 931 jobs' messages were enqueued, oldest messages first, the most recently
 932 picked one at the end. For now, let's assume that there are no
 933 destination concurrency problems. Without preemption, we pick some
 934 entry of the first (oldest) job on the queue, assign it to delivery
 935 agent, pick another one from the same job, assign it again, and so
 936 on, until all the entries are used and the job is delivered. We
 937 would then move onto the next job and so on and on. Now how do we
 938 manage to sneak in some entries from the recently added jobs when
 939 the first job on the job list belongs to a message going to the
 940 mailing-list and has thousands of recipient entries?
 941
 942 </p>
 943
 944 <p>
 945
 946 The <tt>nqmgr</tt>'s answer is that we can artificially "inflate"
 947 the delivery time of that first job by some constant for free - it
 948 is basically the same trick you might remember as "accumulation of
 949 potential" from the amortized complexity lessons. For example,
 950 instead of delivering the entries of the first job on the job list
 951 every time a delivery agent becomes available, we can do it only
 952 every second time. If you view the moments the delivery agent becomes
 953 available on a timeline as "delivery slots", then instead of using
 954 every delivery slot for the first job, we can use only every other
 955 slot, and still the overall delivery efficiency of the first job
 956 remains the same. So the delivery <tt>11112222</tt> becomes
 957 <tt>1.1.1.1.2.2.2.2</tt> (1 and 2 are the imaginary job numbers, .
 958 denotes the free slot). Now what do we do with free slots?
 959
 960 </p>
 961
 962 <p>
 963
 964 As you might have guessed, we will use them for sneaking the mail
 965 with little recipients in. For example, if we have one four-recipient
 966 mail followed by four one recipients mail, the delivery sequence
 967 (that is, the sequence in which the jobs are assigned to the
 968 delivery slots) might look like this: <tt>12131415</tt>. Hmm, fine
 969 for sneaking in the single recipient mail, but how do we sneak in
 970 the mail with more than one recipient? Say if we have one four-recipient
 971 mail followed by two two-recipient mails?
 972
 973 </p>
 974
 975 <p>
 976
 977 The simple answer would be to use delivery sequence <tt>12121313</tt>.
 978 But the problem is that this does not scale well. Imagine you have
 979 mail with thousand recipients followed by mail with hundred recipients.
 980 It is tempting to suggest the  delivery sequence like <tt>121212....</tt>,
 981 but alas! Imagine there arrives another mail with say ten recipients.
 982 But there are no free slots anymore, so it can't slip by, not even
 983 if it had just only one recipients.  It will be stuck until the
 984 hundred-recipient mail is delivered, which really sucks.
 985
 986 </p>
 987
 988 <p>
 989
 990 So, it becomes obvious that while inflating the message to get
 991 free slots is great idea, one has to be really careful of how the
 992 free slots are assigned, otherwise one might corner himself. So,
 993 how does <tt>nqmgr</tt> really use the free slots?
 994
 995 </p>
 996
 997 <p>
 998
 999 The key idea is that one does not have to generate the free slots
1000 in a uniform way. The delivery sequence <tt>111...1</tt> is no
1001 worse than <tt>1.1.1.1</tt>, in fact, it is even better as some
1002 entries are in the first case selected earlier than in the second
1003 case, and none is selected later! So it is possible to first
1004 "accumulate" the free delivery slots and then use them all at once.
1005 It is even possible to accumulate some, then use them, then accumulate
1006 some more and use them again, as in <tt>11..1.1</tt> .
1007
1008 </p>
1009
1010 <p>
1011
1012 Let's get back to the one hundred recipient example. We now know
1013 that we could first accumulate one hundred free slots, and only
1014 after then to preempt the first job and sneak the one hundred
1015 recipient mail in. Applying the algorithm recursively, we see the
1016 hundred recipient job can accumulate ten free delivery slots, and
1017 then we could preempt it and sneak in the ten-recipient mail...
1018 Wait wait wait! Could we? Aren't we overinflating the original one
1019 thousand recipient mail?
1020
1021 </p>
1022
1023 <p>
1024
1025 Well, despite it looks so at the first glance, another trick will
1026 allow us to answer "no, we are not!". If we had said that we will
1027 inflate the delivery time twice at maximum, and then we consider
1028 every other slot as a free slot, then we would overinflate in case
1029 of the recursive preemption. BUT! The trick is that if we use only
1030 every n-th slot as a free slot for n&gt;2, there is always some worst
1031 inflation factor which we can guarantee not to be breached, even
1032 if we apply the algorithm recursively. To be precise, if for every
1033 k&gt;1 normally used slots we accumulate one free delivery slot, than
1034 the inflation factor is not worse than k/(k-1) no matter how many
1035 recursive preemptions happen. And it's not worse than (k+1)/k if
1036 only non-recursive preemption happens. Now, having got through the
1037 theory and the related math, let's see how <tt>nqmgr</tt> implements
1038 this.
1039
1040 </p>
1041
1042 <p>
1043
1044 Each job has so called "available delivery slot" counter. Each
1045 transport has a <a href="postconf.5.html#transport_delivery_slot_cost"><i>transport</i>_delivery_slot_cost</a> parameter, which
1046 defaults to <a href="postconf.5.html#default_delivery_slot_cost">default_delivery_slot_cost</a> parameter which is set to 5
1047 by default. This is the k from the paragraph above. Each time k
1048 entries of the job are selected for delivery, this counter is
1049 incremented by one. Once there are some slots accumulated, job which
1050 requires no more than that number of slots to be fully delivered
1051 can preempt this job.
1052
1053 </p>
1054
1055 <p>
1056
1057 [Well, the truth is, the counter is incremented every time an entry
1058 is selected and it is divided by k when it is used. Or even more
1059 true, there is no division, the other side of the equation is
1060 multiplied by k. But for the understanding it's good enough to use
1061 the above approximation of the truth.]
1062
1063 </p>
1064
1065 <p>
1066
1067 OK, so now we know the conditions which must be satisfied so one
1068 job can preempt another one. But what job gets preempted, how do
1069 we choose what job preempts it if there are several valid candidates,
1070 and when does all this exactly happen?
1071
1072 </p>
1073
1074 <p>
1075
1076 The answer for the first part is simple. The job whose entry was
1077 selected the last time is so called current job. Normally, it is
1078 the first job on the scheduler's job list, but destination concurrency
1079 limits may change this as we will see later. It is always only the
1080 current job which may get preempted.
1081
1082 </p>
1083
1084 <p>
1085
1086 Now for the second part. The current job has certain amount of
1087 recipient entries, and as such may accumulate at maximum some amount
1088 of available delivery slots. It might have already accumulated some,
1089 and perhaps even already used some when it was preempted before
1090 (remember a job can be preempted several times). In either case,
1091 we know how many are accumulated and how many are left to deliver,
1092 so we know how many it may yet accumulate at maximum. Every other
1093 job which may be delivered by less than that number of slots is a
1094 valid candidate for preemption. How do we choose among them?
1095
1096 </p>
1097
1098 <p>
1099
1100 The answer is - the one with maximum enqueue_time/recipient_entry_count.
1101 That is, the older the job is, the more we should try to deliver
1102 it in order to get best message delivery rates. These rates are of
1103 course subject to how many recipients the message has, therefore
1104 the division by the recipient (entry) count. No one shall be surprised
1105 that message with n recipients takes n times longer to deliver than
1106 message with one recipient.
1107
1108 </p>
1109
1110 <p>
1111
1112 Now let's recap the previous two paragraphs. Isn't it too complicated?
1113 Why don't the candidates come only among the jobs which can be
1114 delivered within the number of slots the current job already
1115 accumulated? Why do we need to estimate how much it has yet to
1116 accumulate? If you found out the answer, congratulate yourself. If
1117 we did it this simple way, we would always choose the candidate
1118 with least recipient entries. If there were enough single recipient
1119 mails coming in, they would always slip by the bulk mail as soon
1120 as possible, and the two and more recipients mail would never get
1121 a chance, no matter how long they have been sitting around in the
1122 job list.
1123
1124 </p>
1125
1126 <p>
1127
1128 This candidate selection has interesting implication - that when
1129 we choose the best candidate for preemption (this is done in
1130 qmgr_choose_candidate()), it may happen that we may not use it for
1131 preemption immediately. This leads to an answer to the last part
1132 of the original question - when does the preemption happen?
1133
1134 </p>
1135
1136 <p>
1137
1138 The preemption attempt happens every time next transport's recipient
1139 entry is to be chosen for delivery. To avoid needless overhead, the
1140 preemption is not attempted if the current job could never accumulate
1141 more than <a href="postconf.5.html#transport_minimum_delivery_slots"><i>transport</i>_minimum_delivery_slots</a> (defaults to
1142 <a href="postconf.5.html#default_minimum_delivery_slots">default_minimum_delivery_slots</a> which defaults to 3). If there is
1143 already enough accumulated slots to preempt the current job by the
1144 chosen best candidate, it is done immediately. This basically means
1145 that the candidate is moved in front of the current job on the
1146 scheduler's job list and decreasing the accumulated slot counter
1147 by the amount used by the candidate. If there is not enough slots...
1148 well, I could say that nothing happens and the another preemption
1149 is attempted the next time. But that's not the complete truth.
1150
1151 </p>
1152
1153 <p>
1154
1155 The truth is that it turns out that it is not really necessary to
1156 wait until the jobs counter accumulates all the delivery slots in
1157 advance. Say we have ten-recipient mail followed by two two-recipient
1158 mails. If the preemption happened when enough delivery slot accumulate
1159 (assuming slot cost 2), the delivery sequence becomes
1160 <tt>11112211113311</tt>. Now what would we get if we would wait
1161 only for 50% of the necessary slots to accumulate and we promise
1162 we would wait for the remaining 50% later, after we get back
1163 to the preempted job? If we use such slot loan, the delivery sequence
1164 becomes <tt>11221111331111</tt>. As we can see, it makes it no
1165 considerably worse for the delivery of the ten-recipient mail, but
1166 it allows the small messages to be delivered sooner.
1167
1168 </p>
1169
1170 <p>
1171
1172 The concept of these slot loans is where the
1173 <a href="postconf.5.html#transport_delivery_slot_discount"><i>transport</i>_delivery_slot_discount</a> and
1174 <a href="postconf.5.html#transport_delivery_slot_loan"><i>transport</i>_delivery_slot_loan</a> come from (they default to
1175 <a href="postconf.5.html#default_delivery_slot_discount">default_delivery_slot_discount</a> and <a href="postconf.5.html#default_delivery_slot_loan">default_delivery_slot_loan</a>, whose
1176 values are by default 50 and 3, respectively). The discount (resp.
1177 loan) specifies how many percent (resp. how many slots) one "gets
1178 in advance", when the number of slots required to deliver the best
1179 candidate is compared with the number of slots the current slot had
1180 accumulated so far.
1181
1182 </p>
1183
1184 <p>
1185
1186 And it pretty much concludes this chapter.
1187
1188 </p>
1189
1190 <p>
1191
1192 [Now you should have a feeling that you pretty much understand the
1193 scheduler and the preemption, or at least that you will have it
1194 after you read the last chapter couple more times. You shall clearly
1195 see the job list and the preemption happening at its head, in ideal
1196 delivery conditions. The feeling of understanding shall last until
1197 you start wondering what happens if some of the jobs are blocked,
1198 which you might eventually figure out correctly from what had been
1199 said already. But I would be surprised if your mental image of the
1200 scheduler's functionality is not completely shattered once you
1201 start wondering how it works when not all recipients may be read
1202 in-core.  More on that later.]
1203
1204 </p>
1205
1206 <h3> <a name="<tt>nqmgr</tt>_concurrency"> How destination concurrency
1207 limits affect the scheduling algorithm </a> </h3>
1208
1209 <p>
1210
1211 The <tt>nqmgr</tt> uses the same algorithm for destination concurrency
1212 control as <tt>oqmgr</tt>. Now what happens when the destination
1213 limits are reached and no more entries for that destination may be
1214 selected by the scheduler?
1215
1216 </p>
1217
1218 <p>
1219
1220 From user's point of view it is all simple. If some of the peers
1221 of a job can't be selected, those peers are simply skipped by the
1222 entry selection algorithm (the pseudo-code described before) and
1223 only the selectable ones are used. If none of the peers may be
1224 selected, the job is declared a "blocker job". Blocker jobs are
1225 skipped by the entry selection algorithm and they are also excluded
1226 from the candidates for preemption of current job. Thus the scheduler
1227 effectively behaves as if the blocker jobs didn't exist on the job
1228 list at all. As soon as at least one of the peers of a blocker job
1229 becomes unblocked (that is, the delivery agent handling the delivery
1230 of the recipient entry for given destination successfully finishes),
1231 the job's blocker status is removed and the job again participates
1232 in all further scheduler actions normally.
1233
1234 </p>
1235
1236 <p>
1237
1238 So the summary is that the users don't really have to be concerned
1239 about the interaction of the destination limits and scheduling
1240 algorithm. It works well on its own and there are no knobs they
1241 would need to control it.
1242
1243 </p>
1244
1245 <p>
1246
1247 From a programmer's point of view, the blocker jobs complicate the
1248 scheduler quite a lot. Without them, the jobs on the job list would
1249 be normally delivered in strict FIFO order. If the current job is
1250 preempted, the job preempting it is completely delivered unless it
1251 is preempted itself. Without blockers, the current job is thus
1252 always either the first job on the job list, or the top of the stack
1253 of jobs preempting the first job on the job list.
1254
1255 </p>
1256
1257 <p>
1258
1259 The visualization of the job list and the preemption stack without
1260 blockers would be like this:
1261
1262 </p>
1263
1264 <blockquote>
1265 <pre>
1266 first job-&gt;    1--2--3--5--6--8--...    &lt;- job list
1267 on job list    |
1268                4    &lt;- preemption stack
1269                |
1270 current job-&gt;  7
1271 </pre>
1272 </blockquote>
1273
1274 <p>
1275
1276 In the example above we see that job 1 was preempted by job 4 and
1277 then job 4 was preempted by job 7. After job 7 is completed, remaining
1278 entries of job 4 are selected, and once they are all selected, job
1279 1 continues.
1280
1281 </p>
1282
1283 <p>
1284
1285 As we see, it's all very clean and straightforward. Now how does
1286 this change because of blockers?
1287
1288 </p>
1289
1290 <p>
1291
1292 The answer is: a lot. Any job may become blocker job at any time,
1293 and also become normal job again at any time. This has several
1294 important implications:
1295
1296 </p>
1297
1298 <ol>
1299
1300 <li> <p>
1301
1302 The jobs may be completed in arbitrary order. For example, in the
1303 example above, if the current job 7 becomes blocked, the next job
1304 4 may complete before the job 7 becomes unblocked again. Or if both
1305 7 and 4 are blocked, then 1 is completed, then 7 becomes unblocked
1306 and is completed, then 2 is completed and only after that 4 becomes
1307 unblocked and is completed... You get the idea.
1308
1309 </p>
1310
1311 <p>
1312
1313 [Interesting side note: even when jobs are delivered out of order,
1314 from single destination's point of view the jobs are still delivered
1315 in the expected order (that is, FIFO unless there was some preemption
1316 involved). This is because whenever a destination queue becomes
1317 unblocked (the destination limit allows selection of more recipient
1318 entries for that destination), all jobs which have peers for that
1319 destination are unblocked at once.]
1320
1321 </p>
1322
1323 <li> <p>
1324
1325 The idea of the preemption stack at the head of the job list is
1326 gone.  That is, it must be possible to preempt any job on the job
1327 list. For example, if the jobs 7, 4, 1 and 2 in the example above
1328 become all blocked, job 3 becomes the current job. And of course
1329 we do not want the preemption to be affected by the fact that there
1330 are some blocked jobs or not. Therefore, if it turns out that job
1331 3 might be preempted by job 6, the implementation shall make it
1332 possible.
1333
1334 </p>
1335
1336 <li> <p>
1337
1338 The idea of the linear preemption stack itself is gone. It's no
1339 longer true that one job is always preempted by only one job at one
1340 time (that is directly preempted, not counting the recursively
1341 nested jobs). For example, in the example above, job 1 is directly
1342 preempted by only job 4, and job 4 by job 7. Now assume job 7 becomes
1343 blocked, and job 4 is being delivered. If it accumulates enough
1344 delivery slots, it is natural that it might be preempted for example
1345 by job 8. Now job 4 is preempted by both job 7 AND job 8 at the
1346 same time.
1347
1348 </p>
1349
1350 </ol>
1351
1352 <p>
1353
1354 Now combine the points 2) and 3) with point 1) again and you realize
1355 that the relations on the once linear job list became pretty
1356 complicated. If we extend the point 3) example: jobs 7 and 8 preempt
1357 job 4, now job 8 becomes blocked too, then job 4 completes. Tricky,
1358 huh?
1359
1360 </p>
1361
1362 <p>
1363
1364 If I illustrate the relations after the above mentioned examples
1365 (but those in point 1)), the situation would look like this:
1366
1367 </p>
1368
1369 <blockquote>
1370 <pre>
1371                             v- parent
1372
1373 adoptive parent -&gt;    1--2--3--5--...      &lt;- "stack" level 0
1374                       |     |
1375 parent gone -&gt;        ?     6              &lt;- "stack" level 1
1376                      / \
1377 children -&gt;         7   8   ^- child       &lt;- "stack" level 2
1378
1379                       ^- siblings
1380 </pre>
1381 </blockquote>
1382
1383 <p>
1384
1385 Now how does <tt>nqmgr</tt> deal with all these complicated relations?
1386
1387 </p>
1388
1389 <p>
1390
1391 Well, it maintains them all as described, but fortunately, all these
1392 relations are necessary only for purposes of proper counting of
1393 available delivery slots. For purposes of ordering the jobs for
1394 entry selection, the original rule still applies: "the job preempting
1395 the current job is moved in front of the current job on the job
1396 list". So for entry selection purposes, the job relations remain
1397 as simple as this:
1398
1399 </p>
1400
1401 <blockquote>
1402 <pre>
1403 7--8--1--2--6--3--5--..   &lt;- scheduler's job list order
1404 </pre>
1405 </blockquote>
1406
1407 <p>
1408
1409 The job list order and the preemption parent/child/siblings relations
1410 are maintained separately. And because the selection works only
1411 with the job list, you can happily forget about those complicated
1412 relations unless you want to study the <tt>nqmgr</tt> sources. In
1413 that case the text above might provide some helpful introduction
1414 to the problem domain. Otherwise I suggest you just forget about
1415 all this and stick with the user's point of view: the blocker jobs
1416 are simply ignored.
1417
1418 </p>
1419
1420 <p>
1421
1422 [By now, you should have a feeling that there is more things going
1423 under the hood than you ever wanted to know. You decide that
1424 forgetting about this chapter is the best you can do for the sake
1425 of your mind's health and you basically stick with the idea how the
1426 scheduler works in ideal conditions, when there are no blockers,
1427 which is good enough.]
1428
1429 </p>
1430
1431 <h3> <a name="<tt>nqmgr</tt>_memory"> Dealing with memory resource
1432 limits </a> </h3>
1433
1434 <p>
1435
1436 When discussing the <tt>nqmgr</tt> scheduler, we have so far assumed
1437 that all recipients of all messages in the <a href="QSHAPE_README.html#active_queue">active queue</a> are completely
1438 read into the memory. This is simply not true. There is an upper
1439 bound on the amount of memory the <tt>nqmgr</tt> may use, and
1440 therefore it must impose some limits on the information it may store
1441 in the memory at any given time.
1442
1443 </p>
1444
1445 <p>
1446
1447 First of all, not all messages may be read in-core at once. At any
1448 time, only <a href="postconf.5.html#qmgr_message_active_limit">qmgr_message_active_limit</a> messages may be read in-core
1449 at maximum. When read into memory, the messages are picked from the
1450 <a href="QSHAPE_README.html#incoming_queue">incoming</a> and deferred message queues and moved to the <a href="QSHAPE_README.html#active_queue">active queue</a>
1451 (incoming having priority), so if there is more than
1452 <a href="postconf.5.html#qmgr_message_active_limit">qmgr_message_active_limit</a> messages queued in the <a href="QSHAPE_README.html#active_queue">active queue</a>, the
1453 rest will have to wait until (some of) the messages in the active
1454 queue are completely delivered (or deferred).
1455
1456 </p>
1457
1458 <p>
1459
1460 Even with the limited amount of in-core messages, there is another
1461 limit which must be imposed in order to avoid memory exhaustion.
1462 Each message may contain huge amount of recipients (tens or hundreds
1463 of thousands are not uncommon), so if <tt>nqmgr</tt> read all
1464 recipients of all messages in the <a href="QSHAPE_README.html#active_queue">active queue</a>, it may easily run
1465 out of memory. Therefore there must be some upper bound on the
1466 amount of message recipients which are read into the memory at the
1467 same time.
1468
1469 </p>
1470
1471 <p>
1472
1473 Before discussing how exactly <tt>nqmgr</tt> implements the recipient
1474 limits, let's see how the sole existence of the limits themselves
1475 affects the <tt>nqmgr</tt> and its scheduler.
1476
1477 </p>
1478
1479 <p>
1480
1481 The message limit is straightforward - it just limits the size of
1482 the
1483 lookahead the <tt>nqmgr</tt>'s scheduler has when choosing which
1484 message can preempt the current one. Messages not in the active
1485 queue simply are not considered at all.
1486
1487 </p>
1488
1489 <p>
1490
1491 The recipient limit complicates more things. First of all, the
1492 message reading code must support reading the recipients in batches,
1493 which among other things means accessing the queue file several
1494 times and continuing where the last recipient batch ended. This is
1495 invoked by the scheduler whenever the current job has space for more
1496 recipients, subject to transport's refill_limit and refill_delay parameters.
1497 It is also done any time when all
1498 in-core recipients of the message are dealt with (which may also
1499 mean they were deferred) but there are still more in the queue file.
1500
1501 </p>
1502
1503 <p>
1504
1505 The second complication is that with some recipients left unread
1506 in the queue file, the scheduler can't operate with exact counts
1507 of recipient entries. With unread recipients, it is not clear how
1508 many recipient entries there will be, as they are subject to
1509 per-destination grouping. It is not even clear to what transports
1510 (and thus jobs) the recipients will be assigned. And with messages
1511 coming from the <a href="QSHAPE_README.html#deferred_queue">deferred queue</a>, it is not even clear how many unread
1512 recipients are still to be delivered. This all means that the
1513 scheduler must use only estimates of how many recipients entries
1514 there will be.  Fortunately, it is possible to estimate the minimum
1515 and maximum correctly, so the scheduler can always err on the safe
1516 side.  Obviously, the better the estimates, the better results, so
1517 it is best when we are able to read all recipients in-core and turn
1518 the estimates into exact counts, or at least try to read as many
1519 as possible to make the estimates as accurate as possible.
1520
1521 </p>
1522
1523 <p>
1524
1525 The third complication is that it is no longer true that the scheduler
1526 is done with a job once all of its in-core recipients are delivered.
1527 It is possible that the job will be revived later, when another
1528 batch of recipients is read in core. It is also possible that some
1529 jobs will be created for the first time long after the first batch
1530 of recipients was read in core. The <tt>nqmgr</tt> code must be
1531 ready to handle all such situations.
1532
1533 </p>
1534
1535 <p>
1536
1537 And finally, the fourth complication is that the <tt>nqmgr</tt>
1538 code must somehow impose the recipient limit itself. Now how does
1539 it achieve it?
1540
1541 </p>
1542
1543 <p>
1544
1545 Perhaps the easiest solution would be to say that each message may
1546 have at maximum X recipients stored in-core, but such solution would
1547 be poor for several reasons. With reasonable <a href="postconf.5.html#qmgr_message_active_limit">qmgr_message_active_limit</a>
1548 values, the X would have to be quite low to maintain reasonable
1549 memory footprint. And with low X lots of things would not work well.
1550 The <tt>nqmgr</tt> would have problems to use the
1551 <a href="postconf.5.html#transport_destination_recipient_limit"><i>transport</i>_destination_recipient_limit</a> efficiently. The
1552 scheduler's preemption would be suboptimal as the recipient count
1553 estimates would be inaccurate. The message queue file would have
1554 to be accessed many times to read in more recipients again and
1555 again.
1556
1557 </p>
1558
1559 <p>
1560
1561 Therefore it seems reasonable to have a solution which does not use
1562 a limit imposed on per-message basis, but which maintains a pool
1563 of available recipient slots, which can be shared among all messages
1564 in the most efficient manner. And as we do not want separate
1565 transports to compete for resources whenever possible, it seems
1566 appropriate to maintain such recipient pool for each transport
1567 separately. This is the general idea, now how does it work in
1568 practice?
1569
1570 </p>
1571
1572 <p>
1573
1574 First we have to solve little chicken-and-egg problem. If we want
1575 to use the per-transport recipient pools, we first need to know to
1576 what transport(s) is the message assigned. But we will find that
1577 out only after we read in the recipients first. So it is obvious
1578 that we first have to read in some recipients, use them to find out
1579 to what transports is the message to be assigned, and only after
1580 that we can use the per-transport recipient pools.
1581
1582 </p>
1583
1584 <p>
1585
1586 Now how many recipients shall we read for the first time? This is
1587 what <a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> and <a href="postconf.5.html#qmgr_message_recipient_limit">qmgr_message_recipient_limit</a>
1588 values control. The <a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> value specifies
1589 how many recipients of each message we will read for the first time,
1590 no matter what.  It is necessary to read at least one recipient
1591 before we can assign the message to a transport and create the first
1592 job. However, reading only <a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> recipients
1593 even if there are only few messages with few recipients in-core would
1594 be wasteful. Therefore if there is less than <a href="postconf.5.html#qmgr_message_recipient_limit">qmgr_message_recipient_limit</a>
1595 recipients in-core so far, the first batch of recipients may be
1596 larger than <a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> - as large as is required
1597 to reach the <a href="postconf.5.html#qmgr_message_recipient_limit">qmgr_message_recipient_limit</a> limit.
1598
1599 </p>
1600
1601 <p>
1602
1603 Once the first batch of recipients was read in core and the message
1604 jobs were created, the size of the subsequent recipient batches (if
1605 any - of course it's best when all recipients are read in one batch)
1606 is based solely on the position of the message jobs on their
1607 corresponding transports' job lists. Each transport has a pool of
1608 <a href="postconf.5.html#transport_recipient_limit"><i>transport</i>_recipient_limit</a> recipient slots which it can
1609 distribute among its jobs (how this is done is described later).
1610 The subsequent recipient batch may be as large as the sum of all
1611 recipient slots of all jobs of the message permits (plus the
1612 <a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> amount which always applies).
1613
1614 </p>
1615
1616 <p>
1617
1618 For example, if a message has three jobs, first with 1 recipient
1619 still in-core and 4 recipient slots, second with 5 recipient in-core
1620 and 5 recipient slots, and third with 2 recipients in-core and 0
1621 recipient slots, it has 1+5+2=7 recipients in-core and 4+5+0=9 jobs'
1622 recipients slots in total. This means that we could immediately
1623 read 2+<a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> more recipients of that message
1624 in core.
1625
1626 </p>
1627
1628 <p>
1629
1630 The above example illustrates several things which might be worth
1631 mentioning explicitly: first, note that although the per-transport
1632 slots are assigned to particular jobs, we can't guarantee that once
1633 the next batch of recipients is read in core, that the corresponding
1634 amounts of recipients will be assigned to those jobs. The jobs lend
1635 its slots to the message as a whole, so it is possible that some
1636 jobs end up sponsoring other jobs of their message. For example,
1637 if in the example above the 2 newly read recipients were assigned
1638 to the second job, the first job sponsored the second job with 2
1639 slots. The second notable thing is the third job, which has more
1640 recipients in-core than it has slots. Apart from sponsoring by other
1641 job we just saw it can be result of the first recipient batch, which
1642 is sponsored from global recipient pool of <a href="postconf.5.html#qmgr_message_recipient_limit">qmgr_message_recipient_limit</a>
1643 recipients. It can be also sponsored from the message recipient
1644 pool of <a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> recipients.
1645
1646 </p>
1647
1648 <p>
1649
1650 Now how does each transport distribute the recipient slots among
1651 its jobs?  The strategy is quite simple. As most scheduler activity
1652 happens on the head of the job list, it is our intention to make
1653 sure that the scheduler has the best estimates of the recipient
1654 counts for those jobs. As we mentioned above, this means that we
1655 want to try to make sure that the messages of those jobs have all
1656 recipients read in-core. Therefore the transport distributes the
1657 slots "along" the job list from start to end. In this case the job
1658 list sorted by message enqueue time is used, because it doesn't
1659 change over time as the scheduler's job list does.
1660
1661 </p>
1662
1663 <p>
1664
1665 More specifically, each time a job is created and appended to the
1666 job list, it gets all unused recipient slots from its transport's
1667 pool. It keeps them until all recipients of its message are read.
1668 When this happens, all unused recipient slots are transferred to
1669 the next job (which is now in fact now first such job) on the job
1670 list which still has some recipients unread, or eventually back to
1671 the transport pool if there is no such job. Such transfer then also
1672 happens whenever a recipient entry of that job is delivered.
1673
1674 </p>
1675
1676 <p>
1677
1678 There is also a scenario when a job is not appended to the end of
1679 the job list (for example it was created as a result of second or
1680 later recipient batch). Then it works exactly as above, except that
1681 if it was put in front of the first unread job (that is, the job
1682 of a message which still has some unread recipients in queue file),
1683 that job is first forced to return all of its unused recipient slots
1684 to the transport pool.
1685
1686 </p>
1687
1688 <p>
1689
1690 The algorithm just described leads to the following state: The first
1691 unread job on the job list always gets all the remaining recipient
1692 slots of that transport (if there are any). The jobs queued before
1693 this job are completely read (that is, all recipients of their
1694 message were already read in core) and have at maximum as many slots
1695 as they still have recipients in-core (the maximum is there because
1696 of the sponsoring mentioned before) and the jobs after this job get
1697 nothing from the transport recipient pool (unless they got something
1698 before and then the first unread job was created and enqueued in
1699 front of them later - in such case the also get at maximum as many
1700 slots as they have recipients in-core).
1701
1702 </p>
1703
1704 <p>
1705
1706 Things work fine in such state for most of the time, because the
1707 current job is either completely read in-core or has as much recipient
1708 slots as there are, but there is one situation which we still have
1709 to take care of specially.  Imagine if the current job is preempted
1710 by some unread job from the job list and there are no more recipient
1711 slots available, so this new current job could read only batches
1712 of <a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> recipients at a time. This would
1713 really degrade performance. For this reason, each transport has
1714 extra pool of <a href="postconf.5.html#transport_extra_recipient_limit"><i>transport</i>_extra_recipient_limit</a> recipient
1715 slots, dedicated exactly for this situation. Each time an unread
1716 job preempts the current job, it gets half of the remaining recipient
1717 slots from the normal pool and this extra pool.
1718
1719 </p>
1720
1721 <p>
1722
1723 And that's it. It sure does sound pretty complicated, but fortunately
1724 most people don't really have to care how exactly it works as long
1725 as it works.  Perhaps the only important things to know for most
1726 people are the following upper bound formulas:
1727
1728 </p>
1729
1730 <p>
1731
1732 Each transport has at maximum
1733
1734 </p>
1735
1736 <blockquote>
1737 <pre>
1738 max(
1739 <a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> * <a href="postconf.5.html#qmgr_message_active_limit">qmgr_message_active_limit</a>
1740 + *_recipient_limit + *_extra_recipient_limit,
1741 <a href="postconf.5.html#qmgr_message_recipient_limit">qmgr_message_recipient_limit</a>
1742 )
1743 </pre>
1744 </blockquote>
1745
1746 <p>
1747
1748 recipients in core.
1749
1750 </p>
1751
1752 <p>
1753
1754 The total amount of recipients in core is
1755
1756 </p>
1757
1758 <blockquote>
1759 <pre>
1760 max(
1761 <a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> * <a href="postconf.5.html#qmgr_message_active_limit">qmgr_message_active_limit</a>
1762 + sum( *_recipient_limit + *_extra_recipient_limit ),
1763 <a href="postconf.5.html#qmgr_message_recipient_limit">qmgr_message_recipient_limit</a>
1764 )
1765 </pre>
1766 </blockquote>
1767
1768 <p>
1769
1770 where the sum is over all used transports.
1771
1772 </p>
1773
1774 <p>
1775
1776 And this terribly complicated chapter concludes the documentation
1777 of <tt>nqmgr</tt> scheduler.
1778
1779 </p>
1780
1781 <p>
1782
1783 [By now you should theoretically know the <tt>nqmgr</tt> scheduler
1784 inside out. In practice, you still hope that you will never have
1785 to really understand the last or last two chapters completely, and
1786 fortunately most people really won't. Understanding how the scheduler
1787 works in ideal conditions is more than good enough for vast majority
1788 of users.]
1789
1790 </p>
1791
1792 <h2> <a name="credits"> Credits </a> </h2>
1793
1794 <ul>
1795
1796 <li> Wietse Venema designed and implemented the initial queue manager
1797 with per-domain FIFO scheduling, and per-delivery +/-1 concurrency
1798 feedback.
1799
1800 <li> Patrik Rak designed and implemented preemption where mail with
1801 fewer recipients can slip past mail with more recipients in a
1802 controlled manner, and wrote up its documentation.
1803
1804 <li> Wietse Venema initiated a discussion with Patrik Rak and Victor
1805 Duchovni on alternatives for the +/-1 feedback scheduler's aggressive
1806 behavior. This is when K/N feedback was reviewed (N = concurrency).
1807 The discussion ended without a good solution for both negative
1808 feedback and dead site detection.
1809
1810 <li> Victor Duchovni resumed work on concurrency feedback in the
1811 context of concurrency-limited servers.
1812
1813 <li> Wietse Venema then re-designed the concurrency scheduler in
1814 terms of the simplest possible concepts: less-than-1 concurrency
1815 feedback per delivery, forward and reverse concurrency feedback
1816 hysteresis, and pseudo-cohort failure. At this same time, concurrency
1817 feedback was separated from dead site detection.
1818
1819 <li> These simplifications, and their modular implementation, helped
1820 to develop further insights into the different roles that positive
1821 and negative concurrency feedback play, and helped to identify some
1822 worst-case scenarios.
1823
1824 </ul>
1825
1826 </body>
1827
1828 </html>