autoupdate
[postfix-master.git] / postfix-master / STRESS_README.html
blob3288936395c6e093b9c366ae2c9d9953b0dda6e9
1 <!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
2 "http://www.w3.org/TR/html4/loose.dtd">
4 <html>
6 <head>
8 <title>Postfix Stress-Dependent Configuration</title>
10 <meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
12 </head>
14 <body>
16 <h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix
17 Stress-Dependent Configuration</h1>
19 <hr>
21 <h2>Overview </h2>
23 <p> This document describes the symptoms of Postfix SMTP server
24 overload. It presents permanent <a href="postconf.5.html">main.cf</a> changes to avoid overload
25 during normal operation, and temporary <a href="postconf.5.html">main.cf</a> changes to cope with
26 an unexpected burst of mail. This document makes specific suggestions
27 for Postfix 2.5 and later which support stress-adaptive behavior,
28 and for earlier Postfix versions that don't. </p>
30 <p> Topics covered in this document: </p>
32 <ul>
34 <li><a href="#overload"> Symptoms of Postfix SMTP server overload </a>
36 <li><a href="#concurrency"> Service more SMTP clients at the same time </a>
38 <li><a href="#time"> Spend less time per SMTP client </a>
40 <li><a href="#hangup"> Disconnect suspicious SMTP clients </a>
42 <li><a href="#legacy"> Temporary measures for older Postfix releases </a>
44 <li><a href="#adapt"> Automatic stress-adaptive behavior </a>
46 <li><a href="#feature"> Detecting support for stress-adaptive behavior </a>
48 <li><a href="#forcing"> Forcing stress-adaptive behavior on or off </a>
50 <li><a href="#other"> Other measures to off-load zombies </a>
52 <li><a href="#credits"> Credits </a>
54 </ul>
56 <h2><a name="overload"> Symptoms of Postfix SMTP server overload </a></h2>
58 <p> Under normal conditions, the Postfix SMTP server responds
59 immediately when an SMTP client connects to it; the time to deliver
60 mail is noticeable only with large messages. Performance degrades
61 dramatically when the number of SMTP clients exceeds the number of
62 Postfix SMTP server processes. When an SMTP client connects while
63 all Postfix SMTP server processes are busy, the client must wait
64 until a server process becomes available. </p>
66 <p> SMTP server overload may be caused by a surge of legitimate
67 mail (example: a DNS registrar opens a new zone for registrations),
68 by mistake (mail explosion caused by a forwarding loop) or by malice
69 (worm outbreak, botnet, or other illegitimate activity). </p>
71 <p> Symptoms of Postfix SMTP server overload are: </p>
73 <ul>
75 <li> <p> Remote SMTP clients experience a long delay before Postfix
76 sends the "220 hostname.example.com ESMTP Postfix" greeting. </p>
78 <ul>
80 <li> <p> NOTE: Broken DNS configurations can also cause lengthy
81 delays before Postfix sends "220 hostname.example.com ...". These
82 delays also exist when Postfix is NOT overloaded. </p>
84 <li> <p> NOTE: To avoid "overload" delays for end-user mail
85 clients, enable the "submission" service entry in <a href="master.5.html">master.cf</a> (present
86 since Postfix 2.1), and tell users to connect to this instead of
87 the public SMTP service. </p>
89 </ul>
91 <li> <p> The Postfix SMTP server logs an increased number of "lost
92 connection after CONNECT" events. This happens because remote SMTP
93 clients disconnect before Postfix answers the connection. </p>
95 <ul>
97 <li> <p> NOTE: A portscan for open SMTP ports can also result in
98 "lost connection ..." logfile messages. </p>
100 </ul>
102 <li> <p> Postfix 2.3 and later logs a warning that all server ports
103 are busy: </p>
105 <pre>
106 Oct 3 20:39:27 spike postfix/master[28905]: warning: service "smtp"
107 (25) has reached its process limit "30": new clients may experience
108 noticeable delays
109 Oct 3 20:39:27 spike postfix/master[28905]: warning: to avoid this
110 condition, increase the process count in <a href="master.5.html">master.cf</a> or reduce the
111 service time per client
112 </pre>
114 </ul>
116 <p> Legitimate mail that doesn't get through during an episode of
117 Postfix SMTP server overload is not necessarily lost. It should
118 still arrive once the situation returns to normal, as long as the
119 overload condition is temporary. </p>
121 <h2><a name="concurrency"> Service more SMTP clients at the same time </a> </h2>
123 <p> One measure to avoid the "all server processes busy" condition
124 is to service more SMTP clients simultaneously. For this you need
125 to increase the number of Postfix SMTP server processes. This will
126 improve the
127 responsiveness for remote SMTP clients, as long as the server machine
128 has enough hardware and software resources to run the additional
129 processes, and as long as the file system can keep up with the
130 additional load. </p>
132 <ul>
134 <li> <p> You increase the number of SMTP server processes either
135 by increasing the <a href="postconf.5.html#default_process_limit">default_process_limit</a> in <a href="postconf.5.html">main.cf</a> (line 3 below),
136 or by increasing the SMTP server's "maxproc" field in <a href="master.5.html">master.cf</a>
137 (line 10 below). Either way, you need to issue a "postfix reload"
138 command to make the change effective. </p>
140 <li> <p> Process limits above 1000 require Postfix version 2.4 or
141 later, and an operating system that supports kernel-based event
142 filters (BSD kqueue(2), Linux epoll(4), or Solaris /dev/poll).
143 </p>
145 <li> <p> More processes use more memory. You can reduce the Postfix
146 memory footprint by using <a href="CDB_README.html">cdb</a>:
147 lookup tables instead of Berkeley DB's hash: or btree: tables. </p>
149 <pre>
150 1 /etc/postfix/<a href="postconf.5.html">main.cf</a>:
151 2 # Raise the global process limit, 100 since Postfix 2.0.
152 3 <a href="postconf.5.html#default_process_limit">default_process_limit</a> = 200
154 5 /etc/postfix/<a href="master.5.html">master.cf</a>:
155 6 # =============================================================
156 7 # service type private unpriv chroot wakeup maxproc command
157 8 # =============================================================
158 9 # Raise the SMTP service process limit only.
159 10 smtp inet n - n - 200 smtpd
160 </pre>
162 <li> <p> NOTE: older versions of the <a href="SMTPD_POLICY_README.html">SMTPD_POLICY_README</a> document
163 contain a mistake: they configure a fixed number of policy daemon
164 processes. When you raise the SMTP server's "maxproc" field in
165 <a href="master.5.html">master.cf</a>, SMTP server processes will report problems when connecting
166 to policy server processes, because there aren't enough of them.
167 Examples of errors are "connection refused" or "operation timed
168 out". </p>
170 <p> To fix, edit <a href="master.5.html">master.cf</a> and specify a zero "maxproc" field
171 in all policy server entries; see line 6 in the example below.
172 Issue a "postfix reload" command to make the change effective. </p>
174 <pre>
175 1 /etc/postfix/<a href="master.5.html">master.cf</a>:
176 2 # =============================================================
177 3 # service type private unpriv chroot wakeup maxproc command
178 4 # =============================================================
179 5 # Disable the policy service process limit.
180 6 policy unix - n n - 0 spawn
181 7 user=nobody argv=/some/where/policy-server
182 </pre>
184 </ul>
186 <h2><a name="time"> Spend less time per SMTP client </a></h2>
188 <p> When increasing the number of SMTP server processes is not
189 practical, you can improve Postfix server responsiveness by eliminating
190 delays. When Postfix spends less time per SMTP session, the same
191 number of SMTP server processes can service more clients in a given
192 amount of time. </p>
194 <ul>
196 <li> <p> Eliminate non-functional RBL lookups (blocklists that are
197 no longer in operation). These lookups can degrade performance.
198 Postfix logs a warning when an RBL server does not respond. </p>
200 <li> <p> Eliminate redundant RBL lookups (people often use multiple
201 Spamhaus RBLs that include each other). To find out whether RBLs
202 include other RBLs, look up the websites that document the RBL's
203 policies. </p>
205 <li> <p> Eliminate <a href="postconf.5.html#header_checks">header_checks</a> and <a href="postconf.5.html#body_checks">body_checks</a>, and keep just a few
206 emergency patterns to block the latest worm explosion or backscatter
207 mail. See <a href="BACKSCATTER_README.html">BACKSCATTER_README</a> for examples of the latter.
209 <li> <p> Group your <a href="postconf.5.html#header_checks">header_checks</a> and <a href="postconf.5.html#body_checks">body_checks</a> patterns to avoid
210 unnecessary pattern matching operations:
212 <pre>
213 1 /etc/postfix/header_checks:
214 2 if /^Subject:/
215 3 /^Subject: virus found in mail from you/ reject
216 4 /^Subject: ..other../ reject
217 5 endif
219 7 if /^Received:/
220 8 /^Received: from (postfix\.org) / reject forged client name in received header: $1
221 9 /^Received: from ..other../ reject ....
222 10 endif
223 </pre>
225 </ul>
227 <h2><a name="hangup"> Disconnect suspicious SMTP clients </a></h2>
229 <p> Under conditions of overload you can improve Postfix SMTP server
230 responsiveness by hanging up on suspicious clients, so that other
231 clients get a chance to talk to Postfix. </p>
233 <ul>
235 <li> <p> Use "521" SMTP reply codes (Postfix 2.6 and later) or "421"
236 (Postfix 2.3-2.5) to hang up on clients that that match botnet-related
237 RBLs (see next bullet) or that match selected non-RBL restrictions
238 such as SMTP access maps. The Postfix SMTP server will reject mail
239 and disconnect without waiting for the remote SMTP client to send
240 a QUIT command. </p>
242 <li> <p> To hang up connections from blacklisted zombies, you can
243 set specific Postfix SMTP server reject codes for specific RBLs,
244 and for individual responses from specific RBLs. We'll use
245 zen.spamhaus.org as an example; by the time you read this document,
246 details may have changed. Right now, their documents say that a
247 response of 127.0.0.10 or 127.0.0.11 indicates a dynamic client IP
248 address, which means that the machine is probably running a bot of
249 some kind. To give a 521 response instead of the default 554
250 response, use something like: </p>
252 <pre>
253 1 /etc/postfix/<a href="postconf.5.html">main.cf</a>:
254 2 <a href="postconf.5.html#smtpd_client_restrictions">smtpd_client_restrictions</a> =
255 3 <a href="postconf.5.html#permit_mynetworks">permit_mynetworks</a>
256 4 <a href="postconf.5.html#reject_rbl_client">reject_rbl_client</a> zen.spamhaus.org=127.0.0.10
257 5 <a href="postconf.5.html#reject_rbl_client">reject_rbl_client</a> zen.spamhaus.org=127.0.0.11
258 6 <a href="postconf.5.html#reject_rbl_client">reject_rbl_client</a> zen.spamhaus.org
260 8 <a href="postconf.5.html#rbl_reply_maps">rbl_reply_maps</a> = hash:/etc/postfix/rbl_reply_maps
262 10 /etc/postfix/rbl_reply_maps:
263 11 # With Postfix 2.3-2.5 use "421" to hang up connections.
264 12 zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable;
265 13 $rbl_class [$rbl_what] blocked using
266 14 $rbl_domain${rbl_reason?; $rbl_reason}
268 16 zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable;
269 17 $rbl_class [$rbl_what] blocked using
270 18 $rbl_domain${rbl_reason?; $rbl_reason}
271 </pre>
273 <p> Although the above example shows three RBL lookups (lines 4-6),
274 Postfix will only do a single DNS query, so it does not affect the
275 performance. </p>
277 <li> <p> With Postfix 2.3-2.5, use reply code 421 (521 will not
278 cause Postfix to disconnect). The down-side of replying with 421
279 is that it works only for zombies and other malware. If the client
280 is running a real MTA, then it may connect again several times until
281 the mail expires in its queue. When this is a problem, stick with
282 the default 554 reply, and use "<a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> = 1" as
283 described below. </p>
285 <li> <p> You can automatically turn on the above overload measure
286 with Postfix 2.5 and later, or with earlier releases that contain
287 the stress-adaptive behavior source code patch from the mirrors
288 listed at <a href="http://www.postfix.org/download.html">http://www.postfix.org/download.html</a>. Simply replace line
289 above 8 with: </p>
291 <pre>
292 8 <a href="postconf.5.html#rbl_reply_maps">rbl_reply_maps</a> = ${stress?hash:/etc/postfix/rbl_reply_maps}
293 </pre>
295 </ul>
297 <p> More information about automatic stress-adaptive behavior is
298 in section "<a href="#adapt">Automatic stress-adaptive behavior</a>".
299 </p>
301 <h2><a name="legacy"> Temporary measures for older Postfix releases </a></h2>
303 <p> See the next section, "<a href="#adapt">Automatic stress-adaptive
304 behavior</a>", if you are running Postfix version 2.5 or later, or
305 if you have applied the source code patch for stress-adaptive
306 behavior from the mirrors listed at <a href="http://www.postfix.org/download.html">http://www.postfix.org/download.html</a>.
307 </p>
309 <p> The following measures can be applied temporarily during overload.
310 They still allow <b>most</b> legitimate clients to connect and send
311 mail, but may affect some legitimate clients. </p>
313 <ul>
315 <li> <p> Reduce <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> (default: 300s). Experience on the
316 postfix-users list from a variety of sysadmins shows that reducing
317 the "normal" <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> to 60s is unlikely to affect legitimate
318 clients. However, it is unlikely to become the Postfix default
319 because it's not RFC compliant. Setting <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> to 10s (line
320 2 below) or even 5s under stress will still allow <b>most</b>
321 legitimate clients to connect and send mail, but may delay mail
322 from some clients. No mail should be lost, as long as this measure
323 is used only temporarily. </p>
325 <li> <p> Reduce <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> (default: 20). Setting this
326 to 1 under stress (line 3 below) helps by disconnecting clients
327 after a single error, giving other clients a chance to connect.
328 However, this may cause significant delays with legitimate mail,
329 such as a mailing list that contains a few no-longer-active user
330 names that didn't bother to unsubscribe. No mail should be lost,
331 as long as this measure is used only temporarily. </p>
333 <li> <p> Use an <a href="postconf.5.html#smtpd_junk_command_limit">smtpd_junk_command_limit</a> of 1 instead of the default
334 100. This prevents clients from keeping idle connections open by
335 repeatedly sending NOOP or RSET commands. </p>
337 </ul>
339 <blockquote>
340 <pre>
341 1 /etc/postfix/<a href="postconf.5.html">main.cf</a>:
342 2 <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> = 10
343 3 <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> = 1
344 4 <a href="postconf.5.html#smtpd_junk_command_limit">smtpd_junk_command_limit</a> = 1
345 </pre>
346 </blockquote>
348 <p> With these measures, no mail should be lost, as long
349 as these measures are used only temporarily. The next section of
350 this document introduces a way to automate this process. </p>
352 <h2><a name="adapt"> Automatic stress-adaptive behavior </a></h2>
354 <p> Postfix version 2.5 introduces automatic stress-adaptive behavior.
355 This is also available as a source code patch for Postfix versions
356 2.4 and 2.3 from the mirrors listed at
357 <a href="http://www.postfix.org/download.html">http://www.postfix.org/download.html</a>. </p>
359 <p> It works as follows. When a "public" network service such as
360 the SMTP server runs into an "all server ports are busy" condition,
361 the Postfix <a href="master.8.html">master(8)</a> daemon logs a warning, restarts the service
362 (without interrupting existing network sessions), and runs the
363 service with "-o stress=yes" on the server process command line:
364 </p>
366 <blockquote>
367 <pre>
368 80821 ?? S 0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes
369 </pre>
370 </blockquote>
372 <p> Normally, the Postfix <a href="master.8.html">master(8)</a> daemon runs such a service with
373 "-o stress=" on the command line (i.e. with an empty parameter
374 value): </p>
376 <blockquote>
377 <pre>
378 83326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress=
379 </pre>
380 </blockquote>
382 <p> Services that have local access only never have "-o stress"
383 parameters on the command line. This includes services internal to
384 Postfix such as the queue manager, and services that listen on a
385 loopback interface only, such as after-filter SMTP services. </p>
387 <p> The "stress" parameter value is the key to making <a href="postconf.5.html">main.cf</a>
388 parameter settings stress adaptive. The following settings are the
389 default with Postfix 2.6 and later. With earlier Postfix versions
390 that have stress-adaptive support, append the lines below to the
391 <a href="postconf.5.html">main.cf</a> file and issue a "postfix reload" command: </p>
393 <blockquote>
394 <pre>
395 1 <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> = ${stress?10}${stress:300}s
396 2 <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> = ${stress?1}${stress:20}
397 3 <a href="postconf.5.html#smtpd_junk_command_limit">smtpd_junk_command_limit</a> = ${stress?1}${stress:100}
398 </pre>
399 </blockquote>
401 <p> Translation: <p>
403 <ul>
405 <li> <p> Line 1: under conditions of stress, use an <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a>
406 value of 10 seconds instead of the default 300 seconds. Experience
407 on the postfix-users list from a variety of sysadmins shows that
408 reducing the "normal" <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> to 60s is unlikely to affect
409 legitimate clients. However, it is unlikely to become the Postfix
410 default because it's not RFC compliant. Setting <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> to
411 10s (line 2 below) or even 5s under stress will still allow most
412 legitimate clients to connect and send mail, but may delay mail
413 from some clients. No mail should be lost, as long as this measure
414 is used only temporarily. </p>
416 <li> <p> Line 2: under conditions of stress, use an <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a>
417 of 1 instead of the default 20. This helps by disconnecting clients
418 after a single error, giving other clients a chance to connect.
419 However, this may cause significant delays with legitimate mail,
420 such as a mailing list that contains a few no-longer-active user
421 names that didn't bother to unsubscribe. No mail should be lost,
422 as long as this measure is used only temporarily. </p>
424 <li> <p> Line 3: under conditions of stress, use an
425 <a href="postconf.5.html#smtpd_junk_command_limit">smtpd_junk_command_limit</a> of 1 instead of the default 100. This
426 prevents clients from keeping idle connections open by repeatedly
427 sending NOOP or RSET commands. </p>
429 </ul>
431 <p> The syntax of ${name?value} and ${name:value} is explained at
432 the beginning of the <a href="postconf.5.html">postconf(5)</a> manual page. </p>
434 <p> NOTE: Please keep in mind that the stress-adaptive feature is
435 a fairly desperate measure to keep <b>some</b> legitimate mail
436 flowing under overload conditions. If a site is reaching the SMTP
437 server process limit when there isn't an attack or bot flood
438 occurring, then either the process limit needs to be raised or more
439 hardware needs to be added. </p>
441 <h2><a name="feature"> Detecting support for stress-adaptive behavior </a></h2>
443 <p> To find out if your Postfix installation supports stress-adaptive
444 behavior, use the "ps" command, and look for the smtpd processes.
445 Postfix has stress-adaptive support when you see "-o stress=" or
446 "-o stress=yes" command-line options. Remember that Postfix never
447 enables stress-adaptive behavior on servers that listen on local
448 addresses only. </p>
450 <p> The following example is for FreeBSD or Linux. On Solaris, HP-UX
451 and other System-V flavors, use "ps -ef" instead of "ps ax". </p>
453 <blockquote>
454 <pre>
455 $ ps ax|grep smtpd
456 83326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress=
457 84345 ?? Ss 0:00.11 /usr/bin/perl /usr/libexec/postfix/smtpd-policy.pl
458 </pre>
459 </blockquote>
461 <p> You can't use <a href="postconf.1.html">postconf(1)</a> to detect stress-adaptive support.
462 The <a href="postconf.1.html">postconf(1)</a> command ignores the existence of the stress parameter
463 in <a href="postconf.5.html">main.cf</a>, because the parameter has no effect there. Command-line
464 "-o parameter" settings always take precedence over <a href="postconf.5.html">main.cf</a> parameter
465 settings. <p>
467 <p> If you configure stress-adaptive behavior in <a href="postconf.5.html">main.cf</a> when it
468 isn't supported, nothing bad will happen. The processes will run
469 as if the stress parameter always has an empty value. </p>
471 <h2><a name="forcing"> Forcing stress-adaptive behavior on or off </a></h2>
473 <p> You can manually force stress-adaptive behavior on, by adding
474 a "-o stress=yes" command-line option in <a href="master.5.html">master.cf</a>. This can be
475 useful for testing overrides on the SMTP service. Issue "postfix
476 reload" to make the change effective. </p>
478 <p> Note: setting the stress parameter in <a href="postconf.5.html">main.cf</a> has no effect for
479 services that accept remote connections. </p>
481 <blockquote>
482 <pre>
483 1 /etc/postfix/<a href="master.5.html">master.cf</a>:
484 2 # =============================================================
485 3 # service type private unpriv chroot wakeup maxproc command
486 4 # =============================================================
487 5 #
488 6 smtp inet n - n - - smtpd
489 7 -o stress=yes
490 8 -o . . .
491 </pre>
492 </blockquote>
494 <p> To permanently force stress-adaptive behavior off with a specific
495 service, specify "-o stress=" on its <a href="master.5.html">master.cf</a> command line. This
496 may be desirable for the "submission" service. Issue "postfix reload"
497 to make the change effective. </p>
499 <p> Note: setting the stress parameter in <a href="postconf.5.html">main.cf</a> has no effect for
500 services that accept remote connections. </p>
502 <blockquote>
503 <pre>
504 1 /etc/postfix/<a href="master.5.html">master.cf</a>:
505 2 # =============================================================
506 3 # service type private unpriv chroot wakeup maxproc command
507 4 # =============================================================
508 5 #
509 6 submission inet n - n - - smtpd
510 7 -o stress=
511 8 -o . . .
512 </pre>
513 </blockquote>
515 <h2><a name="other"> Other measures to off-load zombies </a> </h2>
517 <p> OpenBSD <a href="http://www.openbsd.org/spamd/">spamd</a>
518 implements a daemon that handles all connections from "new" clients.
519 Only well-behaved mail clients are allowed to talk to the mail
520 server. Other clients are tarpitted, and will never get a chance
521 to affect mail server performance. </p>
523 <p> At some point in the future, Postfix may come with a simple
524 front-end daemon that does basic greylisting and pipelining detection
525 to keep zombies and other ratware away from Postfix itself. This
526 would use the "pass" service type which has been available in
527 stable Postfix releases since Postfix 2.5. </p>
529 <h2><a name="credits"> Credits </a></h2>
531 <ul>
533 <li> Thanks to the postfix-users mailing list members for sharing
534 early experiences with the stress-adaptive feature.
536 <li> The RBL example and several other paragraphs of text were
537 adapted from postfix-users postings by Noel Jones.
539 <li> Wietse implemented stress-adaptive behavior as the smallest
540 possible patch while he should be working on other things.
542 </ul>
544 </body> </html>