| |||||||||||||||||||||||||||||||
|
FWIW: I've included the postfix-2.4-stress-patch as an option in NetBSD's pkgsrc/mail/postfix package. It's disabled by default but can be enabled with option "postfix-stress". Geert On Tue, Sep 04, 2007 at 05:10:21PM -0400, Wietse Venema wrote: > This is an update of yesterday's stress patch for Postfix 2.3 and > later. > > - This version triggers a "postfix reload" for the affected network > service, so that all servers switch to stress-aware mode. This > change also helped to make a few code simplifications. > > - The logfile warning message now points to a non-existent document, > http://www.postfix.org/STRESS_README.html, that will be based on > information from this document and on feedback from the Postfix > community. Planning for the future... > > Attached you will find a patch relative to yesterday's patch. > > The problem > =========== > > Last week some ratware was causing trouble by connecting to SMTP > servers and keeping server ports occupied for a long time. > > Symptoms: > > - Postfix logs ``service "smtp" (25) has reached its process limit''. > > - SMTP clients have to wait a long time before the server responds. > > - The maillog shows lots of "lost connection after CONNECT" messages. > > - netstat shows lots of SMTP connections in FIN_WAIT1/2 state. > > While Postfix will drop connections when a client hammers the server, > until now it had no specific response against connections from a > large number of different clients. > > Generic workarounds > =================== > > As a first step, you can mitigate too many connections by specifying > more smtpd(8) processes in master.cf (don't forget "postfix reload"), > but you can do only so much without running out of memory, sockets, > or something else that Postfix needs. > > When increasing the number of process becomes unpractical, you can > try to make Postfix spend less time per SMTP client: > > - Eliminate useless or redundant RBL lookups (people often use > multiple Spamhaus RBLs that include each other) > > - Reject non-existent recipients early: > smtpd_recipient_restrictions = > reject_unlisted_recipient > permit_mynetworks > reject_unauth_destination ... > > - Use "421" reply codes for botnet-related RBLs or for selected > non-RBL restrictions. This causes Postfix 2.3 and later to > disconnect immediately without waiting for QUIT. > > - Don't use before-queue content filters or body_checks. > > - Reduce smtpd_timeout and smtpd_hard_error_limit. This may > interfere with legitimate mail. > > - Specify "smtpd_peername_lookup = no" (Postfix 2.3 and later). > Beware, this is a desperate measure; it can save a lot of time, > but it breaks all access controls that depend on client hostnames. > > The stress-mode workaround > ========================== > > The idea is to change Postfix behavior under stress: terminate SMTP > sessions sooner, so that Postfix can help more clients in the same > amount of time, but do this only when really necessary. The stress > patch provides a way to do that. It is simple enough that it can > be adopted into the legacy and stable Postfix releases. > > The patch works as follows. When all SMTP server processes become > busy, the Postfix master daemon logs a warning and requests that > each running SMTP server processes terminate as soon as its SMTP > session ends. From this point on, Postfix creates SMTP server > processes that have "-o stress=yes" on their command line, until > the problematic condition has not happened for at least 1000 seconds. > > As shown below, the "stress=yes" parameter setting can be used to > make main.cf configuration settings stress dependent. > > WARNING: the settings in the example are very agressive and may > affect legitimate mail delivery. But they will help you receive at > least some mail while you're being flooded by worms, spam, or > backscatter. Some mail is better than no legitimate mail at all. > > 1 /etc/postfix/main.cf: > 2 smtpd_hard_error_limit = ${stress?1}${stress:20} > 3 smtpd_timeout = ${stress?5}${stress:300} > 4 smtpd_banner = $myhostname ESMTP $mail_name${stress? (condition RED)} > > NOTES: > > - The example looks ugly because main.cf does not implement ${name?x:y} > syntax. This can't be fixed without major incompatible changes. > > - Line 2 uses a reduced smtpd_hard_error_limit under conditions of > stress, causing Postfix to hang up quickly after rejecting a > command, without waiting for the client to send "QUIT". This > won't affect legitimate single-recipient mail, but will delay > mailing list traffic when you have subscriptions to accounts that > no longer exist. > > - Line 3 uses a reduced SMTP server read/write timeout under stress. > This causes Postfix to drop connections from ratware that would > otherwise keep your SMTP ports occupied. But it may cause delays > with mail from very slow client implementations. > > - Line 4 helps you to monitor your SMTP server's stress level > remotely. It doesn't address the overload condition itself. > > Testing > ======= > > To test, either specify "smtpd -o stress=yes" in master.cf, or > create a test-only smtpd server on a non-default port. Note: the > stress feature does not work for servers that listen only localhost. > > There is no configuration parameter to permanently disable stress > mode. This would greatly increase the footprint of the patch, and > it would increase the likelihood of patch errors. A configuration > parameter will likely be introduced in the Postfix 2.5 experimental > release. > > Limitations > =========== > > None of the above can provide the protection that you can get from > a front-end daemon process that screens connections and keeps the > suspicious ones away from the MTA. But that is a different project. > See, for example, OpenBSD spamd at http://www.openbsd.org/spamd/. > > Under non-stress conditions, the Postfix master daemon creates SMTP > server processes with "-o stress=", that is, an empty parameter > value. Getting rid of this artifact would involve too many changes > for the stable Postfix releases. > > The "-o stress=yes" argument has no effect on the cleanup server, > the queue manager, and other processes. It works only for servers > that receive mail from the network, and only when all processes for > that service are busy. > > Wietse > > Patch relative to yesterday's patch. To apply: > $ patch -p0 <this-message > > Don't forget to "postfix stop" and "postfix start". > > diff -cr /tmp/postfix-2.5-20070824/src/master/master_avail.c src/master/master_avail.c > *** /tmp/postfix-2.5-20070824/src/master/master_avail.c Tue Sep 4 16:48:15 2007 > --- src/master/master_avail.c Tue Sep 4 16:16:36 2007 > *************** > *** 76,81 **** > --- 76,82 ---- > static void master_avail_event(int event, char *context) > { > MASTER_SERV *serv = (MASTER_SERV *) context; > + time_t now; > int n; > > if (event == 0) /* XXX Can this happen? */ > *************** > *** 84,89 **** > --- 85,116 ---- > for (n = 0; n < serv->listen_fd_count; n++) > event_disable_readwrite(serv->listen_fd[n]); > } else { > + > + /* > + * When all servers for a public internet service are busy, we log a > + * warning, suggest workarounds, and remain silent until the warning > + * expires, 1000 seconds later. At the same time, we start creating > + * server processes with "-o stress=yes" on the command line, and > + * keep creating such processes until the process count has stayed > + * below the limit for at least 1000 seconds. This provides a mimimal > + * solution that can be adopted into legacy and stable Postfix > + * releases. > + * > + * This is not the right place to update serv->stress_param_val in > + * response to stress level changes. Doing so would would contaminate > + * the code that implements "postfix reload" with stress management > + * implementation details, creating a source of future bugs. Instead, > + * we update simple counters or flags here, and use their values to > + * determine the proper serv->stress_param_val value when exec-ing a > + * server process. > + */ > + if (serv->stress_param_val != 0 > + && !MASTER_LIMIT_OK(serv->max_proc, serv->total_proc + 1)) { > + now = event_time(); > + if (serv->stress_expire_time < now) > + master_restart_service(serv); > + serv->stress_expire_time = now + 1000; > + } > master_spawn(serv); > } > } > *************** > *** 101,121 **** > * monitoring the socket for connection requests. All this under the > * restriction that we have sufficient resources to service a connection > * request. > - * > - * When all servers for a public internet service are busy, we log a > - * warning, suggest workarounds, and remain silent until the warning > - * expires, 1000 seconds later. At the same time, we start creating > - * server processes with "-o stress=yes" on the command line, and keep > - * creating such processes until the process count has stayed below the > - * limit for at least 1000 seconds. This provides a mimimal solution that > - * can be adopted into legacy and stable Postfix releases. > - * > - * This is not the right place to update serv->stress_param_val in response > - * to stress level changes. Doing so would would contaminate the code > - * that implements "postfix reload" with stress management implementation > - * details, creating a source of future bugs. Instead, we update simple > - * counters or flags here, and use their values to determine the proper > - * serv->stress_param_val value when exec-ing a server process. > */ > if (msg_verbose) > msg_info("%s: avail %d total %d max %d", myname, > --- 128,133 ---- > *************** > *** 128,144 **** > event_enable_read(serv->listen_fd[n], master_avail_event, > (char *) serv); > } else if (serv->stress_param_val != 0 > ! && ((now = event_time()), > ! (serv->stress_expire_time = now + 1000), > ! (now > serv->busy_warn_time + 1000))) { > serv->busy_warn_time = now; > msg_warn("service \"%s\" (%s) has reached its process limit \"%d\": " > "new clients may experience noticeable delays", > serv->ext_name, serv->name, serv->max_proc); > msg_warn("to avoid this condition, increase the process count " > "in master.cf or reduce the service time per client"); > ! msg_warn("you may also make main.cf options dependent on the " > ! "existence of a non-empty \"stress\" parameter value"); > } > } > } > --- 140,154 ---- > event_enable_read(serv->listen_fd[n], master_avail_event, > (char *) serv); > } else if (serv->stress_param_val != 0 > ! && (now = event_time()) - serv->busy_warn_time > 1000) { > serv->busy_warn_time = now; > msg_warn("service \"%s\" (%s) has reached its process limit \"%d\": " > "new clients may experience noticeable delays", > serv->ext_name, serv->name, serv->max_proc); > msg_warn("to avoid this condition, increase the process count " > "in master.cf or reduce the service time per client"); > ! msg_warn("see http://www.postfix.org/STRESS_README.html for " > ! "examples of stress-dependent configuration settings"); > } > } > } > diff -cr /tmp/postfix-2.5-20070824/src/master/master_spawn.c src/master/master_spawn.c > *** /tmp/postfix-2.5-20070824/src/master/master_spawn.c Tue Sep 4 16:48:15 2007 > --- src/master/master_spawn.c Tue Sep 4 14:18:30 2007 > *************** > *** 224,233 **** > vstring_sprintf(env_gen, "%s=%o", MASTER_GEN_NAME, master_generation); > if (putenv(vstring_str(env_gen)) < 0) > msg_fatal("%s: putenv: %m", myname); > ! /* Enable stress mode WHILE forking the last process, not AFTER. */ > ! if (serv->stress_param_val > ! && (serv->total_proc + 1 >= serv->max_proc > ! || serv->stress_expire_time > event_time())) > serv->stress_param_val[0] = CONFIG_BOOL_YES[0]; > > execvp(serv->path, serv->args->argv); > --- 224,230 ---- > vstring_sprintf(env_gen, "%s=%o", MASTER_GEN_NAME, master_generation); > if (putenv(vstring_str(env_gen)) < 0) > msg_fatal("%s: putenv: %m", myname); > ! if (serv->stress_param_val && serv->stress_expire_time > event_time()) > serv->stress_param_val[0] = CONFIG_BOOL_YES[0]; > > execvp(serv->path, serv->args->argv); >
| ||||||||||||||||||||||||||||||
© 2004-2008 readlist.com