1 .TH mailvisa 1 2005-10-23 mailvisa "Mailvisa Documentation"
5 simple bayesian spam filter
13 Mailvisa is a simple but effective Bayesian spam filter,
14 inspired by Paul Graham's \fIA Plan For Spam\fR. It's main features are
15 simplicity (so it's easy to tune), accuracy (high percentage of spam
16 caught, no false positives), and speed, listed in order of priority.
19 The basic usage of \fBmailvisa\fR is checking whether a message is spam
20 or not. By default, \fBmailvisa\fR reads a message on standard input,
21 and writes it out on standard output with an \fBX-Spam:\fR header
22 prepended that is set to \fBtrue\fR when Mailvisa thinks the message is
23 spam, and to \fBfalse\fR otherwise; also, the exit status will be 0 for
24 non-spam, 1 if an error occured, and 160 if the message is spam.
26 Internally, Mailvisa works by maintaining a database of good words, a
28 bad words, and a score file, listing the scores for each word. The score
29 file is generated from the word databases by calculating scores based on
30 how often words occur in good and in bad messages. Messages are
31 classified as spam or ham based on the scores file.
34 the score file is not loaded once for every invocation of
35 \fBmailvisa\fR, but rather loaded once by a daemon process. To score a
36 message, \fBmailvisa\fR connects to the daemon process, which does the
39 The following section details the commands you can use to manage the
40 word databases, generate the score file, start the daemon, and check a
44 To use \fBmailvisa\fR, you invoke it with a command indicating the
45 action to perform. Available commands are:
48 Add messages to a message database.
49 This command is used to add the words from a message to either the list
50 of good words or the list of bad words.
53 Calculate scores and update score file.
54 This command is used to calculate the scores for each word in the good
55 and bad databases, and store the scores in a score file.
58 Check if a message is spam.
59 This is probably the command you will end up using most often.
62 Display a help message.
65 Remove messages from a database.
66 This command scans the given messages for words, and decrements the
67 count for these words in the given database. This can be used to negate
68 the effects of a previous \fBadd\fR command (e.g. if you accidentally
69 added words to the wrong database).
72 Start the daemon process.
73 The daemon process must be started before the \fBcheck\fR command can be
77 View the spam scores associated with words.
78 This command can be used to find out which words have high spam scores,
79 and which ones have low spam scores.
81 Each of the commands can be followed by the \fB-h\fR option to get a
82 list of available options for that command.
85 This section lists the options that can be passed to \fBmailvisa\fR.
86 Some options are common to all commands, whereas others only apply to
91 Look for configuration files in \fIpath\fR. This includes the word
92 lists, the score file, the socket to connect to the daemon, and the pid
93 file. All these will be looked for in the directory specified by
94 \fIpath\fR, unless the specified filenames contain slashes.
97 List the options specific to the given command.
98 .SS Options to mailvisa add
101 Include \fBX-Spam:\fR headers in the analysis of messages. Normally,
102 these headers are skipped when analyzing messages.
105 Weed the wordlist every \fInum\fR words. This removes rare words from
106 the list, so that it doesn't become polluted with useless items (such as
107 message ids, for example). A value of \fB0\fR disables weeding.
110 Weed words that occur fewer than \fInum\fR times. The default is
112 .SS Options to mailvisa calculate
115 Load good words from \fIfile-fR (the default is \fBgood\fR).
118 Load bad words from \fIfile-fR (the default is \fBbad\fR).
121 Write scores to \fIfile-fR (the default is \fBscores\fR).
124 Multiply the number of good occurrences by \fInum\fR. This can be used
125 to bias the scores towards judging a message ham (for multipliers > 1.0)
126 or spam (for multipliers < 1.0). The default is \fB1.0\fR.
127 .SS Options to mailvisa check
130 Do not output the message or \fBX-Spam:\fR header. Only indicate the
131 decission in the exit status (0 for ham, 160 for spam).
134 Do not indicate whether a message is spam in the exit status.
137 Read \fInum\fR bytes at a time. The default is 16384.
140 Threshold for flagging messages as spam. This can be used to bias the
141 check in favor of judging messages as spam (for values < 0.5) or ham
142 (for values > 0.5). Useful values range between 0.0 and 1.0, the default
145 \fB-m\fR \fIcommand\fR
146 Pipe the output to \fIcommand\fR (analogous to \fBfetchmail(1)\fR's
147 option of the same name).
150 Use \fIpath\fR to connect to the daemon. The default is
151 \fBmailvisad.sock\fR.
152 .SS Options to mailvisa remove
155 Include \fBX-Spam:\fR headers in the analysis of messages. Normally,
156 these headers are skipped when analyzing messages.
157 .SS Options to mailvisa start
160 Use \fIfile\fR as the score file. Defaults to \fBscores\fR.
163 Log to \fIfile\fR. Default: \fBmailvisad.log\fR.
166 Use \fIfile\fR to store the pid (process id) of mailvisad. Defaults to
170 Open a socket for \fBmailvisa check\fR at \fIpath\fR. The default is
171 \fBmailvisad.sock\fR.
172 .SS Options to mailvisa view
175 Use \fIfile\fR as the score file. Defaults to \fBscores\fR.
178 Add all messages from the directory \fBmail/inbox\fR to the database of
181 .B mailvisa add good mail/inbox/*
183 Add all messages from the directory \fBmail/spam\fR to the database of
186 .B mailvisa add bad mail/spam/*
188 Calculate word scores and store them in the score file (using the
189 defaults of bad, good, and scores for the files containing bad words,
190 good words, and word scores, respectively):
192 .B mailvisa calculate
198 Check whether the message stored in \fBfoo\fR is spam:
200 .B mailvisa check < foo
202 The same, but suppressing the exit code:
204 .B mailvisa check -e < foo
206 The same, but suppressing the output (\fBX-Spam:\fR header and message)
209 .B mailvisa check -q < foo
211 Spam check a message from standard input and send it to
212 \fBprocmail(1)\fR for further processing (suppressing the exit code):
214 .B mailvisa check -e -m procmail
217 Mailvisa is open source, under the terms of the MIT license. A copy of
218 this license is contained in the file LICENSE in the source
219 distribution. Mailvisa was written by Robbert Haarman. See
220 \fIhttp://inglorion.net/\fR for contact information.