mailvisa.1

   1 .TH mailvisa 1 2005-10-23 mailvisa "Mailvisa Documentation"
   2 .SH NAME
   3 .B mailvisa
   4 \-
   5 simple bayesian spam filter
   6 .SH SYNOPSIS
   7 .PP
   8 .B mailvisa
   9 .I command
  10 [\fIoptions\fR]
  11 .SH DESCRIPTION
  12 .PP
  13 Mailvisa is a simple but effective Bayesian spam filter,
  14 inspired by Paul Graham's \fIA Plan For Spam\fR. It's main features are
  15 simplicity (so it's easy to tune), accuracy (high percentage of spam
  16 caught, no false positives), and speed, listed in order of priority.
  17 .SH OPERATION
  18 .PP
  19 The basic usage of \fBmailvisa\fR is checking whether a message is spam
  20 or not. By default, \fBmailvisa\fR reads a message on standard input,
  21 and writes it out on standard output with an \fBX-Spam:\fR header
  22 prepended that is set to \fBtrue\fR when Mailvisa thinks the message is
  23 spam, and to \fBfalse\fR otherwise; also, the exit status will be 0 for
  24 non-spam, 1 if an error occured, and 160 if the message is spam.
  25 .LP
  26 Internally, Mailvisa works by maintaining a database of good words, a
  27 database of
  28 bad words, and a score file, listing the scores for each word. The score
  29 file is generated from the word databases by calculating scores based on
  30 how often words occur in good and in bad messages. Messages are
  31 classified as spam or ham based on the scores file.
  32 .LP
  33 To gain performance,
  34 the score file is not loaded once for every invocation of
  35 \fBmailvisa\fR, but rather loaded once by a daemon process. To score a
  36 message, \fBmailvisa\fR connects to the daemon process, which does the
  37 actual scoring.
  38 .LP
  39 The following section details the commands you can use to manage the
  40 word databases, generate the score file, start the daemon, and check a
  41 message.
  42 .SH COMMANDS
  43 .PP
  44 To use \fBmailvisa\fR, you invoke it with a command indicating the
  45 action to perform. Available commands are:
  46 .TP
  47 \fBadd\fR
  48 Add messages to a message database.
  49 This command is used to add the words from a message to either the list
  50 of good words or the list of bad words.
  51 .TP
  52 \fBcalculate\fR
  53 Calculate scores and update score file.
  54 This command is used to calculate the scores for each word in the good
  55 and bad databases, and store the scores in a score file.
  56 .TP
  57 \fBcheck\fR
  58 Check if a message is spam.
  59 This is probably the command you will end up using most often.
  60 .TP
  61 \fBhelp\fR
  62 Display a help message.
  63 .TP
  64 \fBremove\fR
  65 Remove messages from a database.
  66 This command scans the given messages for words, and decrements the
  67 count for these words in the given database. This can be used to negate
  68 the effects of a previous \fBadd\fR command (e.g. if you accidentally
  69 added words to the wrong database).
  70 .TP
  71 \fBstart\fR
  72 Start the daemon process.
  73 The daemon process must be started before the \fBcheck\fR command can be
  74 used succesfully.
  75 .TP
  76 \fBview\fR
  77 View the spam scores associated with words.
  78 This command can be used to find out which words have high spam scores,
  79 and which ones have low spam scores.
  80 .LP
  81 Each of the commands can be followed by the \fB-h\fR option to get a
  82 list of available options for that command.
  83 .SH OPTIONS
  84 .PP
  85 This section lists the options that can be passed to \fBmailvisa\fR.
  86 Some options are common to all commands, whereas others only apply to
  87 specific commands.
  88 .SS Common Options
  89 .TP
  90 \fB-c\fR \fIpath\fR
  91 Look for configuration files in \fIpath\fR. This includes the word
  92 lists, the score file, the socket to connect to the daemon, and the pid
  93 file. All these will be looked for in the directory specified by
  94 \fIpath\fR, unless the specified filenames contain slashes.
  95 .TP
  96 .B -h
  97 List the options specific to the given command.
  98 .SS Options to mailvisa add
  99 .TP
 100 .B -i
 101 Include \fBX-Spam:\fR headers in the analysis of messages. Normally,
 102 these headers are skipped when analyzing messages.
 103 .TP
 104 \fB-w\fR \fInum\fR
 105 Weed the wordlist every \fInum\fR words. This removes rare words from
 106 the list, so that it doesn't become polluted with useless items (such as
 107 message ids, for example). A value of \fB0\fR disables weeding.
 108 .TP
 109 \fB-t\fR \fInum\fR
 110 Weed words that occur fewer than \fInum\fR times. The default is
 111 \fB1\fR.
 112 .SS Options to mailvisa calculate
 113 .TP
 114 \fB-g\fR \fIfile\fR
 115 Load good words from \fIfile-fR (the default is \fBgood\fR).
 116 .TP
 117 \fB-b\fR \fIfile\fR
 118 Load bad words from \fIfile-fR (the default is \fBbad\fR).
 119 .TP
 120 \fB-f\fR \fIfile\fR
 121 Write scores to \fIfile-fR (the default is \fBscores\fR).
 122 .TP
 123 \fB-m\fR \fInum\fR
 124 Multiply the number of good occurrences by \fInum\fR. This can be used
 125 to bias the scores towards judging a message ham (for multipliers > 1.0)
 126 or spam (for multipliers < 1.0). The default is \fB1.0\fR.
 127 .SS Options to mailvisa check
 128 .TP
 129 .B -q
 130 Do not output the message or \fBX-Spam:\fR header. Only indicate the
 131 decission in the exit status (0 for ham, 160 for spam).
 132 .TP
 133 .B -e
 134 Do not indicate whether a message is spam in the exit status.
 135 .TP
 136 \fB-b\fR \fInum\fR
 137 Read \fInum\fR bytes at a time. The default is 16384.
 138 .TP
 139 \fB-t\fR \fInum\fR
 140 Threshold for flagging messages as spam. This can be used to bias the
 141 check in favor of judging messages as spam (for values < 0.5) or ham
 142 (for values > 0.5). Useful values range between 0.0 and 1.0, the default
 143 is 0.5.
 144 .TP
 145 \fB-m\fR \fIcommand\fR
 146 Pipe the output to \fIcommand\fR (analogous to \fBfetchmail(1)\fR's
 147 option of the same name).
 148 .TP
 149 \fB-s\fR \fIpath\fR
 150 Use \fIpath\fR to connect to the daemon. The default is
 151 \fBmailvisad.sock\fR.
 152 .SS Options to mailvisa remove
 153 .TP
 154 .B -i
 155 Include \fBX-Spam:\fR headers in the analysis of messages. Normally,
 156 these headers are skipped when analyzing messages.
 157 .SS Options to mailvisa start
 158 .TP
 159 \fB-f\fR \fIfile\fR
 160 Use \fIfile\fR as the score file. Defaults to \fBscores\fR.
 161 .TP
 162 \fB-l\fR \fIfile\fR
 163 Log to \fIfile\fR. Default: \fBmailvisad.log\fR.
 164 .TP
 165 \fB-p\fR \fIfile\fR
 166 Use \fIfile\fR to store the pid (process id) of mailvisad. Defaults to
 167 \fBmailvisad.pid\fR.
 168 .TP
 169 \fB-s\fR \fIpath\fR
 170 Open a socket for \fBmailvisa check\fR at \fIpath\fR. The default is
 171 \fBmailvisad.sock\fR.
 172 .SS Options to mailvisa view
 173 .TP
 174 \fB-f\fR \fIfile\fR
 175 Use \fIfile\fR as the score file. Defaults to \fBscores\fR.
 176 .SH EXAMPLES
 177 .PP
 178 Add all messages from the directory \fBmail/inbox\fR to the database of
 179 good words:
 180 .IP
 181 .B mailvisa add good mail/inbox/*
 182 .PP
 183 Add all messages from the directory \fBmail/spam\fR to the database of
 184 bad words:
 185 .IP
 186 .B mailvisa add bad mail/spam/*
 187 .PP
 188 Calculate word scores and store them in the score file (using the
 189 defaults of bad, good, and scores for the files containing bad words,
 190 good words, and word scores, respectively):
 191 .IP
 192 .B mailvisa calculate
 193 .PP
 194 Start the daemon:
 195 .IP
 196 .B mailvisa start
 197 .PP
 198 Check whether the message stored in \fBfoo\fR is spam:
 199 .IP
 200 .B mailvisa check < foo
 201 .PP
 202 The same, but suppressing the exit code:
 203 .IP
 204 .B mailvisa check -e < foo
 205 .PP
 206 The same, but suppressing the output (\fBX-Spam:\fR header and message)
 207 instead:
 208 .IP
 209 .B mailvisa check -q < foo
 210 .PP
 211 Spam check a message from standard input and send it to
 212 \fBprocmail(1)\fR for further processing (suppressing the exit code):
 213 .IP
 214 .B mailvisa check -e -m procmail
 215 .SH COPYRIGHT
 216 .PP
 217 Mailvisa is open source, under the terms of the MIT license. A copy of
 218 this license is contained in the file LICENSE in the source
 219 distribution. Mailvisa was written by Robbert Haarman. See
 220 \fIhttp://inglorion.net/\fR for contact information.