urlwatch.1

   1 .TH URLWATCH "1" "May 2010" "urlwatch 1.10" "User Commands"
   2 .SH NAME
   3 urlwatch \- Watch web pages and arbitrary URLs for changes
   4 .SH SYNOPSIS
   5 .B urlwatch
   6 [\fIoptions\fR]
   7 .SH DESCRIPTION
   8 urlwatch watches a list of URLs for changes and prints out unified
   9 diffs of the changes. You can filter always-changing parts of websites
  10 by providing a "hooks.py" script.
  11 .SH OPTIONS
  12 .TP
  13 \fB\-\-version\fR
  14 show program's version number and exit
  15 .TP
  16 \fB\-h\fR, \fB\-\-help\fR
  17 show the help message and exit
  18 .TP
  19 \fB\-v\fR, \fB\-\-verbose\fR
  20 Show debug/log output
  21 .TP
  22 \fB\-\-urls\fR=\fIFILE\fR
  23 Read URLs from the specified file
  24 .TP
  25 \fB\-\-hooks\fR=\fIFILE\fR
  26 Use specified file as hooks.py module
  27 .TP
  28 \fB\-e\fR, \fB\-\-display\-errors\fR
  29 Include HTTP errors (404, etc..) in the output
  30 .SH ADVANCED FEATURES
  31 urlwatch includes some advanced features that you have to activate by creating
  32 a hooks.py file that specifies for which URLs to use a specific feature. You
  33 can also use the hooks.py file to filter trivially-varying elements of a web
  34 page.
  35 .SS ICALENDAR FILE PARSING
  36 This module allows you to parse .ics files that are in iCalendar format and
  37 provide a very simplified text-based format for the diffs. Use it like this
  38 in your hooks.py file:
  39
  40   from urlwatch import ical2txt
  41
  42   def filter(url, data):
  43       if url.endswith('.ics'):
  44           return ical2txt.ical2text(data).encode('utf-8') + data
  45       # ...you can add more hooks here...
  46 .SS HTML TO TEXT CONVERSION
  47 There are three methods of converting HTML to text in the current version of
  48 urlwatch: "lynx" (default), "html2text" and "re". The former two use
  49 command-line utilities of the same name to convert HTML to text, and the last
  50 one uses a simple regex-based tag stripping method (needs no extra tools).
  51 Here is an example of using it in your hooks.py file:
  52
  53   from urlwatch import html2txt
  54
  55   def filter(url, data):
  56       if url.endswith('.html') or url.endswith('.htm'):
  57           return html2txt.html2text(data, method='lynx')
  58       # ...you can add more hooks here...
  59 .SH "FILES"
  60 .TP
  61 .B ~/.urlwatch/urls.txt
  62 A list of HTTP/FTP URLs to watch (one URL per line)
  63 .TP
  64 .B ~/.urlwatch/lib/hooks.py
  65 A Python module that can be used to filter contents
  66 .TP
  67 .B ~/.urlwatch/cache/
  68 The state of web pages is saved in this folder
  69 .SH AUTHOR
  70 Thomas Perl <thp@thpinfo.com>
  71 .SH WEBSITE
  72 http://thpinfo.com/2008/urlwatch/