1 .TH URLWATCH "1" "February 2011" "urlwatch 1.12" "User Commands"
3 urlwatch \- Watch web pages and arbitrary URLs for changes
8 urlwatch watches a list of URLs for changes and prints out unified
9 diffs of the changes. You can filter always-changing parts of websites
10 by providing a "hooks.py" script.
14 show program's version number and exit
16 \fB\-h\fR, \fB\-\-help\fR
17 show the help message and exit
19 \fB\-v\fR, \fB\-\-verbose\fR
22 \fB\-\-urls\fR=\fIFILE\fR
23 Read URLs from the specified file
25 \fB\-\-hooks\fR=\fIFILE\fR
26 Use specified file as hooks.py module
28 \fB\-e\fR, \fB\-\-display\-errors\fR
29 Include HTTP errors (404, etc..) in the output
31 urlwatch includes some advanced features that you have to activate by creating
32 a hooks.py file that specifies for which URLs to use a specific feature. You
33 can also use the hooks.py file to filter trivially-varying elements of a web
35 .SS ICALENDAR FILE PARSING
36 This module allows you to parse .ics files that are in iCalendar format and
37 provide a very simplified text-based format for the diffs. Use it like this
38 in your hooks.py file:
40 from urlwatch import ical2txt
42 def filter(url, data):
43 if url.endswith('.ics'):
44 return ical2txt.ical2text(data).encode('utf-8') + data
45 # ...you can add more hooks here...
46 .SS HTML TO TEXT CONVERSION
47 There are three methods of converting HTML to text in the current version of
48 urlwatch: "lynx" (default), "html2text" and "re". The former two use
49 command-line utilities of the same name to convert HTML to text, and the last
50 one uses a simple regex-based tag stripping method (needs no extra tools).
51 Here is an example of using it in your hooks.py file:
53 from urlwatch import html2txt
55 def filter(url, data):
56 if url.endswith('.html') or url.endswith('.htm'):
57 return html2txt.html2text(data, method='lynx')
58 # ...you can add more hooks here...
61 .B ~/.urlwatch/urls.txt
62 A list of HTTP/FTP URLs to watch (one URL per line)
64 .B ~/.urlwatch/lib/hooks.py
65 A Python module that can be used to filter contents
68 The state of web pages is saved in this folder
70 Thomas Perl <thp.io/about>
72 http://thp.io/2008/urlwatch/