Support for POST requests
authorThomas Perl <m@thp.io>
Sat, 20 Aug 2011 10:43:12 +0000 (20 12:43 +0200)
committerThomas Perl <m@thp.io>
Sat, 20 Aug 2011 10:43:12 +0000 (20 12:43 +0200)
This feature and the format in urls.txt has been
proposed by S├ębastien Fricker.

README
examples/urls.txt.example
lib/urlwatch/handler.py
urlwatch

diff --git a/README b/README
index edb3505..1a01e3e 100644 (file)
--- a/README
+++ b/README
@@ -44,6 +44,13 @@ Q: Is there a way to make the output more human-readable?
 Q: Is there a way to turn it into a diff of parsed HTML perhaps?
 A: Of course. See the example hooks.py file -> use html2txt.html2text(data)
 
+Q: Why do I get an error with URLs with spaces in them?
+A: Please make sure to URL-encode the URLs properly. Use %20 for spaces.
+
+Q: The website I want to watch requires a POST request. How do I send one?
+A: Add the POST data in the same line, separated by a single space. The format
+   in urls.txt is: http://example.org/script.cgi value=5&q=search&button=Go
+
 
 CONTACT
 -------
index 209b86e..c9bfe57 100644 (file)
@@ -19,3 +19,10 @@ http://guckes.net/cal/
 # You can use the pipe character to "watch" the output of shell commands
 |ls -al ~
 
+# If you want to use spaces in URLs, you have to URL-encode them (e.g. %20)
+http://example.org/With%20Spaces/
+
+# You can do POST requests by writing the POST data behind the URL,
+# separated by a single space character. POST data is URL-encoded.
+http://example.com/search.cgi button=Search&q=something&category=4
+
index 686f53b..7ec89f1 100755 (executable)
@@ -62,7 +62,7 @@ class JobBase(object):
         else:
             return sha.new(self.location).hexdigest()
 
-    def retrieve(self, timestamp=None, filter=None, headers=None):
+    def retrieve(self, timestamp=None, filter=None, headers=None, log=None):
         raise Exception('Not implemented')
 
 class ShellError(Exception):
@@ -90,7 +90,7 @@ def use_filter(filter, url, input):
 
 
 class ShellJob(JobBase):
-    def retrieve(self, timestamp=None, filter=None, headers=None):
+    def retrieve(self, timestamp=None, filter=None, headers=None, log=None):
         process = subprocess.Popen(self.location, \
                 stdout=subprocess.PIPE, \
                 shell=True)
@@ -105,12 +105,19 @@ class ShellJob(JobBase):
 class UrlJob(JobBase):
     CHARSET_RE = re.compile('text/(html|plain); charset=(.*)')
 
-    def retrieve(self, timestamp=None, filter=None, headers=None):
+    def retrieve(self, timestamp=None, filter=None, headers=None, log=None):
         headers = dict(headers)
         if timestamp is not None:
             timestamp = email.Utils.formatdate(timestamp)
             headers['If-Modified-Since'] = timestamp
-        request = urllib2.Request(self.location, None, headers)
+
+        if ' ' in self.location:
+            self.location, post_data = self.location.split(' ', 1)
+            log.info('Sending POST request to %s', self.location)
+        else:
+            post_data = None
+
+        request = urllib2.Request(self.location, post_data, headers)
         response = urllib2.urlopen(request)
         headers = response.info()
         content = response.read()
index 04d0de7..b820591 100755 (executable)
--- a/urlwatch
+++ b/urlwatch
@@ -229,7 +229,7 @@ if __name__ == '__main__':
                 timestamp = None
 
             # Retrieve the data
-            data = job.retrieve(timestamp, filter, headers)
+            data = job.retrieve(timestamp, filter, headers, log)
 
             if os.path.exists(filename):
                 log.info('%s exists - creating unified diff' % filename)