docs/batch.rst

   1 Batch jobs
   2 ==========
   3
   4 The system relies on a number of batch jobs that run on the webserver
   5 or on another machine, feeding data into the system. These batch jobs
   6 should generally be run under a user account that does *not* have write
   7 permissions in the web directories - any exceptions should be clearly
   8 noted here. Most of the jobs should run regularly from cron, some
   9 should be run manually when required.
  10
  11 All batch jobs are located in the directory tools/.
  12
  13 docs/docload.py
  14 ---------------
  15 This script will load a new set of documentation. Simply specify the
  16 version to load and point out the tarball to load from. The script
  17 will automatically decompress the tarball as necessary, and also
  18 perform HTML tidying of the documentation (since the HTML generated by
  19 the PostgreSQL build system isn't particularly standards-conforming or
  20 nice-looking).
  21
  22 ftp/spider_ftp.py
  23 -----------------
  24 This script needs to be run on the *ftp server*, not on the
  25 webserver. It will generate a python pickle file that is then automatically
  26 uploaded to the webserver, which will write it down (thus, this is
  27 the one directory where the webserver does need write permissions).
  28 The IP address of the machine(s) allowed to upload the ftp pickle
  29 are defined in settings.FTP_MASTERS.
  30
  31 moderation/moderation_report.py
  32 -------------------------------
  33 This script enumerates all unmoderated objects in the database and
  34 generates an email to the NOTIFICATION_EMAIL address if there are any
  35 pending, to prod the moderators to do their job.
  36
  37 rss/fetch_rss_feeds.py
  38 ----------------------
  39 This script will connect to all the RSS feeds registered in the RSS
  40 application and fetch their articles into the database. It's not very
  41 accepting of strange RSS feeds - it requires them to be "nicely
  42 formatted". Usually that's not a problem since we only pull in the
  43 headlines and not the contents. For a more complete RSS fetcher that
  44 stores the data in a PostgreSQL database, see the "hamn" project that
  45 powers planet.postgresql.org - also available on git.postgresql.org.