doc/swedb-grabber.txt

   1 .. Hey emacs, switch to -*- rst-mode -*-
   2
   3 ============================================
   4 Downloading TV-schedules from a Swedb server
   5 ============================================
   6
   7 Background
   8 ----------
   9
  10 The site `tv.swedb.se <http://tv.swedb.se>`_ was started in 2004 to
  11 provide TV-schedules for Swedish channels. It was originally only used
  12 for providing data via the grabber tv_grab_se_swedb (part of the
  13 `XMLTV Project <http://www.xmltv.org>`_), but since then a number of
  14 other programs have been written that utilize the same data files.
  15
  16 The data-format is based on the xmltv-format. It is generic and is not
  17 specific to Sweden in any way. We are hoping that data will be
  18 provided for other countries in the same format in the future, so that
  19 the same applications can be used in several different countries.
  20
  21 This document describes how to write a program that downloads data
  22 from the tv.swedb.se servers. Since the tv.swedb.se project is run on
  23 a voluntary basis with no income generated from the service, it is
  24 important to us that all our users behave properly and don't put an
  25 unnecessarily high load on our servers. Please follow the rules below
  26 if you want to use our data.
  27
  28 Data Layout
  29 -----------
  30
  31 Data is stored in a number of separate gzipped xml-files that can be
  32 downloaded from an http-server. To download data, you should start by
  33 retrieving the root-url. For tv.swedb.se, the root-url is
  34 `http://tv.swedb.se/xmltv/channels.xml.gz
  35 <http://tv.swedb.se/xmltv/channels.xml.gz>`_. The root-url for Sweden will
  36 likely remain the same for the foreseeable future, but there might be
  37 data-sources available in the future, so you should make the root-url
  38 user configurable.
  39
  40 This file describes which channels are available and where data can be
  41 found for each channel. A typical entry looks like this::
  42
  43   <channel id="svt1.svt.se">
  44     <display-name lang="sv">SVT1</display-name>
  45     <base-url>http://xmltv.tvsajten.com/xmltv/</base-url>
  46     <icon src="http://xmltv.tvsajten.com/chanlogos/svt1.svt.se.png"/>
  47   </channel>
  48
  49 The contents of the channel-entry is the same as specified by the
  50 xmltv-dtd with the addition of the base-url element. The base-url
  51 specifies where data for this particular channel can be found. Note
  52 that one base-url is specified for each channel. Right now, all
  53 channels use the same base-url, but this might change in the future.
  54 If a channel-entry specifies more than one base-url for the channel,
  55 the grabber shall use the first base-url.
  56
  57 The actual programs for each channel are stored in one file per
  58 channel and day in the location specified by the base-url for the
  59 channel. The name of each file is <id>_<yyyy-mm-dd>.xml.gz. As an
  60 example, the data for SVT1 on July 2nd, 2006, can be found at
  61 http://xmltv.tvsajten.com/xmltv/svt1.svt.se_2006-07-02.xml.gz
  62
  63 Each of these files follow the xmltv dtd, with the exception that they
  64 don't contain any <channel> elements.
  65
  66 A valid xmltv file can be constructed from the above data by removing
  67 all base-url fields from channels.xml.gz and outputting the relevant
  68 channel-entries concatenated with the contents of all program-files
  69 with the first and last lines omitted.
  70
  71 HTTP Caching
  72 ------------
  73
  74 All http-requests against swedb-servers must implement http-caching
  75 properly. The cache must be stored persistently. Each http-response
  76 from a swedb-server contains a Last-Modified field and/or an ETag
  77 field. These fields shall be used in subsequent requests for the same
  78 url as If-Modified-Since and If-None-Match respectively.
  79
  80 For a tutorial on http-caching, see
  81 `http://fishbowl.pastiche.org/2002/10/21/http_conditional_get_for_rss_hackers
  82 <http://fishbowl.pastiche.org/2002/10/21/http_conditional_get_for_rss_hackers>`_.
  83
  84 The reason for these caching requirements is that programme data
  85 change infrequently and by utilizing http caching, the bandwidth
  86 requirements for our servers decrease drastically.
  87
  88 Proper User-Agent
  89 -----------------
  90
  91 All http-requests must include a User-Agent value that is unique to
  92 this particular version of the grabbing application. The User-Agent
  93 shall consist of an alphanumeric string that is unique for the
  94 program, followed by "/" and an alphanumeric
  95 versionnumber. Optionally, more information may be added with a space
  96 after the version-number followed by an arbitrary string.
  97
  98 **Examples:**
  99
 100 - xmltv/0.5.44
 101 - AirTimes/0.9 (Symbian OS; MIDP-1.0 MIDP-2.0; CLDC-1.0; en)
 102
 103 The User-Agent gives us two advantages:
 104
 105 - It allows us to gather statistics of which grabbers are in use. We
 106   can then share these statistics with the grabber authors.
 107 - It allows us to block non-conforming grabbers.
 108
 109 We will always work with grabber authors before we decide to block a
 110 grabber. The reason that we may want to block a grabber is primarily
 111 that the grabber contains a bug that leads to unnecessarily high
 112 bandwidth usage, e.g. if the grabber fails to implement http-caching
 113 properly or requests data too often.
 114
 115 Update Interval
 116 ---------------
 117
 118 A grabber should normally download data at most once a day. If you
 119 feel that your particular grabber needs to download data more often
 120 than that, please contact us.
 121
 122 Update time
 123 -----------
 124
 125 If your application fetches data automatically, it must not have a
 126 hard-coded time at which it fetches data. The time must be
 127 user-configurable and it should be randomized as default. If a lot of
 128 users try to download data from our servers at the exact same time,
 129 our servers suffer a lot.
 130
 131 Parallel requests
 132 -----------------
 133
 134 An application may run up to two http-requests against the
 135 swedb-servers simultaneously, but not more than that.
 136