4 This describes the booki-zip format that Booki, Espri, and Objavi use
5 to communicate with each other.
10 A booki-zip file is a zip file[1], with certain restrictions. The
11 ultimate test of whether a zip is correctly encoded is whether its
12 contents can be extracted by the zipfile modules in Python 2.5 and
13 2.6. This means the contents must be either uncompressed or
14 deflate-compressed. ZIP64 extensions are OK (though unnecessary in
15 practical terms), but encryption and comments are not.
17 The first file in the zip should be uncompressed and named "mimetype".
18 It should contain only the 23 characters "application/x-booki+zip".
19 This string will end up in the first few bytes of the zip file,
20 allowing it to be identified without unzipping.
25 As well as the just mentioned "mimetype", the booki-zip must have a
26 file called "info.json" in its root directory, the contents of which
27 will be described shortly. Any other files in the root directory
28 should be html files intended for editing with Booki. Any associated
29 files that are not directly editable by Booki should be in a
30 subdirectory named 'static'. Here is an example structure:
40 BookSprints-ott-adam-en.jpg
45 All references from the html to the files in 'static' should use
46 relative addresses. For example, an image should be linked thus:
48 <img src="static/BookSprints-ott-adam-en.jpg" alt="" />
50 It is recommended but not required that the file names have
51 conventional extensions (".html", ".jpg", etc). File names should not
52 contain spaces, and must meet the restrictions imposed by the zip
55 There should be nothing in the root directory other than "mimetype",
56 "info.json", and the html files, and there should be no other
57 subdirectories other than "static". Apart from starting with
58 "mimetype", there is no required order to the arrangement of entries
59 within the zip file itself. Other than "mimetype", files should be
65 All html files, and info.json, should be encoded as utf-8.
70 The "info.json" file describes the structure of the document and
71 carries metadata. It is a JSON file [3], containing a single JSON
72 object with 5 members, as shown here:
82 Being JSON object members, the ordering of these elements is not
83 significant. The following order is for narrative purposes only.
88 This indicates which version of the booki-zip standard is being used.
89 This document describes version 1. If the version is not 1, nothing
90 else here necessarily applies.
95 The manifest is a mapping of identifiers to file names and mime-types.
96 Each entry looks like:
100 "mimetype": mimetype,
101 "contributors": contributors,
102 "rightsholders": rightsHolders,
106 The constraints on *identifier* match the XML name specification[4]
107 (in short, avoid spaces and most punctuation). In practise, the
108 *identifier* is often related to the *filename*.
110 *filename* locates the file within the zip, and must match a path in
113 *mimetype* is the IANA media type [5] of the file. Booki-editable
114 html files must be of type 'text/html', and other files should be
115 correctly identified.
117 *contributors* is a list of names of people who have contributed to this
118 file. It can be empty.
120 *rightsHolders* is a list of the people or organisation that manages
121 the rights for the chapter
123 *license* - a list of licenses applicable to the chapter. If more
124 than one license is listed, the disjunction of these licenses
125 applies. For the common licenses which have abbreviations listed in
126 the license section below, the abbreviation should be used. Other
127 licenses should be listed as an url, which could be a relative url to
128 a file in the booki-zip. Copyrighted files with no sharing license
129 should have an empty list ("[]"), and files out of copyright should
130 have "public domain" as their single member.
132 The manifest shouldn't list the 'mimetype' or 'info.json' files, just
133 the editable html and associated static files.
135 An example manifest, containing two html files and an image, is shown
140 "url": "Introduction.html",
141 "mimetype": "text/html",
142 "contributors": ["Adam Hyde", "Aleksander Erkalovic"]
143 "rightsholders": ["Adam Hyde"],
144 "license": ["CC-BY-SA"]
146 "arbitrary-identifier_0005": [
147 "url": "UseCases.html",
148 "mimetype": "text/html",
150 "rightsholders": ["Wikimedia Foundation"],
151 "license": ["FDL","CC-BY-SA"],
153 "BookSprints-ott-adam-en.jpg": [
154 "url": "static/BookSprints-ott-adam-en.jpg",
155 "mimetype": "image/jpeg",
156 "contributors": ["Ansell Adams"],
157 "rightsholders": ["Ansell Adams"],
166 The spine lists the identifiers of all the html files in the order
167 they appear in the book. It looks like:
169 "spine": [ identifier, identifier,... ]
171 where each *identifier* is the manifest identifier for an editable
174 Here is a possible spine for the manifest used in the previous
177 "spine": ["Introduction", "arbitrary-identifier_0005"]
182 The TOC (Table of Contents) specifies navigation points with the book.
183 It uses a nested structure, with less significant divisions being
184 contained within the "children" attribute of the greater division.
186 The "TOC" element itself is a list of objects with the following
190 "title": division title (optional),
191 "url": filename and possible fragment ID,
192 "type": string indicating division type (optional),
193 "role": epub guide type (optional),
194 "children": list of TOC structures (optional)
197 *title* is a free string giving the divisions title. It may be omitted.
199 *url* points to the start of the division. It should consist of a
200 filename as found in the manifest, optionally followed by a '#' and a
203 *type* is a string indicating what kind of navigation point it is.
204 This might be used to determine text styles.
206 *role*, if present, indicates the navigation point has a particular
207 structural role. It must be a keyword for "reference type" as
208 defined in the guide section of the epub OPF specification[6].
210 *children*, if present, contains a list of objects following this same
211 specification. These are subsections of this section.
217 "title": "INTRODUCTION",
218 "url": "Introduction.html",
219 "type": "booki-section",
222 "title": "WHAT IS GSoC?",
223 "url": "Introduction.html",
228 "title": "WHY GSOC MATTERS",
229 "url": "Testimonials.html",
240 The names in the metadata object are "namespaces" in which "keywords"
241 are defined. The objects referred to by keywords are further divided
242 by "scheme". Each scheme points to a list of values. If the keyword
243 is indivisible, there should be a single scheme identified by an empty
244 string (""). Further, if a scheme is the primary default for that
245 keyword, it may be identified by an empty string as well as by its
253 scheme: [value, value,...],
259 Booki uses Dublin Core[7] metadata keywords wherever possible, which are
260 stored under the namespace "http://purl.org/dc/elements/1.1/".
262 An example metadata section is shown below:
265 "http://purl.org/dc/elements/1.1/": {
267 "": ["FLOSS Manuals http://flossmanuals.net"]
273 "": ["The Contributors"]
276 "": ["Jennifer Redman", "Bart Massey", "Alexander Pico",
277 "selena deckelmann", "Anne Gentle", "adam hyde", "Olly Betts",
278 "Jonathan Leto", "Google Inc And The Contributors",
282 "": ["GSoC Mentoring"]
285 "start": ["2009-10-23"],
286 "last-modified": ["2009-10-30"]
289 "flossmanuals.net": ["http://en.flossmanuals.net/epub/GSoCMentoring/2009.10.23-19.49.01"],
290 "archive.org": ["gsocmentoring00fm"]
293 "": ["Copyright The Contributors. Licensed under the GPLv2. See Appendix.html in this zip file or http://www.gnu.org/licenses/gpl-2.0.txt for details"]
296 "http://booki.cc/": {
298 "": ["en.flossmanuals.net"]
301 "": ["GSoCMentoring"]
311 There must be "language", "creator", "identifier", and "title" Dublin
312 Core elements present. The "contributor" Dublin Core element should
313 list all contributors to individual files, as listed in the manifest,
314 unless those contributors are already identified in the "creator"
317 The Dublin Core "rights" element should contain a human readable
318 summary of the book's copyright and license.
320 The "http://booki.cc/" namespace can contain the following elements:
322 *server* the Booki or FLOSS Manuals server on which the book is
325 *book* the book's identifier on that server.
327 *dir* the primary text direction ('LTR' or 'RTL'). If unspecified,
328 'LTR' is assumed, though software may determine the text direction
329 by inspecting the contents.
331 *license* licenses used in the book, using if possible the
332 abbreviations in the next section. Multiple licenses listed here do
333 not necessarily indicate a disjunction of these licenses applies to
334 each file; rather it might mean each license applies to a different
335 subset of the files. All licenses used should be included in the
338 Other namespaces are permitted but will not be used by Booki. They
339 will, as far as possible, be preserved through Booki edits and be
340 exported to other formats.
342 license abbreviations
343 =====================
345 The following abbreviated license identifiers should be used for the
346 licenses defined at the corresponding URLs.
348 GPL http://www.gnu.org/licenses/gpl.txt
349 GPLv2 http://www.gnu.org/licenses/gpl-2.0.txt
350 GPLv2+ http://www.gnu.org/licenses/gpl-2.0.txt [or greater version]
351 GPLv3 http://www.gnu.org/licenses/gpl-3.0.txt
352 GPLv3+ http://www.gnu.org/licenses/gpl-3.0.txt [or greater version]
353 LGPL http://www.gnu.org/licenses/lgpl.txt
354 LGPLv2.1 http://www.gnu.org/licenses/lgpl-2.1.txt
355 LGPLv2.1+ http://www.gnu.org/licenses/lgpl-2.1.txt [or greater version]
356 LGPLv3 http://www.gnu.org/licenses/lgpl-3.0.txt
357 BSD http://www.debian.org/misc/bsd.license
358 MIT http://www.opensource.org/licenses/mit-license.html
359 Artistic http://dev.perl.org/licenses/artistic.html
360 CC-BY http://creativecommons.org/licenses/by/3.0/
361 CC-BY-SA http://creativecommons.org/licenses/by-sa/3.0/
362 public domain [no copyright]
364 Licenses not shown here should be listed as a URL that points to their
365 text. It is possible for the URL to point to a local file within the
366 booki-zip, but it is preferable to use a stable external link if one
372 [1] Zip specification: http://www.pkware.com/documents/casestudies/APPNOTE.TXT
373 [2] zipfile module: http://docs.python.org/library/zipfile.html
374 [3] JSON specification: http://json.org/
375 [4] XML name specification http://www.w3.org/TR/REC-xml/#NT-Name
376 [5] Media types http://www.iana.org/assignments/media-types/
377 [6] Guides in epub http://www.idpf.org/2007/opf/OPF_2.0_final_spec.html#Section2.6
378 [7] Dublin Core metadata elements http://dublincore.org/documents/2004/12/20/dces/