5 The basic format is one or more field names followed by a colon, followed by
6 one or more actions. Some actions take an optional or required parameter.
7 The actions are applied in the specified order to each field listed, and
8 fields can be listed in several lines.
12 desc1 : unhtml index truncate=200 field=sample
13 desc2 desc3 desc4 : unhtml index
14 name : field=caption weight=3 index
15 ref : field=ref boolean=Q unique=Q
16 type : field=type boolean=XT
18 Don't put spaces around the ``=`` separating an action and its argument -
19 current versions allow spaces here (though this was never documented as
20 supported) but it leads to a missing argument quietly swallowing the next
21 action rather than using an empty value or giving an error, e.g. this takes
22 ``hash`` as the field name, which is unlikely to be what was intended::
24 url : field= hash boolean=Q unique=Q
26 Since 1.4.6 a deprecation warning is emitted for spaces before or after the
32 index the text as a single boolean term (with prefix PREFIX). If
33 there's no text, no term is added. Omega expects certain prefixes to
34 be used for certain purposes - those starting "X" are reserved for user
35 applications. Q is reserved for a unique ID term.
38 generate terms for date range searching. If FORMAT is "unix", then the
39 value is interpreted as a Unix time_t (seconds since 1970). If
40 FORMAT is "yyyymmdd", then the value is interpreted as an 8 digit
41 string, e.g. 20021221 for 21st December 2002. Unknown formats,
42 and invalid values are ignored at present.
45 add as a field to the Xapian record. FIELDNAME defaults to the field
46 name in the dumpfile. It is valid to have more than one instance of
47 a given field: all instances will be processed and stored in the
51 Xapian has a limit on the length of a term. To handle arbitrarily
52 long URLs as terms, omindex implements a scheme where the end of
53 a long URL is hashed (short URLs are left as-is). You can use this
54 same scheme in scriptindex. LENGTH defaults to 239, which if you
55 index with prefix "U" produces url terms compatible with omindex.
58 split text into words and index probabilistically (with prefix PREFIX
62 split text into words and index probabilistically (with prefix PREFIX
63 if specified), but don't include positional information in the
64 database - this makes the database smaller, but phrase searching won't
68 reads the contents of the file using the current text as the filename
69 and then sets the current text to the contents. If the file can't be
70 loaded (not found, wrong permissions, etc) then a diagnostic message is
71 sent to stderr and the current text is set to empty. If the next
72 action is truncate, then scriptindex is smart enough to know it only
73 needs to load the start of a large file.
76 lowercase the text (useful for generating boolean terms)
79 Generate spelling correction data for any ``index`` or ``indexnopos``
80 actions in the remainder of this list of actions.
83 truncate to at most LENGTH bytes, but avoid chopping off a word (useful
84 for sample and title fields)
90 use the value in this field for a unique ID. If the value is empty,
91 a warning is issued but nothing else is done. Only one record with
92 each value of the ID may be present in the index: adding a new record
93 with an ID which is already present will cause the old record to be
94 replaced (or deleted if the new record is otherwise empty). You should
95 also index the field as a boolean field using the same prefix so that
96 the old record can be found. In Omega, Q is reserved for use as the
97 prefix of a unique term. You can use ``unique`` at most once in each
98 index script (this is only enforced since Omega 1.4.5, but older
99 versions didn't handle multiple instances usefully).
102 add as a Xapian document value in slot VALUESLOT. Values can be used
103 for collapsing equivalent documents, sorting the MSet, etc. If you
104 want to perform numeric sorting, use the valuenumeric action instead.
106 valuenumeric=VALUESLOT
107 Like value=VALUESLOT, this adds as a Xapian document value in slot
108 VALUESLOT, but it encodes it for numeric sorting using
109 Xapian::sortable_serialise(). Values set with this action can be
110 used for numeric sorting of the MSet.
113 set the weighting factor to FACTOR (an integer) for any ``index`` or
114 ``indexnopos`` actions in the remainder of this list of actions. The
115 default is 1. Use this to add extra weight to titles, keyword fields,
116 etc, so that words in them are regarded as more important by searches.
121 The data to be indexed is read in from one or more files. Each file has
122 records separated by a blank line. Each record contains one or more fields of
123 the form "name=value". If value contains newlines, these must be escaped by
124 inserting an equals sign ('=') after each newline. Here's an example record::
128 value=This is a multi-line
129 =value. Note how each newline
136 See mbox2omega and mbox2omega.script for an example of how you can generate a
137 dump file from an external source and write an index script to be used with it.
138 Try "mbox2omega --help" for more information.