1 Omega 1.2.24 (2016-09-16):
5 * Drop unused configure check for symbol visibility.
7 Omega 1.2.23 (2016-03-28):
11 * Update links to Xapian website and trac to use https, which is now supported,
12 thanks to James Aylett.
16 * Fix HTML/XML entity decoding to be O(n) not O(n²) - processing HTML/XML with
17 a lot of entities is now much faster.
21 * Remove unused country code to name maps. These were intended as examples,
22 but they aren't very useful as such, and really just bloat the templates
25 Omega 1.2.22 (2015-12-29):
29 * Stop maintaining ChangeLog files. They make merging patches harder, and stop
30 'git cherry-pick' from working as it should. The git repo history should be
31 sufficient for complying with GPLv2 2(a).
33 * Clarify help text for omindex --mime-type option.
35 * docs/omegascript.rst:
37 + Fix documentation of $last to say it's the MSet index *one beyond* the end
38 of the current page. Reported by Andrew Chilton.
40 + Clarify that $split and $substr work in bytes. Previously we said
41 "characters" which could be taken as meaning they work with UTF-8
44 + Update documentation for $filters - it was missing these CGI parameters
45 from the list of those serialised: COLLAPSE, DOCIDORDER, SORT, SORTREVERSE,
48 + Explicitly note user can use $setmap to create their own maps.
52 + SVG extraction is built-in too.
54 + Expand paragraph about command `false`. Note the versions where explicit
55 support was added, and that this will also work with any version on Unix,
56 where `false` is a command.
60 * docs/cgiparams.rst: Document behaviour if xDB is not set.
62 * Change "characters" to "bytes" in a few places to clarify that we don't mean
69 + Add '--title-size' option.
71 + Handle .oft the same way as .msg - it's some sort of template email, and
72 has essentially the same format.
76 * Make $querydescription ensure the match has been run, so that it includes
79 * Avoid $allterms, $cgilist, $filterterms and $terms being O(n²) in the number
80 of items in the returned list.
82 * If xFILTERS is not set, don't force the first page as that's unhelpful if
83 someone fails to set it in their template.
85 * When environment variable SERVER_PROTOCOL is set to INCLUDED (as it is when
86 we're being included in a page), we already suppress the HTTP headers, but
87 now we suppress the blank line after the header too.
89 * Support option flag_cjk_ngram if built against xapian-core >= 1.2.22.
93 * Add test coverage for parsing of HTML entities.
97 * Fix error reporting if PCRE isn't installed. Fixes #693, reported by lhz7370.
101 * Avoid warning when building with glibc >= 2.21.
103 * Don't provide our own implementation of sleep() under __WIN32__ if there
104 already is one - mingw provides one, and in some situations it seems to clash
105 with ours. Reported to xapian-discuss by John Alveris.
107 * Stop trying to use O_STREAMING - the patch to implement it was never merged
108 into the Linux kernel, and I can't find any evidence that other platforms
109 implement it. The constant value O_STREAMING used now seems to be used for
110 the part of O_SYNC which isn't covered by O_DSYNC, which seems likely to hurt
111 performance if anything.
113 Omega 1.2.21 (2015-05-20):
117 * docs/overview.rst: Document 'E' prefixed boolean terms for filtering by
118 extension (see #668, reported by bramvdh).
120 * docs/encodings.rst: Add a document about character encoding, as suggested by
121 James Aylett in #550.
127 + outlookmsg2html: Fix handling of message/rfc822 subparts.
131 * $prettyurl now decodes valid UTF-8 sequences, and some additional ASCII
132 characters in the path part: []@!$&'()*+.;= (Fixes #550 and #644, reported by
135 * $prettyurl now leaves the query and fragment parts of the URL alone and won't
136 decode an escaped "/" (omindex doesn't create URLs with any of these, so we
137 only risk breaking other URLs which have them).
139 * Drop compilation date and time from output when run from the command line -
140 they prevent reproducible builds and the version number is sufficient
145 * templates/query: When listing matching terms, don't make the commas italic.
147 * templates/query: Eliminate blank line before <html>.
149 * templates/xml: Add XML declaration.
151 * templates/godmode: Specify charset utf-8 in the content-type.
155 * Link test programs with libtool's '-no-install' or '-no-fast-install', like
156 we already do in xapian-core, which means that libtool doesn't need to
157 generate shell script wrappers for them on most platforms.
161 * Add spaces between literal strings and macros which expand to literal strings
162 for C++11 compatibility.
164 * Remove 'register' as it's deprecated and clang spits out warnings because of
165 that. Any modern compiler likely just ignores it as an optimisation hint
168 Omega 1.2.20 (2015-03-04):
172 * docs/cgiparams.rst: Improve wording of docs for SORT parameter.
174 * docs/omegascript.rst: Update documentation references to DATE1, DATE2, and
175 DAYSMINUS which were renamed in 0.6.x and the compatibility aliases removed
182 + Ignore extensions .msi and .msp, which are Microsoft installer files, but
183 which libmagic sometimes incorrectly identifies as application/msword.
185 + Interpret a command of "false" in "--filter" as meaning to ignore files
190 * Handle CGI parameter [=0 as [=1.
194 * templates/xml: Update handling of DATE1, DATE2 and DAYSMINUS which were
195 renamed in 0.6.x and the compatibility aliases removed in 1.0.0.
199 * configure: Use pkg-config in preference to determine flags needed to
200 compile and link with PCRE, as this will just work when cross-compiling
201 (at least under MXE).
203 * configure: Define MINGW_HAS_SECURE_API under mingw to get _putenv_s()
204 declared in stdlib.h.
206 * Enable automake option 'subdir-objects' to avoid warning from newer automake.
210 * Avoid doing link tests with libmagic in configure as they fail on mingw due
211 to not automatically picking up libraries which libmagic itself depends on.
213 Omega 1.2.19 (2014-10-21):
217 * docs/overview.rst: Note that pdftotext is part of poppler as well as xpdf.
220 Omega 1.2.18 (2014-06-22):
226 + Work around libmagic returning a MIME content-type of "Composite Document
227 File V2 Document[...]" or "application/CDFV2-corrupt" by returning a more
228 suitable filetype based on looking at the file's extension.
230 + The starting URL wasn't previously URL encoded. In 1.3.2, this will be
231 fixed by URL encoding it as we do for the rest of the path, for the 1.2
232 branch we only URL encode it if it contains a character <= 31 or at least
233 one of '#', '%', ':' or '?'. This avoids a one-off reindex of every
234 document in the database in cases which work OK in practice.
236 + When we skip a file because it exceeds the configured size limit, include
237 that size limit in the message.
241 * Add support for setting the query expansion scheme to use.
245 * Don't compile in unixperm.cc - it isn't currently used, and it fails to build
246 with mingw. (fixes #635, reported by Alexis Denis)
248 * Fix warning when built with GCC 4.7.2 using -Os.
250 * Removed unused inline function, fixing compiler warning.
252 Omega 1.2.17 (2014-01-29):
256 * docs/overview.html: Add Abiword as an example use of --filter, based on patch
257 from Frank J Bruzzaniti (fixes#383).
261 * Fix "no previous declaration" warning on platforms which don't have
264 Omega 1.2.16 (2013-12-04):
270 + Fix off-by-one when finding documents to delete which would sometimes cause
271 omindex to fail to delete documents from the database when they weren't
272 refound during an index update.
274 + Decode dates in xlsx files.
276 + Ignore extensions 'adm', 'cur', and 'ico' by default.
278 + Group-readable files which are owner-readable but not world-readable should
279 still get a "readable by owner" term added. Reported by Emmanuel Garette.
283 * Compress source tarballs with xz instead of gzip.
285 * configure: Sync compiler warning flag machinery against xapian-core. The
286 changes are special handling for clang, passing -fshow-column where
287 supported, and handling for new warning flags in GCC 4.6 and 4.7.
289 Omega 1.2.15 (2013-04-16):
293 * Don't pointlessly link utf8convert.o into the omega CGI.
295 Omega 1.2.14 (2013-03-14):
301 + Correct "max" -> "min" when reserving space for shared strings in .xlsx
302 files. This just means we now reserve a more appropriate amount of space
305 + Ignore .com files by default.
307 Omega 1.2.13 (2013-01-09):
313 + Extracting text using external filters now works for filenames containing a
314 newline character - previously the newline got lost during escaping for the
317 + Fix segfault when -F option without a ':' is passed.
319 + Skip a file if we get a read error while calculating the MD5 checksum (used
320 for duplicate detection) - previously we used a checksum of the file up to
323 + Avoid rereading SVG and Atom files when we calculate their MD5 checksums.
325 + Improvement --help output and man page, most notably:
327 - Say explicitly that --sample-size accepts the same formats as --max-size.
329 - Note default size limit on files to index is unlimited.
331 + When generating a sample for a CSV file, limit the size we pre-allocate to
332 the CSV file size if that's smaller than the requested sample size, in case
333 the user sets that limit very high.
337 * Fix to decode %-encoded character at the end of the query string.
341 * INCLUDES is now deprecated in automake, so use AM_CPPFLAGS instead.
343 Omega 1.2.12 (2012-06-27):
345 No changes since 1.2.11 except to bump the version - this release was made to
346 fix an incorrect library version information update in xapian-core 1.2.11.
348 Omega 1.2.11 (2012-06-26):
352 * Change HTML parser's handling of multiple <body> tags and of text outside of
353 <body> to match the behaviour of modern web browsers. (ticket#599)
357 + Add command line option to control the size of the document sample stored.
358 Patch from Mihai Bivol.
360 + Rework .xlsx parsing to substitute the shared strings into the positions
361 they are used in, so that the sample actually matches what appears in the
362 spreadsheet, and to index calculated cell contents.
364 + Improve handling of headers and footers in OpenDocument documents.
366 + pdftotext outputs a formfeed between each page, which messes up our "empty
367 body" check, so trim any trailing formfeeds before this check.
371 * Don't explicitly link indirect shared library dependencies on FreeBSD,
372 OpenBSD, and Solaris.
374 Omega 1.2.10 (2012-05-09):
378 * Add support for CDATA to HTML/XML parser.
382 + Add --max-size option, based on patch from ndaley in ticket#587.
384 + Add support for atom feed files, patch from Mihai Bivol in ticket#595.
386 + If the document with the highest existing docid before the run was updated,
387 we were reporting it as "added", but now we correctly report it as
388 "updated". (Backported from 1.3.0).
390 + Catch and report std::exception explicitly, so failing to allocate memory
391 is no longer reported as "Unknown exception". (Backported from 1.3.0).
397 * Fix to build with GCC 4.7 by adding cast to rlim_t to fix error about C++11
398 compatibility (reported by Gaurav Arora).
400 Omega 1.2.9 (2012-03-08):
404 * docs/overview.html:
406 + Document that libmagic is used to determine the MIME type if the extension
407 isn't known. Partly addresses ticket#569.
409 + We now limit time as well as CPU and memory for external filters.
413 * Our HTML parser now ignores sections bracketed by <!--UdmComment--> and
414 <!--/UdmComment-->, like we already do for <!--htdig_noindex-->.
416 * omindex: Add more extensions to the default ignore list: bin dat db fon jar
417 lnk pyc pyd pyo sqlite sqlite3 sqlite-journal tmp ttf
419 Omega 1.2.8 (2011-12-13):
423 * scriptindex.cc: Add link to http://xapian.org/docs/omega/scriptindex.html to
424 --help output (and so also to the man page which is generated from this).
426 * omegascript.html: Add note to discourage use of percentage scores.
432 + If we don't get any data from an external filter for 5 minutes, give up -
433 it has probably ended up blocked indefinitely.
435 + Improve --help output (and man page which is generated from it). Closes
440 + If no rules are found in the index script, report an error and give up -
441 this is inevitably the result of a mistake, and adding empty documents to
442 the database isn't helpful.
446 + Add new $prettyurl{} command which undoes RFC3986 URL escaping which
447 doesn't affect semantics in practice. Partly addresses ticket#550.
449 + Replace URL decoder with new implementation which handles various corner
450 cases better. Fixes bug#578.
452 + If CGI parameter P has trailing spaces, we now remove them all rather than
457 * templates/query: HTML escape topterms.
459 * templates/godmode: HTML escape the contents of document values.
461 * templates/query: Don't show the percentage score in the default template.
465 * Add new urlenctest unit test of URL encoding and decoding.
469 * configure: Sync changes from xapian-core: Don't pass -Wshadow for GCC < 4.1;
470 don't pass -Wstrict-null-sentinel for GCC 4.0.x; only enable symbol
471 visibility on platforms where it is supported.
475 * xapian-omega.spec: Package outlookmsg2html helper.
477 Omega 1.2.7 (2011-08-10):
481 * docs/termprefixes.html: Document how to map a user prefix to multiple term
484 * docs/overview.html: Improve documentation of htdig_noindex.
488 * Improve $version output from "Xapian - xapian-omega 1.2.7" to "xapian-omega
493 * xapian-omega.spec: We're ABI compatible within a release series so make
494 dependency on xapian-core-libs >= rather than =.
496 Omega 1.2.6 (2011-06-12):
500 * docs/omegascript.html: Correct the documentation of the colours used by
503 * docs/overview.html: Add using unoconv as more complex example of using
504 --filter (ticket#324).
510 + Make search query input type=search.
512 + Autofocus the search query input (using HTML autofocus attribute with
513 Javascript fallback for older browsers). (ticket#544)
517 * Fix a compiler warning.
519 Omega 1.2.5 (2011-04-04):
523 * Add index page which links to all the other documentation pages.
525 * INSTALL: Copy new Multi-Arch section from xapian-core/INSTALL. Replace VPATH
526 section with better equivalent from Xapian-core/INSTALL.
528 * docs/omegascript.html: Minor improvements.
532 * The HTML parser no longer uses an exception to signify it has finished in
533 the normal case as exceptions are typically costly to handle. In tests,
534 this made omindex ~0.23% faster when indexing a lot of HTML files.
538 + Add --ignore-exclusions option, which will index HTML files despite meta
539 robots tags, etc - omindex is often used in environments where such
540 exclusions aren't relevant.
542 + Fix to compile with older versions of libmagic which don't have
543 MAGIC_MIME_TYPE (e.g. on Ubuntu hardy).
545 + Tell xls2csv to separate fields with spaces rather than commas, and not to
546 quote them. Fixes indexing of numeric fields, and means we don't need to
547 use our CSV parser to get a sample.
549 + Add whitespace between chunks of text extracted from Microsoft Office 2007
550 formats to prevent words in adjacent chunks from being run together.
552 + Encode reserved characters in URLs - links to files with names containing
553 '#' and '?' now work.
555 + Handle .xlr extension the same way as .xls (later Microsoft Works versions
556 apparently produce such files which are really the same format).
558 + Index filename extension with new standard prefix E.
560 + Just report the mimetype as unknown instead of saying "unknown Office 2007
563 + Ignore *.css and *.js by default too.
565 + Messages reporting skipping files are now more consistent and always report
568 + New --empty-docs option to allow documents we extract no body text from to
569 be indexed (existing behaviour), skipped, or reported and then indexed.
573 * Fix double Content-Type header in some error reporting situations (regression
574 introduced in 1.2.4).
576 * Update $url's URL encoding to follow RFC3986.
578 * Allow QueryParser flags to be set from OmegaScript (ticket#418). The
579 FLAG_SPELLING_CORRECTION flag can now be set using
580 $opt{flag_spelling_correction,1} - the old $opt{spelling,true} way to
581 enable this flag still works, but it now deprecated.
585 * templates/emptydocs,templates/godmode,templates/opensearch,templates/query,
586 templates/xml: Add missing escaping. Some of these instances may allow
587 cross-site scripting, so upgrading your templates is recommended, especially
588 if you have any sensitive cookies set on the domain Omega is running on.
592 + Try $field{caption} (which is what omindex sets) before $field{title} when
593 getting a value for the hit tag's title attribute - this is consistent with
594 how the query template gets the title.
596 + Add new 'type' attribute which gives $field{type}.
598 + Add 'DBSize' attribute to <result> element.
600 + Fix double escaping of matching terms. This is only likely to affect cases
601 where a matching term contains '&'.
603 + Remove support for undocumented HILITECLASS CGI variable. There's no
604 evidence I can find using Google code search or web search that this has
605 been used anywhere, and it's difficult to handle escaping it properly in
606 the face of all the ways it could reasonably be used.
610 * Fix to compile on Microsoft Windows (ticket#350).
612 Omega 1.2.4 (2010-12-19):
616 * Minor documentation improvements.
620 * Some iconv implementations (such as that on Mac OS X) don't handle many of
621 the commonly seen mis-punctuated charset names (e.g. UTF16, UTF_16). We now
622 check for this if iconv fails, fix up the charset name, and retry.
624 * The built-in character encoding converter now handles spaces in charset
627 * Use O_NOATIME if available and either the file is owned by the current euid,
628 or the current euid is 0 (i.e. we're running as root). This avoids updating
629 the access time of files we index which saves time. Fixes ticket#222.
631 * Report get_description() for Xapian exceptions, which provides additional
632 information above get_msg().
634 * Add boolean terms with add_boolean_term() so they get wdf of 0 and don't
635 contribute to document length.
639 + Escape wildcard patterns being passed to unzip - in the unlikely event that
640 one of these matched files in or under the current directory, we might fail
641 to extract all the files we wanted to.
643 + Add explicit support for indexing CSV files (better samples than from
644 using '-Mcsv:text/plain').
646 + Add support for indexing .msg files from Microsoft Outlook (using the Perl
647 module Email::Outlook::Message. (ticket#334)
649 + Improve --help for --mime-type option.
651 + Optionally use libmagic to detect MIME types for files for which we have no
652 extension mapping, which allows us to handle files with a misleading
653 extension, or no extension at all. (ticket#114)
655 + Add new --filter option which allows the user to specify new filters
656 provided they return UTF-8 text on stdout.
658 + If a filter command isn't installed, previously we wouldn't try it again
659 for the same file extension - now we won't try it again for the same
662 + Index the leafname of the file (without any extension) as extra keywords.
664 + Extract author from HTML, OpenDocument, and PDF files. Index it with an A
665 prefix, and add it as a field.
667 + Add support for indexing text and metadata from SVG files.
669 + Extract metadata from Microsoft Office 2007 file formats.
671 + Index text in headers and footers for .odt and .docx files.
673 + Use the CSV parser to generate a nicer sample for files of type
674 application/vnd.ms-excel.
676 + Add support for indexing Debian and RPM package files (ticket#493).
678 + Make the memory limit for filter processes the size of physical memory,
679 which is a little less arbitrary than 7/8 of this value (ticket#424).
681 + Under --duplicate=ignore, fix so that old documents which aren't seen get
682 deleted, which wasn't implemented before (to suppress this deletion, pass
685 + Rename the short option for --version from -v to -V for consistency with
686 scriptindex and many other packages, and to free up -v as the short option
687 for --verbose. For backward compatibility, "omindex -v" is handled
688 specially and still reports the version.
690 + Add --verbose option, and disable the less interesting output unless it is
693 + Deprecate "--preserve-nonduplicates" in favour of new long option
694 "--no-delete" which does the same thing, but has a clearer name.
696 + The deletion of documents pass at the end of indexing is now more
697 efficient. We track how many documents in the database we haven't seen so
698 we can stop once we've found them all (a particularly big improvement if
699 there are no documents to delete), and we now use a PostingIterator over
700 all documents which avoids needing to catch an exception for every gap in
701 the used document ids.
703 + Quietly ignore files with mimetype set to "ignore". The initial list of
704 extensions set to ignore is: .a .dll .dylib .exe .lib .o .obj .so
706 + Index file owner and read permissions, to allow finding documents with a
707 particular owner, and so searches can be restricted to documents a user is
710 + Add file size as a document value, so you can sort on it and filter by it.
714 + Fix file descriptor leak if the LOADFILE action is used on something which
719 * Make sure we write out HTTP headers when reporting an error early on.
721 * Extend $field to take an optional DOCID argument, rather than always using
722 the context from $hitlist.
724 * Add new $emptydocs command which returns a list of documents with doclength
727 * Add support for size: range filtering. Currently the end points of the range
728 have to be specified in bytes (e.g. size:102400..204800 for 100-200KB).
732 * templates/emptydocs: New template which lists documents with doclength zero.
736 * configure: Probe for any options needed to enable large file support.
737 Handling files >= 2GB isn't especially useful, but more importantly this is
738 needed to allow omindex to index files on filing systems with 64 bit inodes
739 on some platforms (e.g. 32-bit Linux).
741 * Use -no-undefined on platforms which need it to dynamically link such as
742 cygwin (need to do this taken from ticket#282).
746 * Fix to compile with Sun C++.
748 Omega 1.2.3 (2010-08-24):
752 * docs/termprefixes.html: Update "flint and quartz" to "flint and chert" as
753 quartz is no longer supported. Give exact term length limit for flint and
758 * xapian-omega.spec: Don't run autoreconf - it's no longer required.
760 Omega 1.2.2 (2010-06-27):
764 * Apply getopt portability fixes from xapian-core 1.2.0, fixing build failures
765 on Mac OS X (and probably some other platforms with non-GNU getopt
766 implementations). (ticket#469)
768 Omega 1.2.1 (2010-06-22):
770 This release includes all changes from 1.0.21 which are relevant.
772 Omega 1.2.0 (2010-04-28):
774 This release includes all changes from 1.0.20 which are relevant.
778 * configure: Tell libtool not to link in deplibs on platforms where we know
781 * configure: On Linux, extract the library search path from ldconfig which
782 gives us the default entries reliably.
784 Omega 1.1.5 (2010-04-15):
786 This release includes all changes from 1.0.19 which are relevant.
788 Omega 1.1.4 (2010-02-15):
790 This release includes all changes from 1.0.18 which are relevant.
794 * Use the optimised integer to string conversion routines from xapian-core.
796 Omega 1.1.3 (2009-11-18):
798 This release includes all changes from 1.0.15-1.0.17 which are relevant.
802 * templates/query: If JavaScript is available, convert $field{modtime} to a
803 string on the client-side so that the timezone is correct. If JavaScript
804 isn't available, fall back to the existing behaviour of using UTC.
809 * configure: Default to looking for xapian-config-1.1 unless XAPIAN_CONFIG is
812 Omega 1.1.2 (2009-07-23):
814 This release includes all changes from 1.0.14 which are relevant.
820 + Handle the "macroenabled" versions of MS Office 2007 files too
823 + Extract pptx notesSlides and comments, if present. (ticket#290).
825 Omega 1.1.1 (2009-06-09):
827 This release includes all changes from 1.0.13 which are relevant.
833 + Check the last modification time of files before reindexing (ticket#342).
835 + Add "--spelling" option to index spelling correction data.
839 + Add new "spell" action for indexing spelling correction data (ticket#296).
843 * Add $suggestion and $opt{spelling} to provide access to spelling correction
846 * Add $opt{weighting} to allow the weighting scheme and parameters to be
847 specified (ticket#298).
849 * If SERVER_PROTOCOL in the environment is set to INCLUDED, then our output is
850 being included in another page (e.g. using SSI) so suppress the output of any
855 * templates/query: Offer any spelling correction QueryParser gives.
859 * configure: Sync warning flags used with GCC with xapian-core apart from
860 -Woverloaded-virtual which fires for MyHtmlParser::parse_html(). That
861 probably should be tidied up at some point, but not right now.
863 Omega 1.1.0 (2009-04-23):
869 + Make deprecated "index=nopos" an error.
873 * New OmegaScript command $transform{} which performs regular expression
874 substitutions using the PCRE library (which is now required to build Omega).
879 * The build system is now bootstrapped with newer versions of autoconf and
880 libtool which should produce smaller files and speed up configure and
883 Omega 1.0.23 (2011-01-14):
889 + Escape wildcard patterns being passed to unzip - in the unlikely event that
890 one of these matched files in or under the current directory, we might fail
891 to extract all the files we wanted to when indexing document formats like
892 OpenDocument which use a zip file container.
894 + The parser for OpenDocument metadata wasn't initialising its "state" field.
895 Often you'd be lucky and it would be initialised to zero, but this could
896 have caused misparsing of metadata in some cases.
898 * scriptindex: Fix file descriptor leak if the LOADFILE action is used on
899 something that isn't a file.
901 * If fstat() fails when trying to load a file, preserve the errno value from
902 the fstat call to report to the user.
906 * configure: Probe for any options needed to enable large file support.
907 Handling files >= 2GB isn't especially useful, but more importantly this is
908 needed to allow omindex to index files on filing systems with 64 bit inodes
909 on some platforms (e.g. 32-bit Linux).
911 * Add -no-undefined to AM_LDFLAGS on platforms which need it to dynamically
912 link such as cygwin (need to do this taken from ticket#282).
914 Omega 1.0.22 (2010-10-03):
918 * Fix to compile with Sun C++.
920 Omega 1.0.21 (2010-05-18):
924 * Fix build failure in freemem.cc on Microsoft Windows.
926 Omega 1.0.20 (2010-04-27):
930 * Fix build failure on Mac OS X and possibly some other platforms (regression
931 caused by fix for getopt-related warnings on Cygwin in 1.0.19).
933 Omega 1.0.19 (2010-04-15):
937 * Fix getopt-related warning on Cygwin.
939 Omega 1.0.18 (2010-02-14):
943 * Make the default charset "utf-8" not "UTF-8" as we lower case explicitly
944 specified character sets to compare to see if we need to reparse. Previously
945 XML documents which explicitly specified their character set as UTF-8 would
946 cause needless restart or the parser.
950 + Increase the wdf boost for the document title from 2 to 5, since 2 isn't
955 + Don't abort with "Unknown Exception" if indexing is disallowed or we hit
956 </body> for a document which had an overridden character set. Fixes
959 Omega 1.0.17 (2009-11-18):
965 + On Linux, change the memory limit on external filters to use _SC_PHYS_PAGES
966 since _SC_AVPHYS_PAGES excludes pages used by the OS cache and so will
967 often report a really low value. Fixes Debian bug#548987 and ticket#358.
969 + Fix likely crash when reading output from external filter program if read()
970 is interrupted by a signal.
972 + Fix potential crash when indexing PostScript files (fixed by using delete[]
973 (not delete) for array allocated by new[]).
977 * utf8converttest: Charset "8859_1" isn't understood by Solaris libiconv, and
978 isn't a standard charset name, so just test it when using our built-in
979 converter and GNU libc.
983 * Fix build failure on Mac OS X 10.6.
985 * Also check for socketpair() in -lxnet if it isn't found without, which
986 enables resource limits on external filter programs called by omindex on
987 Solaris, and possibly some other platforms. Fixes ticket#412.
989 Omega 1.0.16 (2009-09-10):
991 * omega: Fix cross-site scripting vulnerability in reporting of exceptions
994 Omega 1.0.15 (2009-08-26):
998 * omegascript.vim: The list of OmegaScript commands in the vim mode was rather
999 out of date, and a few commands were misclassified. Fix both problems and
1000 avoid future recurrences by automatically generating those lists from the
1001 command list in query.cc.
1005 * omegascript.html: Document that $date uses UTC. (ticket#314)
1009 * query: Link to "xapian.org" rather than "www.xapian.org".
1011 * inc/toptermsjs: Use double-quotes rather than single quotes for parameter
1012 values on the <script> tag.
1016 * omindex: Implement correct handling of paths when calling external filter
1017 programs on Microsoft Windows.
1019 Omega 1.0.14 (2009-07-21):
1023 * omindex: Make sure that output is flushed after every message, not just after
1028 * Avoid infinite loop in omindex and scriptindex when reading files under
1029 Cygwin with automatic end of line translation enabled. This same bug can
1030 also manifest on Unix platforms if the file is truncated by another process
1033 Omega 1.0.13 (2009-05-23):
1039 + If the filter program needed for a file format isn't installed, report this
1040 explicitly when skipping subsequent files with the extension instead of
1041 misleadingly reporting "Unknown extension".
1043 + Make -s actually work as a short-form for --stemmer (as documented by
1044 "omindex --help" and "man omindex").
1046 + Drop the copyright info from the output of --version as it's perennially
1047 out of date and we don't report it for any other Xapian programs.
1051 + Add new "valuenumeric" action to add a document value using
1052 Xapian::sortable_serialise() to allow numeric sorting (ticket#260).
1056 * configure: Enable more GCC warnings - "-Wstrict-null-sentinel" for 4.0+,
1057 "-Wlogical-op -Wmissing-declarations" for 4.3+.
1059 Omega 1.0.12 (2009-04-19):
1063 * $log now retries a partial write, or one interrupted by a system call.
1067 * configure: Fix iconv parameter type probe not to implicitly cast a string
1068 literal to char* - this a warning under GCC currently, but the user could
1069 pass -Werror explicitly in CXXFLAGS, and this could be promoted to an error
1070 in future GCC versions, and may already be so for some other compilers.
1072 * Overriding CXXFLAGS at make-time (e.g. "make CXXFLAGS=-Os") no longer
1073 overrides any flags required for building with Xapian.
1075 * We now actually use the compiler warning flags which configure detects.
1077 Omega 1.0.11 (2009-03-15):
1081 * cgiparams.html: Note the technique of using a stub database file to allow a
1082 default of searching over multiple databases.
1088 + Add support for indexing Microsoft Office 2007 formats and XPS files
1091 + Fix the extraction of metadata from OpenDocument formats.
1093 + Fix "-l" which would previously always cause a segmentation fault if used
1094 ("--depth-limit" wasn't affected).
1098 * configure: The output of g++ --version changed format (again) with GCC 4.3
1099 which meant configure got "g++" for the version. Instead use the (hopefully)
1100 more robust technique of using g++ -E to pull out __GNUC__ and
1103 * configure: Turn on _FORTIFY_SOURCE where available (as we do in xapian-core).
1107 * Fix to compile when RLIMIT_AS isn't available (as on NetBSD and OpenBSD).
1108 Instead use RLIMIT_VMEM or RLIMIT_DATA if either is available, else don't try
1109 to limit the memory the filter process can use.
1111 Omega 1.0.10 (2008-12-23):
1115 * This release now uses newer versions of the autotools (autoconf 2.62 ->
1116 2.63; automake 1.10.1 -> 1.10.2). The newer autoconf fixes a regression
1117 in autoconf 2.62 (and so Omega 1.0.7) with detecting the endian-ness of some
1120 Omega 1.0.9 (2008-10-31):
1124 * docs/overview.html: Document HTML parsing a bit, including robots
1125 meta and htdig_noindex.
1129 * omega: Catch std::exception and report what its what() method returns.
1131 * omega: Remove undocumented and non-functional support for numeric sorting
1132 via CGI parameter SORT=#<slot> (SORT=<slot> works as before).
1136 * configure: Sync warning flag handling changes from xapian-core to eliminate
1137 many warnings from GCC 4.3.
1139 Omega 1.0.8 (2008-09-04):
1143 * Fix a few typos and improve wording in a few places.
1149 + If the character encoding is specified using <meta http-equiv=...> in an
1150 HTML document then reparse the document if it isn't the encoding we're
1151 already using so that any preceding <title> is converted correctly
1154 + Convert text from meta tag parameters to UTF-8 (bug#293).
1156 + Handle <meta charset="..."> (new in HTML 5).
1158 + Fix bug in HTML tag parameter parsing which was probably just a small
1159 performance penalty in real world cases, but could perhaps result in
1160 parsing bogus extra parameters in carefully contrived situations.
1164 * Add missing <signal.h>, noted on FreeBSD by Henrik Brix Andersen.
1166 Omega 1.0.7 (2008-07-14):
1170 * omegascript.html,scriptindex.html: Fix empty titles.
1176 + When indexing text files, handle UCS-2 and UTF-16 text files with a
1177 byte-order mark (BOM), and ignore any UTF-8 "byte-order" mark.
1179 + The built-in conversion code (used when iconv isn't available) now handles
1180 UCS-2/UTF-16 with and without a BOM, and also the explicit BE and LE forms.
1184 * Overhaul the $highlight colour combinations since some were rather
1185 unreadable (Debian bug 484456).
1189 * configure: Synchronise code for working out warning flags used for builds
1190 with that used for xapian-core, which in particular handles different
1191 output formats from "gcc --version".
1195 * configure: Fix header checks to pre-include <sys/types.h> which Mac OS X
1196 needs for some other headers to work.
1198 * configure: Fix probing for iconv to work better when iconv isn't found
1199 (previously this only worked on Mac OS X with fink).
1201 * Fix compilation error on FreeBSD, introduced in 1.0.5.
1203 * In omega, cast size to unsigned before division to avoid a warning about
1208 * xapian-omega.spec: Remove "www." from xapian.org and oligarchy.co.uk URLs.
1210 Omega 1.0.6 (2008-03-17):
1214 * docs/omegascript.html: Improve formatting.
1220 + Add support for DjVu files.
1222 + If we get an error trying to read a directory entry, report it to the user
1223 rather than ignoring it.
1227 * New OmegaScript commands $addfilter, $lower, $upper.
1231 * Check "defined HAVE_SYSMP" rather than just "HAVE_SYSMP". This doesn't
1232 change behaviour, but fixes a compile warning on platforms other than Linux
1235 Omega 1.0.5 (2007-12-21):
1239 * Convert .txt docs to reStructedText which we process to produce HTML.
1241 * Add a note inviting suggestions for additional reliable filter programs.
1243 * overview.html: omindex hasn't generated "W"-prefix terms since 0.9.7, so
1244 remove the documentation saying it does.
1250 + If a file's extension isn't found in the mime_map and contains uppercase
1251 ASCII characters, check for the lower cased extension (so .PDF and .Pdf
1252 behave the same way as .pdf, unless you deliberately add different mappings
1255 + '-f' is documented by --help as a short option for '--follow', but wasn't
1256 previously actually recognised.
1258 + Limit filter programs to 7/8 of free physical memory on platforms where we
1259 know how to determine this statistic (currently at least Linux, FreeBSD,
1260 IRIX, HP-UX; probably Solaris and a few others too). This helps to prevent
1261 runaway filters from causing a denial of service (bug#111).
1263 + Avoid rereading uncompressed AbiWord documents in order to calculate their
1268 + Now inserts a ':' between prefix and term, using the same criteria which
1269 Xapian::QueryParser does.
1271 + The 'BOOLEAN' action now ignores an empty input rather than adding just the
1274 + The 'UNIQUE' action now issues a warning for empty input but otherwise
1279 * Add explicit includes of C headers needed to build with the latest snapshots
1282 Omega 1.0.4 (2007-10-30):
1286 * If an OmegaScript template specifies the same field name as both a boolean
1287 and a probabilistic term prefix then previous the boolean setting would
1288 be ignored (e.g. $setmap{prefix,foo,A}$setmap{boolprefix,foo,H}). Now this
1289 generates an error. If you set prefixes in your templates, you may wish to
1290 check them over before upgrading.
1292 Omega 1.0.3 (2007-09-28):
1296 * Distribution tarballs are now in the POSIX "ustar" format since it saves
1297 a few KB and we need to use it for xapian-core anyway.
1301 * Expand the output of 'mbox2omega --help' and refer the reader to it from
1302 docs/scriptindex.txt.
1308 + Add support for indexing AbiWord documents and TeX DVI files.
1310 + Impose a 5 minute CPU time limit on filter programs to prevent problems if
1311 a filter program goes into an infinite loop on a malformed input. Partly
1316 + Fix line number tracking in dump files.
1320 * Add $muldiv{A,B,C} which calculates int(A*B/C).
1322 * Fix bug in decimal fraction in $size for files >= 1M in size.
1328 + Set HTML charset to utf-8 since that's what databases now are by default.
1330 + Restyle to use CSS to draw a "score bar" instead of using images.
1332 + Rework the layout of each hit.
1334 + Add popup hints on mouse-over for various items.
1336 + Tidy up some HTML gremlins.
1338 Omega 1.0.2 (2007-07-05):
1342 * scriptindex.txt: Fix typo.
1348 + If --url isn't passed, default to "/", but print a warning noting that this
1349 default has been used (at least for now).
1351 + Report files that aren't indexed because their extensions aren't
1356 * Value of XAPIAN_CONFIG supplied to configure is now passed to distcheck,
1357 to ensure that it works with uninstalled copies of Xapian.
1361 * Fix test programs to build with a development snapshot of GCC 4.3.
1363 Omega 1.0.1 (2007-06-11):
1367 * overview.txt: As of 1.0.0, we no longer use pstotext for PostScript, but
1368 instead use ps2pdf followed by pdftotext (since this works for Unicode).
1370 * scriptindex.txt: Document that you can delete a document by supplying a new
1371 document which only contains the unique term.
1375 * Fix bug in HTML parser - if the text between two tags consisted entirely of
1376 whitespace it would just be ignored which could run words together if
1377 the tags didn't produce implicit whitespace. This bug dates back to at least
1380 * omindex: Under Linux (and probably some other platforms) struct dirent can
1381 tell us the type of a directory entry for some filing systems, so make use of
1382 this to avoid calling stat() (or lstat()) unnecessarily - when indexing
1383 /usr/share/doc on my Linux box, this saves about 14000 explicit calls to
1384 stat() (leaving about 7000).
1388 * Fix handling of query parsing errors (broken by changes in 1.0.0).
1392 * The required automake version has been lowered to 1.8.3, so RPMs can now be
1393 built on RHEL 4 and SLES 9.
1395 Omega 1.0.0 (2007-05-17):
1399 * Omega and the indexers now work in UTF-8. If iconv() is available, omindex
1400 will use it to convert documents from other formats, otherwise it has
1401 built-in support for UTF-8 and ISO-8859-1; omindex knows how to run the
1402 various external filter programs to generate UTF-8 output; scriptindex
1403 assumes input is already in UTF-8.
1405 * Change the project name (used to name tarballs, and default installation
1406 paths) to "xapian-omega" since that's what the RPMs and Debian packages
1407 already use (there's a Rogue-like game called Omega).
1411 * docs/overview.txt: Document what each of the OmegaScript templates does.
1413 * docs/quickstart.txt: Assorted minor improvements.
1415 * docs/termprefixes.txt: Document new 'Z' prefix, and that the 'R' and 'W'
1416 prefixes are no longer used by Xapian.
1418 * docs/cgiparams.txt: FMT isn't limited to just `a-z' - the actual restriction
1419 is that it may not contain `..'.
1421 * docs/scriptindex.txt: Explicitly note that index=nopos is deprecated
1422 (scriptindex already emits a warning).
1424 * NEWS: Add note that Omega < 0.8.0 NEWS entries are in the xapian-core NEWS
1431 * Updated to use the new Xapian::TermGenerator class. This means that the
1432 indexing strategy has changed.
1434 * "--help" now reports the default stemming language (i.e. "english").
1436 * Implement new sample generating function which normalises all runs of
1437 whitespace to a single space, and fixes invalid UTF-8 in the sample.
1441 + We now index PostScript by converting to PDF with ps2pdf and then indexing
1442 that. This allows us to index PostScript files containing Unicode
1443 characters outside of ISO-8859-1, and also means we now get metadata from
1444 PostScript files. The downside is it is quite a bit slower.
1446 + Add support for indexing MS Works documents using wps2text (part of
1449 + Don't index empty files.
1453 + Fix optimisation of "load truncate=N" to actually work!
1455 + The "truncate" action knows not to chop off a multibyte UTF-8 character.
1457 + Update short option list for scriptindex to match documented usage (-h, -V
1458 and -s were not working).
1460 + Remove -q and -u options - they no longer do anything and are only accepted
1461 for compatibility with really old versions (0.6.1 and earlier for -q; 0.7.5
1462 and earlier for -u).
1466 * Add an alternative implementation of date range filtering which uses a
1467 MatchDecider. This allows everything that the existing implementation does,
1468 plus you can support sorting on a choice of dates (e.g. first published or
1469 last updated), and filtering works to a resolution of a minute rather than a
1470 day. Set CGI parameter DATEVALUE to enable this, and to specify the value to
1471 use. Since omindex now adds the last modified date as value 0, this will
1474 * Enhance $substr{} to accept a negative length (meaning to count back from the
1477 * New CGI parameters to allow finer control of sorting and ranking - SORTAFTER
1480 * The sorting options are now encoded in $filters so Omega can automatically
1481 reset to page 1 if they are changed.
1483 * Add new OmegaScript $weight command which returns the raw document weight -
1484 mostly useful for debugging purposes.
1486 * $topterms{} now generates unstemmed terms.
1488 * $prettyterm{TERM} has been updated to fit with changes to the term generation
1491 * Add 'you' and 'your' as stopwords.
1493 * $filesize{SIZE} enhanced to return a decimal point for K, M, and G (e.g.
1494 "2.1K" and "4.0M" rather than "2K" and "4M"); $filesize{0} is now "0 bytes";
1495 $filesize{1} is now "1 byte"; $filesize{SIZE} where SIZE is negative is now
1498 * Remove $freqs as it has been deprecated for ages.
1500 * Remove support for xB, xDATE1, xDATE2, xDAYSMINUS, and xDEFAULTOP which were
1501 deprecated in favour of xFILTER in 0.7.5 (over 3 years ago).
1503 * Remove deprecated aliases for CGI parameters (deprecated in 0.6.3 or 0.6.5,
1504 more than 3.5 years ago): RAW_SEARCH (now RAWSEARCH), DATE1 (now START),
1505 DATE2 (now END), DAYSMINUS (now SPAN but with slightly different semantics),
1506 and MIN_HITS (now MINHITS).
1508 * Remove "bias_weight" and "bias_halflife" CGI parameters since they rely on
1509 Enquire::set_bias() which has been removed.
1513 * The 'query' template no longer uses $topterms by default.
1515 * New 'topterms' template provides a query template with $topterms support.
1517 * Template fragments which aren't intended for direct use have been moved to
1518 an "inc" subdirectory.
1522 * md5test: Add tests for MD5 code.
1526 * `./configure --enable-quiet' already allows you to specify at configure time
1527 to pass `--quiet' to libtool. Now you can override this at make-time by
1528 using `make QUIET=' (to turn off `--quiet') or `make QUIET=y' (to turn on
1531 * configure: Disable probes for f77, gcj, and rc completely by preventing
1532 the probe code from even appearing in configure - this reduces the size of
1533 configure by 29% and should speed it up significantly.
1537 * Fixed to build with GCC 4.3 snapshot.
1539 * We now make use of the safe*.h portability headers from xapian-core.
1541 * Ensure that the result of snprintf is zero terminated since MSVC's snprintf
1542 is broken (by design it seems).
1544 * configure: xapian-config --cxxflags now includes -ptused for SGI's C++
1545 compiler, so we don't need to probe for it here.
1547 * configure: Perform a link test for posix_fadvise to fix misdetection on
1550 Omega 0.9.10 (2007-03-04):
1554 * docs/omegascript.txt: Rewrite introductory paragraph. Note that
1555 whitespace is significant, and add explicit warning to $setmap.
1557 * docs/termprefixes.txt: Expand section on boolean prefixes, showing
1558 how to generate them using scriptindex, and how to allow them to be
1559 selected in an HTML form.
1563 * omindex: Generate correct MD5 checksums on big-endian platforms.
1567 * Fix $substr{} with negative start to actually work.
1569 * Fix $substr{} to never cause a C++ exception.
1573 * omega.spec.in: Remove "." from the end of the Summary.
1575 Omega 0.9.9 (2006-11-09):
1579 * Ship our custom INSTALL file rather than the generic one from autoconf which
1580 we've accidentally been shipping instead since 0.9.5.
1584 * scriptindex: The "date" action no longer modifies the value it operates on
1585 (it was never meant to!)
1589 * Report an error if $setmap is called with an even number of parameters.
1590 An incorrect example in the documentation used to suggest this, so it's
1591 particularly useful to catch this case.
1595 * RPMs: Prevent binaries getting an rpath for /usr/lib64 on FC6.
1597 Omega 0.9.8 (2006-11-02):
1601 * $substr where the start is negative and longer than the string (e.g.
1602 $substr{abcd,-5,1}) wasn't working as intended.
1606 * configure: Tell AC_CHECK_HEADERS to suppress its backward compatibility mode,
1607 so it only checks headers with the compiler. This speeds up configure a
1608 little, and is what we do elsewhere.
1610 * configure: Warning flags for GCC weren't actually getting used. Fix this to
1611 work and use the same warning flags for GCC and Intel C++ as xapian-core does.
1612 Fix all the warnings this uncovered!
1614 * omega,omindex,scriptindex: Remove some old unused code.
1618 * Ensure that we always pass an unsigned char value to isupper(), toupper(),
1619 etc as they are undefined on other values (glibc makes them work for signed
1620 char values too, but this is an extension).
1622 * configure: Pass magic options to SGI's C++ compiler to allow linking of
1625 * configure: IRIX doesn't allow stdint.h to be included from C++ so we need
1626 a smarter configure test than AC_CHECK_HEADERS.
1628 * Fix warnings from SGI's C++ compiler.
1630 Omega 0.9.7 (2006-10-10):
1634 * omegascript.txt: Note that (by design) an omegascript template can't
1635 contain an infinite loop.
1637 * termprefixes.txt: "$setmap{title,S}" should be "$setmap{prefix,title,S}".
1639 * Use the default paths to the database directories and the omega CGI binary in
1642 * README: Update reference to "CVS" to say "SVN".
1646 * Don't get confused by "a<b" in Javascript in a <script> tag. Fixes bug#91.
1648 * Support htdig's "ignore this bit" comments.
1650 * Don't generate terms with more than 3 trailing symbols ('-', '+', or '#').
1654 + Add the file last modified time as value #0.
1656 + Generate an MD5 checksum of each file indexed and store it in value #1
1657 to allow duplicates to be collapsed.
1659 + Store the file's last modified time in the document data as "modtime" so it
1660 shows up in search results (and tweak the query template so the display of
1661 this information looks nicer). Don't add "modtime" field if the timestamp
1664 + Run pdfinfo once and pull out the fields we want using string operations,
1665 instead of running it twice filtered through sed.
1667 + Parse the XML from OpenDocument and OpenOffice using new subclasses of
1668 HtmlParser. Only extract meta.xml once.
1670 + Add "size" field to document data.
1672 + Run xls2csv on MS Excel files, run catppt on MS Powerpoint files, and also
1673 index MS Word templates (.dot) the same way as .doc files.
1675 + Don't generate 'W' terms since omega doesn't use them.
1677 + If a filter program isn't installed, then don't try it again for the same
1678 extension (not perfect but an improvement - previously we indexed an empty
1681 + If popen() fails, treat it as a read error.
1685 + Add new "load" action to allow the contents of an external file to be
1688 + Fix check for whether a record has content in the case where the same field
1689 is processed more than once.
1693 * Add $pack and $unpack OmegaScript commands to allow big endian binary values
1694 to be encoded and decoded (for use with omindex's lastmod in value #1).
1696 * omega.conf: Fix code which reads omega.conf to be line based as documented
1697 rather than the wacky whitespace based scheme that was actually implemented.
1698 Also we now allow "#" comments and blank lines in omega.conf.
1700 * Fix $highlight{} to work with capitalised words (it used to work but
1701 regressed in 0.8.2).
1703 * Use '\t' to separate terms in xP since filter terms might contain '.'. Fixes
1708 * Add htmlparsetest which tests the MyHtmlParser class.
1712 * Makefile.am: Make use of the dist_ prefix to avoid having to list files in
1713 EXTRA_DIST as well as in *_SCRIPTS, *_DATA, and man_MANS.
1715 * Makefile.am: Prefer $(sysconfdir) to @sysconfdir@ since the former can be
1716 overridden on the "make" command line.
1720 * xapian-config will now switch Sun's C++ compiler into ANSI C++ compliant
1721 mode, so remove all the special case bits of code added for just this one
1724 * omindex: Fix escaping of filenames to cast characters to "unsigned char" so
1725 that isalnum() works correctly everywhere. Not a security hole as dangerous
1726 characters were still being escaped.
1728 * Call pclose() not fclose() on a FILE* obtained from popen(). This bug could
1729 cause us to run out of file descriptors on some platforms.
1731 * configure: Check for strftime.
1735 * omega.spec.in: Include documentation in the RPM package.
1737 Omega 0.9.6 (2006-05-15):
1741 * docs/omegascript.txt: Clarified description of $now.
1745 * scriptindex: Fix "index" and "indexnopos" without a prefix to set the weight
1746 correctly (bug introduced in 0.9.5).
1750 * Added new OmegaScript commands $filterterms and $substr.
1754 * configure: Update snprintf detection to match xapian-core.
1756 * Fix MSVC warnings.
1760 * omega.spec.in: Create and package /var/lib/omega/cdb and /var/log/omega.
1762 Omega 0.9.5 (2006-04-08):
1766 * README: Add pointer to documentation.
1768 * Added man pages for omindex and scriptindex, generated using help2man.
1774 + If we fail to open the index script, die with an error (previously we
1775 acted as if an empty file was specified).
1777 + Warn about a useless "weight" action, even if it's followed by another
1778 non-useless action (e.g. "field") - previously we only warned if it
1779 was last or followed only by other useless actions.
1781 + Warn if "unique=<prefix>" is used without a corresponding
1782 "boolean=<prefix>" on the same line.
1784 + Warn that "index=nopos" is deprecated and should be replaced by
1787 + Add explanatory text "(note that actions are executed from left to right)"
1788 when reporting useless actions.
1790 + Added new "hash" command to allow hashed terms to be generated from long
1791 URLs like omindex does.
1793 * htdig2omega.script,mbox2omega.script: Make use of the new scriptindex "hash"
1796 * dbi2omega: Check DBIDRIVER environmental variable to allow a driver other
1797 than mysql to be specified without modifying the script.
1801 * Fix $opt[fieldnames] handling. Previously it would try to kick in if you
1802 didn't set fieldnames but set any alphabetically later option! The symptom
1803 was that $field{} would stop working (bug#72).
1807 * omindex,omega: Tweaks for MSVC compilation.
1809 Omega 0.9.4 (2006-02-21):
1813 * COPYING: Updated FSF address.
1815 Omega 0.9.3 (2006-02-16):
1819 * overview.txt: The U prefix (URL term) was grouped with the date searching
1820 prefixes, but it makes more sense to group it with the prefixes relating to
1821 parts of the URL (H for hostname, P for path, etc).
1823 * overview.txt: Add pointer to documentation of the supported query syntax.
1825 * omegascript.txt: Improve descriptions of $cgi, $collapsed, $value, $version.
1827 * termprefixes.txt: Fix typo.
1831 * omindex: add --preserve-nonduplicates / -p option to not delete any documents
1832 that aren't updated, in replace duplicates mode (so that multiple runs of
1833 omindex on different subsites don't stomp on each other).
1835 * omindex,scriptindex: Add "--stemmer" option to omindex and scriptindex
1836 to allow the stemming language to be set. Fixes bug#11.
1838 * omindex,scriptindex: More consistent --help and --version output.
1840 * omindex: Add support for OpenDocument format mimetypes and extensions out of
1841 the box. Previously you could index them but had to pass a "-m" option for
1842 each OpenDocument filename extension you wanted to handle.
1844 * scriptindex: The "-q" option no longer actually controls anything. Just
1845 ignore it for backwards compatibility (and don't document it in --help).
1849 * If executing an OmegaScript command causes a Xapian exception to be thrown,
1850 catch it and copy the error message into error_msg (which is read by the
1851 $error command). This allows such errors to reported in a nicer way.
1853 * Added "SORTREVERSE" CGI parameter which allows the sort order to be reversed
1854 when sorting on a value. Removed "SORTBANDS" CGI parameter since it no
1855 longer does anything.
1857 * Added $find{LIST,STRING} to return the subscript of the first occurrence of
1858 string STRING in list LIST.
1860 * Added $lookup{CDBFILE,KEY} OmegaScript command to perform a lookup in a CDB
1863 * Added new feature which allows you to avoid storing fieldnames in every
1864 document. Instead you just store the field values, one per line, and add
1865 something like "$set{fieldnames,$split{caption sample url}}" to the
1866 OmegaScript template to specify the fieldnames to use. This can save a lot
1867 of disk space for a large database.
1869 * Add new "$split{}" OmegaScript command which splits a string to give an
1872 * Fix $url{} to escape "+" to "%2b". Also fix encoding of top-bit-set
1873 characters on platforms where char is signed by default.
1875 * Speed up $highlight{} - only compare terms which are the same length.
1877 * Reduce memory usage if a lot of documents are marked as relevant.
1881 * query: Make the page title shorter so there's more chance it will fit on icon
1884 * opensearch: Add missing escaping.
1886 * godmode: If a non-existent docid is specified, report the error and prompt
1887 the user to enter another docid. Fixes bug#60.
1891 * omega: Fix printf type mismatch on 64 bit platforms.
1893 * omega: Cast time_t to unsigned long to avoid problems on 64bit platforms.
1895 * Use snprintf where available.
1897 * Write top-bit set characters using \xXX notation to avoid warnings from
1898 Intel's C++ compiler.
1900 Omega 0.9.2 (2005-07-15):
1902 * omega: Changed $highlight so if OPEN and CLOSE aren't specified, they default
1903 to highlighting each word from the query with a different background colour
1904 like gmane does (previous default was to use '<strong>' and '</strong>').
1906 * omega: Call QueryParser::set_database() as this is now used to decide what to
1907 do for terms like "C#".
1909 * omega: Added the ability to set boolean prefixes for the QueryParser by
1910 setting a "boolprefix" map in the omegascript template.
1912 * omega: Added $length{} and $stoplist{} commands to OmegaScript.
1914 * scriptindex: Fix infinite loop if there's no newline at the end of a dumpfile.
1916 * docs/termprefixes.txt: Explain how to use termprefixes with scriptindex and
1917 omega, since that's what most people will want to know.
1919 * docs/omegascript.txt: Use standard "S" prefix for title in example for
1920 $setmap, rather than "XT".
1922 Omega 0.9.1 (2005-06-06):
1924 * Releases are now created using libtool 1.5.18 and automake 1.9.5.
1926 * Updated RPM packaging.
1928 Omega 0.9.0 (2005-05-13):
1930 * Updated for 0.9.0 API changes.
1932 * omindex/scriptindex: Generate terms like "c#".
1934 * Added mbox2omega script which allows a mail folder to be indexed using
1935 scriptindex. Mostly it's an example as there's no mechanism included to show
1936 the full original message.
1940 * The configuration file is now looked for differently - you can now set
1941 the environmental variable OMEGA_CONFIG_FILE. See docs/overview.txt for
1944 * $highlight can now highlight terms like "C#".
1946 * Add new template 'opensearch' to implement basic opensearch feeds of search
1951 * URL hashing previously depended on sizeof(long) so databases weren't totally
1952 portable between platforms. This is now fixed, but to do so we've had to
1953 break compatibility with databases built on platforms with 64 bit longs
1954 with URLs > 228 bytes.
1956 * Removed useless "DUPE_duplicate" option.
1958 * Added support for indexing Perl "pod" documentation using pod2text.
1960 * Replaced -l/--no-recurse with -l/--depth-limit which takes an argument
1961 allowing recursion to be restriction to any depth, not just 0 or infinity!
1963 * Extend -M/--mime-type to allow an existing mapping to be removed by omitting
1966 * Fixed code so that we get lstat() prototype on Linux systems where we have
1971 * Improved handling of extra blank lines in dump file.
1973 * Strip multiple \r characters from end of line.
1975 * Complain if a dump file doesn't appear to have been = escaped correctly.
1977 * Flush database after each input file to ensure all changes from a file
1982 * docs/omegascript.txt: Clarify $field description slightly.
1984 * docs/cgiparams.txt,docs/omegascript.txt: Fixed 3 references to OmXxxx classes.
1986 * docs/termprefixes.txt: Added a single document covering all aspects of term
1989 * docs/omegascript.txt: Moved $collapsed into correct place alphabetically!
1991 * docs/cgiparams.txt,docs/overview.txt: Improved description of how B filters
1992 are handled when building the query.
1994 * docs/scriptindex.txt: Note that actions are applied in the specified order.
1996 Omega 0.8.5 (2004-12-23):
1998 * README,INSTALL: Proper installation instructions.
2000 * omega: If an exception is thrown, make sure that the HTTP headers
2001 get written so that we don't cause "500 Internal Server Error".
2002 This problem was introduced by the change to allow a user specified
2003 Content-Type in 0.8.0. Partly addresses bug#60.
2005 * scriptindex: Fixed "Unknown Exception" when trying to "unhtml" text which
2006 contains "</body>" (bug#61). This bug was introduced in 0.8.4.
2008 * omindex/scriptindex: <h1> - <h6> and </h1> - </h6> now leave a space in the
2009 dumped HTML. This bug was introduced in 0.8.4 - before that any tag left
2010 a space in the dumped HTML.
2012 * omindex: Only try to delete removed documents in "replace duplicates" mode
2013 (which is the default).
2015 * omindex: Change behaviour of crawler such that it doesn't follow symbolic
2016 links any more. The new "--follow" command line option turns following of
2019 * dbi2omega: Add a comment to the start of the file detailing what
2022 Omega 0.8.4 (2004-12-08):
2024 * omindex,scriptindex: Improved HTML to text conversion - now we strip
2025 leading and trailing whitespace and convert all other consecutive groups of
2026 whitespace to a single space. Also the parser now knows that some tags
2027 should be regarded as word breaks and some shouldn't (previously all tags
2028 were treated as word breaks).
2030 * omindex: Removed bogus extra line from code which was meant to
2031 truncate samples, titles, etc at a word boundary, but has never actually
2034 * omindex: Added hooks for indexing the following formats: OpenOffice (requires
2035 unzip), MS Word (requires antiword), Wordperfect (requires wpd2text), RTF
2038 * omindex: If a filename to be passed to a filter program has a leading "-",
2039 protect it from possible interpretation as an option by prepending "./".
2041 * omega: When there's only a boolean query we promote it to be the query.
2042 Tweaked so we use boolean weights in this case.
2044 * omega: Use Query::empty() instead of the now deprecated Query::is_empty().
2046 * omega,omindex,scriptindex: Use the new Database/WritableDatabase
2049 * templates/godmode: Finished off godmode template.
2051 * Compile everything as C++.
2053 * Check snprintf actually works - some older versions don't implement C90
2056 * XAPIAN_FLAGS already links with xapianqueryparser so remove
2057 -lxapianqueryparser from omega_LDADD as it was causing link errors on cygwin.
2059 Omega 0.8.3 (2004-09-20):
2061 * scriptindex: --version now actually reports the version. --help now exits
2062 with status 0 rather than status 1.
2064 * RPM packaging: Updated. The most notable change is that the RPM is now
2065 called xapian-omega because there's already an omega RPM (in Fedora Core at
2066 least) which is a game. Also htdig2omega and htdig2omega.script are now
2067 included in the RPM.
2069 * Install htdig2omega.script in ${prefix}/share/omega/ rather than
2072 Omega 0.8.2 (2004-09-13):
2074 * omega: $highlight now handles accented characters (bug#9).
2076 * omega: Use new checkatleast parameter to Enquire::get_mset to implement
2079 * omindex: When running with "replace duplicates" mode (the default), detect
2080 documents removed since the last indexing run and delete them from the
2083 * omindex: Use the new WritableDatabase::replace_document(term, doc) method.
2085 * scriptindex: Report index script file name and line number when
2086 reporting errors in it. Added warning for redundant actions,
2087 such as "truncate" as the last action in a rule.
2089 * templates/query: Always report if the database is not found - previously we
2090 only did so if there was a query.
2092 * templates/query: Fixed missing </center> tag which happened in certain cases.
2094 * docs/omegascript.txt: Added note about that $add{$hit,1} gives
2097 * Now includes htdig2omega and htdig2omega.script which allow you to crawl
2098 remote websites with ht://dig, then build a searchable index of them with
2101 * Link with -lxapianqueryparser, not -lomqueryparser.
2103 Omega 0.8.1 (2004-06-30):
2105 * omindex: Renamed hash() to hash_string() to avoid colliding with something
2108 * omega: Changed MORELIKE to pick up to 40 terms, rather than up to 6 (feedback
2109 on the mailing list suggests this gives much better results).
2111 * scriptindex: Added explicit catch for std::bad_alloc.
2113 Omega 0.8.0 (2004-04-19):
2115 * scriptindex: Change default to *not* overwriting the database (use
2116 --overwrite if you really want to do this); -u is now accepted but ignored.
2118 * scriptindex: Use getopt for option parsing.
2120 * omindex: Added --overwrite option which forces an existing database to be
2121 deleted before indexing begins.
2123 * templates/xml: Correct spelling of `relavence' to `relevance'. NB: if you're
2124 parsing the XML output, you'll need to fix this spelling in your parser!
2126 * templates/xml: Now set HTTP header: "Content-Type: application/html".
2128 * templates/xml: Remove unused OmegaScript code:
2129 `$set{topterms,$or{$ne{$msize,0},$query}}'.
2131 * indextext.cc,omindex.cc,scriptindex.cc: Updated to use add_term() instead of
2134 * omega: Added $httpheader Omegascript to allow arbitrary HTTP headers and
2135 alternative Content-Type headers to be specified.
2137 * omega: If the probabilistic query was bad, don't try to run the match.
2139 * omega: Don't crash if there's a date filter but no probabilistic query.
2141 * omindex/scriptindex: Raw terms with a multicharacter prefix are now indexed
2142 with a : inserted (e.g. as XFOO:Rterm). This matches what the query parser
2145 * omindex/scriptindex: Don't create R terms for terms which start with a digit.
2147 * omindex: Use O_STREAMING and/or posix_fadvise() when reading files to be
2148 indexed (if available). This helps to keep the Xapian database in cache,
2149 and should greatly improve indexing throughput.
2151 * docs/scriptindex.txt: Make more explicit that boolean produces a *single*
2154 * docs/cgiparams.txt: Note that START and END should be in the format YYYYMMDD.
2156 For NEWS entries for Omega versions prior to 0.8.0, see the xapian-core NEWS