Public Git Hosting - xapian.git/commit

commit	caca4e6999620c207584f061647a3ec8a5c96aaf
author	Olly Betts <olly@survex.com>
	Tue, 13 Mar 2018 02:46:25 +0000 (13 15:46 +1300)
committer	Olly Betts <olly@survex.com>
	Tue, 13 Mar 2018 03:57:11 +0000 (13 16:57 +1300)
tree	45909647b7203ff354a6ec9b627d0bd8eac2d306	tree \| snapshot (tar.gz zip)
parent	eaf81734cbcab99b54320d683d90bdad237c40b7	commit \| diff

[honey] Better per-term wdf upper bound

Previously we used min(cf(term), wdf_upper_bound(db)) for the per
term upper bound - that's tight for any terms which attain that
upper bound, and for terms with termfreq == 1, which are common
in the database (e.g. 66% for a database of wikipedia), but probably
much less common in searches.

We now use max(first_wdf(term), cf(term) - first_wdf(term)) when
termfreq > 1, which means terms with termfreq == 2 will also attain
their bound (another 11% for the same database) while terms with higher
termfreq but below the global bound will get a tighter bound.

xapian-core/backends/honey/honey_database.cc		diff \| blob \| blame \| history
xapian-core/backends/honey/honey_postlisttable.cc		diff \| blob \| blame \| history
xapian-core/backends/honey/honey_postlisttable.h		diff \| blob \| blame \| history