From c0714a99f23bd61bcde14a34091b1e5c55a4ffd0 Mon Sep 17 00:00:00 2001 From: Jared Camins-Esakov Date: Thu, 17 May 2012 20:05:29 -0400 Subject: [PATCH] Bug 6557: Record bib popularity in totalissues Because updating the total issues count associated with a bibliographic record on issue could cause a significant load on the server, this commit adds the syspref UpdateTotalIssuesOnCirc (which defaults to OFF to match existing behavior). The syspref has the following description: Do/Do not update a bibliographic record's total issues count whenever an item is issued (WARNING! This increases server load significantly; if performance is a concern, use the update_totalissues.pl cron job to update the total issues count). Bug 6557: automatically increment totalissues Adds the ability to automatically increment biblioitems.totalissues whenever an item is issued. To test: 1) Choose a record with at least one item that can circulate 2) Check the value of 942$0 (you may need to look at the plain MARC view on the OPAC). Most likely there won't be any 942$0 at all 3) Enable UpdateTotalIssuesOnCirc 4) Check out the item you selected 5) Check the value of 942$0 (you may need to look at the plain MARC view on the OPAC). That value should now be one greater than before 6) Discharge the item 7) Disable UpdateTotalIssuesOnCirc 8) Check out the item you selected again 9) Check the value of 942$0 (you may need to look at the plain MARC view on the OPAC). That value should not have changed Bug 6557: add script to update totalissues from stats NAME update_totalissues.pl SYNOPSIS update_totalissues.pl --use-stats update_totalissues.pl --use-items update_totalissues.pl --commit=1000 update_totalissues.pl --since='2012-01-01' update_totalissues.pl --interval=30d DESCRIPTION This batch job populates bibliographic records' total issues count based on historical issue statistics. --help Prints this help -v|--verbose Provide verbose log information (list every bib modified). --use-stats Use the data in the statistics table for populating total issues. --use-items Use items.issues data for populating total issues. Note that issues data from the items table does not respect the --since or --interval options, by definition. Also note that if both --use-stats and --use-items are specified, the count of biblios processed will be misleading. -s|--since=DATE Only process issues recorded in the statistics table since DATE. -i|--interval=S Only process issues recorded in the statistics table in the last N units of time. The interval should consist of a number with a one-letter unit suffix. The valid suffixes are h (hours), d (days), w (weeks), m (months), and y (years). The default unit is days. --incremental Add the number of issues found in the statistics table to the existing total issues count. Intended so that this script can be used as a cron job to update popularity information during low-usage periods. If neither --since or --interval are specified, incremental mode will default to processing the last twenty-four hours. --commit=N Commit the results to the database after every N records are processed. --test Only test the popularity population script. WARNING If the time on your database server does not match the time on your Koha server you will need to take that into account, and probably use the --since argument instead of the --interval argument for incremental updating. === TESTING PLAN === NOTE: in order to test this script, you will need to have some sort of circulation data already existing in your Koha installation. 1) Disable UpdateTotalIssuesOnCirc 2) Run: misc/cronjobs/update_totalissues.pl --use-items -t -v 3) If you have total checkout data in your item records (i.e. anything in 952$l), you should see messages like "Processing bib 43 (1 issues)" 4) Choose one of the lines that shows more than 0 issues, and view the record with that biblionumber in the staff client, choosing the "Items" tab (moredetail.pl). Add up the "Total checkouts" listed for each item, and confirm it matches what the script reported 5) Run: misc/cronjobs/update_totalissues.pl --use-stats -t -v 6) If you have any circulation statistics in your database (i.e. any 'issue' entries in your statistics table), you should see messages like "Processing bib 43 (1 issues)"; 7) Choose one of the lines and view the record with that biblionumber in the staff client, choosing the "Items" tab (moredetail.pl). If you count the number of checkouts listed in each item's checkout history, the total should match what the script reported. 8) Check out an item 9) Run: misc/cronjobs/update_totalissues.pl --use-stats --incremental --interval=1h -t -v 10) You should see one line reporting a single circ for the bib record associated with the item you just checked out (there may be more if you checked out any books in the hour prior to running these tests 11) If the results in steps 4, 7, and 10 match the predictions, the script worked This patch to Koha was sponsored by the Arcadia Public Library and the Arcadia Public Library Foundation in honor of Jackie Faust-Moreno, late director of the Arcadia Public Library. Signed-off-by: Liz Rea Tested this with my test data - numbers are correct and updated appropriately. More importantly - if I do a popularity search, the most popular items *come up first*. Amazing. --- C4/Biblio.pm | 48 ++++ C4/Circulation.pm | 3 + installer/data/mysql/sysprefs.sql | 2 + installer/data/mysql/updatedatabase.pl | 7 + .../en/modules/admin/preferences/circulation.pref | 6 + misc/cronjobs/crontab.example | 3 + misc/cronjobs/update_totalissues.pl | 297 +++++++++++++++++++++ 7 files changed, 366 insertions(+) create mode 100755 misc/cronjobs/update_totalissues.pl diff --git a/C4/Biblio.pm b/C4/Biblio.pm index 1ab4fff247..f8ba09924d 100644 --- a/C4/Biblio.pm +++ b/C4/Biblio.pm @@ -104,6 +104,7 @@ BEGIN { &ModBiblio &ModBiblioframework &ModZebra + &UpdateTotalIssues ); # To delete something @@ -3828,6 +3829,53 @@ sub prepare_host_field { return; } + +=head2 UpdateTotalIssues + + UpdateTotalIssues($biblionumber, $increase, [$value]) + +Update the total issue count for a particular bib record. + +=over 4 + +=item C<$biblionumber> is the biblionumber of the bib to update + +=item C<$increase> is the amount to increase (or decrease) the total issues count by + +=item C<$value> is the absolute value that total issues count should be set to. If provided, C<$increase> is ignored. + +=back + +=cut + +sub UpdateTotalIssues { + my ($biblionumber, $increase, $value) = @_; + my $totalissues; + + my $data = GetBiblioData($biblionumber); + + if (defined $value) { + $totalissues = $value; + } else { + $totalissues = $data->{'totalissues'} + $increase; + } + my ($totalissuestag, $totalissuessubfield) = GetMarcFromKohaField('biblioitems.totalissues', $data->{'frameworkcode'}); + + my $record = GetMarcBiblio($biblionumber); + + my $field = $record->field($totalissuestag); + if (defined $field) { + $field->update( $totalissuessubfield => $totalissues ); + } else { + $field = MARC::Field->new($totalissuestag, '0', '0', + $totalissuessubfield => $totalissues); + $record->insert_grouped_field($field); + } + + ModBiblio($record, $biblionumber, $data->{'frameworkcode'}); + return; +} + 1; diff --git a/C4/Circulation.pm b/C4/Circulation.pm index 96ae0d963f..ca2e94d9c1 100644 --- a/C4/Circulation.pm +++ b/C4/Circulation.pm @@ -1063,6 +1063,9 @@ sub AddIssue { CartToShelf( $item->{'itemnumber'} ); } $item->{'issues'}++; + if ( C4::Context->preference('UpdateTotalIssuesOnCirc') ) { + UpdateTotalIssues($item->{'biblionumber'}, 1); + } ## If item was lost, it has now been found, reverse any list item charges if neccessary. if ( $item->{'itemlost'} ) { diff --git a/installer/data/mysql/sysprefs.sql b/installer/data/mysql/sysprefs.sql index 947433631d..6a3735d584 100644 --- a/installer/data/mysql/sysprefs.sql +++ b/installer/data/mysql/sysprefs.sql @@ -373,3 +373,5 @@ INSERT INTO systempreferences (variable,value,explanation,options,type) VALUES(' INSERT INTO systempreferences (variable,value,explanation,options,type) VALUES('OPACShowBarcode','0','Show items barcode in holding tab','','YesNo'); INSERT INTO systempreferences (variable,value,options,explanation,type) VALUES ('OPACShowUnusedAuthorities','1','','Show authorities that are not being used in the OPAC.','YesNo'); INSERT INTO systempreferences (variable,value,explanation,type) VALUES('EnableBorrowerFiles','0','If enabled, allows librarians to upload and attach arbitrary files to a borrower record.','YesNo'); +INSERT INTO systempreferences (variable,value,explanation,options,type) VALUES('UpdateTotalIssuesOnCirc','0','Whether to update the totalissues field in the biblio on each circ.',NULL,'YesNo'); + diff --git a/installer/data/mysql/updatedatabase.pl b/installer/data/mysql/updatedatabase.pl index f3acaad9ec..4da22e71f5 100755 --- a/installer/data/mysql/updatedatabase.pl +++ b/installer/data/mysql/updatedatabase.pl @@ -5445,6 +5445,13 @@ CREATE TABLE IF NOT EXISTS borrower_files ( SetVersion($DBversion); } +$DBversion = "3.09.00.021"; +if ( C4::Context->preference("Version") < TransformToNum($DBversion) ) { + $dbh->do("INSERT INTO systempreferences (variable,value,explanation,options,type) VALUES('UpdateTotalIssuesOnCirc','0','Whether to update the totalissues field in the biblio on each circ.',NULL,'YesNo');"); + print "Upgrade to $DBversion done (Add syspref UpdateTotalIssuesOnCirc)\n"; + SetVersion($DBversion); +} + =head1 FUNCTIONS =head2 TableExists($table) diff --git a/koha-tmpl/intranet-tmpl/prog/en/modules/admin/preferences/circulation.pref b/koha-tmpl/intranet-tmpl/prog/en/modules/admin/preferences/circulation.pref index a54e3c5fd7..e07fc0ded1 100644 --- a/koha-tmpl/intranet-tmpl/prog/en/modules/admin/preferences/circulation.pref +++ b/koha-tmpl/intranet-tmpl/prog/en/modules/admin/preferences/circulation.pref @@ -103,6 +103,12 @@ Circulation: - pref: NoticeCSS class: url - on Notices. (This should be a complete URL, starting with http://) + - + - pref: UpdateTotalIssuesOnCirc + choices: + yes: Do + no: "Do not" + - update a bibliographic record's total issues count whenever an item is issued (WARNING! This increases server load significantly; if performance is a concern, use the update_totalissues.pl cron job to update the total issues count). Checkout Policy: - - pref: AllowNotForLoanOverride diff --git a/misc/cronjobs/crontab.example b/misc/cronjobs/crontab.example index 3c5030dedc..ed8f117ea9 100644 --- a/misc/cronjobs/crontab.example +++ b/misc/cronjobs/crontab.example @@ -67,6 +67,9 @@ KOHA_CRON_PATH = /usr/share/koha/bin/cronjobs # Cancel expired holds 0 1 * * * $KOHA_CRON_PATH/holds/cancel_expired_holds.pl >/dev/null 2>&1 +# Update popularity counts for biblio records +0 2 * * * $KOHA_CRON_PATH/update_totalissues.pl --commit=1000 --use-stats --incremental --interval=1d >/dev/null 2>&1 + # ZEBRA INDEX UPDATES with -z option, incremental index updates throughout the day # for both authorities and bibs */10 * * * * $KOHA_CRON_PATH/../migration_tools/rebuild_zebra.pl -b -a -z >/dev/null diff --git a/misc/cronjobs/update_totalissues.pl b/misc/cronjobs/update_totalissues.pl new file mode 100755 index 0000000000..71da3aa869 --- /dev/null +++ b/misc/cronjobs/update_totalissues.pl @@ -0,0 +1,297 @@ +#!/usr/bin/perl + +# Copyright 2012 C & P Bibliography Services +# +# This file is part of Koha. +# +# Koha is free software; you can redistribute it and/or modify it under the +# terms of the GNU General Public License as published by the Free Software +# Foundation; either version 2 of the License, or (at your option) any later +# version. +# +# Koha is distributed in the hope that it will be useful, but WITHOUT ANY +# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR +# A PARTICULAR PURPOSE. See the GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License along +# with Koha; if not, write to the Free Software Foundation, Inc., +# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +use strict; +use warnings; + +BEGIN { + + # find Koha's Perl modules + # test carefully before changing this + use FindBin; + eval { require "$FindBin::Bin/../kohalib.pl" }; +} + +use Getopt::Long; +use Pod::Usage; +use C4::Context; +use C4::Biblio; +use DateTime; +use DateTime::Format::MySQL; +use Time::HiRes qw/time/; +use POSIX qw/strftime ceil/; + +sub usage { + pod2usage( -verbose => 2 ); + exit; +} + +$| = 1; + +# command-line parameters +my $verbose = 0; +my $test_only = 0; +my $want_help = 0; +my $since; +my $interval; +my $usestats = 0; +my $useitems = 0; +my $incremental = 0; +my $commit = 100; +my $unit; + +my $result = GetOptions( + 'v|verbose' => \$verbose, + 't|test' => \$test_only, + 's|since=s' => \$since, + 'i|interval=s' => \$interval, + 'use-stats' => \$usestats, + 'use-items' => \$useitems, + 'incremental' => \$incremental, + 'c|commit=i' => \$commit, + 'h|help' => \$want_help +); + +binmode( STDOUT, ":utf8" ); + +if ( defined $since && defined $interval ) { + print "The --since and --interval options are mutually exclusive.\n\n"; + $want_help = 1; +} + +if ( $useitems && $incremental ) { + print + "The --use-items and --incremental options are mutually exclusive.\n\n"; + $want_help = 1; +} + +if ( $incremental && !( defined $since || defined $interval ) ) { + $interval = '24h'; +} + +unless ( $usestats || $useitems ) { + print "You must specify either --use-stats and/or --use-items.\n\n"; + $want_help = 1; +} + +if ( not $result or $want_help ) { + usage(); +} + +my $dbh = C4::Context->dbh; +$dbh->{AutoCommit} = 0; + +my $num_bibs_processed = 0; + +my $starttime = time(); + +process_items() if $useitems; +process_stats() if $usestats; + +report(); + +exit 0; + +sub process_items { + my $query = +"SELECT items.biblionumber, SUM(items.issues) FROM items GROUP BY items.biblionumber;"; + process_query($query); +} + +sub process_stats { + if ($interval) { + my $dt = DateTime->now; + + my %units = ( + h => 'hours', + d => 'days', + w => 'weeks', + m => 'months', + y => 'years' + ); + + $interval =~ m/([0-9]*)([hdwmy]?)$/; + $unit = $2 || 'd'; + $since = DateTime::Format::MySQL->format_datetime( + $dt->subtract( $units{$unit} => $1 ) ); + } + my $limit = ''; + $limit = " AND statistics.datetime >= ?" if ( $interval || $since ); + + my $query = +"SELECT biblio.biblionumber, COUNT(statistics.itemnumber) FROM biblio LEFT JOIN items ON (biblio.biblionumber=items.biblionumber) LEFT JOIN statistics ON (items.itemnumber=statistics.itemnumber) WHERE statistics.type = 'issue' $limit GROUP BY biblio.biblionumber;"; + process_query( $query, $limit ); + + unless ($incremental) { + $query = +"SELECT biblio.biblionumber, 0 FROM biblio LEFT JOIN items ON (biblio.biblionumber=items.biblionumber) LEFT JOIN statistics ON (items.itemnumber=statistics.itemnumber) WHERE statistics.itemnumber IS NULL GROUP BY biblio.biblionumber;"; + process_query( $query, '' ); + + $query = +"SELECT biblio.biblionumber, 0 FROM biblio LEFT JOIN items ON (biblio.biblionumber=items.biblionumber) WHERE items.itemnumber IS NULL GROUP BY biblio.biblionumber;"; + process_query( $query, '' ); + } + + $dbh->commit(); +} + +sub process_query { + my $query = shift; + my $uselimit = shift; + my $sth = $dbh->prepare($query); + + if ( $since && $uselimit ) { + $sth->execute($since); + } + else { + $sth->execute(); + } + + while ( my ( $biblionumber, $totalissues ) = $sth->fetchrow_array() ) { + $num_bibs_processed++; + $totalissues = 0 unless $totalissues; + print "Processing bib $biblionumber ($totalissues issues)\n" + if $verbose; + if ( not $test_only ) { + if ( $incremental && $totalissues > 0 ) { + UpdateTotalIssues( $biblionumber, $totalissues ); + } + else { + UpdateTotalIssues( $biblionumber, 0, $totalissues ); + } + } + if ( not $test_only and ( $num_bibs_processed % $commit ) == 0 ) { + print_progress_and_commit($num_bibs_processed); + } + } + + $dbh->commit(); +} + +sub report { + my $endtime = time(); + my $totaltime = ceil( ( $endtime - $starttime ) * 1000 ); + $starttime = strftime( '%D %T', localtime($starttime) ); + $endtime = strftime( '%D %T', localtime($endtime) ); + + my $summary = <<_SUMMARY_; + +Update total issues count script report +======================================================= +Run started at: $starttime +Run ended at: $endtime +Total run time: $totaltime ms +Number of bibs modified: $num_bibs_processed +_SUMMARY_ + $summary .= "\n**** Ran in test mode only ****\n" if $test_only; + print $summary; +} + +sub print_progress_and_commit { + my $recs = shift; + $dbh->commit(); + print "... processed $recs records\n"; +} + +=head1 NAME + +update_totalissues.pl + +=head1 SYNOPSIS + + update_totalissues.pl --use-stats + update_totalissues.pl --use-items + update_totalissues.pl --commit=1000 + update_totalissues.pl --since='2012-01-01' + update_totalissues.pl --interval=30d + +=head1 DESCRIPTION + +This batch job populates bibliographic records' total issues count based +on historical issue statistics. + +=over 8 + +=item B<--help> + +Prints this help + +=item B<-v|--verbose> + +Provide verbose log information (list every bib modified). + +=item B<--use-stats> + +Use the data in the statistics table for populating total issues. + +=item B<--use-items> + +Use items.issues data for populating total issues. Note that issues +data from the items table does not respect the --since or --interval +options, by definition. Also note that if both --use-stats and +--use-items are specified, the count of biblios processed will be +misleading. + +=item B<-s|--since=DATE> + +Only process issues recorded in the statistics table since DATE. + +=item B<-i|--interval=S> + +Only process issues recorded in the statistics table in the last N +units of time. The interval should consist of a number with a one-letter +unit suffix. The valid suffixes are h (hours), d (days), w (weeks), +m (months), and y (years). The default unit is days. + +=item B<--incremental> + +Add the number of issues found in the statistics table to the existing +total issues count. Intended so that this script can be used as a cron +job to update popularity information during low-usage periods. If neither +--since or --interval are specified, incremental mode will default to +processing the last twenty-four hours. + +=item B<--commit=N> + +Commit the results to the database after every N records are processed. + +=item B<--test> + +Only test the popularity population script. + +=back + +=head1 WARNING + +If the time on your database server does not match the time on your Koha +server you will need to take that into account, and probably use the +--since argument instead of the --interval argument for incremental +updating. + +=head1 CREDITS + +This patch to Koha was sponsored by the Arcadia Public Library and the +Arcadia Public Library Foundation in honor of Jackie Faust-Moreno, late +director of the Arcadia Public Library. + +=head1 AUTHOR + +Jared Camins-Esakov + +=cut -- 2.11.4.GIT