From f9f5fb9e3b79c8e4e76c3cf3d0650cf8749652ea Mon Sep 17 00:00:00 2001 From: "Kyle J. McKay" Date: Wed, 22 Apr 2015 09:58:59 -0700 Subject: [PATCH] t/htmlcache/age-epoch: create TopGit-controlled branch --- .topdeps | 3 +- .topmsg | 251 +++------------------------------------------------------------ 2 files changed, 12 insertions(+), 242 deletions(-) rewrite .topmsg (99%) diff --git a/.topdeps b/.topdeps index b0b2695195..f1225bea3a 100644 --- a/.topdeps +++ b/.topdeps @@ -1,2 +1 @@ -t/frontpage/separate -t/projlist-cache/lastactivity +t/projlist-cache/caching diff --git a/.topmsg b/.topmsg dissimilarity index 99% index 1dcb7559f9..820e5cd55b 100644 --- a/.topmsg +++ b/.topmsg @@ -1,240 +1,11 @@ -Subject: [PATCH] gitweb: support caching projects list - -On repo.or.cz (permanently I/O overloaded and hosting 1050 project + -forks), the projects list (the default gitweb page) can take more than -a minute to generate. This naive patch adds simple support for caching -the projects list data structure so that all the projects do not need -to get rescanned at every page access. - -$projlist_cache_lifetime gitweb configuration variable is introduced, -by default set to zero. If set to non-zero, it describes the number of -minutes for which the cache remains valid. Only single project root -per system can use the cache. Any script running with the same uid as -gitweb can change the cache trivially - this is for secure -installations only. - -The cache itself is stored in $cache_dir/$projlist_cache_name using -Storable to store() Perl data structure with the list of project -details. When reusing the cache, the data is retrieve()'d back into -@projects. - -To prevent contention when multiple accesses coincide with cache -expiration, the timeout is postponed to time()+120 when we start -refreshing. When showing cached version, a disclaimer is shown -at the top of the projects list. - -[jn: moved from Data::Dumper to Storable for serialization of data] - -$cache_grpshared gitweb configuration variable can be set to 1 to -create the cache file group-readable and group-writable to facilitate -external re-generation of the cache. - -Signed-off-by: Petr Baudis -Signed-off-by: Jakub Narebski -Signed-off-by: Kyle J. McKay - ---- - -This version uses Storable (which gives binary output), instead of a -bit unsafe, because of need to 'eval' to unserialize, Data::Dumper. - -It should also be a bit faster: unscientific comparison gives that -Storable is 3-4 times faster than using Data::Dumper on totally -artificial data of 1050 elements table of hashes. - -Below replies to comments for first (or second if you count original -Pasky version) version of this patch. - -No interdiff there... on the other hand it includes gitweb/README - -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -[The message below didn't made it to git mailing list due to not -quoting J.H. name in email: it should be - "J.H." -not - J.H. ] - -On Mon, 17 Mar 2008, Rafael Garcia-Suarez wrote: -> On 17/03/2008, Jakub Narebski wrote: ->> ->> The cache itself is stored in /tmp/gitweb.index.cache as a ->> Data::Dumper dump of the perl data structure with the list of project ->> details. When reusing the cache, the file is simply eval'd back into ->> @projects. -> -> Correct me if I'm wrong (I haven't read the full source code...) Doesn't -> that mean that anyone who can write this file can make gitweb eval -> arbitrary perl code? And anyone can write a file in /tmp, usually... If -> so, I would strongly advise against including this patch in its current -> state, not without at least checking the ownership of the cache file. - -First, this patch is an RFC, mainly meant as better version of Pasky's -patch with (the nearly) the same subject, to help repo.or.cz, when it -can be assumed that installation is secure. - -Second, it was changed in this version of patch: gitweb now uses -Storable to write Perl data to disk and read it back, so there is no -need to eval to read back the data from cache, as it was when using -Data::Dumper package. Additionally you can now define where to store -cache; by default it is /tmp/gitweb/, and you can make gitweb write -only for projects with gitweb uid (if directory does not exists, -gitweb would create it if possible with appropriate permissions). - ->> Note: instead of using Data::Dumper to serialize data we could use ->> Storable module (distributed with Perl like Data::Dumper). From what ->> I've checked it has larger initial cost, but might be better for ->> larger number of projects, exactly the situation when projects list ->> caching is needed. -> -> Storable would be more secure. (But check file ownership!) - -This version uses Storable, which should be more secure (but see -below)... and additionally bit faster. - -For the simplicity of patch gitweb does not check cache file -ownership, nor permissions on cache file and on directory it is in, -but this should be fairly easy to add. - -BTW. store() and retrieve() from Storable "die" on many errors; -in unsafe environment we would probably want to protect gitweb from -them failing due to incorrect file format, or nonreadable cache file, -etc. - -> I don't know how complex are the data structures you're going to -> save. - -It is array (list) or array reference (but ordering of items doesn't -matter) of records (hashes / hash references). Fields can be integer, -string (possibly containing quite characters and other strangeness), -or undef. - -> My paranoid preference would be to use a text-only format. If CPAN -> modules are out, devise your own dump/load routines, maybe? - -What format should it be? Data::Dumper format looks like this: - - [ - { - 'owner' => 'Jakub Narebski', - 'descr_long' => './t9500-gitweb-standalone-no-errors.sh test repository', - 'mtime' => undef, - 'age_string' => '7 days ago', - 'path' => 'trash.git', - 'age' => 652787, - 'descr' => './t9500-gitweb-standalone... ' - }, - { - 'owner' => 'Jakub Narebski', - 'descr_long' => "git with Jakub Narebski modifications, local copy.", - 'mtime' => undef, - 'age_string' => '80 min ago', - 'path' => 'git.git', - 'age' => 4813, - 'descr' => 'git with Jakub Narebski modif... ' - } - ] - -Another solution would be to use YAML [Tiny], but although Perl -packages for YAML are in contrib repositories, they are not installed -by default with Perl, and most probably are not present on the -system. Additonally at least YAML::Tiny is slower than even -Data::Dumper when retrieving data, around 4-5 times slower... on the -other hand YAML::Tiny is Pure Perl implementation; other solutions -which use C libraries should be much faster. - -YAML output looks like this: - - --- - - - age: 652787 - age_string: '7 days ago' - descr: './t9500-gitweb-standalone... ' - descr_long: './t9500-gitweb-standalone-no-errors.sh test repository' - mtime: ~ - owner: 'Jakub Narebski' - path: trash.git - - - age: 4813 - age_string: '80 min ago' - descr: 'git with Jakub Narebski modif... ' - descr_long: 'git with Jakub Narebski modifications, local copy.' - mtime: ~ - owner: 'Jakub Narebski' - path: git.git - -Yet another solution would be to use [much] restricted version of -ini-like git config format, and perhaps reuse git-cvsserver parsing of -this format, or simply call "git-config --file -z -l" and -gitweb code to parse git-config output. - -The output whould then look like this: - - [gitweb "trash.git"] - age = 652787 - age_string = "7 days ago" - descr = './t9500-gitweb-standalone... ' - descr_long = "./t9500-gitweb-standalone-no-errors.sh test repository" - mtime - owner = Jakub Narebski - - [gitweb "git.git"] - age = 4813 - age_string = "80 min ago" - descr = "git with Jakub Narebski modif... " - descr_long = "git with Jakub Narebski modifications, local copy." - mtime - owner = Jakub Narebski - -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -On Mon, 17 Mar 2008, Frank Lichtenheld wrote: -> On Mon, Mar 17, 2008 at 04:09:29PM +0100, Jakub Narebski wrote: ->> ->> From: Petr Baudis ->> $projlist_cache_lifetime gitweb configuration variable is introduced, ->> by default set to zero. If set to non-zero, it describes the number of ->> minutes for which the cache remains valid. Only single project root ->> per system can use the cache. Any script running with the same uid as ->> gitweb can change the cache trivially - this is for secure ->> installations only. -> -> The more subtle threat is the fact that anyone with writing -> rights to /tmp can give gitweb any data he wants if the file doesn't -> exist yet. - -Right. - -> At the very least you should: -> -> - Allow to override /tmp (via ENV{TMPDIR} or via a configuration -> variable) - -Done. - -> - Advise people to change that to something that is not -> world-writable - -I wrote a bit about this in gitweb/README, feel free to add to it. -Additionally gitweb can create cache directory if it does not exist, -and it does so with restrictive premissions. - -> - Check if the file is owned by the uid gitweb is running under and -> not word-writable. - -Currently not done, to make patch simpler; should be fairly easy to -add. I guess it would be best to create is_cache_safe() or something -like that subroutine. - -> [...] ->> + my @projects; ->> + my $stale = 0; ->> + my $now = time(); ->> + if ($cache_lifetime && -f $cache_file && ->> + stat($cache_file)->mtime + $cache_lifetime * 60 > $now && ->> + open(my $fd, '<', $cache_file)) { ->> + $stale = $now - stat($cache_file)->mtime; -> -> One stat() call instead of three would be better for performance. - -Thanks. Done. +Subject: [PATCH] gitweb: refactor age_string computation + +To facilitate caching, the age_string function needs to take +as its arguments not the actual age in seconds, but the +actual unix epoch creation time and the unix epoch to compute +the age string against (which defaults to now) instead. + +To get the old behavior, passing in 0 for the first argument +and the age in seconds as the second will suffice. + +Signed-off-by: Kyle J. McKay -- 2.11.4.GIT