1 From: Petr Baudis <pasky@suse.cz>
2 Subject: [PATCH] gitweb: Support caching projects list
4 On repo.or.cz (permanently I/O overloaded and hosting 1050 project +
5 forks), the projects list (the default gitweb page) can take more than
6 a minute to generate. This naive patch adds simple support for caching
7 the projects list data structure so that all the projects do not need
8 to get rescanned at every page access.
10 $projlist_cache_lifetime gitweb configuration variable is introduced,
11 by default set to zero. If set to non-zero, it describes the number of
12 minutes for which the cache remains valid. Only single project root
13 per system can use the cache. Any script running with the same uid as
14 gitweb can change the cache trivially - this is for secure
17 The cache itself is stored in $cache_dir/$projlist_cache_name using
18 Storable to store() Perl data structure with the list of project
19 details. When reusing the cache, the data is retrieve()'d back into
22 To prevent contention when multiple accesses coincide with cache
23 expiration, the timeout is postponed to time()+120 when we start
24 refreshing. When showing cached version, a disclaimer is shown
25 at the top of the projects list.
27 [jn: moved from Data::Dumper to Storable for serialization of data]
29 Signed-off-by: Petr Baudis <pasky@suse.cz>
30 Signed-off-by: Jakub Narebski <jnareb@gmail.com>
34 This version uses Storable (which gives binary output), instead of a
35 bit unsafe, because of need to 'eval' to unserialize, Data::Dumper.
37 It should also be a bit faster: unscientific comparison gives that
38 Storable is 3-4 times faster than using Data::Dumper on totally
39 artificial data of 1050 elements table of hashes.
41 Below replies to comments for first (or second if you count original
42 Pasky version) version of this patch.
44 No interdiff there... on the other hand it includes gitweb/README
46 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
48 [The message below didn't made it to git mailing list due to not
49 quoting J.H. name in email: it should be
50 "J.H." <warthog19@eaglescrag.net>
52 J.H. <warthog19@eaglescrag.net>]
54 On Mon, 17 Mar 2008, Rafael Garcia-Suarez wrote:
55 > On 17/03/2008, Jakub Narebski <jnareb@gmail.com> wrote:
57 >> The cache itself is stored in /tmp/gitweb.index.cache as a
58 >> Data::Dumper dump of the perl data structure with the list of project
59 >> details. When reusing the cache, the file is simply eval'd back into
62 > Correct me if I'm wrong (I haven't read the full source code...) Doesn't
63 > that mean that anyone who can write this file can make gitweb eval
64 > arbitrary perl code? And anyone can write a file in /tmp, usually... If
65 > so, I would strongly advise against including this patch in its current
66 > state, not without at least checking the ownership of the cache file.
68 First, this patch is an RFC, mainly meant as better version of Pasky's
69 patch with (the nearly) the same subject, to help repo.or.cz, when it
70 can be assumed that installation is secure.
72 Second, it was changed in this version of patch: gitweb now uses
73 Storable to write Perl data to disk and read it back, so there is no
74 need to eval to read back the data from cache, as it was when using
75 Data::Dumper package. Additionally you can now define where to store
76 cache; by default it is /tmp/gitweb/, and you can make gitweb write
77 only for projects with gitweb uid (if directory does not exists,
78 gitweb would create it if possible with appropriate permissions).
80 >> Note: instead of using Data::Dumper to serialize data we could use
81 >> Storable module (distributed with Perl like Data::Dumper). From what
82 >> I've checked it has larger initial cost, but might be better for
83 >> larger number of projects, exactly the situation when projects list
86 > Storable would be more secure. (But check file ownership!)
88 This version uses Storable, which should be more secure (but see
89 below)... and additionally bit faster.
91 For the simplicity of patch gitweb does not check cache file
92 ownership, nor permissions on cache file and on directory it is in,
93 but this should be fairly easy to add.
95 BTW. store() and retrieve() from Storable "die" on many errors;
96 in unsafe environment we would probably want to protect gitweb from
97 them failing due to incorrect file format, or nonreadable cache file,
100 > I don't know how complex are the data structures you're going to
103 It is array (list) or array reference (but ordering of items doesn't
104 matter) of records (hashes / hash references). Fields can be integer,
105 string (possibly containing quite characters and other strangeness),
108 > My paranoid preference would be to use a text-only format. If CPAN
109 > modules are out, devise your own dump/load routines, maybe?
111 What format should it be? Data::Dumper format looks like this:
115 'owner' => 'Jakub Narebski',
116 'descr_long' => './t9500-gitweb-standalone-no-errors.sh test repository',
118 'age_string' => '7 days ago',
119 'path' => 'trash.git',
121 'descr' => './t9500-gitweb-standalone... '
124 'owner' => 'Jakub Narebski',
125 'descr_long' => "git with Jakub Narebski modifications, local copy.",
127 'age_string' => '80 min ago',
130 'descr' => 'git with Jakub Narebski modif... '
134 Another solution would be to use YAML [Tiny], but although Perl
135 packages for YAML are in contrib repositories, they are not installed
136 by default with Perl, and most probably are not present on the
137 system. Additonally at least YAML::Tiny is slower than even
138 Data::Dumper when retrieving data, around 4-5 times slower... on the
139 other hand YAML::Tiny is Pure Perl implementation; other solutions
140 which use C libraries should be much faster.
142 YAML output looks like this:
147 age_string: '7 days ago'
148 descr: './t9500-gitweb-standalone... '
149 descr_long: './t9500-gitweb-standalone-no-errors.sh test repository'
151 owner: 'Jakub Narebski'
155 age_string: '80 min ago'
156 descr: 'git with Jakub Narebski modif... '
157 descr_long: 'git with Jakub Narebski modifications, local copy.'
159 owner: 'Jakub Narebski'
162 Yet another solution would be to use [much] restricted version of
163 ini-like git config format, and perhaps reuse git-cvsserver parsing of
164 this format, or simply call "git-config --file <file> -z -l" and
165 gitweb code to parse git-config output.
167 The output whould then look like this:
171 age_string = "7 days ago"
172 descr = './t9500-gitweb-standalone... '
173 descr_long = "./t9500-gitweb-standalone-no-errors.sh test repository"
175 owner = Jakub Narebski
179 age_string = "80 min ago"
180 descr = "git with Jakub Narebski modif... "
181 descr_long = "git with Jakub Narebski modifications, local copy."
183 owner = Jakub Narebski
185 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
187 On Mon, 17 Mar 2008, Frank Lichtenheld wrote:
188 > On Mon, Mar 17, 2008 at 04:09:29PM +0100, Jakub Narebski wrote:
190 >> From: Petr Baudis <pasky@suse.cz>
191 >> $projlist_cache_lifetime gitweb configuration variable is introduced,
192 >> by default set to zero. If set to non-zero, it describes the number of
193 >> minutes for which the cache remains valid. Only single project root
194 >> per system can use the cache. Any script running with the same uid as
195 >> gitweb can change the cache trivially - this is for secure
196 >> installations only.
198 > The more subtle threat is the fact that anyone with writing
199 > rights to /tmp can give gitweb any data he wants if the file doesn't
204 > At the very least you should:
206 > - Allow to override /tmp (via ENV{TMPDIR} or via a configuration
211 > - Advise people to change that to something that is not
214 I wrote a bit about this in gitweb/README, feel free to add to it.
215 Additionally gitweb can create cache directory if it does not exist,
216 and it does so with restrictive premissions.
218 > - Check if the file is owned by the uid gitweb is running under and
221 Currently not done, to make patch simpler; should be fairly easy to
222 add. I guess it would be best to create is_cache_safe() or something
223 like that subroutine.
228 >> + my $now = time();
229 >> + if ($cache_lifetime && -f $cache_file &&
230 >> + stat($cache_file)->mtime + $cache_lifetime * 60 > $now &&
231 >> + open(my $fd, '<', $cache_file)) {
232 >> + $stale = $now - stat($cache_file)->mtime;
234 > One stat() call instead of three would be better for performance.