1 This repository is for testing cvs2svn's pruning choices when filling
2 symbolic names. The layout is:
11 Every file was imported in a standard way, then revisions 1.2 and 1.3
12 were committed on every file. Then this branch was made on four
13 files (/one.txt, /two.txt, /foo/three.txt, /foo/four.txt):
15 BRANCH: sprouts from 1.3 (so branch number 1.3.0.2)
17 Then a revision was committed on that branch, creating revision
18 1.3.2.1 on those files.
20 No branch was made on /bar/five.txt nor /bar/six.txt. Thus, when
21 BRANCH is created, subdir /bar should be deleted entirely. But if you
22 revert r1125 of cvs2svn.py, /bar won't be deleted.
24 Search below for the word "Suppose" for more details. (Note that we
25 still don't have a test for the proposed 'if not prune_ok:'
28 --------------------8-<-------cut-here---------8-<-----------------------
30 From nobody Wed Jun 2 18:24:01 2004
31 Sender: kfogel@newton.ch.collab.net
33 Cc: commits@cvs2svn.tigris.org
34 Subject: Re: cvs2svn commit: r1035 - branches/may-04-redesign
35 References: <200405290539.i4T5dmM10064@morbius.ch.collab.net>
36 From: kfogel@collab.net
37 Reply-To: kfogel@collab.net
38 X-Windows: putting new limits on productivity.
39 Date: 02 Jun 2004 18:24:00 -0500
40 In-Reply-To: <200405290539.i4T5dmM10064@morbius.ch.collab.net>
41 Message-ID: <85vfi9czn3.fsf@newton.ch.collab.net>
43 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1.50
45 Content-Type: text/plain; charset=us-ascii
47 fitz@tigris.org writes:
48 > Modified: branches/may-04-redesign/cvs2svn.py
49 > ==============================================================================
50 > --- branches/may-04-redesign/cvs2svn.py (original)
51 > +++ branches/may-04-redesign/cvs2svn.py Sat May 29 00:39:43 2004
52 > @@ -4802,15 +4802,16 @@
55 > def _fill(self, symbol_fill, key, name,
56 > - parent_path_so_far=None, preferred_revnum=None):
57 > + parent_path_so_far=None, preferred_revnum=None, prune_ok=None):
58 > """Descends through all nodes in SYMBOL_FILL.node_tree that are
59 > rooted at KEY, which is a string key into SYMBOL_FILL.node_tree.
60 > Generates copy (and delete) commands for all destination nodes
61 > that don't exist in NAME.
63 > - PARENT_PATH_SO_FAR is the parent directory of the path(s) that may
64 > - be copied in this invocation of the method. If None, that means
65 > - that our source path starts from the root of the repository.
66 > + PARENT_PATH_SO_FAR is the parent directory of the source path(s)
67 > + that may be copied in this invocation of the method. If None,
68 > + that means that our source path starts from the root of the
71 > PREFERRED_REVNUM is an int which is the source revision number
72 > that the caller (who may have copied KEY's parent) used to
73 > @@ -4818,9 +4819,15 @@
74 > is preferable to any other (which probably means that no copies
77 > - PARENT_PATH_SO_FAR and PREFERRED_REVNUM should only be passed in
78 > - by recursive calls."""
79 > + PRUNE_OK means that a copy has been made in this recursion, and
80 > + it's safe to prune directories that are not in SYMBOL_FILL.node_tree.
82 This documentation of PRUNE_OK might want to be a little more
83 detailed, and make clear the cause-effect relationship between the
84 first and second clauses. I.e., PRUNE_OK doesn't mean two unrelated
85 things, it means one thing that happens to have an implication.
87 Also, the pruning behavior applies to files as well as directories,
88 no? (Maybe you were tempted to say "directories" because that's what
89 the unrelated --prune flag to cvs2svn affects... Which is one reason
90 not to call this new param 'prune_ok'; see later for another reason.)
92 Anyway, I'm thinking something like this for the doc:
94 PRUNE_OK means that a copy has already been made higher in this
95 recursion, and that therefore it's safe to prune directories and
96 files that do not appear in the part of SYMBOL_FILL.node_tree()
97 governing this recursion.
99 If that were the doc string, would it be accurate?
101 > @@ -4839,7 +4846,7 @@
103 > # if our destination path doesn't already exist, then we may
104 > # have to make a copy.
105 > - if (dest_path is not None and not self.path_exists(dest_path)):
106 > + if not self.path_exists(dest_path):
107 > src_revnum = symbol_fill.get_best_revnum(key, preferred_revnum)
109 > # If the revnum of our parent's copy (src_revnum) is the same
110 > @@ -4849,6 +4856,7 @@
111 > if src_revnum != preferred_revnum:
113 > new_entries = self.copy_path(src_path_so_far, dest_path, src_revnum)
114 > + prune_ok = peer_path_unsafe_for_pruning = 1
115 > # Delete invalid entries that got swept in by the copy.
116 > valid_entries = symbol_fill.node_tree[key]
117 > bad_entries = self._get_invalid_entries(valid_entries, new_entries)
118 > @@ -4857,8 +4865,25 @@
119 > del_path = dest_path + '/' + entry
120 > self.delete_path(del_path)
122 > - self._fill(symbol_fill, key, name, src_path_so_far, src_revnum)
123 > + self._fill(symbol_fill, key, name, src_path_so_far, src_revnum, prune_ok)
125 > + if peer_path_unsafe_for_pruning:
127 > + # Any entries still present in the dest directory that we've just
128 > + # created by copying, but not in
129 > + # symbol_fill.node_tree[parent_key], don't belong and should be
130 > + # deleted as well. If we haven't actually made a copy, do
132 > + if parent_path_so_far and prune_ok:
133 > + this_path = self._dest_path_for_source_path(name, parent_path_so_far)
134 > + ign, this_contents = self._node_for_path(this_path, self.youngest)
135 > + expected_contents = symbol_fill.node_tree[parent_key]
136 > + bad_entries = self._get_invalid_entries(expected_contents, this_contents)
137 > + for entry in bad_entries:
138 > + del_path = this_path + '/' + entry
140 > + print "FITZ: deleting path", del_path
141 > + self.delete_path(del_path)
143 Hmmm. Okay, I get the general idea here, and like it.
145 There's something weird about 'peer_path_unsafe_for_pruning',
146 though... Let me see if I can put my finger on it.
148 Suppose we first make this copy:
150 cp /trunk/foo/ @ r4 --> /branches/MYBRANCH/foo/
152 That sets 'peer_path_unsafe_for_pruning' to 1 in the stack frame for
153 "/" (that is, the frame in which "/foo" is a child entry). Great.
154 Suppose that foo@4 had two subdirs, "/foo/bar/" and "/foo/baz/", only
155 the first of which is on this branch at all, albeit from r6 not r4.
157 So, now we recurse down and copy "bar/" from r6:
159 del /branches/MYBRANCH/foo/bar/
160 cp /trunk/foo/bar/ @ r6 --> /branches/MYBRANCH/foo/bar/
162 That sets 'peer_path_unsafe_for_pruning' to 1 in the stack frame for
163 "/foo", of course, since "/foo/bar" is a child entry of that. But
164 remember, earlier when we copied "/foo", we accidentally got
165 "/foo/baz/" as well, which needs to be pruned out.
167 We used to have code to prune that, but it's commented out now, see
168 the comment that begins "###TODO OPTIMIZE: If we keep a list
171 So the question is, who *will* prune "/foo/baz/"? All of the stack
172 frames we've created in this fill have 'peer_path_unsafe_for_pruning'
173 set to 1, so all of them will return early, without entering the blob
174 of code at the end that is supposed to clean up these bad entries.
176 Thus, I don't think anyone can prune "/foo/baz/", even though it
177 clearly must be pruned. Another way of saying it is, that commented
178 out loop that's supposed to be just an optimization is actually not an
179 optimization -- it's correctness code.
181 But I'm not saying the answer here is just to uncomment that loop. I
182 think also 'peer_path_unsafe_for_pruning' should not get set *if*
183 'prune_ok' is already set! In other words
185 prune_ok = peer_path_unsafe_for_pruning = 1
190 prune_ok = peer_path_unsafe_for_pruning = 1
192 Because if 'prune_ok' is already set, then we're already in a
193 recursive call after a copy, so peer paths are, in fact, safe for
194 pruning, even though they are peer paths! (One way to think of it is
195 that the name 'prune_ok' should be 'we_are_now_under_a_copy', or
196 'copy_happened', or something like that. What the param really
197 indicates is that all dest paths being generated in this leg of the
198 recursion are underneath a copy -- the fact that pruning is okay is
199 merely one consequence of that situation.)
201 Is this making any sense, or am I off my rocker?
203 The comment right before the final blob of code confused me for a
206 # Any entries still present in the dest directory that we've just
207 # created by copying, but not in
208 # symbol_fill.node_tree[parent_key], don't belong and should be
209 # deleted as well. If we haven't actually made a copy, do
212 ... because it talks about "any entries still present in the dest
213 directory that we've just copied", yet 'parent_path_so_far' is *not* a
214 directory we just copied, rather it's the parent of things we may or
215 may not have copied. IOW, this code (the last bit of code in the
216 function, right below the above comment)...
218 if parent_path_so_far and prune_ok:
219 this_path = self._dest_path_for_source_path(name, parent_path_so_far)
220 ign, this_contents = self._node_for_path(this_path, self.youngest)
221 expected_contents = symbol_fill.node_tree[parent_key]
222 bad_entries = self._get_invalid_entries(expected_contents, this_contents)
223 for entry in bad_entries:
224 del_path = this_path + '/' + entry
225 self.delete_path(del_path)
227 ... cannot be talking about a directory we've just copied in this
228 stack frame, but rather about a directory copied in some previous
229 frame, for which we're now in a recursive call (that is, an old
230 parent_path_so_far got extended by one or more entry components,
233 Perhaps this is what is meant by "the dest directory that we've just
234 copied", and all we need to do is clarify things by saying "the dest
235 directory at or under a copy made in an call higher up in this
236 recursion". Well that's clumsy, but you know what I mean.
238 Anyway, those are my thoughts for now. Are they anything like yours,
239 or have I fanned off into my own private Oort cloud of insanity?
241 Btw, can't say I share the feeling that _fill() has gotten overly
242 complex. I mean, apparently it has a bug or two right now, but in
243 general it seems to be about as complex as the problem it solves -- I
244 wouldn't want to break it up any further.