1 This file contains a description of the RevisionCollector /
2 RevisionReader mechanism.
5 cvs2svn now includes hooks to make it possible to avoid having to
6 invoke CVS or RCS zillions of times in OutputPass (which is otherwise
7 the most expensive part of the conversion). Here is a brief
8 description of how the hooks work.
10 Most conversions [1] require an instance of RevisionReader, whose
11 responsibility is to produce the text contents of CVS revisions on
12 demand during OutputPass. The RevisionReader can read the CVS
13 revision contents directly out of the RCS files during OutputPass.
14 But additional hooks support the construction of different kinds of
15 RevisionReader that record the CVS file revisions' contents during
16 FilterSymbolsPass then output the contents during OutputPass.
18 The interface that is used during FilterSymbolsPass to allow the
19 collection of revision information is:
21 RevisionCollector -- can collect information during
22 FilterSymbolsPass to help the RevisionReader produce RCS file
23 revision contents during OutputPass.
25 The type of RevisionCollector/RevisionReader to be used for a run of
26 cvs2svn can be set using --use-internal-co, --use-rcs, or --use-cvs,
27 or via the --options file with lines like:
29 ctx.revision_collector = MyRevisionCollector()
30 ctx.revision_reader = MyRevisionReader()
32 The following RevisionCollectors are supplied with cvs2svn:
34 NullRevisionCollector -- does nothing (for RevisionReaders that
35 don't need anything to happen in FilterSymbolsPass).
37 InternalRevisionCollector -- records the delta text and
38 dependencies for required revisions in FilterSymbolsPass, for
39 use with the InternalRevisionReader.
41 GitRevisionCollector -- uses another RevisionReader to reconstruct
42 the revisions' fulltext during FilterSymbolsPass, then writes
43 the fulltexts to a blobfile in git-fast-import format. This
44 file, combined with the dumpfile created in OutputPass, can be
47 ExternalBlobGenerator -- uses an external Python program to
48 reconstruct the revision fulltexts in FilterSymbolsPass and
49 write them to a blobfile in git-fast-import format. This
50 option is very fast because (1) it uses code similar to that
51 used by InternalRevisionCollector/InternalRevisionReader, and
52 (2) it processes all revisions from a file at once, thereby
53 avoiding a lot of disk seeking.
55 The following RevisionReaders are supplied with cvs2svn:
57 InternalRevisionReader -- reconstitutes the revisions' contents
58 during OutputPass from the data recorded by
59 InternalRevisionCollector. This is by far the fastest option
60 for cvs2svn conversions, but it requires a substantial amount
61 of temporary disk space for the duration of the conversion.
63 RCSRevisionReader -- uses RCS's "co" command to extract the
64 revision text during OutputPass. This is slower than
65 InternalRevisionReader because "co" has to be executed very
66 many times, but is better tested and does not require any
67 temporary disk space. RCSRevisionReader does not use a
70 CVSRevisionReader -- uses the "cvs" command to extract the
71 revision text during OutputPass. This is even slower than
72 RCSRevisionReader, but it can handle some CVS file quirks that
73 stymy RCSRevisionReader (see the cvs2svn HTML documentation).
74 CVSRevisionReader does not use a RevisionCollector.
76 It is possible to write your own RevisionCollector and RevisionReader
77 if you would like to do things differently. A RevisionCollector, with
78 callback methods that are invoked as the CVS files are parsed, can be
79 used to collect information during FilterSymbolsPass. Its
80 process_file() method is allowed to set an arbitrary token (for
81 example, a content hash) in CVSItem.revision_reader_token. This token
82 is carried along by cvs2svn for use by the RevisionReader in
85 Later, when OutputPass requires the file contents, it calls
86 RevisionReader.get_content(), which is passed a CVSRevision instance
87 and has to return the file revision's contents. The fancy
88 RevisionReader could use the token to retrieve the pre-stored file
89 contents without having to call CVS or RCS at all.
92 [1] The exception is cvs2git conversions, which need a
93 RevisionCollector but not a RevisionReader. The reason is that
94 "git fast-import" allows file revision contents to be written as
95 "blobs" in arbitrary order, to be hooked together later into
96 proper changesets. This feature is very beneficial to the
97 performance of cvs2git, because it allows all revisions of a
98 single file to be generated at the same time (with good disk
99 locality) rather than having to jump around from file to file
100 getting single revisions in changeset order. Unfortunately,
101 neither "bzr fast-import" nor "hg fastimport" support separate