4 # The contents of this file are subject to the terms of the
5 # Common Development and Distribution License, Version 1.0 only
6 # (the "License"). You may not use this file except in compliance
9 # You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
10 # or http://www.opensolaris.org/os/licensing.
11 # See the License for the specific language governing permissions
12 # and limitations under the License.
14 # When distributing Covered Code, include this CDDL HEADER in each
15 # file and include the License file at usr/src/OPENSOLARIS.LICENSE.
16 # If applicable, add the following below this CDDL HEADER, with the
17 # fields enclosed by brackets "[]" replaced with your own identifying
18 # information: Portions Copyright [yyyy] [name of copyright owner]
22 # Copyright (c) 1995 Sun Microsystems, Inc. All Rights Reserved
26 # design notes that are likely to be of general (rather than
27 # merely historical) interest.
31 Overview what filesync does
33 Primary Data Structures
34 general principles why they exist
35 key concepts what they represent
36 data structures major structures and their contents
38 Overview of Passes main phases of program execution
40 Modules list and descriptions of files
43 active ingredients a reading list of high points
44 the whole thing a suggested order for everything
46 Gross calling structure who calls whom
48 Helpful hints good things to know
52 The purpose of this program is to compare pairs of directory
53 trees with a baseline snapshot, to determine which files have
54 changed, and to propagate the changes in order to bring the
55 trees back into congruency. The baseline snapshot describes
56 size, ownership, ... for all files that filesync is managing
57 WHEN THEY WERE LAST IN SYNC.
59 The files and directory trees to be compared are determined
60 by a relatively flexible (user editable) rules file, whose
61 format (packingrules.4) permits files and or trees to be
62 specified, explicitly, implicitly, or with wild cards.
63 There are also provisions for filtering out unwanted files
64 and for running programs to generate lists of files and
65 directories to be included or excluded.
67 The comparisons begin by comparing the structured name
68 spaces. For names that appear in both trees, the files
69 are then compared on the basis of type, size, contents,
70 ownership and protections. For files that are already
71 in the baseline snapshot, if the sizes and modification
72 times have not changed, we do not bother to recheck the
75 The reconciliation process (resolving the differences)
76 will only propagate a change if it is obvious what should
77 be done (one side has changed relative to the snapshot,
78 while the other has not). If there are conflicting changes,
79 the file is flagged and the user is asked to reconcile the
80 differences manually. There are, however a few switches
81 that can be used to constrain the analysis or reconciliation,
82 or to force one particular side to win in case of a conflict.
85 Primary Data Structures
88 we will build up an in-memory tree that represents
89 the union of the name spaces found in the baseline
90 and on the source and destination sides.
92 keep in mind that the baseline recalls the state of
93 files THE LAST TIME THEY WERE IN AGREEMENT. If files
94 have disagreed for a long time, the baseline still
95 remembers what they were like when they agreed. If
96 files have never agreed, the baseline has no notions
97 of how they "used to be".
100 a "base pair" is a pair of directories whose
101 contents (or a subset of whose contents) are to
102 be syncrhonized. The "base pairs" to be managed
103 are specified in the packing rules file.
105 associated with each "base pair" is a set of rules
106 that describe which files (under those directories)
107 are to be kept in sync. Each rule is a list of:
108 files and or directories to be included
109 wild cards for files or directories to be included
110 programs to generate lists of names for inclusion
111 file names to be ignored
112 wild cards for file names to be ignored
113 programs to generate lists of names for ignoring
115 as a result of the "evaluation" process we build up
116 (under each base pair) a tree that represents all of
117 the files that we are supposed to keep in sync, and
118 contains everything we need to know about each one
119 of those files. The structure of the tree mirrors
120 the directory hierarchy ... actually the union of the
121 three hiearchies (baseline, source and destination).
123 for each file, we record interesting information (type,
124 size, owner, protection, mod time) and keep separate
125 note of what these values were:
126 in the baseline last time two sides agreed
127 on the source side, as we just examined it
128 on the destination side, as we just examined it
132 there is an ordered list of "base" structures
133 for each base, we maintain
134 three lists of associated "rule" descriptions:
137 restriction rules (from the command line)
138 a "file" tree, representing all files below the bases
139 a list of statistics to be printed as a summary
141 for each "rule", we maintain
142 some flags describing the type of rule
143 the character string that is the rule
145 for each "file", we maintain
146 sibling and child pointers to give them tree structure
147 flags to describe what we have done/should do
148 "fileinfo" information from the src, dest, and baseline
150 in addition there are some fields that are used
151 to add the file to a list of files requiring
152 reconciliation and record what happened to it.
154 a "fileinfo" structure contains a subset of the information
155 that we obtain from a stat call:
159 ownership, protection, and acls
163 there is also, built up during analysis, a reconciliation
164 list. This is an ordered list of "file" structures which
165 are believed to descibe files that have changed and require
166 reconciliation. The ordering is important both for correctness
167 and to preserve relative modification times.
173 stat every file that we might be interested in
174 (on both src/dest sides). This includes walking
175 the trees under all directories in order to
176 find out what files exist and stating all of
179 the main trick in this pass is that there may be
180 files we don't want to evaluate (because we are
181 limiting our attention to specific files and trees).
182 There is a LISTED flag kept in the database that
183 tells me whether or not I need to stat/descend any
186 all restrictions and ignores take effect during this pass.
190 given the baseline and all of the current stat information
191 gained during pass I, figure out what might conceivably
192 have changed and queue it for pass III. This pass doesn't
193 try to figure out what happened or who should win ... it
194 merely identifies candidates for pass III. This pass
195 ignores any nodes that were not evaluated during pass I.
197 the queueing process, however, determines the order in
198 which the files will be processed in pass III, and the
199 order is very important.
203 process the list of candidates, figuring out what has
204 actually changed and which versions deserve to win. If
205 is clear what needs doing, we actually do it in this
211 defines for limits, sizes and return codes
212 declarations for global variables (mostly cmd-line parms)
213 defines for default file names
214 declarations for routines of general interest
217 data-structures for recording rules
218 data-structures for recording information about files
219 declarations for routines that operate on/with those structures
222 the text of all localizable messages
225 definitions and declarations for routines for error
226 simulation and bit-map display.
229 routines to get, set, compare, and display Access Control Lists
231 routines to do the real work of copying, deleting, or
232 changing ownership in order to make one side agree
235 routines to examine the in-core list of files and
236 determine what has changed (and therefore what is
237 files are candidates for reconciliation). This
238 analysis includes figuring out which files should
239 be links rather than copies.
241 routines to read and write the baseline file
242 routines to search and manipulate the in-core base list
244 data structures and routines, used to sumulate errors
245 and produce debug output, that map between bits (as found
246 in various flag words) character string names for their
250 routines to build up the internal tree that describes
251 the status of all of the files that are described
252 by the current rules.
254 routines to manipulate file name arguments, including
255 wild cards and embedded environment variables.
257 routines to maintain a list of names or patterns for
258 files to be ignored, and to check file names against
261 global variables, cmd-line parameter processing,
262 parameter validation, error reporting, and the
265 routines to examine a list of files that appear to
266 have changed, and figure out what the appropriate
267 reconciliation course of action is.
269 routines to search the tree to determine whether
270 or not any creates/deletes are actually renames.
272 routines to read and write the rules file
273 routines to add rules and enumerate in-core rules
276 not really a part of filesync, but rather a utility
277 program that is used in the test suite. It extracts
278 information about files that is not readily available
279 from other unix commands.
281 Comments on studying the code
283 if you are only interested in the "active ingredients":
285 read the above notes on data structures and then
287 read the structure declarations in database.h
289 read the above notes overviewing the passes
291 in recon.c: read reconcile
293 this routine almost makes sense on its own,
294 and it is unquestionably the most important
295 routine in the entire program. Everything
296 else just gathers data for reconcile to use,
297 or updates the books to reflect the changes.
299 in eval.c: read evaluate, eval_file, walker, and note_info
301 this is the main guts of pass I
303 in anal.c: read analyze, check_file, check_changes & queue_file
305 this is the main guts of pass II
307 if you want to read the whole thing:
309 the following routines do fundamentally simple things
310 in simple ways, and can (for the most part) be understood
311 in vaccuuo. The things they do are probably sufficiently
312 obvious that you can probably understand the more interesting
313 code without having read them at all.
322 the following routines constitute the real meat of the
323 program, and while they are broken into specialized
324 modules, they probably need to be understood as an
327 main.c setup and control
331 action.c execution and book-keeping
332 rename.c a special case for a common situation
335 Gross calling structure / flow of control
385 the "file" structure contains a bunch of flags. Many of them
386 just summarize what we know about the file (e.g. where it was
387 found). Others are more subtle and control the evaluation
388 process or the writing out of the baseline file. You can't
389 really understand the processing unless you understand what
392 F_NEW added by a new rule
394 F_LISTED this name was generated by a rule
396 F_SPARSE this directory is an intermediate on
397 the way to a name generated by a rule
398 and should not be recursively walked.
400 F_EVALUATE this node was found in evaluation and
401 has up-to-date stat information
403 F_CONFLICT there is a conflict on this node so
404 baseline should remain unchanged
406 F_REMOVE this node should be purged from the baseline
408 F_STAT_ERROR it was impossible to stat this file
409 (and anything below it)
411 the implications of these flags on processing are
413 F_NEW, F_LISTED, F_SPARSE
415 affect whether or not a particular node should
416 be included in the evaluation pass.
418 in some situations, only new rules are interpreted.
420 listed files and directories should be evaluated
421 and analyzed. sparse directories should not be
422 recursively enumerated.
426 determines whether or not a node is included
427 in the analysis pass. Only nodes that have
428 been evaluated will be analyzed.
430 F_CONFLICT, F_REMOVE, F_EVALUATE
432 affect how a node should be written back into the baseline file.
434 if there is a conflict or we haven't evaluated
435 a node, we won't update the baseline.
437 if a node is marked for removal, it will be
438 excluded from the baseline when it is written out.
442 if we could not get proper status information
443 about a file (or the tree under it) we cannot,
444 with any confidence, determine what its state
445 is or do anything about it. Such files are
446 flagged as "in conflict".
448 it is somewhat kinky that we put error flagged
449 files on the reconciliation list. We do this
450 because this is the easiest way to pull them
451 out for reporting as conflicts.