1 <?xml version="1.0" encoding="UTF-8"?>
\r
2 <!DOCTYPE chapter SYSTEM "../../dtd/dblite.dtd">
\r
3 <!-- This chapter is copied over from the Git book and slightly
\r
4 modified for the TortoiseGit client.
\r
6 <chapter id="tsvn-basics">
\r
7 <title>Basic Concepts</title>
\r
10 <primary>Git book</primary>
\r
13 This chapter is a slightly modified version of the same chapter
\r
14 in the Git book. An online version of the Git book is
\r
16 <ulink url="http://svnbook.red-bean.com/">
\r
17 <citetitle>http://svnbook.red-bean.com/</citetitle>
\r
21 This chapter is a short, casual introduction to Git.
\r
22 If you're new to version control, this chapter is definitely for
\r
23 you. We begin with a discussion of general version control
\r
24 concepts, work our way into the specific ideas behind
\r
25 Git, and show some simple examples of Git in
\r
29 Even though the examples in this chapter show people sharing
\r
30 collections of program source code, keep in mind that Git
\r
31 can manage any sort of file collection - it's not limited to
\r
32 helping computer programmers.
\r
35 <sect1 id="tsvn-basics-repository">
\r
36 <title>The Repository</title>
\r
38 <primary>repository</primary>
\r
41 Git is a centralized system for sharing information.
\r
42 At its core is a <firstterm>repository</firstterm>, which is a
\r
43 central store of data. The repository stores information in the
\r
44 form of a <firstterm>filesystem tree</firstterm> - a typical
\r
45 hierarchy of files and directories. Any number of
\r
46 <firstterm>clients</firstterm> connect to the repository, and
\r
47 then read or write to these files. By writing data, a client
\r
48 makes the information available to others; by reading data, the
\r
49 client receives information from others.
\r
51 <figure id="tsvn-basics-dia1">
\r
52 <title>A Typical Client/Server System</title>
\r
53 <graphic fileref="images/ch02dia1.png"/>
\r
56 So why is this interesting? So far, this sounds like the
\r
57 definition of a typical file server. And indeed, the repository
\r
58 <emphasis>is</emphasis> a kind of file server, but it's not your
\r
59 usual breed. What makes the Git repository special is
\r
60 that <emphasis>it remembers every change</emphasis> ever written
\r
61 to it: every change to every file, and even changes to the
\r
62 directory tree itself, such as the addition, deletion, and
\r
63 rearrangement of files and directories.
\r
66 When a client reads data from the repository, it normally
\r
67 sees only the latest version of the filesystem tree. But the
\r
68 client also has the ability to view
\r
69 <emphasis>previous</emphasis> states of the filesystem. For
\r
70 example, a client can ask historical questions like, <quote>what did
\r
71 this directory contain last Wednesday?</quote>, or <quote>who was the last
\r
72 person to change this file, and what changes did they make?</quote>
\r
73 These are the sorts of questions that are at the heart of any
\r
74 <firstterm>version control system</firstterm>: systems that are
\r
75 designed to record and track changes to data over time.
\r
78 <!-- the philosophical section -->
\r
79 <sect1 id="tsvn-basics-versioning">
\r
80 <title>Versioning Models</title>
\r
82 All version control systems have to solve the same
\r
83 fundamental problem: how will the system allow users to share
\r
84 information, but prevent them from accidentally stepping on
\r
85 each other's feet? It's all too easy for users to
\r
86 accidentally overwrite each other's changes in the
\r
89 <sect2 id="tsvn-basics-versioning-filesharing">
\r
90 <title>The Problem of File-Sharing</title>
\r
92 Consider this scenario: suppose we have two co-workers,
\r
93 Harry and Sally. They each decide to edit the same repository
\r
94 file at the same time. If Harry saves his changes to the
\r
95 repository first, then it's possible that (a few moments
\r
96 later) Sally could accidentally overwrite them with her own
\r
97 new version of the file. While Harry's version of the file
\r
98 won't be lost forever (because the system remembers every
\r
99 change), any changes Harry made <emphasis>won't</emphasis> be
\r
100 present in Sally's newer version of the file, because she
\r
101 never saw Harry's changes to begin with. Harry's work is
\r
102 still effectively lost - or at least missing from the
\r
103 latest version of the file - and probably by accident.
\r
104 This is definitely a situation we want to avoid!
\r
106 <figure id="tsvn-basics-dia2">
\r
107 <title>The Problem to Avoid</title>
\r
108 <graphic fileref="images/ch02dia2.png"/>
\r
111 <sect2 id="tsvn-basics-versioning-lockmodifyunlock">
\r
112 <title>The Lock-Modify-Unlock Solution</title>
\r
114 Many version control systems use a
\r
115 <firstterm>lock-modify-unlock</firstterm> model to address
\r
116 this problem, which is a very simple solution. In such a
\r
117 system, the repository allows only one person to change a file
\r
118 at a time. First Harry must <emphasis>lock</emphasis> the file before he can begin
\r
119 making changes to it. Locking a file is a lot like borrowing
\r
120 a book from the library; if Harry has locked a file, then Sally
\r
121 cannot make any changes to it. If she tries to lock the file,
\r
122 the repository will deny the request. All she can do is read
\r
123 the file, and wait for Harry to finish his changes and release
\r
124 his lock. After Harry unlocks the file, his turn is over, and
\r
125 now Sally can take her turn by locking and editing.
\r
127 <figure id="tsvn-basics-dia3">
\r
128 <title>The Lock-Modify-Unlock Solution</title>
\r
129 <graphic fileref="images/ch02dia3.png"/>
\r
132 The problem with the lock-modify-unlock model is that it's
\r
133 a bit restrictive, and often becomes a roadblock for
\r
139 <emphasis>Locking may cause administrative problems.</emphasis>
\r
140 Sometimes Harry will lock a file and then forget about it.
\r
141 Meanwhile, because Sally is still waiting to edit the file,
\r
142 her hands are tied. And then Harry goes on vacation. Now
\r
143 Sally has to get an administrator to release Harry's lock.
\r
144 The situation ends up causing a lot of unnecessary delay
\r
150 <emphasis>Locking may cause unnecessary serialization.</emphasis>
\r
151 What if Harry is editing the beginning of a text file,
\r
152 and Sally simply wants to edit the end of the same file?
\r
153 These changes don't overlap at all. They could easily
\r
154 edit the file simultaneously, and no great harm would
\r
155 come, assuming the changes were properly merged together.
\r
156 There's no need for them to take turns in this
\r
162 <emphasis>Locking may create a false sense of security.</emphasis>
\r
163 Pretend that Harry locks and edits file A, while
\r
164 Sally simultaneously locks and edits file B. But suppose
\r
165 that A and B depend on one another, and the changes made
\r
166 to each are semantically incompatible. Suddenly A and B
\r
167 don't work together anymore. The locking system was
\r
168 powerless to prevent the problem - yet it somehow
\r
169 provided a sense of false security. It's easy for Harry and
\r
170 Sally to imagine that by locking files, each is beginning a
\r
171 safe, insulated task, and thus inhibits them from
\r
172 discussing their incompatible changes early
\r
178 <sect2 id="tsvn-basics-versioning-copymodifymerge">
\r
179 <title>The Copy-Modify-Merge Solution</title>
\r
181 Git, CVS, and other version control systems use a
\r
182 <firstterm>copy-modify-merge</firstterm> model as an
\r
183 alternative to locking. In this model, each user's client
\r
184 reads the repository and creates a personal <firstterm>working
\r
185 copy</firstterm> of the file or project. Users then work in
\r
186 parallel, modifying their private copies. Finally, the
\r
187 private copies are merged together into a new, final version.
\r
188 The version control system often assists with the merging, but
\r
189 ultimately a human being is responsible for making it happen
\r
193 Here's an example. Say that Harry and Sally each create
\r
194 working copies of the same project, copied from the
\r
195 repository. They work concurrently, and make changes to the
\r
196 same file <filename>A</filename> within their copies. Sally saves her changes to
\r
197 the repository first. When Harry attempts to save his changes
\r
198 later, the repository informs him that his file A is
\r
199 <firstterm>out-of-date</firstterm>. In other words, that file
\r
200 A in the repository has somehow changed since he last copied
\r
201 it. So Harry asks his client to <firstterm>merge</firstterm>
\r
202 any new changes from the repository into his working copy of
\r
203 file A. Chances are that Sally's changes don't overlap with
\r
204 his own; so once he has both sets of changes integrated, he
\r
205 saves his working copy back to the repository.</para>
\r
206 <figure id="tsvn-basics-dia4">
\r
207 <title>The Copy-Modify-Merge Solution</title>
\r
208 <graphic fileref="images/ch02dia4.png"/>
\r
210 <figure id="tsvn-basics-dia5">
\r
211 <title>...Copy-Modify-Merge Continued</title>
\r
212 <graphic fileref="images/ch02dia5.png"/>
\r
215 <primary>conflict</primary>
\r
218 But what if Sally's changes <emphasis>do</emphasis> overlap
\r
219 with Harry's changes? What then? This situation is called a
\r
220 <firstterm>conflict</firstterm>, and it's usually not much of
\r
221 a problem. When Harry asks his client to merge the latest
\r
222 repository changes into his working copy, his copy of file A
\r
223 is somehow flagged as being in a state of conflict: he'll be
\r
224 able to see both sets of conflicting changes, and manually
\r
225 choose between them. Note that software can't automatically
\r
226 resolve conflicts; only humans are capable of understanding
\r
227 and making the necessary intelligent choices. Once Harry has
\r
228 manually resolved the overlapping changes (perhaps by
\r
229 discussing the conflict with Sally!), he can safely save the
\r
230 merged file back to the repository.
\r
233 The copy-modify-merge model may sound a bit chaotic, but
\r
234 in practice, it runs extremely smoothly. Users can work in
\r
235 parallel, never waiting for one another. When they work on
\r
236 the same files, it turns out that most of their concurrent
\r
237 changes don't overlap at all; conflicts are infrequent. And
\r
238 the amount of time it takes to resolve conflicts is far less
\r
239 than the time lost by a locking system.
\r
242 In the end, it all comes down to one critical factor: user
\r
243 communication. When users communicate poorly, both syntactic
\r
244 and semantic conflicts increase. No system can force users to
\r
245 communicate perfectly, and no system can detect semantic
\r
246 conflicts. So there's no point in being lulled into a false
\r
247 promise that a locking system will somehow prevent conflicts;
\r
248 in practice, locking seems to inhibit productivity more than
\r
252 There is one common situation where the lock-modify-unlock
\r
253 model comes out better, and that is where you have unmergeable
\r
254 files. For example if your repository contains some graphic
\r
255 images, and two people change the image at the same time, there
\r
256 is no way for those changes to be merged together. Either Harry
\r
257 or Sally will lose their changes.
\r
260 <sect2 id="tsvn-basics-versioning-4">
\r
261 <title>What does Git Do?</title>
\r
263 Git uses the copy-modify-merge solution by default,
\r
264 and in many cases this is all you will ever need. However,
\r
265 as of Version 1.2, Git also supports file locking,
\r
266 so if you have unmergeable files, or if you are simply
\r
267 forced into a locking policy by management, Git
\r
268 will still provide the features you need.
\r
272 <!-- How svn implements the philosophy -->
\r
273 <sect1 id="tsvn-basics-svn">
\r
274 <title>Git in Action</title>
\r
275 <sect2 id="tsvn-basics-svn-workingcopy">
\r
276 <title>Working Copies</title>
\r
278 <primary>working copy</primary>
\r
281 You've already read about working copies; now we'll
\r
282 demonstrate how the Git client creates and uses
\r
286 A Git working copy is an ordinary directory tree on
\r
287 your local system, containing a collection of files. You can
\r
288 edit these files however you wish, and if they're source code
\r
289 files, you can compile your program from them in the usual
\r
290 way. Your working copy is your own private work area:
\r
291 Git will never incorporate other people's changes, nor
\r
292 make your own changes available to others, until you
\r
293 explicitly tell it to do so.
\r
296 After you've made some changes to the files in your
\r
297 working copy and verified that they work properly, Git
\r
298 provides you with commands to <emphasis>publish</emphasis> your changes to the
\r
299 other people working with you on your project (by writing to
\r
300 the repository). If other people publish their own changes,
\r
301 Git provides you with commands to merge those changes
\r
302 into your working directory (by reading from the
\r
306 A working copy also contains some extra files, created and
\r
307 maintained by Git, to help it carry out these commands.
\r
308 In particular, each directory in your working copy contains a
\r
309 subdirectory named <filename>.svn</filename>, also known as
\r
310 the working copy <firstterm>administrative
\r
311 directory</firstterm>. The files in each administrative
\r
312 directory help Git recognize which files contain
\r
313 unpublished changes, and which files are out-of-date with
\r
314 respect to others' work.
\r
317 A typical Git repository often holds the files (or
\r
318 source code) for several projects; usually, each project is a
\r
319 subdirectory in the repository's filesystem tree. In this
\r
320 arrangement, a user's working copy will usually correspond to
\r
321 a particular subtree of the repository.
\r
324 For example, suppose you have a repository that contains
\r
325 two software projects.
\r
327 <figure id="tsvn-basics-dia6">
\r
328 <title>The Repository's Filesystem</title>
\r
329 <graphic fileref="images/ch02dia6.png"/>
\r
332 In other words, the repository's root directory has two
\r
333 subdirectories: <filename>paint</filename> and
\r
334 <filename>calc</filename>.
\r
337 To get a working copy, you must <firstterm>check
\r
338 out</firstterm> some subtree of the repository. (The term
\r
339 <emphasis>check out</emphasis> may sound like it has something to do with locking
\r
340 or reserving resources, but it doesn't; it simply creates a
\r
341 private copy of the project for you).
\r
344 Suppose you make changes to <filename>button.c</filename>.
\r
345 Since the <filename>.svn</filename> directory remembers the
\r
346 file's modification date and original contents, Git can
\r
347 tell that you've changed the file. However, Git does
\r
348 not make your changes public until you explicitly tell it to.
\r
349 The act of publishing your changes is more commonly known as
\r
350 <firstterm>committing</firstterm> (or <firstterm>checking
\r
351 in</firstterm>) changes to the repository.
\r
354 To publish your changes to others, you can use
\r
355 Git's <command>commit</command> command.
\r
358 Now your changes to <filename>button.c</filename> have
\r
359 been committed to the repository; if another user checks out a
\r
360 working copy of <filename>/calc</filename>, they will see
\r
361 your changes in the latest version of the file.
\r
364 Suppose you have a collaborator, Sally, who checked out a
\r
365 working copy of <filename>/calc</filename> at the same time
\r
366 you did. When you commit your change to
\r
367 <filename>button.c</filename>, Sally's working copy is left
\r
368 unchanged; Git only modifies working copies at the
\r
372 To bring her project up to date, Sally can ask
\r
373 Git to <firstterm>update</firstterm> her working copy,
\r
374 by using the Git <command>update</command> command.
\r
375 This will incorporate your changes into her working copy, as
\r
376 well as any others that have been committed since she checked
\r
380 Note that Sally didn't need to
\r
381 specify which files to update; Git uses the information
\r
382 in the <filename>.svn</filename> directory, and further
\r
383 information in the repository, to decide which files need to
\r
384 be brought up to date.
\r
387 <sect2 id="tsvn-basics-svn-urls">
\r
388 <title>Repository URLs</title>
\r
390 Git repositories can be accessed through many
\r
391 different methods - on local disk, or through various
\r
392 network protocols. A repository location, however, is
\r
393 always a URL. The URL schema indicates the access
\r
396 <table id="tsvn-basics-svn-table-1">
\r
397 <title>Repository Access URLs</title>
\r
399 <colspec colnum="1" colwidth="2*"/>
\r
400 <colspec colnum="2" colwidth="5*"/>
\r
403 <entry>Schema</entry>
\r
404 <entry>Access Method</entry>
\r
410 <literal>file://</literal>
\r
413 Direct repository access on local or network drive.
\r
418 <literal>http://</literal>
\r
421 Access via WebDAV protocol to Git-aware Apache server.
\r
426 <literal>https://</literal>
\r
429 Same as <literal>http://</literal>, but with SSL encryption.
\r
434 <literal>svn://</literal>
\r
437 Unauthenticated TCP/IP access via custom protocol
\r
438 to a <literal>svnserve</literal> server.
\r
443 <literal>svn+ssh://</literal>
\r
446 authenticated, encrypted TCP/IP access via custom protocol
\r
447 to a <literal>svnserve</literal> server.
\r
454 For the most part, Git's URLs use the standard
\r
455 syntax, allowing for server names and port numbers to be
\r
456 specified as part of the URL.
\r
457 The <literal>file://</literal> access method is normally used
\r
458 for local access, although it can be used with UNC paths to
\r
459 a networked host. The URL therefore takes the form
\r
460 <systemitem class="url">file://hostname/path/to/repos</systemitem>. For the
\r
461 local machine, the <literal>hostname</literal> portion of the URL is required
\r
462 to be either absent or <literal>localhost</literal>. For
\r
463 this reason, local paths normally appear with three slashes,
\r
464 <systemitem class="url">file:///path/to/repos</systemitem>.
\r
467 Also, users of the <literal>file://</literal> scheme on
\r
468 Windows platforms will need to use an unofficially
\r
469 <quote>standard</quote> syntax for accessing repositories
\r
470 that are on the same machine, but on a different drive than
\r
471 the client's current working drive. Either of the two
\r
472 following URL path syntaxes will work where
\r
473 <literal>X</literal> is the drive on which the repository
\r
477 file:///X:/path/to/repos
\r
479 file:///X|/path/to/repos
\r
483 Note that a URL uses ordinary slashes even though the native
\r
484 (non-URL) form of a path on Windows uses backslashes.
\r
487 You can safely access a FSFS repository via a network share,
\r
488 but you <emphasis>cannot</emphasis> access a BDB repository
\r
493 Do not create or access a Berkeley DB repository on a network share.
\r
494 It <emphasis>cannot</emphasis> exist on a remote filesystem.
\r
495 Not even if you have the network drive mapped to a drive letter.
\r
496 If you attempt to use Berkeley DB on a network share,
\r
497 the results are unpredictable - you may see mysterious errors
\r
498 right away, or it may be months before you discover that your
\r
499 repository database is subtly corrupted.
\r
503 <sect2 id="tsvn-basics-svn-revisions">
\r
504 <title>Revisions</title>
\r
506 <primary>revision</primary>
\r
509 A <command>svn commit</command> operation can publish
\r
510 changes to any number of files and directories as a single
\r
511 atomic transaction. In your working copy, you can change
\r
512 files' contents, create, delete, rename and copy files and
\r
513 directories, and then commit the complete set of changes as a
\r
517 In the repository, each commit is treated as an atomic
\r
518 transaction: either all the commits changes take place, or
\r
519 none of them take place. Git retains this
\r
520 atomicity in the face of program crashes, system crashes,
\r
521 network problems, and other users' actions.
\r
524 Each time the repository accepts a commit, this creates a
\r
525 new state of the filesystem tree, called a
\r
526 <firstterm>revision</firstterm>. Each revision is assigned a
\r
527 unique natural number, one greater than the number of the
\r
528 previous revision. The initial revision of a freshly created
\r
529 repository is numbered zero, and consists of nothing but an
\r
530 empty root directory.
\r
533 A nice way to visualize the repository is as a series of
\r
534 trees. Imagine an array of revision numbers, starting at 0,
\r
535 stretching from left to right. Each revision number has a
\r
536 filesystem tree hanging below it, and each tree is a
\r
537 <quote>snapshot</quote> of the way the repository looked after
\r
540 <figure id="tsvn-basics-dia7">
\r
541 <title>The Repository</title>
\r
542 <graphic fileref="images/ch02dia7.png"/>
\r
545 <title>Global Revision Numbers</title>
\r
547 Unlike those of many other version control systems,
\r
548 Git's revision numbers apply to <emphasis>entire
\r
549 trees</emphasis>, not individual files. Each revision
\r
550 number selects an entire tree, a particular state of the
\r
551 repository after some committed change. Another way to
\r
552 think about it is that revision N represents the state of
\r
553 the repository filesystem after the Nth commit. When a
\r
554 Git user talks about ``revision 5 of
\r
555 <filename>foo.c</filename>'', they really mean
\r
556 ``<filename>foo.c</filename> as it appears in revision 5.''
\r
557 Notice that in general, revisions N and M of a file do
\r
558 <emphasis>not</emphasis> necessarily differ!
\r
562 It's important to note that working copies do not always
\r
563 correspond to any single revision in the repository; they may
\r
564 contain files from several different revisions. For example,
\r
565 suppose you check out a working copy from a repository whose
\r
566 most recent revision is 4:
\r
574 At the moment, this working directory corresponds exactly
\r
575 to revision 4 in the repository. However, suppose you make a
\r
576 change to <filename>button.c</filename>, and commit that
\r
577 change. Assuming no other commits have taken place, your
\r
578 commit will create revision 5 of the repository, and your
\r
579 working copy will now look like this:
\r
587 Suppose that, at this point, Sally commits a change to
\r
588 <filename>integer.c</filename>, creating revision 6. If you
\r
589 use <command>svn update</command> to bring your working copy
\r
590 up to date, then it will look like this:
\r
598 Sally's changes to <filename>integer.c</filename> will
\r
599 appear in your working copy, and your change will still be
\r
600 present in <filename>button.c</filename>. In this example,
\r
601 the text of <filename>Makefile</filename> is identical in
\r
602 revisions 4, 5, and 6, but Git will mark your working
\r
603 copy of <filename>Makefile</filename> with revision 6 to
\r
604 indicate that it is still current. So, after you do a clean
\r
605 update at the top of your working copy, it will generally
\r
606 correspond to exactly one revision in the repository.
\r
609 <sect2 id="tsvn-basics-svn-wcrepository">
\r
610 <title>How Working Copies Track the Repository</title>
\r
612 For each file in a working directory, Git records
\r
613 two essential pieces of information in the
\r
614 <filename>.svn/</filename> administrative area:</para>
\r
618 what revision your working file is based on
\r
619 (this is called the file's <firstterm>working
\r
620 revision</firstterm>), and
\r
625 a timestamp recording when the local copy was
\r
626 last updated by the repository.
\r
631 Given this information, by talking to the repository,
\r
632 Git can tell which of the following four states a
\r
633 working file is in:
\r
637 <term>Unchanged, and current</term>
\r
640 The file is unchanged in the working
\r
641 directory, and no changes to that file have been committed
\r
642 to the repository since its working revision. A
\r
643 <command>commit</command> of the file will do nothing,
\r
644 and an <command>update</command> of the file will do
\r
650 <term>Locally changed, and current</term>
\r
653 The file has been changed in the working
\r
654 directory, and no changes to that file have been committed
\r
655 to the repository since its base revision. There are local
\r
656 changes that have not been committed to the repository, thus
\r
657 a <command>commit</command> of the file will succeed in
\r
658 publishing your changes, and an <command>update</command>
\r
659 of the file will do nothing.
\r
664 <term>Unchanged, and out-of-date</term>
\r
667 The file has not been changed in the working
\r
668 directory, but it has been changed in the repository. The
\r
669 file should eventually be updated, to make it current with
\r
670 the public revision. A <command>commit</command> of the
\r
671 file will do nothing, and an <command>update</command> of
\r
672 the file will fold the latest changes into your working
\r
678 <term>Locally changed, and out-of-date</term>
\r
681 The file has been changed both in the
\r
682 working directory, and in the repository. A <command>commit</command>
\r
683 of the file will fail with an <emphasis>out-of-date</emphasis>
\r
684 error. The file should be updated first; an <command>update</command>
\r
685 command will attempt to merge the public
\r
686 changes with the local changes. If Git can't
\r
687 complete the merge in a plausible way automatically, it
\r
688 leaves it to the user to resolve the
\r
696 <sect1 id="tsvn-basics-summary">
\r
697 <title>Summary</title>
\r
699 We've covered a number of fundamental Git concepts in
\r
705 We've introduced the notions of the central repository,
\r
706 the client working copy, and the array of repository
\r
707 revision trees.</para>
\r
711 We've seen some simple examples of how two collaborators
\r
712 can use Git to publish and receive changes from one
\r
713 another, using the 'copy-modify-merge' model.
\r
718 We've talked a bit about the way Git tracks and
\r
719 manages information in a working copy.
\r
727 sgml-parent-document: ("book.xml" "chapter")
\r