Sverre Rabbelier [Thu, 26 Mar 2009 17:25:14 +0000 (26 18:25 +0100)]
gitstats: Licensed GitStats under the Apache License, 2.0
In order to facilitate other projects using and/or incorperating
GitStats, it is convenient to explicitly license it.
Sverre Rabbelier [Tue, 27 Jan 2009 22:35:09 +0000 (27 23:35 +0100)]
gitstats: Teach 'stats.py author -f' to sort output by commit count
The main usage of author -f is to find out who has been working on a
certain file, so it makes sense to sort the output.
Sverre Rabbelier [Thu, 31 Jul 2008 04:35:38 +0000 (31 06:35 +0200)]
gitstats: Bugfix for the config parsing mechanism
When refactoring it to make testing easier a typo slipped
in and the readDefaultConfigs was actually broken instead.
Bug-introduced-in:
9126a29e13ca515cb63849f480965c86c88b5d5d
Sverre Rabbelier [Thu, 31 Jul 2008 04:34:15 +0000 (31 06:34 +0200)]
gitstats: Added some example values to gitstats-bug.txt
Also added an explanation of how the *_rating settings are
used, and what they should be set to.
Sverre Rabbelier [Thu, 31 Jul 2008 04:19:33 +0000 (31 06:19 +0200)]
gitstats: Ran ispell on all gitstats-* files in doc/
With this, all files in doc/ have been spellchecked, the
source files are probably too bothersome to spellcheck.
Sverre Rabbelier [Thu, 31 Jul 2008 04:19:33 +0000 (31 06:19 +0200)]
gitstats: Ran ispell on all non gitstats-* files in doc/
The same should be done for all the gitstats-* files in the
doc/ directory as well.
Sverre Rabbelier [Wed, 30 Jul 2008 21:50:52 +0000 (30 23:50 +0200)]
gitstats: Expanded the documentation of the bug module
After having to explain on the mailing list exactly what
the command does, it made sense to put that explanation in
the documentation.
Sverre Rabbelier [Tue, 29 Jul 2008 23:00:24 +0000 (30 01:00 +0200)]
gitstats: Renamed Memory->GitCache and Options->OptionList
The current names were somewhat questionable, these renames
give them a name that better explains the purpose of these
classes.
Sverre Rabbelier [Mon, 28 Jul 2008 12:49:28 +0000 (28 14:49 +0200)]
gitstats: Add a description about the bug module
This briefly lists the most useful functions it exports,
it should later be made to include a more detailed
description of what metrics the user should expect this
subcommand to contain.
Sverre Rabbelier [Tue, 29 Jul 2008 12:36:22 +0000 (29 14:36 +0200)]
gitstats: Renamed the GitStats to include a gitstats- prefix
To make it easier to see which files are the actual
documentation, prefix those with gitstats-.
Sverre Rabbelier [Mon, 28 Jul 2008 12:49:28 +0000 (28 14:49 +0200)]
gitstats: Add a description about the tests module
This gives a very brief description, not really giving a
lot of information. It should probably be extended later
to give some more information.
Sverre Rabbelier [Mon, 28 Jul 2008 12:49:28 +0000 (28 14:49 +0200)]
gitstats: Add a description about the matcher module
This briefly lists the most useful functions it exports,
it should later be made to include a more detailed
description of what metrics the user should expect this
subcommand to contain.
Sverre Rabbelier [Mon, 28 Jul 2008 12:49:28 +0000 (28 14:49 +0200)]
gitstats: Add a description about the index module
This briefly lists the most useful functions it exports,
it should later be made to include a more detailed
description of what metrics the user should expect this
subcommand to contain.
Sverre Rabbelier [Mon, 28 Jul 2008 12:49:28 +0000 (28 14:49 +0200)]
gitstats: Add a description about the diff module
This briefly lists the most useful functions it exports,
it should later be made to include a more detailed
description of what metrics the user should expect this
subcommand to contain.
Sverre Rabbelier [Sat, 26 Jul 2008 16:52:08 +0000 (26 18:52 +0200)]
gitstats: Add a general note about the 'stats.py'
This briefly lists how the stats.py command works, and what
is required to run it.
Sverre Rabbelier [Mon, 28 Jul 2008 12:49:28 +0000 (28 14:49 +0200)]
gitstats: Add a description about the commit module
This briefly lists the most useful functions it exports,
it should later be made to include a more detailed
description of what metrics the user should expect this
subcommand to contain.
Sverre Rabbelier [Sun, 27 Jul 2008 18:45:56 +0000 (27 20:45 +0200)]
gitstats: Add a description about the author module
This briefly lists the most useful functions it exports,
it should later be made to include a more detailed
description of what metrics the user should expect this
subcommand to contain.
Sverre Rabbelier [Sun, 27 Jul 2008 18:26:17 +0000 (27 20:26 +0200)]
gitstats: Add a description about the branch module
This briefly lists the most useful functions it exports,
it should later be made to include a more detailed
description of what metrics the user should expect this
subcommand to contain.
Sverre Rabbelier [Sat, 26 Jul 2008 16:29:57 +0000 (26 18:29 +0200)]
gitstats: Convert matcher.py to use optparse
Since we wrap matcher.py with stats.py, the matcher.py
module should provide a usage message when passed '--help',
as such it makes sense to use optparse.
Sverre Rabbelier [Sat, 26 Jul 2008 10:45:04 +0000 (26 12:45 +0200)]
gitstats: Added a README describing GitStats purpose
The README of a project usually contains information on the
project itself, for example, how it was made, what it's
purpose is, and what it's layout is. Follow this custom by
adding one for GitStats.
Sverre Rabbelier [Sat, 26 Jul 2008 10:43:01 +0000 (26 12:43 +0200)]
gitstats: Expanded the INSTALL file to include usage information
Previously it only contained information on the external
requirements of GitStats, now it also explains how to set
it up and then use it.
Sverre Rabbelier [Sat, 26 Jul 2008 10:27:31 +0000 (26 12:27 +0200)]
gitstats: Renamed README to INSTALL
On Unix, the README file usually contains information on
the program itself, whereas the INSTALL file is used to
provide the user with information on how to install the
program (makes sense, doesn't it). The current README file
is more of an INSTALL file anyway, so rename it to make
room for a proper README.
Sverre Rabbelier [Thu, 24 Jul 2008 23:31:25 +0000 (25 01:31 +0200)]
gitstats: Don't require stats.py to be in $PATH when running the regression tests
Instead, assume we run from src/t and that stats.py is in
the src/ directory. This way we can run the regression
tests without requiring that the user changes their PATH.
Sverre Rabbelier [Thu, 24 Jul 2008 22:07:44 +0000 (25 00:07 +0200)]
gitstats: Wrote unittests for the config module and hooked them up
It is now possible to run the tests for the config module
by either "stats.py test -a", "stats.py test -s config",
"stats.py test -m config_test", or by specifying it as one
of the modules to be testing with "stats.py test -t".
Sverre Rabbelier [Thu, 24 Jul 2008 19:06:23 +0000 (24 21:06 +0200)]
gitstats: Refactor config.py to make it more testable
Split up the 'readAll' metho into a 'default config paths'
gathering part, and a 'multiple paths reading' part.
Sverre Rabbelier [Thu, 24 Jul 2008 16:52:30 +0000 (24 18:52 +0200)]
gitstats: Do not require lines to end with a '\n' in the config parser
There is no reason to require this, and it makes testing
easier if it is not required.
Sverre Rabbelier [Thu, 24 Jul 2008 16:39:00 +0000 (24 18:39 +0200)]
gitstats: Hooked up the test suite in stats.py
This way it is possible to run the unit tests using the
stats.py command itself.
Sverre Rabbelier [Thu, 24 Jul 2008 16:36:41 +0000 (24 18:36 +0200)]
gitstats: Added a module to dispatch unit-testing commands
This should be the last part needed for a fully working
unit-testing framework around GitStats that shows it's
output in a format like the git regression test suite.
Sverre Rabbelier [Thu, 24 Jul 2008 15:58:09 +0000 (24 17:58 +0200)]
gitstats: Improved testing.py output and removed manual parsing of arguments
Instead of parsing manually, leave it to the calling
function to provide us with an option object. Also, the
output when a test failed was sub-optimal, we should show
the error message even when --verbose is not set.
Sverre Rabbelier [Thu, 24 Jul 2008 12:24:18 +0000 (24 14:24 +0200)]
gitstats: Made config.read take a bunch of strings instead of a path
This way it is easier to test it, and makes it possible to
pass different types of input to the method other than just
a file.
Sverre Rabbelier [Wed, 23 Jul 2008 16:19:14 +0000 (23 18:19 +0200)]
gitstats: Use dashed_form for variable names instead of camelCase
The overal style is to use dashed_form for variable names,
this fixes up the exceptions.
Sverre Rabbelier [Wed, 23 Jul 2008 15:43:36 +0000 (23 17:43 +0200)]
gitstats: Replace check_file with checkFile
To be consistent with the naming conventions everywhere
the name should be camelCase and not in dashed_form.
Sverre Rabbelier [Wed, 23 Jul 2008 15:42:41 +0000 (23 17:42 +0200)]
gitstats: Remove unneeded executionable bit on setupRepo.py
The testcases now run it through the python interpreter
explicitly, we don't require it to be executable anymore
now.
Sverre Rabbelier [Wed, 23 Jul 2008 15:41:36 +0000 (23 17:41 +0200)]
gitstats: Don't asume setupRepo.py is executable
Instead run it explicitly through the python interpreter
which is more portable anyway.
Sverre Rabbelier [Sun, 20 Jul 2008 21:45:38 +0000 (20 23:45 +0200)]
gitstats: Replace backslashes with parens to do line continuation
In python one can use a backslash at the end of a line to
do line continuations. A better alternative however, is to
surround the long lines in parens and have python's auto
unpacking do it's magic on a tuple with only one element.
Sverre Rabbelier [Sun, 20 Jul 2008 21:33:28 +0000 (20 23:33 +0200)]
gitstats: Use format specifiers instead of appending to a string
Instead of using '"foo" + str(myvar) + "bar"' it reads
a lot more natural to use '"foo %s bar" % myvar', plus it
prevents whitespace 'bugs'.
Sverre Rabbelier [Sun, 20 Jul 2008 20:37:02 +0000 (20 22:37 +0200)]
gitstats: Make use of getattr instead of just trying
The idea was to check if an option was set, but if it was
not set, we would die because 'not foo.bar' does not work
if foo doesn to have a bar property. Instead, we want to
use 'not getattr(foo, "bar", None)'.
Sverre Rabbelier [Sun, 20 Jul 2008 19:31:11 +0000 (20 21:31 +0200)]
gitstats: Bugfix for diff.py, don't die on empty diffs
With the rewrite of the _splitFileDiff routine the default
was changed. The original behaviorwhen a diff with no hunks
was passed, was to 'eat' the entire content and return an
empty diff. This commit restores that behavior by setting
the size initially to the entire content.
Bug-introduced-in:
00a3c465c01e335317857e47f02fdeb93c79a052
Sverre Rabbelier [Sun, 20 Jul 2008 19:29:15 +0000 (20 21:29 +0200)]
gitstats: Renamed fileDiff to FileDiff
Class names consistently start with a capital letter and
are camelCased, with the exception of this class. This
commit fixes that.
Sverre Rabbelier [Sun, 20 Jul 2008 14:59:14 +0000 (20 16:59 +0200)]
gitstats: Use 'key in dict' instead of 'dict.has_key(key)'
It is both more Pythonic and more readable, enough reason
to warrant this 'style fix'.
Sverre Rabbelier [Fri, 18 Jul 2008 20:33:52 +0000 (18 22:33 +0200)]
gitstats: Added a unit-test framework for GitStats
This cumstom implementation tries to have it's output look
similar to the output of git's regression test suite.
Sverre Rabbelier [Thu, 17 Jul 2008 20:02:42 +0000 (17 22:02 +0200)]
gitstats: Make use of the 'sorted' built-in
This increases readability, instead of first extracting the
keys to the dictionary, sorting it, and then iterating
those, iterate over sorted(dict).
Sverre Rabbelier [Thu, 17 Jul 2008 19:48:57 +0000 (17 21:48 +0200)]
gitstats: Don't assume the temp path is /tmp
Instead, use the value returned by the script when running
it with the 'path' argument by passing it that value as a
third argument on a second run.
Sverre Rabbelier [Thu, 17 Jul 2008 19:46:55 +0000 (17 21:46 +0200)]
gitstats: Added a way to specify the path to use instead of the default
Now it is possible to specify the path to use as a third
argument to setupRepo.py.
Sverre Rabbelier [Thu, 17 Jul 2008 19:04:59 +0000 (17 21:04 +0200)]
gitstats: Added an option to print the path for setupRepo.py
In order to locate the created repository a second argument
is now understood. When running 'setupRepo.py test path'
the path of the test repo is printed. Operation is similar
with 'metrics'.
Sverre Rabbelier [Thu, 17 Jul 2008 18:29:46 +0000 (17 20:29 +0200)]
gitstats: Removed the redundant setupRepo.sh script
The setupRepo.sh script is not used anymore and as such
may be removed. To anyone interested in it it is available
in the repo history.
Sverre Rabbelier [Thu, 17 Jul 2008 17:19:47 +0000 (17 19:19 +0200)]
gitstats: Official 0.1.4 release of git-python
Updated the readme and added the location of git-python at
pypi for ease of download.
Sverre Rabbelier [Thu, 17 Jul 2008 17:17:04 +0000 (17 19:17 +0200)]
gitstats: Updated the tests to match the new output of 'author -a'
Made the tests match the new output of 'author -a' now that
it also lists a column with the 'net changes'.
Sverre Rabbelier [Thu, 17 Jul 2008 00:17:54 +0000 (17 02:17 +0200)]
gitstats: Added a 'net loc' to 'author -a'
This adds a column with the net changes to the result of
'author -a'.
Sverre Rabbelier [Wed, 16 Jul 2008 23:58:02 +0000 (17 01:58 +0200)]
gitstats: Updated the README to include specific installation instructions
Instruct the user to use tag 0.1.4-pre, which has been
tested to work with GitStats.
Sverre Rabbelier [Wed, 16 Jul 2008 19:53:36 +0000 (16 21:53 +0200)]
gitstats: Refactoring, cleanups and documentation
Checked all files and added documentation where needed.
Also added comments where appropriate as well as some
minor cleanups and refactoring.
Sverre Rabbelier [Tue, 15 Jul 2008 16:00:49 +0000 (15 18:00 +0200)]
gitstats: Introduce a bugfixRating to bug.py
Each commit that has one of the 'bug' properties now has a
bugfixRating. This rating is done by adding the values of
all the applicable <property>_rating settings.
Sverre Rabbelier [Tue, 15 Jul 2008 15:58:59 +0000 (15 17:58 +0200)]
gitstats: Teach config.py to read multiple verses
When there are multiple GitStats verses in one file we
should read them all as one. This commit does so.
Sverre Rabbelier [Tue, 15 Jul 2008 15:10:37 +0000 (15 17:10 +0200)]
gitstats: Make config.py more versatile
There is now a readAll function that tries to read from the
common places. Support for integer values has been added,
as well as converting CamelCase keys to dashed_form.
This also makes the parser properly stop at both unindented
and empty lines.
Sverre Rabbelier [Tue, 15 Jul 2008 14:05:31 +0000 (15 16:05 +0200)]
gitstats: Refactored bug.py to use a Memory and Options class
The amount of options passed to determineType was getting
out of hand, instead of passing around individual arguments
we now pass a newly created Memory and a newly created
Options object. This also factors out some common code into
the Memory constructor.
Sverre Rabbelier [Tue, 15 Jul 2008 14:03:45 +0000 (15 16:03 +0200)]
gitstats: Provide a default empty dict to pretty names
The dictionary is not required for normal operation and as
such should not be mandatory.
Sverre Rabbelier [Tue, 15 Jul 2008 00:14:41 +0000 (15 02:14 +0200)]
gitstats: Make use of the ignore_parents option to belongsTo
We now automatically turn it on when there are more than
1000 commits in the rev-list of the starting point. There
is also a command-line option and a config option for this.
Sverre Rabbelier [Tue, 15 Jul 2008 00:06:04 +0000 (15 02:06 +0200)]
gitstats: Bugfix for branchContains Fix the branch
When re-using the code to get all branchnames and then
storing it as branchname/hash the %(objectname) was not
changed to %(refname), as a result the stored 'name' for
the hash was the hash itself.
This fixes that and also introduces a way to specify
multiple branches as branch-filter. In the proces it fixes
the bug caused by passing a string to set() where a array
with just one string should have been passed.
Bug-introduced-in:
af2fff8d9d6a7973a6b86ddac26188683a78a51a
Sverre Rabbelier [Mon, 14 Jul 2008 22:43:06 +0000 (15 00:43 +0200)]
gitstats: General cleanups in bug.py
Main changes are renaming variables with camelCase names
to dashed_form names and refactoring extractKwargs to use
a list of options to be extracted.
Updated the sample config file to match the rename changes.
Sverre Rabbelier [Mon, 14 Jul 2008 22:22:51 +0000 (15 00:22 +0200)]
gitstats: Allow specifying True, False, or None on the command line
This teaches parse.py to translate True, False, and None
into the actual Python objects for the new type 'bool'.
Sverre Rabbelier [Mon, 14 Jul 2008 22:10:46 +0000 (15 00:10 +0200)]
gitstats: Allow specifying True, False, or None in the config file
This teaches config.py to translate True, False, and None
into the actual Python objects.
Sverre Rabbelier [Mon, 14 Jul 2008 17:11:28 +0000 (14 19:11 +0200)]
gitstats: Added an option to limit the amount of commits checked
When gathering bug information on commits it is useful on
large repositories to gather stats on only the most recent
commits instead of all commits. Defaults to 10, which is a
small value, but makes 'bug -a' usable on git.git.
Sverre Rabbelier [Mon, 14 Jul 2008 16:44:01 +0000 (14 18:44 +0200)]
gitstats: Add an option to ignore parent in the 'belongs to' metric
For repositories with long history it is faster to first
call 'git rev-list' for the target commit and then ignore
those commits. However, for smaller repositories the
overhead of all these rev-lists makes it more slow.
Sverre Rabbelier [Sat, 12 Jul 2008 22:26:37 +0000 (13 00:26 +0200)]
gitstats: Added a proof of concept matcher
It tries to match hunks that were added with hunks that
were removed in an intelligent manner by taking the diff
and comparing the sizes.
Sverre Rabbelier [Sat, 12 Jul 2008 22:23:48 +0000 (13 00:23 +0200)]
gitstats: Added an option to disable line numbering in diff parsing
This makes it easier to reseuse the parsed diff for other
purposes than visual presentation. Also restored the use of
the context variable, in which the unparsed context line
is stored for usage by others.
Sverre Rabbelier [Sat, 12 Jul 2008 18:50:09 +0000 (12 20:50 +0200)]
gitstats: Print authors sorted in 'author -e'
Per request, sort the authors before listing their activity
so that it is easier to browse for a specific author.
Sverre Rabbelier [Sat, 12 Jul 2008 18:11:14 +0000 (12 20:11 +0200)]
gitstats: Two more memories were added and some refactoring
Both a memory for files and what commits touched them and
one for raw diffs was added. Together with some refactoring
this speeds up the entire bug detection process so much it
is now actually feasible to run it on a regular repository.
Sverre Rabbelier [Sat, 12 Jul 2008 16:04:46 +0000 (12 18:04 +0200)]
gitstats: Add a 'commits that touch' memory to commit.py
This saves a lot of calls to the git binary since often
there are only relatively few files and relatively many
commits.
Sverre Rabbelier [Sat, 12 Jul 2008 13:33:19 +0000 (12 15:33 +0200)]
gitstats: Don't look before we leap in getting the commit diff
It's ok if that means that occasionally we hit a root
commit and have to retry with --root instead. Since most
commits -do- have a parent this gives a speed improvement
of almost 1.5x!
Sverre Rabbelier [Sat, 12 Jul 2008 12:55:40 +0000 (12 14:55 +0200)]
gitstats: Add a diff memory to diff.py and use it in bug.py
This way, when we try to find reverts for many commits we
don't do so much double work. This results in a speedup
of about 15 times.
Sverre Rabbelier [Fri, 11 Jul 2008 20:27:28 +0000 (11 22:27 +0200)]
gitstats: Check for empty diffs
When a diff introduces no changes we need to check this
explicitly or the rest of the code will die because it
is missing data. This adds checks for that.
Also, when finding reverts, do not check for the commit
that we are checking for.
Sverre Rabbelier [Fri, 11 Jul 2008 20:25:05 +0000 (11 22:25 +0200)]
gitstats: Allow for checking deleted files in commitsThatTouched
By adding the '--' infix it is possible to specify files
that are not present in the working directory.
Sverre Rabbelier [Fri, 11 Jul 2008 15:45:58 +0000 (11 17:45 +0200)]
gitstats: Refactored stats.py to use a main function
This makes it easier to call stats.py's main through exec
and as such to profile it with programs as cProfile.
Sverre Rabbelier [Fri, 11 Jul 2008 13:07:25 +0000 (11 15:07 +0200)]
gitstats: Refactored bug.py to allow for a trivial aggregation function
By moving some logic into a seperate function (namely the
part that figures out what options we want) it is now easy
to loop over all commits and determine their type.
This is very slow though, some optimization will have to be
done to make it reasonably fast.
Sverre Rabbelier [Fri, 11 Jul 2008 13:02:12 +0000 (11 15:02 +0200)]
gitstats: Restructured the commit and diff module so that diff depends on commit
After finding a buggy 'getCommitDiff' in diff.py, while we
already have a very well functioning one in commit.py, it
made sense to make diff.py depend on commit.py. Previously
commit.py used one method from diff.py (being findReverts)
so that diff.py could not depend on commit.py (since that
would create a circular dependency). To fix this the
findReverts function was moved to diff.py (together with
'commit -v' => 'diff -r').
Sverre Rabbelier [Thu, 10 Jul 2008 11:55:14 +0000 (10 13:55 +0200)]
gitstats: Moved the commit information into a seperate class
Instead of returning a string with info we in determineType
we now return an object that has a __str__ method defined
to do the same thing.
Sverre Rabbelier [Thu, 10 Jul 2008 11:33:26 +0000 (10 13:33 +0200)]
gitstats: Don't die when an option is not specified at all
In bug.py we died when an option (such as the diff filter)
was not specified at all (e.g., both not in the config file
and not on the command line). Instead we should just ignore
it.
Sverre Rabbelier [Thu, 10 Jul 2008 11:31:19 +0000 (10 13:31 +0200)]
gitstats: When the specified path was not found, return an empty dict
Instead of dying on open() in config.py when the specified
path is not a file (or does not exist) , just return an
empty dict.
Sverre Rabbelier [Thu, 10 Jul 2008 11:20:20 +0000 (10 13:20 +0200)]
gitstats: Made use of the new config parser in bug.py
With the new config parser it is now possible to just run
'stats.py bug -t HEAD' and get a report on that commit.
Sverre Rabbelier [Thu, 10 Jul 2008 11:14:56 +0000 (10 13:14 +0200)]
gitstats: Bugfix to _parseFileDiff so that it handles diffs with mode changes only
Before it would die on patches that contain only mode
changes, this commit fixes that.
Bug-introduced-in:
bc9d1040f323a0b09674f0a1c6ec7048e31580ab
Sverre Rabbelier [Thu, 10 Jul 2008 11:12:30 +0000 (10 13:12 +0200)]
gitstats: Add a config parser and an example configuration
A simple configuration parser that should be able to parse
the default git config format without too much trouble.
Included a sample configuration file with some same default
values.
Sverre Rabbelier [Wed, 9 Jul 2008 12:40:41 +0000 (9 14:40 +0200)]
gitstats: Added aggregation and showing all activity
Previously one could only see activity for one specific
developer with the 'author -d' switch. Now one can see the
activity of all developers with 'author -e'.
The total activity can now be seen with 'author -a', which
previously unimplemented.
In order to get this to work the activityInArea function
was modified to remember the activity not only per file,
but also per author.
Sverre Rabbelier [Tue, 8 Jul 2008 13:02:47 +0000 (8 15:02 +0200)]
gitstats: Improved the output of 'branch -b' with '-d'
The main difference is that the intermediate results (e.g.
the dilution per-branch) is now also shown, instead of only
showing the metrics of the best branch. As a bonus the
branch names are now printed using prettyName, instead of
printing their hashes.
Sverre Rabbelier [Sun, 6 Jul 2008 22:43:41 +0000 (7 00:43 +0200)]
gitstats: Optimization in the 'belongs to' metric
Retreive the parentage information in 'one big go' instead
of on a 'per branch' basis. This is now possible since we
do not need the individual rev-lists anymore since the
previous commit. This has two advantages, the rev-lists
mechanism cuts down the amount of parentage-information we
need to parse by not listing doubles (for example, we
always get the parentage information of the first commit
multiple times when there are multiple branches that branch
off from it). A second advantage is that this saves us a
lot of calls to the git binary (1 call, instead of one for
each branch).
Sverre Rabbelier [Sun, 6 Jul 2008 22:40:44 +0000 (7 00:40 +0200)]
gitstats: Don't filter out subsets in the belongs to metric
Not only is it expensive to calculate this, it is also
possible that it is not desired. Most likely too much
output is preferrable over too little output.
Sverre Rabbelier [Sun, 6 Jul 2008 22:12:52 +0000 (7 00:12 +0200)]
gitstats: Optimion to the 'belongs to' metric
Introduce a global 'memory' that allows for early
elmination of branches that will not end up to be the best
fit anyway. By remembering if a commit was already seen in
other branches with a lower dilution, we can stop
traversing the current path. We cannot replace the normal
memory with this global memory however, since that would
prevent us from finding all branches when a commit belongs
most to multiple branches.
Sverre Rabbelier [Sun, 6 Jul 2008 21:53:56 +0000 (6 23:53 +0200)]
gitstats: Bugfix to only ignore if dilution was 'worse'
In the case that a commit has already been seen, it should
also be checked if the dilution that is was seen by was
lower. If this is not checked, the result is that when a
'bad' path is walked first, later on when trying the
'better' path, it will be aborted due to the fact that this
commit was already examined. A simple solution is to store
by which dilution the commit was seen in a dictionary.
This also fixes the t8101-metrics test, the order of the
output had changed, but the test wasn't updated to reflect
this change.
Bug-introduced-in:
556fe5b883c97ad5ea8143d987d3799673e17001
Sverre Rabbelier [Sun, 6 Jul 2008 21:19:43 +0000 (6 23:19 +0200)]
gitstats: Optimizations to the 'belongs to' metric
On the git.git repository there are a lot of 'strange' tree
structures, in which the same commits are often refered to
by multiple different other commits. As a result, if we do
not take measures against this, a lot of the same history
is checked multiple times. Especially in the git.git repo
this means the algorithm takes an insanely long amount of
time to finish. This commit introduces a 'memory' that
stores which commits have already been checked, and as such
should not be checked again.
To help debugging the information printed when the '-d'
flag is specified has been increased, the '-d' flag now
doubles as a '--verbose' flag. Not only debug information
is printed, but also information on which branch is
currently being examined.
Sverre Rabbelier [Sun, 6 Jul 2008 19:43:30 +0000 (6 21:43 +0200)]
gitstats: Disable debug output, print minimum dilution
It may be usefull to know that the minimum dilution is not
zero, as this means that such a commit was merged in from
a branch that is not a branch head.
Sverre Rabbelier [Fri, 4 Jul 2008 20:58:19 +0000 (4 22:58 +0200)]
gitstats: Added a simple 'stats.py bug' command
Currently it examines a specific commit using the
appropriate metrics. With the use of flags it is possible
to specify how to use these sub-metrics, and what arguments
to run them with.
Sverre Rabbelier [Fri, 4 Jul 2008 20:54:08 +0000 (4 22:54 +0200)]
gitstats: Use a stack-based approach instead of a recursive algorithm
Because python has a rather low recursion limit, we run
into this limit when running the 'belongs to' metric on a
non-trivial repo. This commit removes the recursion and
replaces it by a stack-based algorithm.
Sverre Rabbelier [Fri, 4 Jul 2008 19:40:12 +0000 (4 21:40 +0200)]
gitstats: Print the usage help sorted and added some documentation
Instead of printing the command list in semi-random order,
sort it first and then print the sorted list in order.
Sverre Rabbelier [Fri, 4 Jul 2008 19:36:49 +0000 (4 21:36 +0200)]
gitstats: When parsing the parent listing take into account parentless commits
In the case that a commit doesn't have a parent we should
record this in the dictionary as a key with no value.
Sverre Rabbelier [Mon, 30 Jun 2008 12:00:03 +0000 (30 14:00 +0200)]
gitstats: Added a switch that lists only reverts of a specified commit
Now it is possible to find all commits reverted by a
specific commit with 'commit -v'. It is possible that
multiple commits are returned, or even future commits.
This should be changed so that only past commits, and only
the most recent one is returned.
Sverre Rabbelier [Mon, 30 Jun 2008 11:30:41 +0000 (30 13:30 +0200)]
gitstats: Minor cleanups, added documentation
Removed a few redundant newlines, made the 'commit -t'
option take a value instead of it being a boolean switch.
This way we do not have to enforce only one file being
specified. Also, added a switch to commitdiffEqual to make
it less verbose. A simple 'isRevert' method makes use of
this to provide a convenient wrapper.
Sverre Rabbelier [Sun, 29 Jun 2008 21:59:20 +0000 (29 23:59 +0200)]
gitstats: Added a "author -f" aggregation, and a test case.
With this switch one can see who is active in the specified
file. This activity is currently measured in amount of
commits. By making use of 'author -d' it could be made to
use the 'lines of change' instead.
Sverre Rabbelier [Sun, 29 Jun 2008 21:50:44 +0000 (29 23:50 +0200)]
gitstats: Moved isUnique into parse.py so that other modules may use it
Since commit.py is not really the place where isUnique
belongs, moving it to parse.py makes it more sensible for
other modules to use it. Renamed the function to isUnique
(from _isUnique) in the move.
Sverre Rabbelier [Sun, 29 Jun 2008 00:14:00 +0000 (29 02:14 +0200)]
gitstats: First gather all parentage information, then run metrics
With this patch the test suite runs 3 times faster. It is
probably even more than 3, since the 'setup time' (running
the generation script) is constant.
Sverre Rabbelier [Sun, 29 Jun 2008 00:03:48 +0000 (29 02:03 +0200)]
gitstats: Added tests for the 'belongs to' metric
This will help when improving the metric's speed later on.
Instead of having to test manually, this test suite will
not only provide for a quick way to see if everything still
works, it also allows for a controlled test case. This way
an easy 'before/after' comparison can be made by running
the test with the improvements, and without.
Sverre Rabbelier [Sat, 28 Jun 2008 23:40:31 +0000 (29 01:40 +0200)]
gitstats: Made the 'belongs to' metric recursive
Note: next time, try the recursive approach first, it works
like a charm! The big townside here is that it comes at the
cost of having to manually walk the entire tree, this is
very expensive, especially on windows this will be
impossibly slow for large repositories. (This is mainly due
to the high amount of calls to 'git log --pretty=format:%P'
which requires a fork each time. An optimization would be
to first get all the parents, and then query a hashtable.