Implement a commit filtering API for RevWalk
commit80e4d15045c6fe69da4c53e56d47e36ff0849cf9
authorShawn O. Pearce <spearce@spearce.org>
Sun, 9 Mar 2008 22:22:19 +0000 (9 18:22 -0400)
committerShawn O. Pearce <spearce@spearce.org>
Mon, 7 Apr 2008 03:40:45 +0000 (6 23:40 -0400)
tree78b92ae080519df0ae4d28a558e7eb89d178e06e
parent10bcc200fefb6496738df3bd06443972eaf42caa
Implement a commit filtering API for RevWalk

C Git supports options on its log family of operators to filter and
reduce the result set of the commits to only those that match one or
more filter commands.  Commonly used filters are --no-merges (skip all
merge commits), --author, --committer (search within the author or
committer lines) and --grep (search within the commit message).

We now support these filters through a generic RevFilter API.  Any
implementation of the filter can be registered with a RevWalk before
it starts walking, and only commits that the filter selects to be
included shall be returned.

During a filter the time to discard a commit as not interesting is
critical, as this is typically the most expensive operation occurring
during the walk.  For the string based matches we work against the raw
byte[] of the decompressed commit as this is faster than converting
the commit to a proper java.lang.String and scanning the string.

Two different types of string matches are used.  An optimized version
is included for substring matching when regexp patterns are not used
in the needle.  Such searches are very common and should be as fast as
we can make them.  An unoptimized version is also supported to execute
a generic regular expression through the java.util.regex package.

In the java.util.regex implementation we recycle the same Matcher, as
resetting an existing Matcher is faster than creating a new Matcher
per candidate string.  This recycling is what caused us to declare the
RevFilter API as not being thread-safe, as the Matcher class is also
not thread-safe.

Performance of the substring variant is about the same as the speed of
C Git, even though C Git is using a regular expression to execute the
match.  Performance of the regex pattern variant is about 2x slower
than C Git/Cygwin on a Windows x86 system.  Since this type of search
is usually less frequent the lower performance may be tolerated by an
end user application.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
15 files changed:
org.spearce.jgit/src/org/spearce/jgit/revwalk/DateRevQueue.java
org.spearce.jgit/src/org/spearce/jgit/revwalk/RevCommit.java
org.spearce.jgit/src/org/spearce/jgit/revwalk/RevWalk.java
org.spearce.jgit/src/org/spearce/jgit/revwalk/filter/AndRevFilter.java [new file with mode: 0644]
org.spearce.jgit/src/org/spearce/jgit/revwalk/filter/AuthorRevFilter.java [new file with mode: 0644]
org.spearce.jgit/src/org/spearce/jgit/revwalk/filter/CommitterRevFilter.java [new file with mode: 0644]
org.spearce.jgit/src/org/spearce/jgit/revwalk/filter/MessageRevFilter.java [new file with mode: 0644]
org.spearce.jgit/src/org/spearce/jgit/revwalk/filter/NotRevFilter.java [new file with mode: 0644]
org.spearce.jgit/src/org/spearce/jgit/revwalk/filter/OrRevFilter.java [new file with mode: 0644]
org.spearce.jgit/src/org/spearce/jgit/revwalk/filter/PatternMatchRevFilter.java [new file with mode: 0644]
org.spearce.jgit/src/org/spearce/jgit/revwalk/filter/RawParseUtils.java [new file with mode: 0644]
org.spearce.jgit/src/org/spearce/jgit/revwalk/filter/RevFilter.java [new file with mode: 0644]
org.spearce.jgit/src/org/spearce/jgit/revwalk/filter/SubStringRevFilter.java [new file with mode: 0644]
org.spearce.jgit/src/org/spearce/jgit/util/RawCharSequence.java [new file with mode: 0644]
org.spearce.jgit/src/org/spearce/jgit/util/RawSubStringPattern.java [new file with mode: 0644]