descriptionPrograms to feed git-fast-import
ownerfrej.drejhammar@gmail.com
last changeSat, 2 Mar 2024 19:25:29 +0000 (2 20:25 +0100)
content tags
add:
README.md

hg-fast-export.sh - mercurial to git converter using git-fast-import

Most hg-* scripts are licensed under the MIT license and were written by Rocco Rutte <pdmef@gmx.net> with hints and help from the git list and #mercurial on freenode. hg-reset.py is licensed under GPLv2 since it copies some code from the mercurial sources.

The current maintainer is Frej Drejhammar <frej.drejhammar@gmail.com>.

Support

If you have problems with hg-fast-export or have found a bug, please create an issue at the github issue tracker. Before creating a new issue, check that your problem has not already been addressed in an already closed issue. Do not contact the maintainer directly unless you want to report a security bug. That way the next person having the same problem can benefit from the time spent solving the problem the first time.

System Requirements

This project depends on Python (>=3.7) and the Mercurial package (>= 5.2). If Python is not installed, install it before proceeding. The Mercurial package can be installed with pip install mercurial.

On windows the bash that comes with "Git for Windows" is known to work well.

Usage

Using hg-fast-export is quite simple for a mercurial repository <repo>:

git init repo-git # or whatever
cd repo-git
hg-fast-export.sh -r <local-repo>
git checkout

Please note that hg-fast-export does not automatically check out the newly imported repository. You probably want to follow up the import with a git checkout-command.

Incremental imports to track hg repos is supported, too.

Using hg-reset it is quite simple within a git repository that is hg-fast-export'ed from mercurial:

hg-reset.sh -R <revision>

will give hints on which branches need adjustment for starting over again.

When a mercurial repository does not use utf-8 for encoding author strings and commit messages the -e <encoding> command line option can be used to force fast-export to convert incoming meta data from <encoding> to utf-8. This encoding option is also applied to file names.

In some locales Mercurial uses different encodings for commit messages and file names. In that case, you can use --fe <encoding> command line option which overrides the -e option for file names.

As mercurial appears to be much less picky about the syntax of the author information than git, an author mapping file can be given to hg-fast-export to fix up malformed author strings. The file is specified using the -A option. The file should contain lines of the form "<key>"="<value>". Inside the key and value strings, all escape sequences understood by the python unicode_escape encoding are supported; strings are otherwise assumed to be UTF8-encoded. (Versions of fast-export prior to v171002 had a different syntax, the old syntax can be enabled by the flag --mappings-are-raw.)

The example authors.map below will translate User <garbage<tab><user@example.com> to User <user@example.com>.

-- Start of authors.map --
"User <garbage\t<user@example.com>"="User <user@example.com>"
-- End of authors.map --

If you have many Mercurial repositories, Chris J Billington's hg-export-tool allows you to batch convert them.

Tag and Branch Naming

As Git and Mercurial have differ in what is a valid branch and tag name the -B and -T options allow a mapping file to be specified to rename branches and tags (respectively). The syntax of the mapping file is the same as for the author mapping.

When the -B and -T flags are used, you will probably want to use the -n flag to disable the built-in (broken in many cases) sanitizing of branch/tag names. In the future -n will become the default, but in order to not break existing incremental conversions, the default remains with the old behavior.

By default, the default mercurial branch is renamed to the master branch on git. If your mercurial repo contains both default and master branches, you'll need to override this behavior. Use -M <newName> to specify what name to give the default branch.

Content filtering

hg-fast-export supports filtering the content of exported files. The filter is supplied to the --filter-contents option. hg-fast-export runs the filter for each exported file, pipes its content to the filter's standard input, and uses the filter's standard output in place of the file's original content. The prototypical use of this feature is to convert line endings in text files from CRLF to git's preferred LF:

-- Start of crlf-filter.sh --
#!/bin/sh
# $1 = pathname of exported file relative to the root of the repo
# $2 = Mercurial's hash of the file
# $3 = "1" if Mercurial reports the file as binary, otherwise "0"

if [ "$3" == "1" ]; then cat; else dos2unix -q; fi
# -q option in call to dos2unix allows to avoid returning an
# error code when handling non-ascii based text files (like UTF-16
# encoded text files)
-- End of crlf-filter.sh --

Plugins

hg-fast-export supports plugins to manipulate the file data and commit metadata. The plugins are enabled with the --plugin option. The value of said option is a plugin name (by folder in the plugins directory), and optionally, and equals-sign followed by an initialization string.

There is a readme accompanying each of the bundled plugins, with a description of the usage. To create a new plugin, one must simply add a new folder under the plugins directory, with the name of the new plugin. Inside, there must be an __init__.py file, which contains at a minimum:

def build_filter(args):
    return Filter(args)

class Filter:
    def __init__(self, args):
        pass
        #Or don't pass, if you want to do some init code here

Beyond the boilerplate initialization, you can see the two different defined filter methods in the dos2unix and branch_name_in_commit plugins.

commit_data = {'branch': branch, 'parents': parents, 'author': author, 'desc': desc, 'revision': revision, 'hg_hash': hg_hash, 'committer': 'committer', 'extra': extra}

def commit_message_filter(self,commit_data):

The commit_message_filter method is called for each commit, after parsing from hg, but before outputting to git. The dictionary commit_data contains the above attributes about the commit, and can be modified by any filter. The values in the dictionary after filters have been run are used to create the git commit.

file_data = {'filename':filename,'file_ctx':file_ctx,'data':file_contents}

def file_data_filter(self,file_data):

The file_data_filter method is called for each file within each commit. The dictionary file_data contains the above attributes about the file, and can be modified by any filter. file_ctx is the filecontext from the mercurial python library. After all filters have been run, the values are used to add the file to the git commit.

The file_data_filter method is also called when files are deleted, but in this case the data and file_ctx keys map to None. This is so that a filter which modifies file names can apply the same name transformations when files are deleted.

Submodules

See README-SUBMODULES.md for how to convert subrepositories into git submodules.

Notes/Limitations

hg-fast-export supports multiple branches but only named branches with exactly one head each. Otherwise commits to the tip of these heads within the branch will get flattened into merge commits. There are a few options to deal with this:

  1. Chris J Billington's hg-export-tool can help you to handle branches with duplicate heads.
  2. Use the head2branch plugin to create a new named branch from an unnamed head.
  3. You can ignore unnamed heads with the --ignore-unnamed-heads option, which is appropriate in situations such as the extra heads being close commits (abandoned, unmerged changes).

hg-fast-export will ignore any files or directories tracked by mercurial called .git, and will print a warning if it encounters one. Git cannot track such files or directories. This is not to be confused with submodules, which are described in README-SUBMODULES.md.

As each git-fast-import run creates a new pack file, it may be required to repack the repository quite often for incremental imports (especially when importing a small number of changesets per incremental import).

The way the hg API and remote access protocol is designed it is not possible to use hg-fast-export on remote repositories (http/ssh). First clone the repository, then convert it.

Design

hg-fast-export was designed in a way that doesn't require a 2-pass mechanism or any prior repository analysis: it just feeds what it finds into git-fast-import. This also implies that it heavily relies on strictly linear ordering of changesets from hg, i.e. its append-only storage model so that changesets hg-fast-export already saw never get modified.

Submitting Patches

Please create a pull request at Github to submit patches.

When submitting a patch make sure the commits in your pull request:

Please do not submit a pull request if you are not willing to spend the time required to address review comments or revise the patch until it follows the guidelines above. A take it or leave it approach to contributing wastes both your and the maintainer's time.

Frequent Problems

shortlog
2024-03-02 Frej DrejhammarCI: Remove run-tests scriptmaster
2024-02-23 Frej DrejhammarDrop manual CodeQL actions
2024-02-23 Frej DrejhammarMerge branch 'gh/321'
2024-02-23 Frej DrejhammarMerge branch 'gh/320'
2024-02-23 Stephan HoheAdd tests for plugins setting file content to None
2024-02-20 Stephan HoheDon't add file if plugin sets content to `None`
2024-02-19 Stephan HoheFix escape in regular expression
2024-02-16 Frej DrejhammarMerge branch 'frej/gh318'
2024-02-16 Frej DrejhammarRun file_data_filter on deleted files
2024-02-16 Frej DrejhammarMake plugin loader look in directories relative to cwd
2023-12-28 Frej DrejhammarMerge branch 'frej/run-tests-with-different-python...
2023-12-28 Frej DrejhammarRun tests with multiple Python versions
2023-12-28 Frej DrejhammarCheck for a supported Python version on startup
2023-12-28 Frej DrejhammarUpdate required version of Python to 3.7
2023-12-28 Frej DrejhammarAdd command line flag to dump found versions
2023-12-28 Frej DrejhammarMerge branch 'frej/fix-314'
...
tags
5 months ago v231118 v231118
18 months ago v221024 v221024
19 months ago v220921 v220921
2 years ago v210917 New features in the v210917 release:
3 years ago v201029 New features in the v201029 release:
4 years ago v200213 v200213
4 years ago v190913 v190913
5 years ago v190107 v190107
5 years ago v180610 v180610
6 years ago v180317 v180317
6 years ago v180126 v180126
6 years ago v171002 v171002
6 years ago v170826 v170826
6 years ago v170818 v170818
6 years ago v170624 v170624
6 years ago v170617 v170617
...
heads
7 weeks ago master
16 years ago git-p4
forks
Cached version (6141s old)
 source.codes 
fast-export/rorcz.git Local repo.or.cz fork with needed fixes mackyle@gmail.com 7 years ago
fast-export/benizi.git repo.or.cz@benizi.com 11 years ago
fast-export/fast-export-unix-compliant.git dereckson@espace... 11 years ago
fast-export/dharding.git dharding@gmail.com 11 years ago
fast-export/barak.git tweaks and maybe debian packaging info barak@cs.nuim.ie 12 years ago