Author's photo

Moving Git Repository Files Preserving Change History

  • 21 Oct 2023
  • 1375 words
  • 7 min read

tool git

Figure 1. A visual concept of merging things into one
Figure 1. A visual concept of merging things into one

Sometimes, you realize that a common library is being used in a single project but resides in its own Git repository. At other times, you may decide to merge a few Git repositories into one to reduce maintenance issues. There are many other similar scenarios that involve moving a portion of one Git repository into another. A common problem that unites all these cases is how to preserve the Git history for the moved files.

The problem can be described as follows: given two Git repositories, source and target, the task is to move content from source to target, either wholly or partially, while preserving the Git change history.

How to preserve change history

The main idea, as described in this source of wisdom, is to create a patch file that includes all commits for the files you want to move. Then, apply that patch to a target repository, as shown in the code snippet below.

# Go into the source git repo and create a patch.
git log --pretty=email --reverse --binary --full-index \
        --patch-with-stat --first-parent -m \
        -- <path-to-file-or-directory> > history.patch

# Go into the target git repo and apply the patch.
git am --committer-date-is-author-date \
       < <path-to-source-repo-with-patch>/history.patch

Options used in the aforementioned commands:

  • --pretty=email - shows commit logs in the email format, which is required by git am to apply the patch.
  • --reverse - lists commits in reverse order.
  • --binary - includes a binary diff in the patch file.
  • --full-index - shows the full hash for each commit on the “index” line when generating patch output.
  • --patch-with-stat - generates a patch with diff statistics, which is a synonym for -p --stat.
  • --first-parent - follows only the first parent commit when encountering a merge commit.
  • -m - provides information for each commit, including merge commits. Without this option the individual changes introduced by merge commits are not shown.
  • -- <path-to-file-or-directory> - shows only commits related to a specified file or directory.
  • --committer-date-is-author-date - keeps the committer date the same as the author date to preserve the original timestamps of the changes.

You may encounter some variations depending on the resulting directory structure where you want the files to be moved. However, the approach is almost the same. To gain hands-on experience, let’s use two Git repositories and explore the main possible scenarios for moving files along with their change history.

Setting up the stage

I have created two Git repositories with their initial history. Here’s the structure of the source-git-repo repository:

source-git-repo
└── app
    ├── docs
    │   └── HOWTO.txt
    └── src
        ├── main
        │   └── java
        │       └── App.java
        └── test
            └── java
                └── AppTest.java

And here’s the structure of the target-git-repo repository:

target-git-repo
├── README.md
└── app
    └── src
        └── main
            ├── cpp
            │   └── app.cpp
            └── headers
                └── app.h

Moving files into the same directory

Let’s move source-git-repo/app/src into target-git-repo/app/src.

Create a patch specifically for a directory to be moved:

cd source-git-repo
git log --pretty=email --reverse --binary --full-index \
        --patch-with-stat --first-parent -m \
        -- app/src > history.patch

Now, go to the target repository and apply the patch:

cd ../target-git-repo
git am --committer-date-is-author-date < ../source-git-repo/history.patch

# Output of the previous command:
Applying:  source-git-repo -> add simple App
Applying:  source-git-repo -> remove redundant comments
Applying:  source-git-repo -> add tests
Applying:  source-git-repo -> remove java comments from tests
Applying:  source-git-repo -> move to default package
Applying:  source-git-repo -> fix compilation errors

If you check the target repository’s structure, you’ll see new files have appeared under the correct structure:

target-git-repo
├── README.md
└── app
    └── src
        ├── main
        │   ├── cpp
        │   │   └── app.cpp
        │   ├── headers
        │   │   └── app.h
        │   └── java
        │       └── App.java
        └── test
            └── java
                └── AppTest.java

Let’s also check the Git history for the target repository before and after the patch is applied:

# Before patch
git log --pretty=oneline

238ebaae28d50f0a5598ea6c82a2f10f47552a66 (HEAD -> main)  target-git-repo -> remove gradle and simplify repo
a00279e8ab44f3623e9b79f8915ffe1592709e38  target-git-repo -> init cpp app skeleton

# After patch
git log --pretty=oneline

21d445aab6ed57cda334021edac818f9d694d078 (HEAD -> main)  source-git-repo -> fix compilation errors
16509a5d165e42637d4d34705a3d90f7087d2016  source-git-repo -> move to default package
051f615172de04db74869882d39183c8f2a5d2fd  source-git-repo -> remove java comments from tests
47b06a0a1df7cd252fa6d7904224081a40a82ced  source-git-repo -> add tests
c24f2d1faa637781d5e171904d3c78764dc1cee7  source-git-repo -> remove redundant comments
46220bc67b4a7121b396cd61e5650ca74862e776  source-git-repo -> add simple App
238ebaae28d50f0a5598ea6c82a2f10f47552a66  target-git-repo -> remove gradle and simplify repo
a00279e8ab44f3623e9b79f8915ffe1592709e38  target-git-repo -> init cpp app skeleton

You can see that the commits from the source repository were applied on top of the existing history. Therefore, we have successfully moved the files and preserved their change history in the target repository.

Moving files into a different directory

Let’s do the following file movement now: source-git-repo/app/* -> target-git-repo/app-java. You don’t want the app directory to be present in app-java; instead, you want all subdirectories and files from app to be directly present under app-java.

Let’s create a patch for this scenario:

cd source-git-repo
git log --pretty=email --reverse --binary --full-index \
        --patch-with-stat --first-parent -m \
        -- app > history.patch

Now, to apply it to the target repository and properly organize the new files, you need to use two additional options -p2 and --directory:

cd ../target-git-repo
git am -p2 --committer-date-is-author-date --directory app-java \
       < ../source-git-repo/history.patch
       
# Output of the previous command:
Applying:  source-git-repo -> add simple App
Applying:  source-git-repo -> remove redundant comments
Applying:  source-git-repo -> add build.gradle
Applying:  source-git-repo -> add tests
Applying:  source-git-repo -> remove java comments from tests
Applying:  source-git-repo -> move to default package
Applying:  source-git-repo -> fix compilation errors
Applying:  source-git-repo -> fix build errors
Applying:  source-git-repo -> simplify repo

Here, you are applying the patch while removing a leading path component source-git-repo/app (stripping directory information) using the -p2 option and placing the patched files into a subdirectory named target-git-repo/app-java via the --directory option.

Take a look at the resulted structure of the target repository:

target-git-repo
├── README.md
├── app
│   └── src
│       └── main
│           ├── cpp
│           │   └── app.cpp
│           └── headers
│               └── app.h
└── app-java
    ├── docs
    │   └── HOWTO.txt
    └── src
        ├── main
        │   └── java
        │       └── App.java
        └── test
            └── java
                └── AppTest.java

As expected, all the content from source-git-repo/app is now available in target-git-repo/app-java. The Git history contains all changes related to the moved files:

git log --pretty=oneline

a770f378bced672290fc62da66145a9e3b073f37 (HEAD -> main)  source-git-repo -> simplify repo
b16dd23121d56e9d0703a2ec5628f2c743207500  source-git-repo -> fix build errors
ae07d962403660b493d9a6454b0f8d0813015fb4  source-git-repo -> fix compilation errors
a2d1fe162ec9009f218031a3f49b9087683c6046  source-git-repo -> move to default package
fda2ef3b4d1c559f0ebb39eda3f61c7f62c68d81  source-git-repo -> remove java comments from tests
fd158342bc14c25db7aa7faba881e555012102a7  source-git-repo -> add tests
c7e626fafd5ad552f8f4ca0b0a038e8753498d89  source-git-repo -> add build.gradle
058a0331d922654b0c5209c3b3e7c227226a763f  source-git-repo -> remove redundant comments
6b85c90fbfd5ae978946e117b01188b99a487a86  source-git-repo -> add simple App
238ebaae28d50f0a5598ea6c82a2f10f47552a66  target-git-repo -> remove gradle and simplify repo
a00279e8ab44f3623e9b79f8915ffe1592709e38  target-git-repo -> init cpp app skeleton

Moving a directory while excluding specific files or subdirectories

Let’s say you want to move source-git-repo/app into target-git-repo/app, but you don’t want source-git-repo/app/src/test to be included in the target repository. This can be achieved using an exclusion string called pathspec during patch creation, as follows:

cd source-git-repo
git log --pretty=email --reverse --binary --full-index \
        --patch-with-stat --first-parent -m \
        -- app ':!app/src/test' > history.patch

Where -- app ':!app/src/test' means: obtain all changes related to app directory but exclude all changes under the app/src/test path.

Apply the patch:

cd ../target-git-repo
git am --committer-date-is-author-date \
       < ../source-git-repo/history.patch

The structure of the target repository includes everything from source-git-repo/app, excluding all files and subdirectories under source-git-repo/app/src/test:

target-git-repo
├── README.md
└── app
    ├── docs
    │   └── HOWTO.txt
    └── src
        └── main
            ├── cpp
            │   └── app.cpp
            ├── headers
            │   └── app.h
            └── java
                └── App.java

Summary

The approach to move files while preserving their change history, as described in this post, is one of many. It is not ideal and has its own drawbacks:

  • It may be challenging to create a patch for files with a very large change history.
  • The change history for moved files is applied on top of the existing history in the target repository. This can result in a lengthy list of commits related to the moved files on top of the current history, making it challenging to navigate the current history.

However, it gets the job done and can be considered a good starting point. If you know of a better approach, I would be eager to learn more. Please share in the comments below.