2016年7月5日星期二

How to do an initial push of a very large git repo

I've been trying to mirror the linux code repository from one git server (e.g. Gerrit) to another (e.g. GitHub Enterprise) since yesterday. The theory was nothing more than cloning from A and pushing to B. What makes this case worth a blog post is that linux is a project with decades of history (I just learned that this is not even a complete one) and millions of commits. GHE, on the other hand, limits the upload size. So when I did the push step (after cloning the original repo, renaming the original remote to upstream, adding the new repo as origin), I ran into the following error:

fatal: The remote end hung up unexpectedly
error: pack-objects died of signal 13
error: failed to push some refs to 'git@github.xxxx/XXX.git'

I then Google'd a bit and found a possible solution. The nice one-liner that javabrett mentioned splits the whole commit history into numerous pieces, each OK to be pushed, then pushes them one by one.

The paging parameter, i.e. number of commits each piece contains, needs to be carefully crafted. If you set it too high, you end up with fewer pushes, but one of them might exceed the size limit. Setting it too low, you'll then have to bear with more pushes, thus more waiting time. It's a trial-and-error kind of experience before you could finally settle. The silver lining, however, was that you could start where you failed last time, without deleting the repository and doing it all over again - git is smart enough to reject existing commits from being pushed. The one I finally settled with is 5000 (commits per push):

git log --reverse --oneline | sed -n '0~5000p' | awk '{print "git push origin "$1":refs/heads/master"}' | while read i; do eval $i; sleep 2; done

Not every push in the whole process was a success. Sometimes I ran into:

To git@github.ibm.com:workload/linux.git
 ! [rejected]        fdc657c -> master (non-fast-forward)
error: failed to push some refs to 'git@github.ibm.com:workload/linux.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

I suspect this was because I did everything in a loop and the next push came in before the server could fully settle the previous one (even with 2 seconds' sleep). I didn't actually look into the errors because commits in a failed push would also be carried in subsequent pushes, as long as one of them succeeded.

After the one-liner loop was done, the remainder commits needed also to be pushed:

git push origin master

Also if you have other branches, push them directly (maybe a --dry-run before the real deal) as they share most history as the master branch and the push would not be too large.

git checkout branchA
git push origin branchA

To double check if everything is in place:

git push --all origin

You should see Everything up-to-date.

Also don't forget tags:

git push --tags origin

Last but not least, you can do it from a whole different perspective, because only recent commits matter to you, say, one year. One from 2008 doesn't. So a shallow clone with reasonable depth should be just fine.

没有评论:

发表评论