代码之家  ›  专栏  ›  技术社区  ›  Chris Jefferson

拒绝git中的大文件

  •  7
  • Chris Jefferson  · 技术社区  · 15 年前

    编辑:问题来自意外提交的大文件,在这种情况下是程序输出的大转储。这样做的目的是为了避免意外提交,因为如果开发人员确实意外提交了一个大文件,那么试图将其从存储库中取出是一个下午,没有人可以做任何工作,并且必须修复他们所有的本地分支。

    4 回复  |  直到 14 年前
        1
  •  2
  •   araqnid    15 年前

    问题到底是什么时候发生的?当他们最初提交文件时,还是当文件被推到别处时?如果您有一个每个人都要执行的staging repo,那么您可以实现一个更新钩子来扫描大文件的更改引用,以及其他权限检查等。

    非常 粗略和现成的示例:

    git --no-pager log --pretty=oneline --name-status $2..$3 -- | \
      perl -MGit -lne 'if (/^[0-9a-f]{40}/) { ($rev, $message) = split(/\s+/, $_, 2) }
         else { ($action, $file) = split(/\s+/, $_, 2); next unless $action eq "A"; 
           $filesize = Git::command_oneline("cat-file", "-s", "$rev:$file");
           print "$rev added $file ($filesize bytes)"; die "$file too big" if ($filesize > 1024*1024*1024) }';
    

    以调用$GIT_DIR/hooks/update的方式调用(args是ref name、old rev、new rev;例如“refs/heads/master master~2 master”),这将显示添加的文件,如果添加的文件太大,则会中止。

    请注意,我想说的是,如果你要对这类事情进行监督,你需要一个集中点来进行监督。如果您相信您的团队只是彼此交换更改,那么您应该相信他们知道添加巨大的二进制文件是一件坏事。

        2
  •  2
  •   Anthony Geoghegan    8 年前

        3
  •  1
  •   Thomas L Holaday    15 年前

    如果您可以控制提交者的工具链,那么修改git commit可能很简单,以便它在“真正”提交之前对文件大小执行合理性测试。由于核心的这种变化会在每次提交时给所有git用户带来负担,而且“驱逐任何会提交1.5GB变化的人”的替代策略具有吸引人的简单性,因此我怀疑这种测试在核心中永远不会被接受。我建议您权衡一下维护本地git分支(nannygit)的负担和在一次过于雄心勃勃的提交后修复崩溃的git的负担。

    我必须承认,我很好奇1.5GB的提交是如何产生的。是否涉及视频文件?

        4
  •  0
  •   ddub    10 年前
    Here is my solution. I must admit it doesn't look like others I have seen, but to me it makes the most sense. It only checks the inbound commit. It does detect when a new file is too large, or an existing file becomes too big. It is a pre-receive hook. Since tags are size 0, it does not check them.
    
        #!/usr/bin/env bash
    #
    # This script is run after receive-pack has accepted a pack and the
    # repository has been updated.  It is passed arguments in through stdin
    # in the form
    #  <oldrev> <newrev> <refname>
    # For example:
    #  aa453216d1b3e49e7f6f98441fa56946ddcd6a20 68f7abf4e6f922807889f52bc043ecd31b79f814 refs/heads/master
    #
    # see contrib/hooks/ for an sample, or uncomment the next line (on debian)
    #
    
    set -e
    
    let max=1024*1024
    count=0
    echo "Checking file sizes..."
    while read oldrev newrev refname
    do
    #   echo $oldrev $newrev $refname
        # skip the size check for tag refs
        if [[ ${refname} =~ ^refs/tags/* ]]
        then
            continue
        fi
    
        if [[ ${newrev} =~ ^[0]+$ ]]
        then
            continue
        fi
    
        # find all refs we don't care about and exclude them from diff
        if [[ ! ${oldrev} =~ ^[0]+$ ]]
        then
            excludes=^${oldrev}
        else
            excludes=( $(git for-each-ref --format '^%(refname:short)' refs/heads/) )
        fi
    #   echo "excludes " ${excludes}
        commits=$(git rev-list $newrev "${excludes[@]}")
        for commit in ${commits};
        do
    #       echo "commit " ${commit}
            # get a list of the file changes in this commit
            rawdiff=$(git diff-tree --no-commit-id ${commit})
            while read oldmode newmode oldsha newsha code fname
            do
    #           echo "reading " ${oldmode} ${newmode} ${oldsha} ${newsha} ${code} ${fname}
                # if diff-tree returns anything, new sha is not all 0's, and it is a file (blob)
                if [[ "${newsha}" != "" ]] && [[ ! ${newsha} =~ ^[0]+$ ]] && [[ $(git cat-file -t ${newsha}) == "blob" ]]
                then
                    echo -n "${fname} "
                    newsize=$(git cat-file -s ${newsha})
                    if (( ${newsize} > ${max} ))
                    then
                        echo " size ${newsize}B > ${max}B"
                        let "count+=1"
                    else
                        echo "ok"
                    fi
                fi
            done <<< "${rawdiff}"
        done
    done
    
    exit ${count}