代码之家  ›  专栏  ›  技术社区  ›  Oak

How to extract a single file from a remote archive file?

  •  10
  • Oak  · 技术社区  · 14 年前

    鉴于

    1. URL of an archive (e.g. a zip file)
    2. Full name (including path) of a file inside that archive

    我正在寻找一种方法(最好是Java)来创建该文件的本地副本, 不需要先下载整个归档文件 .

    From my (limited) understanding it should be possible, though I have no idea how to do that. 我一直在使用 TrueZip ,因为它似乎支持各种各样的存档类型,但我对它以这种方式工作的能力表示怀疑。有人对这类事情有经验吗?

    编辑: being able to also do that with tarballs and zipped tarballs is also important for me.

    4 回复  |  直到 9 年前
        1
  •  9
  •   David Z    14 年前

    URLConnection to the archive, get its input stream, wrap it in a ZipInputStream , and repeatedly call getNextEntry() closeEntry() to iterate through all the entries in the file until you reach the one you want. Then you can read its data using ZipInputStream.read(...) .

    Java代码看起来是这样的:

    URL url = new URL("http://example.com/path/to/archive");
    ZipInputStream zin = new ZipInputStream(url.getInputStream());
    ZipEntry ze = zin.getNextEntry();
    while (!ze.getName().equals(pathToFile)) {
        zin.closeEntry(); // not sure whether this is necessary
        ze = zin.getNextEntry();
    }
    byte[] bytes = new byte[ze.getSize()];
    zin.read(bytes);
    

    This is, of course, untested.

        2
  •  5
  •   Adam Crume    14 年前

    与这里的其他答案相反,我想指出zip条目是单独压缩的,因此(理论上)您不需要下载目录和条目本身以外的任何内容。服务器需要支持 Range HTTP header for this to work.

    标准JAVA API只支持从本地文件和输入流读取zip文件。据我所知,没有从随机访问远程文件读取的规定。

    Since you're using TrueZip, I recommend implementing de.schlichtherle.io.rof.ReadOnlyFile using Apache HTTP Client and creating a de.schlichtherle.util.zip.ZipFile 就这样。

    这不会为压缩的tar档案提供任何优势,因为整个档案都被压缩在一起(不仅仅是使用一个输入流,当你有条目的时候就杀死它)。

        3
  •  2
  •   Christian Schlichtherle    13 年前

    Since TrueZIP 7.2, there is a new client API in the module TrueZIP Path. This is an implementation of an NIO.2 FileSystemProvider for JSE 7. Using this API, you can access HTTP URI as follows:

    Path path = new TPath(new URI("http://acme.com/download/everything.tar.gz/README.TXT"));
    try (InputStream in = Files.newInputStream(path)) {
        // Read archive entry contents here.
        ...
    }
    
        4
  •  0
  •   Michael    14 年前

    我不确定是否有一种方法可以在不首先下载整个文件的情况下从压缩包中提取单个文件。但是,如果您是承载zip文件的那个人,您可以创建一个Java servlet,它读取zip文件并在响应中返回请求的文件:

    public class GetFileFromZIPServlet extends HttpServlet{
      @Override
      public void doGet(HttpServletRequest request, HttpServletResponse response)
      throws ServletException, IOException{
        String pathToFile = request.getParameter("pathToFile");
    
        byte fileBytes[];
        //get the bytes of the file from the ZIP
    
        //set the appropriate content type, maybe based on the file extension
        response.setContentType("...");
    
        //write file to the response
        response.getOutputStream().write(fileBytes);
      }
    }