代码之家  ›  专栏  ›  技术社区  ›  Jay Askren

如何使用zstdjni和字节缓冲区解压大型文件

  •  2
  • Jay Askren  · 技术社区  · 6 年前

    我正在尝试解压大量40 MB以上的文件,因为我下载他们在并行使用ByteBuffers和渠道。我通过使用通道获得了比使用流更好的吞吐量,我们需要这是一个非常高的吞吐量系统,因为我们每天需要处理40 TB的文件,这部分过程目前是瓶颈。这些文件是用 zstd-jni . zstdjni有用于解压缩字节缓冲区的api,但我在使用它们时遇到了一个错误。如何使用zstdjni一次解压缩一个字节缓冲区?

    我在他们的测试中发现了这些示例,但是除非我遗漏了一些东西,否则使用ByteBuffers的示例似乎假设整个输入文件都适合一个ByteBuffer: https://github.com/luben/zstd-jni/blob/master/src/test/scala/Zstd.scala

    public static long compressFile(String inFile, String outFolder, ByteBuffer inBuffer, ByteBuffer compressedBuffer, int compressionLevel) throws IOException {
        File file = new File(inFile);
        File outFile = new File(outFolder, file.getName() + ".zs");
        long numBytes = 0l;
    
        try (RandomAccessFile inRaFile = new RandomAccessFile(file, "r");
            RandomAccessFile outRaFile = new RandomAccessFile(outFile, "rw");
                    FileChannel inChannel = inRaFile.getChannel();
                    FileChannel outChannel = outRaFile.getChannel()) {
            inBuffer.clear();
            while(inChannel.read(inBuffer) > 0) {
                inBuffer.flip();
                compressedBuffer.clear();
    
                long compressedSize = Zstd.compressDirectByteBuffer(compressedBuffer, 0, compressedBuffer.capacity(), inBuffer, 0, inBuffer.limit(), compressionLevel);
                numBytes+=compressedSize;
                compressedBuffer.position((int)compressedSize);
                compressedBuffer.flip();
                outChannel.write(compressedBuffer);
                inBuffer.clear(); 
            }
        }
    
        return numBytes;
    }
    
    public static long decompressFile(String originalFilePath, String inFolder, ByteBuffer inBuffer, ByteBuffer decompressedBuffer) throws IOException {
        File outFile = new File(originalFilePath);
        File inFile = new File(inFolder, outFile.getName() + ".zs");
        outFile = new File(inFolder, outFile.getName());
    
        long numBytes = 0l;
    
        try (RandomAccessFile inRaFile = new RandomAccessFile(inFile, "r");
            RandomAccessFile outRaFile = new RandomAccessFile(outFile, "rw");
                    FileChannel inChannel = inRaFile.getChannel();
                    FileChannel outChannel = outRaFile.getChannel()) {
    
            inBuffer.clear();
    
            while(inChannel.read(inBuffer) > 0) {
                inBuffer.flip();
                decompressedBuffer.clear();
                long compressedSize = Zstd.decompressDirectByteBuffer(decompressedBuffer, 0, decompressedBuffer.capacity(), inBuffer, 0, inBuffer.limit());
                System.out.println(Zstd.isError(compressedSize) + " " + compressedSize);
                numBytes+=compressedSize;
                decompressedBuffer.position((int)compressedSize);
                decompressedBuffer.flip();
                outChannel.write(decompressedBuffer);
                inBuffer.clear(); 
            }
        }
    
        return numBytes;
    }
    
    1 回复  |  直到 6 年前
        1
  •  3
  •   karavelov    6 年前

    是的,您在示例中使用的静态方法假设整个压缩文件适合一个ByteBuffer。据我所知,您需要使用ByteBuffers进行流式解压缩。ZstdDirectBufferDecompressingStream已经提供了以下功能:

    https://static.javadoc.io/com.github.luben/zstd-jni/1.3.7-1/com/github/luben/zstd/ZstdDirectBufferDecompressingStream.html

    https://github.com/luben/zstd-jni/blob/master/src/test/scala/Zstd.scala#L261-L302

    但是您还必须将其子类化并重写“refill”方法。

    https://github.com/luben/zstd-jni/blob/master/src/test/scala/Zstd.scala#L540-L586