代码之家  ›  专栏  ›  技术社区  ›  Dean Meehan

如何将JDF文件转换为PDF(从多编码文档中删除文本)

  •  0
  • Dean Meehan  · 技术社区  · 5 年前

    我正在尝试使用C#将JDF文件转换为PDF文件。

    看了照片之后 JDF format

    我试过用 StreamWriter / StreamReader

    using (StreamReader reader = new StreamReader(_jdf.FullName, Encoding.Default))
    {
        using (StreamWriter writer = new StreamWriter(_pdf.FullName, false, Encoding.Default))
        {
    
            writer.NewLine = "\n"; //Tried without this and with \r\n
    
            bool IsStartOfPDF = false;
            while (!reader.EndOfStream)
            {
                var line = reader.ReadLine();
    
                if (line.IndexOf("%PDF-") != -1)
                {
                    IsStartOfPDF = true;
                }
    
                if (!IsStartOfPDF)
                {
                    continue;
                }
    
                writer.WriteLine(line);
            }
        }
    }
    
    1 回复  |  直到 5 年前
        1
  •  0
  •   Dean Meehan    5 年前

    我自己回答这个问题,因为这可能是一个有点常见的问题,解决方案可以提供信息给其他人。

    StreamWriter 将二进制文件写回另一个文件。即使你使用 要读取一个文件,然后将所有内容写入另一个文件,您将意识到文档之间的差异。

    BinaryWriter 为了搜索一个由多个部分组成的文档,并将每个字节准确地写入另一个文档中。

    //Using a Binary Reader/Writer as the PDF is multitype
    using (var reader = new BinaryReader(File.Open(_file.FullName, FileMode.Open)))
    {
        using (var writer = new BinaryWriter(File.Open(tempFileName.FullName, FileMode.CreateNew)))
        {
    
            //We are searching for the start of the PDF 
            bool searchingForstartOfPDF = true;
            var startOfPDF = "%PDF-".ToCharArray();
    
            //While we haven't reached the end of the stream
            while (reader.BaseStream.Position != reader.BaseStream.Length)
            {
                //If we are still searching for the start of the PDF
                if (searchingForstartOfPDF)
                {
                    //Read the current Char
                    var str = reader.ReadChar();
    
                    //If it matches the start of the PDF signiture
                    if (str.Equals(startOfPDF[0]))
                    {
                        //Check the next few characters to see if they match
                        //keeping an eye on our current position in the stream incase something goes wrong
                        var currBasePos = reader.BaseStream.Position;
                        for (var i = 1; i < startOfPDF.Length; i++)
                        {
                            //If we found a char that isn't in the PDF signiture, then resume the while loop
                            //to start searching again from the next position
                            if (!reader.ReadChar().Equals(startOfPDF[i]))
                            {
                                reader.BaseStream.Position = currBasePos;
                                break;
                            }
                            //If we've reached the end of the PDF signiture then we've found a match
                            if (i == startOfPDF.Length - 1)
                            {
                                //Success
                                //Set the Position to the start of the PDF signiture 
                                searchingForstartOfPDF = false;
                                reader.BaseStream.Position -= startOfPDF.Length;
                                //We are no longer searching for the PDF Signiture so 
                                //the remaining bytes in the file will be directly wrote
                                //using the stream writer
                            }
                        }
                    }
                }
                else
                {
                    //We are writing the binary now
                    writer.Write(reader.ReadByte());
                }
            }
    
        }
    }
    

    此代码示例使用 BinaryReader 读取每个字符1乘1,如果它找到字符串的匹配项 %PDF- % 然后使用 writer.Write(reader.ReadByte()) .