代码之家 › 专栏 › 技术社区 › user36800

PIL的Image.save('Image.pdf')创建了非常大的pdf

python-imaging-library python-3.x

user36800 · 技术社区 · 4 年前

screenBWsmall.png :

我使用Python图像库将其转换为PDF:

#!python
from PIL import Image 
im = Image.open('screenBWsmall.png')
im.save('screenBWsmall.pdf')

convert ,从Bash命令行发出:

convert screenBWsmall.png screenBWsmall_IM.pdf

文件大小为:

  11093 screenBWsmall.png
1050994 screenBWsmall.pdf
  16999 screenBWsmall_IM.pdf

虽然我对此感到困惑,但考虑到更大的文件 screenBWsmall.pdf 每像素使用1位(每组件使用1位,或 bpc )相比之下,8 bpc的较小文件 screenBWsmall_IM.pdf :

$ pdfimages.exe -list screenBWsmall.pdf
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image     960   540  gray    1   1  image  no         1  0    72    72 1025K 1621%

$ pdfimages.exe -list screenBWsmall_IM.pdf
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
-------------------------------------------------------------------------------------------- 
   1     0 image     960   540  gray    1   8  image  no         8  0    72    72 14.9K 2.9%

这个 Image.save documentation 没有提供太多的信息,我可以推测的原因,大文件大小。

转换 ? 我想用Python来做,因为我将用许多文件执行更复杂的步骤。

我的Python版本是:

Python 3.8.8 (default, Mar  4 2021, 21:24:42) 
[GCC 10.2.0] on cygwin

转换

感谢在他的建议下,我系统地尝试了3种方法,将2张JPG图像压缩并合并成1个PDF文件。为了减小文件大小,三种方法如下。

方法1: 使用Python的PIL生成 (见 jpg2pdf.py 下面)。在此过程中,将两个图像的收缩版本保存到单独的PNG文件中,以便 方法#3 .

方法2: 使用ImageMagick的 转换 产生

convert -sample 50% -type Bilevel +dither IMG_077[45].JPG IMG_077x_IMcvt.pdf

方法3: 转换 从PIL压缩PNG文件以生成

convert IMG_077[45]small.png IMG_077x_PIL+IMcvt.pdf

输出PDF文件大小 (顺序与上述方法相同):

12350481 IMG_077x_PIL.pdf
 1234076 IMG_077x_IMcvt.pdf
  149782 IMG_077x_PIL+IMcvt.pdf

2号有几个MBs:

 2526685 IMG_0774.JPG
 2699515 IMG_0775.JPG

2号方法#1和#3中使用的是几十KBs:

   67283 IMG_0775small.png
   61968 IMG_0774small.png

观察:

方法#1:非常适合缩小图像,但非常不适合生成比实际大小大两个数量级的巨大PDF文件。
方法2:中间路线,最方便,但PDF文件的大小比它必须的大一个数量级。
方法#3:需要Python、PIL和 转换 . 这是最不方便,但最有效的字节。生成的PDF仅略大于两个PNG图像的总和。

输出PDF文件的特征

page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
$pdfimages -list IMG_077x_PIL.pdf
   1     0 image    1512  2016  gray    1   1  image  no         1  0    72    72 6030K 1621%
   2     1 image    1512  2016  gray    1   1  image  no         4  0    72    72 6030K 1621%

$pdfimages -list IMG_077x_IMcvt.pdf
   1     0 image    2016  1512  gray    1   8  jpeg   no         8  0    72    72  576K  19% 
   2     1 image    2016  1512  gray    1   8  jpeg   no        22  0    72    72  626K  21% 

$pdfimages -list IMG_077x_PIL+IMcvt.pdf
   1     0 image    1512  2016  gray    1   8  image  no         8  0    72    72 68.9K 2.3% 
   2     1 image    1512  2016  gray    1   8  image  no        22  0    72    72 74.6K 2.5%

#!python

# jpg2pdf.py
#-----------
# Use PIL to subsample, rotate, and convert 2 JPGs to B&W.
# Save each to small PNGs.
# Combine both into a PDF.

import os
from PIL import Image

ims = [] # Stores the 2 images
fns=('IMG_0774.JPG','IMG_0775.JPG') # Filenames of the 2 images

for fn in fns:
   
   # Read, resize, rotate, convert to B&W, add to list `fns`
   im = Image.open(fn)
   im = im.resize((im.width//2, im.height//2))
   im = im.rotate(-90,expand=True)
   im = im.convert(mode="1", dither=Image.NONE)
   ims.append(im)

   # Write IMG_077[45]small.png
   fnBase = os.path.splitext(fn)[0]
   im.save( fnBase+'small.png' )

# Write both to a single PDF
ims[0].save( 'IMG_077x_PIL.pdf' , save_all=True , append_images=ims[1:] )

测试输入文件

This dummy JPEG image file 应该可以同时保存为 IMG_0774.JPG IMG_0775.JPG . 方法#1到#3应该完全按照上面发布的代码进行操作。使用这个JPG 将上载到此已发布的问题。

0 回复 | 直到 4 年前

Mark Setchell 4 年前

我认为这是由于你的形象模式和个人所得税的感觉约束,以保持它作为一个双层形象。

im = Image.open('lorem.png')

# Check type of image - it is bi-level, i.e. mode=1
print(im)
<PIL.PngImagePlugin.PngImageFile image mode=1 size=960x540 at 0x7F9E08A65100>


# Save and check size
im.save('lorem.pdf')

# -rw-r--r--     1 mark  staff  1050978 18 Apr 10:01 lorem.pdf   <--- YIKES

im = Image.open('lorem.png').convert('RGB')
im.save('lorem.pdf')

# -rw-r--r--     1 mark  staff    99556 18 Apr 10:02 lorem.pdf  <--- THAT'S BETTER

文件中有一条线索 here 它说它的写入方式取决于模式和JPEG编码器的可用性。