PDF Version and file size

The PDF format has evolved over the years as Adobe have released new versions of their Acrobat and Reader software. New ideas have been added to the file format, and as a result there are different versions of the PDF format. If you take a look at a PDF in Adobe Reader, you can see which version the file is using in the Document Properties information. Of course, files using the newer versions of the PDF format need a suitable viewer, be that Reader or something else.

This is relevant to TeX users as PDF tends to be the target format, either directly or via DVI files, for many users. Tools such as pdfTeX are not tied to one version of the PDF specification. For example, when creating a PDF directly with pdfTeX the \pdfminorversion primitive can be used to set the PDF version to 1.3, 1.4 or 1.5.

Why would you want to do this? Well, obviously the newer versions bring new features. A particularly significant one is the compression of non-stream objects. The detail of these objects is not really important, but they relate to items such as links within documents. Version 1.5 of the PDF specification allows these to be compressed, which can make quite a difference to the resulting file size. For example, I did a trial run with the siunitx manual, and by adding the lines

\pdfminorversion=5
\pdfobjcompresslevel=2

resulted in reducing the file size from around 700 KiB to around 550 KiB, a saving of roughly 20 %.

There is some discussion ongoing at the moment on the TeX Live mailing list about possibly changing the default PDF version produced by tools such as pdfLaTeX, XeLaTeX, etc. The current standard setting is version 1.4, which makes larger files but does have the advantage of being readable by a wider range of viewers. On the other hand, PDF version 1.5 was first released in 2003, and there is pretty good support for it in most of the well-known readers. As long as switching to version 1.5 also enables the compression, this looks like a good idea: just moving to version 1.5 without using the features available seems a bit odd to me.

There are times where you need to use PDF version 1.4 (for example for archive-type PDFs), but for those you also need to check other features of the PDF. So I feel that the change looks like a good idea, provided there is a good way to set the version to something else.

9 thoughts on “PDF Version and file size

  1. Hello Sivaram,

    These are both TeX primitives, and need to occur before anything is typeset. So they go in the preamble of LaTeX files, or would go “near the beginning” in a plain TeX file. Something like

    documentclass{article}
    pdfminorversion=5
    pdfobjcompresslevel=2
     ...
    

    Joseph

  2. In ConTeXt, I have noticed that files produced in MkIV are generally significantly smaller than those produced by MkII. For example, in a sample document (with no hyperlinks and only using embedded metapost figures), the file size produced by MkII are 563 kB (pdf 1.4) and 356 kB (pdf 1.5); while the file sizes produced by MkIV are 258 kB (pdf 1.4) and 243 kB (pdf 1.5). I don’t understand the details, but the smaller size with luatex is supposed to be because of better font handling. Interestingly, the file size does not change significantly in MkIV. (Agreed, this is just one sample.)

  3. Hello Aditya,

    Interesting: I guess that this might be due to the graphics (I’ve not tried any graphic-heavy files). On the other hand, perhaps you are right about font inclusion, which may I suppose be down to how the format handles things.

    Joseph

  4. Hello Raúl,

    There are a few different takes on the PDF format that have been formalised as standards. For example, PDF/A has been defined as an archival storage format for digital documents, while things like PDF/X-1 and PDF/X-3 are favoured by some printers. In all cases, the key point about a standard is that it lays down what is (and is not) acceptable to meet it. This of course depends both on the aim of the standard and when it was conceived.

    In the case of PDF/A, the standard is a definition based on version 1.4 of the PDF format. So PDF/A is essentially a sub-set of PDF 1.4. Various things that are included in PDF 1.4 are not acceptable in PDF/A, whereas any valid PDF/A file will be a valid PDF version 1.4 file.

    Similar considerations apply to PDF/X-1, PDF/X-3 and so on. There are of course newer standards, which are subsets of later versions of the PDF file format. The crucial point about PDF/A is that it is carefully designed and agreed to work well as an archive format. That does not mean other PDFs are going to suddenly fail, just that if you are looking at preserving digital information for the long term (decades) then you need to worry about ensuring you can still read it in many years time.

    Joseph

  5. Hi,
    I’ve been reading this discussion with interest. However I don’t understand how pdfTeX produces the pdf file. I have TeXLive 2008 and WinEdt. LaTeXing the document uses the following:

    pdfTeXk, Version 3.1415926-1.40.9 (Web2C 7.5.7)

    and then I get a dvi file. To generate the pdf file I go through the tex–dvi–ps–pdf route and it’s Ghostscript which decides the pdf version.

    So how do one use pdfTeX to arrive to a pdf without going through a ps to pdf converter like Ghostscript?
    I presume that it’s not about dvi2pdf command.

    Your help would be highly appreciated?

    YC

  6. Hello YC,

    The key thing to remember here is that pdfTeX (the program) is used by modern TeX installations for both DVI and direct PDF output. The systems are set up so that “latex” calls pdfTeX in DVI mode, while “pdflatex” calls pdfTeX in PDF mode (in both cases with the LaTeX format loaded). The primitives under discussion here only work in direct PDF mode (although you can do the same thing at the pd2pdf stage if using the traditional DVI route).

    In WinEdt, there are two typesetting buttons (both for TeX and for LaTeX), one for DVI output and one for direct PDF output. So it’s a question of picking the right button to press!

  7. Hi,
    Thanks for the clarifications. I could change my PDF TeXify method to Default (PDF TeXify) under options > execution modes > TeX options in WinEdt. It works now. A quick check was to set pdfminorversion=3 and then see if the correct pdf version is produced in Acrobat.

    But now, using PDFTeX doesn’t compile eps file in the LaTeX document. So I guess now I have to use a package like epstopdf to convert all the eps files to pdf. Am I right?

    You mentioned: “although you can do the same thing at the pd2pdf stage if using the traditional DVI route”. I used ps2pdf as I can add many option like embedding the fonts, setting the pdf type to prepress, screen, etc. I read somewhere that using -dUseFlatCompression=true in ps2pdf can compress the resulting pdf document by a factor of half! D you think this is it some kind of equivalence to pdfobjcompresslevel=2 ? If yes, then I’d prefer to stck to the traditional dvi route with the eps files.

    Many thanks again for your kind help.

Leave a Reply