PDFasStandardPrintJobFormat

Launchpad Entry: https://launchpad.net/distros/ubuntu/+spec/pdf-as-standard-print-job-format
Created: 2008-05-14 by Till Kamppeter https://launchpad.net/~till-kamppeter
Contributors:
Packages affected: CUPS, CUPS PDF filters, foomatic-filters, GNOME/GTK and KDE/Qt libraries, OpenOffice.org, Thunderbird, other desktop applications

Summary

One of the decisions which was made on the OSDL Printing Summit in Atlanta in 2006 and widely accepted by all participants was to switch the standard print job transfer format from PostScript to PDF. This format has many important advantages, especially

PDF is the common platform-independent web format for printable documents
Portable
Easy post-processing (N-up, booklets, scaling, ...)
Easy Color management support
Easy High color depth support (> 8bit/channel)
Easy Transparency support
Smaller files
Linux workflow gets closer to Mac OS X

Most important here is the post-processing. In contrary to PostScript, one can easily distinguish in every PDF file which part of the data belongs to which page. So one can easily take the pages apart and do things like printing selected pages, 2, 4, ... pages per sheet, even/odd pages for manual duplex, scaling, ... PostScript files must be strictly DSC-conforming to allow this kind of page management. By using PDF we assure that page management always works.

PDF as standard print job format is one of the main projects of OpenPrinting at the Linux Foundation, coordinated by Till Kamppeter, manager of OpenPrinting.

Rationale

Many users report problems with post-processing options like 2, 4, ... pages per sheet, booklets, printing only selected pages. Problem is that on PostScript as it is produced by most applications this kind of post-processing does not work, leading to these complaints. Also advanced graphical techniques used in documents, like transparency or high color depths are much better supported by PDF. This leads to smaller job files and faster rendering. The overall printing experience will improve a lot.

Use cases

The LaserStar QuickPrint G3000 allows to print booklets by stitching the sheets in the middle and folding them, but the printer does not do the appropriate rearranging of the pages internally. For printing an 8-page booklet correctly, the pages have to be ordered as 8, 1, 2, 7, 6, 3, 4, 5 and then printed 2 pages per sheet with duplex onto two sheets. Lea uses a tool for rearranging PostScript files, but what comes out of the 10000-Dollar printer are two blank sheets, nicely stitched together and folded. She complains on the support line of the big Japanese manufacturer and they tell that many users have already called because of this. The manufacturer sees the high amount of support calls from Linux and Unix users as a big problem and thinks about dropping Linux support.
Paul is a student and has not much money, so he buys a printer for 30 EUR in the supermarket. Naturally, you do not get a duplex unit for that price, but you can print duplex manually by printing the even pages at first, turning the stack of paper over and then printing the odd pages. CUPS has even options for that. Now at the last night before the deadline for his seminar report he sends it to the printer, but instead of only the even pages the whole document came out, every page on the front side of a sheet. And the university only accepts double sided prints of the reports. It takes him the whole night to find a trick with many file conversions, forth and back, and 60 EUR of paper and ink to get the report done correctly. Why did he not buy the duplex-capable printer for 90 EUR?
Angela makes artwork for marketing with Scribus and Inkscape. These sophisticated applications use advanced graphical techniques like transparency and 16 bit color depth. She sends the graphics to a color laser printer (20 pages per minute). The computer needs 30 minutes with 100% CPU load to render one page and then the printer prints it in 3 seconds. Exporting the page into a PDF file and sending the file to a native PDF printer does the job in 5 seconds.
George has a network of Linux and Mac OS X boxes. In the beginning he did not succeed to print from Linux on the Macs. It was strange for him as everything uses CUPS. He found out by the CUPS mailing list that the Macs did not understand PostScript, as they use PDF already as print job format. So he had to install Ghostscript on the Macs in order to print from his Linux applications.

Scope

All Posix-style operating systems using the CUPS printing system.

Design

On the server (printing system) side we make use of the configurable filter system of CUPS. We add filters for a PDF workflow to the filter collection of CUPS (/usr/lib/cups/filter/) and define additional file conversion rules (/etc/cups/*.convs) so that the filters get used. In the file conversion rules we give priority to filter paths which do the processing of the job when it is a PDF data stream. Currently the processing and page management is done with the pstops filter, on PostScript data. We will use the new pdftopdf filter instead, which does the page management and other processing steps on PDF data.

Examples for filter chains executed by CUPS:

OLD: JPG --imagetops-> PS --pstops-> PS (processed) --pstoraster-> Raster --rastertohp-> PCL

NEW: JPG --imagetopdf-> PDF --pdftopdf-> PDF (processed) --pdftoraster-> ...

OLD: PS --pstops-> PS (processed) (for PostScript printer)

NEW: PS --pstopdf-> PDF --pdftopdf-> PDF (processed) --pdftops-> PS (for PostScript printer)

Now everyone would think that in the second example the new PDF workflow is much more awkward. But imagine the incoming PostScript is not DSC-conforming. Then page management steps done by pstops will break and the printout will not be satisfactory. The second chain converts the document temporarily into PDF, to do the page management on PDF data (pdftopdf). This way the page management will always work correctly. And in the future the second example will be:

OLD: PDF --pdftops-> PS --pstops-> PS (processed) (for PostScript printer)

NEW: PDF --pdftopdf-> PDF (processed) --pdftops-> PS (for PostScript printer)

On the client (application) side we will let the applications generate PDF instead of PostScript when the user prints his document. For KDE and GNOME applications probably only some libraries (which provide the printing functionality) need to be modified. OpenOffice.org has already a well-working "Export to PDF" function. Code from this function can also be used for printing.

Implementation

On the server side most of the implementation is already done and will make it into Intrepid soon. The PDF-capable foomatic-rip 4.0 is already uploaded. The Japanese OpenPrinting workgroup will package their PDF filters in the next days. So the PDF workflow will soon get reality in Intrepid and will have several months for getting tested until Intrepid will get released. The still missing CUPS filters texttopdf and pdftoijs are under development by Tobias Hoffmann as a Google Summer of Code project. The Google Summer of Code will end in time for the feature freeze of Intrepid, so that the new filters will get included.

Due to the modular filter system of CUPS the CUPS daemon itself does not need to be modified. All filters are separate code pieces in /usr/lib/cups/filter/. Filter chains to convert many file formats into the format which the printer needs are determined by the file type definitions and filter rule definitions in the /etc/cups/*.types and /etc/cups/*.rules files. Making PDF the standard print job format is therefore possible by simply adding filters (pdftopdf, pdfto..., ...topdf) and file conversion rules (to give priority to workflows which use the odftopdf filter instead of the pstops filter. By not removing any of the original filters and filter rules fallback to the old PostScript workflow (via pstops) is always possible. Backward compatibility to applications which emit print jobs in PostScript is given by having a pstopdf filter.

The filters to be added (as long as they are not taken into upstream CUPS via a separate package) will be: pdftopdf, imagetopdf, texttopdf, pstopdf, pdftoraster, pdftoijs, pdftoopvp. There will also be added op-pdf.types and op-pdf.rules files with file detection and file conversion rules which prioritize filter chains going through pdftopdf against filter chains going through pstops. The rules will get incorporated into the files of CUPS (mime.types, mime.rules) in the case of the new filters being made part of upstream CUPS.

Also foomatic-rip (foomatic-filters package) needs to be made PDF-aware.

As GhostScript also understands PDF, it can be used as the renderer for all drivers for which it got used before.

On the client side we must make the applications emitting PDF instead of PostScript. This is not urgent to be completed for Intrepid, as CUPS can always convert PostScript to PDF with the pstopdf filter, but having the applications already sending PDF will improve the rendering capabilities of the applications a lot.

This is the part of this blueprint which will require much more work. Depending on the internal architecture of the applications perhaps most gets done by switching the printing libraries of KDE/Qt and GNOME/GTK to PDF output when issuing the "Print" command. According to Thomas Zander (Qt) and Lars Uebernickel (Common Printing Dialog API) one needs only to change an option setting in these libraries to make them printing in PDF.

Some applications need to be treated individually, like OpenOffice.org or Thunderbird. Implementation will get easier for applications with "Export to PDF" functionality, as then there is already the code to generate a PDF from the document.

Patches on applications and libraries in the Ubuntu distributions should always be considered as a temporary solution, therefore we must be in contact with upstream developers for them to implement the changes and we could use backports of these patches in Intrepid in the case that the new versions of the applications and libraries do not make it into Intrepid before Feature Freeze.

Code

On the server side no existing code needs to be modified. Everything gets done by adding filters and filter rules as shown above. The PDF-capable foomatic-rip is already uploaded into Intrepid. Most of the CUPS filters are ready to use on the OpenPrinting site at sourceforge.jp. On the last OpenPrinting Steering Committee phone meeting the developers in Japan promised to package these filters for various distros including Ubuntu soon. The texttopdf and pdftoijs filters are currently under development by a Google Summer of Code student. These are also hosted at sourceforge.jp and so they will appear in a later version of the filter package.

On the client side modifications of applications and GUI libraries are needed. These changes need to be done in cooperation with upstream. For KDE/Qt and GNOME/GTK only simple option settings need to be changed in the library packages.

The current Intrepid still uses the PostScript printing workflow. The actual switchover will happen by adding the /etc/cups/*.convs file(s) with the new file conversion rules.

Data preservation and migration

Servers and clients can be switched over to the PDF printing workflow independently. If applications still produce PostScript, a pstopdf filter will do the first step of converting the incoming PostScript to PDF. In case of PDF-producing applications printing on a server which is still PostScript based, the pdftops filter will kick in to convert the job to PostScript. As the PostScript is then generated by CUPS from a PDF source, the PostScript will always be DSC-conforming and so page management with pstops will work also in this case.

This means that the server side and the client side can get developed and released completely independently without any loss of printing functionality. For some of the advantages of using PDF as standard print job format (like page manipulation/selection/reordering) it is enogh to implement it on one side, the server or the client.

Outstanding issues

BoF agenda and discussion

On the BoF the PDF workflow will be presented to application developers and maintainers and the conversion of the applications to output print jobs as PDF will be coordinated.

Further discussion

Acknowledgments

We thank especially HP and Konica Minolta for the financial support of OpenPrinting. We thank Google for funding the student project.

CategorySpec