How to Generate Printable Documents from Java Web Applications

How to Generate Printable Documents from Java Web Applications

Users of your Web applications can print the pages you generate, but webpages don’t usually look quite right on paper. They look like webpages do in a browser instead of real documents. There are many ways to control how information appears on the page. In this article, I show you a few methods, from using CSS to change the look of the content, to leveraging tools like Apache’s FOP and iText to generate PDF documents.

Before you start working on getting your webpage or document to print just right, make sure you have a way to view documents without wasting paper. Often, it takes a few tries to get it right. For example, on my Mac, I can generate a PDF file from the regular print dialog. On Windows, this is a little harder, but the Chrome Web browser provides a way to export PDFs. There are also other PDF generators can help you test your webpages in other browsers.

Media Queries

If you’re already familiar with HTML and CSS, which is likely if you’ve been writing Web applications, the easiest approach is to use a tool you already know. With CSS, you can specify different presentation rules for the document when it’s rendered on the screen (in a browser) and when it’s rendered in print. The technique used is called media queries, because they encapsulate rules that are applied to the document only if the media (screen, paper) match the “query.” If you’ve done “responsive design” you know exactly what I’m talking about.

You can add a media block in an already existing stylesheet, and specify the print- and screen-specific CSS for your page in each block; or, if your stylesheet is already pretty complex, you can create a separate print stylesheet. When using media blocks, you declare it with:

    @media print {
      /* print-specific directives go here */
    }

And you put your style declarations in that block. So, to hide the navigation on the side of your site, currently enclosed in a div with class sidenav, you can add to your stylesheet:

    @media print {
      .sidenav {
        display: none
      }
    }

To use a separate stylesheet to specify the print layout, include it only for print media:

    <link rel=”stylesheet” media=”print” href=”site-print.css”>

CSS directives outside of a media block or in a file referenced without a media attribute are always applied, regardless of the display media.  This is also true if the media type requested is all. To indicate that a directive or stylesheet applies only when rendering the page in the browser window, use the media type screen.  The CSS specification includes a number of other media types such as braille, speech, and tv, but they are rarely used.

You can further refine when CSS directives should apply using the “feature” part of the media queries. These expressions cause the browser to obey the CSS directives therein when the expression evaluates to true; that is, when the feature requested is available. Here are a few examples:

    @media (orientation: portrait) {/*…*/}
    @media (orientation: landscape) {/*…*/}
    @media print and (min-height: 14in) {
        /* rules for “legal size” paper */
    }
    @media print and (max-resolution: 300dpi) {
        /* rules for lower resolution printers (or printer settings) */
    }
    @media print and (min-resolution: 1000dpcm) {
        /* higher resolution printers – in dots per cm) */
    }

Set up your print layout. Figure out which CSS rules are common to both screen and print and which apply to only one or the other. Use media blocks in your CSS files, or use multiple CSS files and only include the appropriate ones. That’s it. You’re done!

Well, almost.

Next, you test on the various browsers and platforms, of course. Depending on the complexity of your layout and the CSS involved, you may well find that the results are good enough. But in some circumstances you may want the printed output to be more predictable or more uniform. You may also want to generate the PDFs for your users, rather than have them print a webpage. In those cases, you have a few more tools you can use, namely FOP and iText.

Apache FOP

Apache FOP (Formatted Object Processor) is a Java library you can use to generate PDF files from XML data. It can generate output to other files types too, but for printing, PDF is pretty much what users expect.

To get started with Apache FOP, first download the distribution and include fop.jar as well as all the dependent libraries packaged with it; or use maven with groupId org.apache.xmlgraphics and artifactId fop.

Next, set up a bare-bones servlet (taken almost verbatim from the documentation on their site) that generates a PDF from a file called sample.fo. The servlet’s doGet method looks like this:

    response.setContentType(“application/pdf”);
    Fop fop =
      fopFactory.newFop(MimeConstants.MIME_PDF,
        response.getOutputStream());
    Transformer transformer = tFactory.newTransformer();
    InputStream in = 
      getServletContext().getResourceAsStream(“simple.fo”);
    Source src = new StreamSource(in);
    Result res = new SAXResult(fop.getDefaultHandler());
    transformer.transform(src, res);

We first set the response type to “application/pdf” to tell the browser how to display the contents we are generating. The Transformer object is responsible for shuffling the data from the “simple.fo” file (through the InputStream wrapped in a StreamSource) to the output stream (response.getOutputStream()) which takes the data to the browser. Think of its job as pulling data on one side and pushing it on the other. The Fop object handles how that data is transformed in between being pulled and being pushed.

The hard part of working with Apache FOP is generating the “.fo” file that gets you a PDF looking how you want. The input file is written using XSL-FO, a particular schema for XML. There are a number of examples in the distribution, and I strongly recommend you start by modifying one of them. The XSL-FO namespace http://www.w3.org/1999/XSL/Format, is usually identified by the prefix fo:.

A basic file has as its root element a <fo:root> tag. In it, the <fo:layout-master-set> declares how the pages are laid out. This in turns contains one or more <fo:simple-page-master>. The simple-page-masters must be named using the master-name attribute, so you can refer to them from the content. In the page-master, you set up the page size, orientation, margins, header, footer, and so on, defining how the page should look.

After the <fo:layout-master-set> is the <fo:page-sequence> that describes the actual content of the document (about time!) you can have one or more page-sequences, and each may refer to a different page-master. This allows you to have, for example, a layout for the title page, a different layout for the regular pages, another layout for the table of contents, and one for the index. This is very useful if you’re writing a book. It’s a little cumbersome if you’re writing a “lost cat” flyer.

In your page-sequence, you have one or more <fo:flow>. A flow is a collection of content items, mostly <fo:block>s that correspond to paragraphs, but also <fo:list-blocks> that represent lists holders (think of <ul>s and <ol>s) and tables (<fo:table>).

From there, you can include links, images, tables, and lists. You can decorate the text in any of those. For instance, you can change the font, alter the weight and size, pick a color, or strike through the text.

Apache FOP, clearly, is very powerful if you can handle XSL-FO. If the content to format comes to you in XML, but using a different schema or layout, you can use XSLT to restructure your document into something Apache FOP accepts. You only have to pass a reference to your XSLT source to the Transformer above:

    Source xsltSrc =
      new StreamSource(getServletContext()
        .getResourceAsStream(“myTransform.xsl”));
    Transformer transformer = tFactory.newTransformer(xsltSrc);

But you know the joke: Now you have two problems. This is particularly dangerous if you think that, well, HTML is just like XML, and it should be easy to translate HTML to XSL-FO. Remember that HTML does not need to be well-formed. Browsers happily accept HTML with unclosed tags and render it correctly. The HTML5 specification even contains some detailed ways in which HTML5 does not need to be well-formed; for example, attribute values don’t have to be quoted if they do not include a space.

You should therefore generate the XSL-FO directly, through your templating engine.

iText

The last tool I introduce in this article is iText. It is a programmatic way to generate PDF files on the fly. It does not rely on any input and you can even draw with it. You have complete control over how elements of the page are rendered. Conversely, using iText means you have to hand-render every element on the page.

iText is a free/open source software (FOSS) project covered by the GNU Affero General Public License, which means that you have to obtain a license if you use in a commercial product. If that puts you in a bind, older versions were released under LGPL, giving you more freedom with how you use it, but obviously, not all features will be available.

To get started, download iText from SourceForge and install the jar (or jars, if you want the added functionality they offer) in your path, or use maven (see documentation for a pom snippet).

You can start with a simple servlet that generates a PDF as follows:

    response.setContentType(“application/pdf”);
    Document doc = new Document(PageSize.LETTER);
    PdfWriter.getInstance(doc, response.getOutputStream());

    doc.open();
    doc.add(new Paragraph(“hello world!”));
    doc.close();

The Document object, unsurprisingly, is your PDF document. By setting up the PdfWriter, you say where the document is to be written as you make various calls to write/draw on the document. Once the document is closed, you’re done, and the browser gets to render it.

For a typical text and images document, you use Document.add() to add content. This requires the use of various Element implementations. A bit of text is a Chunk. One or more Chunks make up a Paragraph. You can then put Paragraphs in Chapters or Sections. You also add Images and Lists. You can control the appearance of the text by passing a Font object when you create a Chunk or Paragraph. The Font represents the font family, size, style, and color.  So to display one word of a paragraph in red, you need to create a chunk for the content before that work (using a font with the base color), then a chunk for the word to highlight with a font with color red, and one more chunk for the rest.

Where you can go totally wild with iText is that you can draw on your document. You get the drawing context with:

    PdfContentByte cb = writer.getDirectContent();
And call any of the drawing metods on cb:
    Rectangle pageSize = doc.getPageSize();
    cb.setColorStroke(BaseColor.CYAN);
    cb.setColorFill(BaseColor.RED);
    cb.circle(300, 400, 100);
    cb.ellipse(100, 100, 400, 600);

The grid is setup so the lower left corner is at (0, 0) and goes up in value towards the top and the right of the document. The units are “points,” that is 1/72 of an inch. In my example, the page size of my “letter” document is 612 by 792. All values are floats.

Remember to set the color for your stroke (the virtual pen drawing the shapes and lines) and for the fill, if you want your circle, ellipse, or path created with moveTo()/lineTo()/closePath() to be filled. You must also complete all your drawing with a “stroke” – either closePathFillStroke(), closePathStroke(), fillStroke() or stroke() –  otherwise all your drawing efforts are ignored.

With iText, you can also modify existing PDFs, such as to fill out an existing form with dynamic data, for example. You can insert pages into a document, or extract certain pages out into a new one. If you have extensive document manipulations to perform, your best approach is iText.

How you go about supporting printing on your site and for your application depends on what needs to be printed or transferred as a printable document, and how much control you want or need on the result. While CSS media queries are the easiest, you have to contend with browser idiosyncrasies. Apache FOP gives you output that is consistent across platforms since it is generated on your server, and it works well with longer documents where you can dictate the layout at a higher level. iText gives you the most control on the complete presentation of the PDF, but requires more work on your part to style each element. It is often more suitable for shorter documents with various pieces that do not follow the usual flow of a page of text.

Did you find this How-To useful?  Need tips on something else?  Got an insatiable curiosity about other methodologies?  Leave us a comment and let us know.

About the author
Nancy Deschênes (@ndeschenes) has been developing for the Web for more than 15 years. In that time, she has worn many hats, acting at times as a front-end developer, database specialist, and (her favorite) application architect. She has used various technologies, mostly Java and the Spring MVC framework, but has recently spent most of her time using the Grails framework. She is the technical co-founder of myTurn.com, a platform for online rental of physical goods.

See also:


Comments

  1. SillentTroll says:

    I woul like to recommend a cool library, flying-saucer (https://code.google.com/p/flying-saucer/). It can generate PDF from xml/xhtml using itext. Have used it in big project, and it saved us a lot of time!

Speak Your Mind

*