<<

Document Handling

cfpdf

By Tim Cunningham (Bio)
gist

Whereas cfdocument is used to create PDFs, the cfpdf tag is used to manipulate existing PDFs. With cfpdf, you can read an existing PDF, write meta-data to it, merge PDFs together, delete pages, create thumbnails of the pages, extract text & images, add or remove watermarks, manipulate headers & footers, create PDF portfolios, and deal with PDF passwords, permissions and Encryption.

Reading a PDF

The cfpdf tag accepts an action property, which currently has 18 possible values. Choosing the 'read' action will take an existing PDF, read it to memory, and store it in a variable name of your choosing.

<cfpdf action="read" name="myDoc" source="C:\docs\mypdf.pdf" />
<cfdump var="#myDoc#" />

The above code will dump out the metadata for the chosen PDF such as the author, date created, keywords, etc. This is the same information you would receive if you used the action="getinfo" argument; however, with the getInfo action, you can access the values of resulting operation. If you wanted to display the PDF to the browser, the following code could be used:

<cfpdf action="read" name="myDoc" source="C:\docs\mypdf.pdf" />
<cfcontent variable="#toBinary(myDoc)#" type="application/pdf" />

Creating Thumbnails of PDF Pages

<cfpdf action="read" name="myDoc" source="C:\docs\mypdf.pdf" />
<cfpdf action="thumbnail" source="myDoc" destination="C:\docs\" overwrite="yes" />
<cfimage action="read" name="img" source="C:\docs\thumbnail_page_1.jpg" format="jpg" />
<cfcontent reset="true" variable="#imageGetBlob(img)#" type="image/jpeg" />

The above code will read a PDF and create 25% scale JPG thumbnails for all the pages in the C:\Docs folder. The default naming convention in CF 10 is thumbnail_page_1, thumbnail_page_2, and so on. Format can be set to JPG, TIFF, or PNG. The thumbnail action also has other arguments that work along with it to determine the scale, max breadth, resolution, and naming scheme for the generated thumbnail.

Extracting Images

Most images embedded in a PDF can be extracted and saved to a folder of your choice using a file prefix of your choice. By default, the file prefix is "cfimage-" and the image number. The default destination is in the same folder as the ColdFusion page calling the cfpdf tag.

<cfpdf action="extractimage" source="./mypdf.pdf"  overwrite="yes" /> 
<img src="cfimage-0.jpg" />
<br />
<img src="cfimage-1.jpg" />

Extracting Text

If you desire to extract the text of a PDF, such as if you wanted to search and catalog words used in the PDF, you can do so with the extractText action. The extractText action will return a XML document; each page of text is in its own XML node name with a corresponding page number. You can then use ColdFusion's powerful XML parsing tags to interact with the XML document. The code below will output the XML from a PDF to the browser for you.

<cfpdf action="extracttext" source="./mypdf.pdf" name="myXML" />
<cfcontent type="text/xml" />
<cfoutput>#myXML#</cfoutput>

Merging PDF Documents

Merging Using a List

There are a variety of ways ColdFusion can merge different PDF documents together. The simplest is to pass a comma delimited list of PDF files, which will append them in the order you list them.

<cfpdf action="merge" source="mypdf.pdf,beer.pdf" destination="mergedPDF.pdf" overwrite="yes" />
<cfpdf action="read" name="myPDF" source="mergedPDF.pdf" />
<cfcontent variable="#toBinary(myPDF)#" type="application/pdf" />

Merging using cfpdfparam

You can also use cfpdfparam tags nested within an opening and closing cfpdf tag to more accurtely control the final PDF. The cfpdfparam tag accepts source, pages, and password arguments. The pages attribute can choose what page(s) you want to merge into the final document. You can choose one page or a list of page numbers. Password is used for password protected PDFs. The code below will combine two PDFs: mypdf.pdf and beer.pdf. However, rather than appending it, we will take the first page of mypdf.pdf, then all of beer.pdf, and finally the second page of mypdf.pdf.

<cfpdf action="merge"  destination="mergedPDF.pdf" overwrite="yes">
    <cfpdfparam source="myPDF.pdf" pages="1" password="test" />
    <cfpdfparam source="beer.pdf" />
    <cfpdfparam source="myPDF.pdf" pages="2" password="test" />
</cfpdf>
<cfpdf action="read" name="myPDF" source="mergedPDF.pdf" />
<cfcontent variable="#toBinary(myPDF)#" type="application/pdf" />

Merging a Directory of PDFs

If you use the directory argument on a cfpdf merge action, you can specify a folder that contains multiple PDFs and merge them into one PDF. The order strategy is either by name (ascending alphabetical) or time (ascending by file time stamp). Each file in the directory is evaluated if it is a valid PDF readable by ColdFusion; if you have stoponerror="yes", then the cfpdf tag will error if any of the files in the directory are not valid PDFs. If you have stoponerror="no", then any non-PDF files will be skipped.

<cfpdf action="merge"  directory="./" orde ascending="yes" stoponerror="false" overwrite="yes" destination="mergedPDf.pdf" />
<cfpdf action="read" name="myPDF" source="mergedPDF.pdf" />
<cfcontent variable="#toBinary(myPDF)#" type="application/pdf" />

Deleting Pages

You can remove a single page or a range of pages using the action="delete" on the cfpdf tag. Specifying a destination is optional for delete actions. If you do not specify a destination, the page or pages will be deleted from the original source PDF.

<cfpdf action="deletepages" pages="2-3" source="mergedPDF.pdf" />
<cfpdf action="read" name="myPDF" source="mergedPDF.pdf" />
<cfcontent variable="#toBinary(myPDF)#" type="application/pdf" />

Creating and Removing Watermarks

To add a watermark to a PDF, you can either use an existing watermark from an image or another PDF. If you specify a destination, a new PDF will be created with the watermark; otherwise the watermark will be added to the source PDF. By default, the watermark is placed in the center of the page.

<cfpdf action="addwatermark" source="mypdf.pdf"  image="draft.png" foreground="yes" overwrite="yes" />
<cfcontent variable="#toBinary(myPDF)#" type="application/pdf" />

To modify the positioning of the watermark, you can use the rotation and position arguments. Rotation is the number of degrees the watermark image will be turned. The position argument takes Cartesian coordinates, with the top-left corner being the 0,0 position. The code below will put a watermark at a 45 degree angle in the top left corner of the PDF:

<cfpdf action="addwatermark" source="mypdf.pdf"  image="draft.png" foreground="yes" overwrite="yes" name="mypdf" rotation="45" position="0,700" opacity="2" />
<cfcontent variable="#toBinary(myPDF)#" type="application/pdf" />	

Removing a watermark is very simple:

<cfpdf action="removewatermark" source="mypdf.pdf" />	 
<cfpdf action="read" name="mypdf" source="mypdf.pdf" />	
<cfcontent variable="#toBinary(myPDF)#" type="application/pdf" />	

Optimizing PDFs

PDFs can accumulate a lot of baggage. To help slim down the file size and speed up the rendering of a PDF, you can use the optimize action on the cfpdf tag. The result can be saved to a new file, the original source, or a variable. You must also supply the algorithm used to compress the PDF; your choices are Nearest_Neighbour (notice the non-American spelling), bicubic, bilinear.

  • Nearest_Neighbour (default): Lossy image quality, fast processing.
  • bicubic: High quality image, slowest processing.
  • bilinear: Mid-range image quality, mid-range processing speed.

The above guidelines are very general as both the resultant compression is highly dependent on the actual content of your PDF. In the example PDF built for this chapter, its original PDF was 221 KB. Nearest_Neighbour took it down to 49 KB, bicubic to 47 KB, and bilinear to 48 KB.

<cfpdf action="optimize" source="mypdf.pdf"  overwrite="yes" algo="bilinear" />
<cfpdf action="read" name="mypdf" source="mypdf.pdf" /> 
<cfcontent variable="#toBinary(myPDF)#" type="application/pdf" />

There are also 12 other arguments which can be passed to cfpdf to remove other PDF features, such as attachments, bookmarks, font styling, javascript, etc.

Headers and Footers

cfpdf also has actions called addHeader and addFooter which will add a header and a footer to an existing PDF. The header or footer can be simple text or an image.

<cfpdf action="addheader" source="mypdf.pdf" name="mypdf" text="Header Here" />
<cfcontent variable="#toBinary(myPDF)#" type="application/pdf" />

Using the text argument, you have access to four dynamic options: _LASTPAGELEBEL, _LASTPAGENUMBER, _PAGELABEL, and _PAGENUMBER. For example, if you wanted your footer to have 'Page # of #' on each page, you could use the following code:

<cfpdf action="addFooter" source="mypdf.pdf" name="mypdf" text="Page _PAGENUMBER of _LASTPAGENUMBER" />
<cfcontent variable="#toBinary(myPDF)#" type="application/pdf" />