iText RUPS

RUPS is an acronym for Reading/Updating PDF Syntax. RUPS is a GUI application (currently still in beta) that allows you to look inside a PDF. Future versions will also allow you to change the internal syntax of the PDF file. This tool is written for debugging purposes. You shouldn't change the syntax of a PDF manually.

PDF Objects

A PDF file is a standalone string of bytes (binary) that syntactically forms a series of objects. Some objects are numbered, and can refer to each other by number. Other objects are part of a parent object directly. This model used to be called the Carousel Object System (Carousel was an early code name for PDF).

There are eight basic types of objects. Five primitive object types:

Tree compound object types:

More complex data types are built on top of these: for instance a rectangle is an array consisting of four number objects. A date-time object is a special type of string, and so on. The numbered objects, aka indirect objects, can occur in any order in the file. The object numbers are mapped to the actual place in the file by the cross reference (xref) table.

The Structure of a PDF file

The cross reference table can be found at the end of the file; after the body and before the trailer.

header
(%PDF-1.4)



body
(objects)




cross-reference table
(xref)

trailer
(dictory, startxref, %EOF)

Starting with the trailer dictionary, you can create a cross-linked set of objects of the complete document. Each document contains an ordered set of fixed and independent pages. Each page may contain an arbitrary combination of text, graphics and images; how these objects interact is described in the content stream. The information to find the objects referred to in the content stream can be found in the resources dictionary.

Whereas PDF Viewers such as Adobe Reader, Evince, PDF Renderer, JPedal, and many others allow you to view these pages, RUPS allows you to look inside the PDF and shows the mathematical graph of the cross-linked objects, unveiling the type and content of each object. With RUPS, you can look also at the internal syntax of each page. This gives you an interesting view on the Adobe Imaging Model (note that the same model is used by the Apple Operating System).

The Imaging Model

The Adobe Imaging Model describes the marks on a page. Selected zones on the page are 'stroked' and/or 'filled' with 'paint'. These zones can be closed or open paths. Closed paths describe geometric shapes; they can be filled with different kinds of paint: solid color, gradients, shades, patterns, and so forth. Text consists of a series of special shapes named 'glyphs', organized in fonts. Open paths (lines) are composed of straight or curved segments. They have a thickness, a dash pattern, and many other attributes.

A page content stream consists of a sequence of operators and operands that define these paths and their parameters. These operators often use implicit parameters contained in the current 'graphics state' (GS) and all geometric information is relative to the 'current transformation matrix' (CTM).

 
Copyright © 2008 by 1T3XT BVBA
Hosted by Hostbasket