Object - array -- 1D collection of objects arranged sequentially and implicitly numbered starting at 0 - boolean -- either true or false - dictionary -- an associative table of pairs of objects, (key-value pairs) - integer -- decimal digit preceded by an optional sign - name -- an atomic symbol uniquely defined by a sequence of characters introduced by a / - null - real -- approximate mathematical real numbers, decimal digits with an optional sign and leading, trailing, or embedded period - stream -- dictionary followed by zero or more bytes bracketed between the keywords stream and endstream - string -- series of bytes (not integer objects) Rectangle -- specific array object used to describe locations on a page and bounding boxes for a variety of objects [lower-left-x, lower-left-y, upper-right-x, upper-right-y] FDF (Forms Data Format 12.7.7) Charaters can be: - A PDF file is represented as a sequence of 8-bit types, some of which are in the ASCII character set and some of which are binary data - The contents of a string or stream can be PDFDocEncoding or UTF-16 PDF Syntax is 4 parts: 1. Objects -- "A PDF document is a data structure compose from a small set of basic types of data objects." 2. File Structure -- file structure determines how objects are stored in a PDF file, how they are accessed, and how they are updated 3. Document Structure -- how the basic object types are used to represent components of a PDF document 4. Content Streams -- a content stream contains a sequence of instructions describing the appearance of a page or other graphical entity Structure of a PDF file: - Header - Body - Cross-Reference Table - Trailer By convention, tokens in a PDF file are arranged into lines First line of a PDF file should be a header consisting of the line: `%PDF-1.N` (N=1..7)