Object
- array -- 1D collection of objects arranged sequentially and implicitly numbered starting at 0
- boolean -- either true or false
- dictionary -- an associative table of pairs of objects, (key-value pairs)
- integer -- decimal digit preceded by an optional sign
- name -- an atomic symbol uniquely defined by a sequence of characters introduced by a /
- null
- real -- approximate mathematical real numbers, decimal digits with an optional sign and leading, trailing, or embedded period
- stream -- dictionary followed by zero or more bytes bracketed between the keywords stream and endstream
- string -- series of bytes (not integer objects)
Rectangle -- specific array object used to describe locations on a page and bounding boxes for a variety of objects [lower-left-x, lower-left-y, upper-right-x, upper-right-y]
FDF (Forms Data Format 12.7.7)
Charaters can be:
- A PDF file is represented as a sequence of 8-bit types, some of which are in the ASCII character set and some of which are binary data
- The contents of a string or stream can be PDFDocEncoding or UTF-16
PDF Syntax is 4 parts:
1. Objects -- "A PDF document is a data structure compose from a small set of basic types of data objects."
2. File Structure -- file structure determines how objects are stored in a PDF file, how they are accessed, and how they are updated
3. Document Structure -- how the basic object types are used to represent components of a PDF document
4. Content Streams -- a content stream contains a sequence of instructions describing the appearance of a page or other graphical entity
Structure of a PDF file:
- Header
- Body
- Cross-Reference Table
- Trailer
By convention, tokens in a PDF file are arranged into lines
First line of a PDF file should be a header consisting of the line:
`%PDF-1.N` (N=1..7)