Class MLA (Media Library Assistant) PDF extracts legacy and XMP meta data from PDF files

package Media Library Assistant
since 2.10

 Methods

Build an array of indirect object definitions

_build_pdf_indirect_objects(string $string) : void

Creates the array of indirect object offsets and lengths

since 2.10

Parameters

$string

string

The entire PDF document, passsed by reference

Extract dictionary from traditional cross-reference + trailer documents

_extract_pdf_trailer(string $file_name, integer $file_offset) : mixed

since 2.10

Parameters

$file_name

string

full path to the desired file

$file_offset

integer

offset within file of the cross-reference table

Returns

mixedarray of "PDF dictionary arrays", newest first, or NULL on failure

Find the offset, length and contents of an indirect object containing a dictionary

_find_pdf_indirect_dictionary(string $file_name, integer $object, integer $generation) : mixed

The function searches the entire file, if necessary, to find the last/most recent copy of the object. This is required because Adobe Acrobat does NOT increment the generation number when it reuses an object.

since 2.10

Parameters

$file_name

string

full path and file name

$object

integer

The object number

$generation

integer

The object generation number; default zero (0)

Returns

mixedNULL on failure else array( 'start' => offset in the file, 'length' => object length, 'content' => dictionary contents )

Parse a PDF Linearization Parameter Dictionary object

_parse_pdf_LPD_dictionary(string $source_string, integer $filesize) : mixed

Returns an array of dictionary contents, classified by object type: boolean, numeric, string, hex (string), indirect (object), name, array, dictionary, stream, and null. The array also has a '/length' element containing the number of bytes occupied by the dictionary in the source string, excluding the enclosing delimiters, if passed in.

since 2.10

Parameters

$source_string

string

data within which the object occurs, typically the start of a PDF document

$filesize

integer

filesize of the PDF document, for validation purposes, or zero (0) to ignore filesize

Returns

mixedarray of dictionary objects on success, false on failure

Parse a PDF Unicode (16-bit Big Endian) object

_parse_pdf_UTF16BE(string $source_string) : string

since 2.10

Parameters

$source_string

string

PDF string of 16-bit characters

Returns

stringUTF-8 encoded string

Parse a PDF dictionary object

_parse_pdf_dictionary(string $source_string, integer $offset) : array

Returns an array of dictionary contents, classified by object type: boolean, numeric, string, hex (string), indirect (object), name, array, dictionary, stream, and null. The array also has a '/length' element containing the number of bytes occupied by the dictionary in the source string, excluding the enclosing delimiters.

since 2.10

Parameters

$source_string

string

data within which the string occurs

$offset

integer

offset within the source string of the opening '<<' characters or the first content character.

Returns

array( '/length' => length, key => array( 'type' => type, 'value' => value ) ) for each dictionary field

Parse a PDF string object

_parse_pdf_string(string $source_string, integer $offset) : array

Returns an array with one dictionary entry. The array also has a '/length' element containing the number of bytes occupied by the string in the source string, including the enclosing parentheses.

since 2.10

Parameters

$source_string

string

data within which the string occurs

$offset

integer

offset within the source string of the opening '(' character.

Returns

array( key => array( 'type' => type, 'value' => value, '/length' => length ) ) for the string

Parse a cross-reference table section into the array of indirect object definitions

_parse_pdf_xref_section(string $file_name, integer $file_offset) : integer

Creates the array of indirect object offsets and lengths

since 2.10

Parameters

$file_name

string

full path and file name

$file_offset

integer

offset within the file of the xref id and count entry

Returns

integerlength of the section

Parse a cross-reference steam into the array of indirect object definitions

_parse_pdf_xref_stream(string $file_name, integer $file_offset, string $entry_parms_string) : integer

Creates the array of indirect object offsets and lengths

since 2.10

Parameters

$file_name

string

full path and file name

$file_offset

integer

offset within the file of the xref id and count entry

$entry_parms_string

string

"/W" entry, representing the size of the fields in a single entry

Returns

integerlength of the stream

Parse a cross-reference table subsection into the array of indirect object definitions

_parse_pdf_xref_subsection(string $xref_section, integer $offset, integer $object_id, integer $count) : void

A cross-reference subsection is a sequence of 20-byte entries, each with offset and generation values.

since 2.10

Parameters

$xref_section

string

buffer containing the subsection

$offset

integer

offset within the buffer of the first entry

$object_id

integer

number of the first object in the subsection

$count

integer

number of entries in the subsection

 Properties

 

Array of PDF indirect objects

$pdf_indirect_objects : array

This array contains all of the indirect object offsets and lengths. The array key is ( object ID * 1000 ) + object generation. The array value is array( number, generation, start, optional /length )

since 2.10