\MLAPDF

Class MLA (Media Library Assistant) PDF extracts legacy and XMP meta data from PDF files

Summary

Methods
Properties
Constants
mla_extract_pdf_metadata()
No public properties found
No constants found
No protected methods found
No protected properties found
N/A
_parse_pdf_xref_subsection()
_parse_pdf_xref_section()
_parse_pdf_xref_stream()
_build_pdf_indirect_objects()
_find_pdf_indirect_dictionary()
_parse_pdf_UTF16BE()
_parse_pdf_string()
_parse_pdf_LPD_dictionary()
_parse_pdf_dictionary()
_extract_pdf_trailer()
$pdf_indirect_objects
N/A

Properties

$pdf_indirect_objects

$pdf_indirect_objects : array

Array of PDF indirect objects

This array contains all of the indirect object offsets and lengths. The array key is ( object ID * 1000 ) + object generation. The array value is array( number, generation, start, optional /length )

Type

array

Methods

mla_extract_pdf_metadata()

mla_extract_pdf_metadata(  $file_name) : array

Extract Metadata from a PDF file

Parameters

$file_name

Returns

array —

( 'xmp' => array( key => value ), 'pdf' => array( key => value ) ) for each metadata field, in string format

_parse_pdf_xref_subsection()

_parse_pdf_xref_subsection(  $xref_section,   $offset,   $object_id,   $count) : void

Parse a cross-reference table subsection into the array of indirect object definitions

A cross-reference subsection is a sequence of 20-byte entries, each with offset and generation values.

Parameters

$xref_section
$offset
$object_id
$count

_parse_pdf_xref_section()

_parse_pdf_xref_section(  $file_name,   $file_offset) : integer

Parse a cross-reference table section into the array of indirect object definitions

Creates the array of indirect object offsets and lengths

Parameters

$file_name
$file_offset

Returns

integer —

length of the section

_parse_pdf_xref_stream()

_parse_pdf_xref_stream(  $file_name,   $file_offset,   $entry_parms_string) : integer

Parse a cross-reference steam into the array of indirect object definitions

Creates the array of indirect object offsets and lengths

Parameters

$file_name
$file_offset
$entry_parms_string

Returns

integer —

length of the stream

_build_pdf_indirect_objects()

_build_pdf_indirect_objects(  $string) : void

Build an array of indirect object definitions

Creates the array of indirect object offsets and lengths

Parameters

$string

_find_pdf_indirect_dictionary()

_find_pdf_indirect_dictionary(  $file_name,   $object,   $generation,   $instance = NULL) : mixed

Find the offset, length and contents of an indirect object containing a dictionary

The function searches the entire file, if necessary, to find the last/most recent copy of the object. This is required because Adobe Acrobat does NOT increment the generation number when it reuses an object.

Parameters

$file_name
$object
$generation
$instance

Returns

mixed —

NULL on failure else array( 'start' => offset in the file, 'length' => object length, 'content' => dictionary contents )

_parse_pdf_UTF16BE()

_parse_pdf_UTF16BE(  $source_string) : string

Parse a PDF Unicode (16-bit Big Endian) object

Parameters

$source_string

Returns

string —

UTF-8 encoded string

_parse_pdf_string()

_parse_pdf_string(  $source_string,   $offset) : array

Parse a PDF string object

Returns an array with one dictionary entry. The array also has a '/length' element containing the number of bytes occupied by the string in the source string, including the enclosing parentheses.

Parameters

$source_string
$offset

Returns

array —

( key => array( 'type' => type, 'value' => value, '/length' => length ) ) for the string

_parse_pdf_LPD_dictionary()

_parse_pdf_LPD_dictionary(  $source_string,   $filesize) : mixed

Parse a PDF Linearization Parameter Dictionary object

Returns an array of dictionary contents, classified by object type: boolean, numeric, string, hex (string), indirect (object), name, array, dictionary, stream, and null. The array also has a '/length' element containing the number of bytes occupied by the dictionary in the source string, excluding the enclosing delimiters, if passed in.

Parameters

$source_string
$filesize

Returns

mixed —

array of dictionary objects on success, false on failure

_parse_pdf_dictionary()

_parse_pdf_dictionary(  $source_string,   $offset) : array

Parse a PDF dictionary object

Returns an array of dictionary contents, classified by object type: boolean, numeric, string, hex (string), indirect (object), name, array, dictionary, stream, and null. The array also has a '/length' element containing the number of bytes occupied by the dictionary in the source string, excluding the enclosing delimiters.

Parameters

$source_string
$offset

Returns

array —

( '/length' => length, key => array( 'type' => type, 'value' => value ) ) for each dictionary field

_extract_pdf_trailer()

_extract_pdf_trailer(  $file_name,   $file_offset) : mixed

Extract dictionary from traditional cross-reference + trailer documents

Parameters

$file_name
$file_offset

Returns

mixed —

array of "PDF dictionary arrays", newest first, or NULL on failure