class SimHashUtils

SimHashUtils contains a set of static methods to serialize and normalize person names and titles. The function getSimHash1 calculates the interhash of a Rousource.

Constants

SINGLE_LETTER

DEFAULT_LAST_FIRST_NAMES

By default, all author and editor names are in "Last, First" order

PERSON_NAME_DELIMITER

the delimiter used for separating person names

FIRSTNAME_LASTNAME_DELIMITER

the delimiter used for separating first and last name

Methods

static 
getSimHash1(Resource $resource)

No description

static 
getSimHash2(Resource $resource)

No description

static string|null
serializePersonNames(array $persons, bool $lastFirstNames = self::DEFAULT_LAST_FIRST_NAMES, string $delimiter = self::PERSON_NAME_DELIMITER)

No description

static null|string
serializePersonName(Person $person, bool $lastFirstName)

No description

static string
getNormalizedTitle(string $string)

No description

static string
getNormalizedPersons(array|ArrayList $persons)

No description

static string
normalizePersonList(array|ArrayList $persons)

Normalizes a collection of persons by normalizing their names and sorting them.

static string
normalizePerson(Person $person)

Used for "sloppy" hashes, i.e., the inter hash.

static string
getNormalizedYear(string $year)

No description

static string
getFirstPersonsLastName(array|ArrayList $persons)

No description

Details

at line 135
static getSimHash1(Resource $resource)

Parameters

Resource $resource

at line 144
static getSimHash2(Resource $resource)

Parameters

Resource $resource

at line 162
static string|null serializePersonNames(array $persons, bool $lastFirstNames = self::DEFAULT_LAST_FIRST_NAMES, string $delimiter = self::PERSON_NAME_DELIMITER)

Parameters

array $persons $persons
bool $lastFirstNames
string $delimiter

Return Value

string|null

at line 186
static null|string serializePersonName(Person $person, bool $lastFirstName)

Parameters

Person $person
bool $lastFirstName

Return Value

null|string

at line 220
static string getNormalizedTitle(string $string)

Parameters

string $string

Return Value

string

at line 234
static string getNormalizedPersons(array|ArrayList $persons)

Parameters

array|ArrayList $persons – array of strings

Return Value

string [name1, name2, name3]

at line 247
static string normalizePersonList(array|ArrayList $persons)

Normalizes a collection of persons by normalizing their names and sorting them.

Parameters

array|ArrayList $persons
  • a list of persons.

Return Value

string A sorted set of normalized persons.

at line 282
static string normalizePerson(Person $person)

Used for "sloppy" hashes, i.e., the inter hash.

The person name is normalized according to the following scheme: x.last, where x is the first letter of the first name and last is the last name.

Example:

Donald E. Knuth       --> d.knuth
D.E.      Knuth       --> d.knuth
Donald    Knuth       --> d.knuth
          Knuth       --> knuth
Knuth, Donald         --> d.knuth
Knuth, Donald E.      --> d.knuth
Maarten de Rijke      --> m.rijke
Balby Marinho, Leandro--> l.marinho

Parameters

Person $person Person|string $person

Return Value

string

at line 376
static string getNormalizedYear(string $year)

Parameters

string $year

Return Value

string

at line 386
static string getFirstPersonsLastName(array|ArrayList $persons)

Parameters

array|ArrayList $persons

Return Value

string first person's last name.