Skip to content

Parse a list of names from raw text into a dictionary of unique authors. [for use with LIS authority control]

Notifications You must be signed in to change notification settings

lib-re/lib-name-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

lib-name-parser

Description

short

Parse a list of names from raw text into a dictiory of unique authors. [for use with LIS authority control]

long

The lib-name-parser was originally intended for use with the dublin-core-text-parser. It will be used to process a list of unformatted strings containing names (e.g. "Dr. Douglas Raymond Murrow III") and create a uniquely identified and combined object used for matching.

This will (hopefully) enable the linking of similar names across multiple occurances ("Douglas Murrow", "Douglas Ray Murrow" "Doug R. Murrow"), and perspectively allow for the detection of typos or misspellings ("Daug R. Marrow"). With some extra work, it could also automate the creation of authority records between issues and within collections.

The program follows the lineage from Josh Fraser's original implementation of the php-name-parser, and from Garve Hays's java port, NameParser. Its naming scheme is meant to reflect this geneology.

Purpose

short

Automate the complex, time-consuming, and often mind-numbing process of manual authority control in an extensible, manageable way.

long

Going through any periodical or other succession of similarly sourced items and logging metadata about its contributors, you often find small inconsistencies with how the names are displayed.

Whether this comes in the form of typos/OCR errors (Doug Murrow <=> Dog Murrow), progressive revelation (Doug Murrow => Douglas Murrow => Douglas R. Murrow), or title acquisition (Douglas R. Murrow => Dr. Douglas R. Murrow, PhD), these minor changes over time can complicate searching and collocation in catalogs, databases, etc.

This software is intended for use as an external library (software sense) to other applications, assisting them with combining these disparate references to the same author/individual by combining them under a single universal ID, object, and/or authority record (maybe doing this, we'll see) in a way that can remain controlled/monitored or fully automated.

I intend to ensure that the program itself remains agnostic as to the balance between standardization (fixing things to make them match up) and provenance (retaining how they originally appeared), by providing the means to run it at varying levels of automation and with certain features turned off. Knowing, of course, that this will be an open source project and more advanced edits can be made by the user according to need.

Background

Name parsing is a very common operation in software development. I wanted to find (or create) a standard algorithm for doing this with LIS software, but found that @joshfraser had already created one in PHP and JavaScript (links below).

Looking into these derivatives, I first forked @gkhays 'NameParser', but found that this would be a much heftier implementation given the time/occurrences component and that it might be wise to create my own from scratch instead of pull requesting a gigantic alteration.

This software, as mentioned, follows this authorial lineage from Josh to Garve to myself. The application of this algorithm to library authority control is, to my knowledge, a novel and meaningful contribution.

External Links

names (general)

software/algorithm

authority control (LIS)

note: I did come up with a very similar algorithm and parsing scheme independent of these sources, but given the clarity of their documentation and the prior existence of their implementations and their usefulness to me, it would be dishonest to claim full credit for it

About

Parse a list of names from raw text into a dictionary of unique authors. [for use with LIS authority control]

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published