Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repeated label on valid name format: Surname, Givenname MiddleName #90

Open
matthoskins1980 opened this issue Oct 5, 2019 · 1 comment

Comments

@matthoskins1980
Copy link

ORIGINAL STRING: Bianchette, Michael David
PARSED TOKENS: [('Bianchette,', 'Surname'), ('Michael', 'GivenName'), ('David', 'Surname')]
UNCERTAIN LABEL: Surname

When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly

To report an error in labeling a valid name, open an issue at https://github.com/datamade/probablepeople/issues/new - it'll help us continue to improve probablepeople!

@jbrezovan
Copy link

Hi, all. Has there been any followup on this issue? I am seeing it as well. Out of a dataset of 320,000 names, probablepeople had trouble parsing about 19,000 of them, and 11,000 of those were because of this exact issue.

I tried following parserator's instructions for training the model with additional examples--used parserator's label utility to create 11 examples, which I then trained my model with. It says it wrote out an updated .crfsuite file, but I do not see an updated copy of this file anywhere, and the model's behavior has not changed. (The only .crfsuite files I see are the three that were installed with probablepeople, and they have retained their original last-modified timestamps.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants