Skip to content

Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)

License

Notifications You must be signed in to change notification settings

jenojp/extractacy

Repository files navigation

extractacy - pattern extraction and named entity linking for spaCy

Build Status Built with spaCy Code style: black pypi Version DOI

spaCy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, or laboratory results)

Installation and usage

Install the library.

pip install extractacy

Import library and spaCy.

import spacy
from spacy.pipeline import EntityRuler
from extractacy.extract import ValueExtractor

Load spacy language model. Set up an EntityRuler for the example.

nlp = spacy.load("en_core_web_sm")
# Set up entity ruler
ruler = nlp.add_pipe("entity_ruler")
patterns = [
    {"label": "TEMP_READING", "pattern": [{"LOWER": "temperature"}]},
    {"label": "TEMP_READING", "pattern": [{"LOWER": "temp"}]},
    {
        "label": "DISCHARGE_DATE",
        "pattern": [{"LOWER": "discharge"}, {"LOWER": "date"}],
    },
    
]
ruler.add_patterns(patterns)

Define which entities you would like to link patterns to. Each entity needs 3 things:

  1. patterns to search for (list). This relies on spaCy token matching syntax.
  2. n_tokens to search around a named entity (int or sent)
  3. direction (right, left, both)
# Define ent_patterns for value extraction
ent_patterns = {
    "DISCHARGE_DATE": {"patterns": [[{"SHAPE": "dd/dd/dddd"}],[{"SHAPE": "dd/d/dddd"}]],"n": 2, "direction": "right"},
    "TEMP_READING": {"patterns": [[
                        {"LIKE_NUM": True},
                        {"LOWER": {"IN": ["f", "c", "farenheit", "celcius", "centigrade", "degrees"]}
                        },
                    ]
                ],
                "n": "sent",
                "direction": "both"
        },
}

Add ValueExtractor to spaCy processing pipeline

nlp.add_pipe("valext", config={"ent_patterns":ent_patterns}, last=True)

doc = nlp("Discharge Date: 11/15/2008. Patient had temp reading of 102.6 degrees.")
for e in doc.ents:
    if e._.value_extract:
        print(e.text, e.label_, e._.value_extract)
        
## Discharge Date DISCHARGE_DATE 11/15/2008
## temp reading TEMP_READING 102.6 degrees

Contributing

contributing

Authors

  • Jeno Pizarro

License

license

About

Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages