Skip to content

Document Markings in gerbil.nif.transfer

Michael Röder edited this page Nov 10, 2015 · 5 revisions

A NIF document object can contain 'Markings' that have further information about single parts of the document, e.g., the position and URL of a named entity. In this article, we want to present the different interaces and classes that the gerbil.nif.transfer library offers. Where possible, we added the RDF triples to which an instance of an interface or a class would be translated.

The following RDF prefixes are used in this article

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .

Overview

The following UML diagram shows the interfaces, classes and their relations described in this article. hierarchy of marking interfaces and classes

Interfaces

The interfaces described in this section can be found in the package org.aksw.gerbil.transfer.nif.

Marking

This is the broadest interface and defines no special methods. Implementing this interface does not really make sense.

Meaning

This is a marking that contains a certain meaning. This could be a single URI or a set of URIs. Note, that if a single meaning contains a set of URIs, GERBIL will assume that the URIs can be connected with owl:sameAs.

Span

Classes implementing this interface mark a certain part of the text of a document. They have a start position and a length. Both measured in Java characters. Following the conventions in Java, the position end = start + length is the position of the first character after the span.

Instances of this interface are translated into RDF nodes with an own URI. They are connected to the RDF node of the document using the nif:referenceContext property. The start and end positions are according to the NIF standard added to the node URI and to the node itself using the nif:beginIndex and nif:endIndex properties, respectively.

TypedMarking

A typed marking does contain a set of types. Note, that there is no class that implements this interface without implementing one of the other interfaces, too.

ScoredMarking

A scored marking contains a confidence score of the annotator. The higher this score, the more confident an annotator is regarding the correctness of this marking. Note, that there is no class that implements this interface without implementing one of the other interfaces, too.

MeaningSpan

This interface is the combination of the Meaning and the Span interfaces.

TypedSpan

This interface is the combination of the TypedMarking and the Span interfaces.

ScoredSpan

This interface is the combination of the ScoredMarking and the Span interfaces.

Classes

The classes described in this section can be found in the package org.aksw.gerbil.transfer.nif.data.

Annotation

This class implements the Meaning interface. It is used to add a general topic to a document, e.g., for the C2KB task. Instances of this class are added as RDF nodes with their own URI. The document references them using the nif:topic property.

ScoredAnnotation

This class extends the Annotation class by implementing the ScoredMarking interface and adding a confidence score to the annotation. In the RDF graph, the confidence score is added to the annotation using the itsrdf:taConfidence property.

SpanImpl

This class implements the Span interface.

ScoredSpanImpl

This class extends the SpanImpl class and implements the ScoredSpan interface.

NamedEntity

This class represents a named entity inside the text. It extends the SpanImpl class and implements the MeaningSpan interface. In the RDF graph, the URI(s) of the named entity are added to the Span RDF node using the itsrdf:taIdentRef property.

ScoredNamedEntity

This class extends the NamedEntity class by implementing the ScoredMarking interface and adding a confidence score to the named entity. In the RDF graph, the confidence score is added to the named entities RDF node using the itsrdf:taConfidence property.

TypedNamedEntity

This class extends the NamedEntity class by implementing the TypedMarking interface and adding a set of types to the named entity. In the RDF graph, the type URI(s) are added to the Span RDF node using the itsrdf:taClassRef property.

ScoredTypedNamedEntity

This class extends the ScoredTypedNamedEntity class by implementing the ScoredMarking interface and adding a confidence score to the named entity. In the RDF graph, the confidence score is added to the named entities RDF node using the itsrdf:taConfidence property.

TypedSpanImpl

This class represents a part of the text, for which no meaning (in terms of one or more URIs) but a list of types is available. In the RDF graph, the type URI(s) are added to the Span RDF node using the itsrdf:taClassRef property.