Skip to content

GiulioC/REPOSUM-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantically Aware Text Categorization for Metadata Annotation

DOI

Public repository that stores the code for the Article submitted at the 15th Italian Research Conference on Digital Libraries (IRCDL 2019, Pisa).

Abstract

In this paper we illustrate a system aimed at solving a long-standing and challenging problem: acquiring a classifier to automatically annotate bibliographic records by starting from a huge set of unbalanced and unlabelled data. We illustrate the main features of the dataset, the learning algorithm adopted, and how it was used to discriminate philosophical documents from documents of other disciplines. One strength of our approach lies in the novel combination of a standard learning approach with a semantic one: the results of the acquired classifier are improved by accessing a semantic network containing conceptual information. We illustrate the experimentation by describing the construction rationale of training and test set, we report and discuss the obtained results and conclude by drawing future work.

Keywords

Text categorization, Lexical resources, Semantics, NLP, Language models