You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently learn that the TEI XML format is becoming popular in the linguistics community. In this format, texts are saved in small chunks with associated meta information (e.g. speaker), and, sometime, POS tags.
This would be cool. Not in the least because tools like GROBID allow you to parse out things like references and headers/footers etc. and saving it as TEI-xml. [I'm just starting to look into quanteda, so sorry if quanteda can do this natively already]
I recently learn that the TEI XML format is becoming popular in the linguistics community. In this format, texts are saved in small chunks with associated meta information (e.g. speaker), and, sometime, POS tags.
See:
https://tei-c.org/
https://tei-c.org/activities/projects/
https://dracor.org/
The text was updated successfully, but these errors were encountered: