Skip to content

julsraemy/loud-consistency

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LOUD Consistency

Initiatlly, there were several ideas on how consistency of Linked Open Usable Data (LOUD), namely that Linked Art and International Image Interoperability Framework (IIIF) data on the LUX platform could be verified according to several factors (syntax, patterns, compliance to the APIs). This repository is an attempt to document the various actions undertaken in the context of a PhD Thesis on LOUD for Cultural Heritage in terms of data validation and consistency.

Linked Art

For the first action, a compressed slice of the Linked Art data from LUX, Yale Collections Discovery has been provided. For the second and third action related purely to LUX, a CSV has been provided.

Syntax

  • Syntax against the Linked Art Schema Definitions - Linked Art specification <-> Data [Automatable] - 1/24th records (LUX), the highest number of parallel processes. The JSONL file can be extracted using lux_jsonl_extractor.py. See lux-ectractor.

A fork of the Linked Art JSON Validator was leveraged.

Unit to unit consistency (YCBA and YUAG)

  • To some extent, first by hand with a couple of instances and through thumb drives.

Scripts and Outputs

  • cross_unit_terms.csv: This CSV file contains terms that are present in both the YCBA and YUAG digital collections.
  • getty-scraping.py: This Python script fetches term labels from the Getty API for terms present in both the YCBA and YUAG collections. The script also adds the Linked Art entity endpoint (e.g., place, person) to the CSV file.
  • intersection-ycba-yuag-vocab.csv: The output CSV file generated by the getty-scraping.py script, containing the terms, their labels from the Getty API, and their respective Linked Art entity endpoints.
  • venn.py: This Python script generates an UpSet plot to visualize the overlap and intersection of Getty terms in the YCBA and YUAG digital collections. It also includes the number of terms from AAT, ULAN, and TGN.

Consistency of concepts between LUX and the Getty Vocabularies

  • Initial idea: LUX <-> external Linked Art that have the same concepts, that would be the Getty JSON-LD representation of the AAT, ULAN, TGN [To some extent, first by hand with a couple of instances]

Scripts

  • vocabs-lux-alignment-trimmed.csv: This CSV file contains terms aligned between LUX and the Getty Vocabularies (AAT, ULAN, TGN), specifically for concepts present in either the YCBA or YUAG collections.
  • lux_jsonl_extractor.py: A Python script to extract and analyze LUX data, focusing on JSON-LD representation of concepts and their alignment with the Getty Vocabularies.

IIIF

Testing how consistent are IIIF resources within LUX (V2.1 and 3.0) with the help of the Presentation API Validator and a dedicated shell script.

Scripts

  • validate_urls.sh: A shell script to validate IIIF Presentation API URLs using the Presentation API Validator.
  • lux-iiif-analysis.py: A Python script to create data visualisations based on the JSONL output file.

Additional Information

For more detailed information on each script and its usage, refer to the respective README files located in the subdirectories.

Releases

No releases published

Packages

No packages published