LMs 4 Text2SPARQL

This repository contains code to finetune a varietey of LMs with under one billion parameters on different datasets.

How to use

There are 4 evaluation datasets available right now:

An organizational demo graph
Sample data from Coypu
QALD10
LC_QUAD

To run the training for 20 epochs on the coypu dataset for example, execute the following:

python train.py --num-epochs 20 --dataset coypu

The code will stop every five iterations to check how the model performs. This will create json-Files under ./results that contain the generated SPARQL query as well as the gold standard, using the following naming scheme:

f"{model_checkpoint}_{dataset_name}_{run_id}_{evaluation_step}.json"

So for example, Babelscape_mrebel-base_coypu_R01_12.json contains queries generated by the model Babelscape/mREBEL-base, trained on the CoyPu dataset. The ID of the run is R01 and it is the 12th evaluation step. The Run ID is used to identify which json files belong together when doing the evaluation and can be chosen arbitrarily by the user.

To actually evaluate the results, run the following:

python eval.py --num-epochs 20 --dataset coypu

The dataset parameter is important to let the eval script know which dataset to run the queries on. The number of epochs are specified here as well to tell the script how many json files to expect per model. It will print some stuff to the screen which was only used during development, you can ignore that. After the script is done, you will find a file called total_{dataset}.json in the results folder, for example ./results/total_coypu.json which contains for each model the number of correctly translated questions.

Docker

With make you get an overview of all the avialable tasks. First you would need to build the docker image localy.

docker build -t akws/lms4text2sparql .

Then you can run the benchmark for a given dataset with:

make run-orga

how to cite

If you want to reference our work, please cite the related paper "Leveraging small language models for Text2SPARQLtasks to improve the resilience of AI assistance" by Felix Brei et al accepted for publication in the workshop D2R2@ESWC2024:

@InProceedings{Brei2024LeveragingSmallLanguage,
  author    = {Brei, Felix and Frey, Johannes and Meyer, Lars-Peter},
  booktitle = {Proceedings of the Third International Workshop on Linked Data-driven Resilience Research 2024 (D2R2'24), colocated with ESWC 2024},
  title     = {Leveraging small language models for Text2SPARQLtasks to improve the resilience of AI assistance},
  year      = {2024},
  comment   = {accepted for publication},
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
datasets		datasets
results		results
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
eval.py		eval.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets

datasets

results

results

.gitignore

.gitignore

Dockerfile

Dockerfile

Makefile

Makefile

README.md

README.md

eval.py

eval.py

train.py

train.py

Repository files navigation

LMs 4 Text2SPARQL

How to use

Docker

how to cite

About

Releases 1

Packages

Contributors 3

Languages

AKSW/LMs4Text2SPARQL

Folders and files

Latest commit

History

Repository files navigation

LMs 4 Text2SPARQL

How to use

Docker

how to cite

About

Resources

Stars

Watchers

Forks

Languages