Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Feature] LLMs for Machine Translation of slot-annotated data #260

Open
j-hoscilowic opened this issue Apr 16, 2024 · 0 comments
Open

Comments

@j-hoscilowic
Copy link

Describe the feature
Expansion of SLU to new languages requires much work on manual annotation of data. In order to significantly reduce amount of work, LLMs can be used to machine translate slot-annotated data, e.g.
"play me <a> Dune <a> on <b> Youtube <b>" => "Spiele mir <a> Dune <a> auf <b> Youtube <b>"

Such feature is especially useful for expansion of On-Device SLU to new languages, as high quality multilingual transformers/LLMs cannot be used as core SLU model in this case.

Expected behavior
MT-LLM pipeline expects english sentences annotated in generic <> tags format (for example: "play me <a> Dune <a> on <b> Youtube <b>") and outputs translated sentence in the same format ("Spiele mir <a> Dune <a> auf <b> Youtube <b>"). Such data format can be easily converted to BIO annotation and to other popular NLU formats.

Additional context
https://paperswithcode.com/paper/large-language-models-for-expansion-of-spoken

In our recent work, we fine-tuned MT-LLM called BigTranslate towards MT of slot-annotated NLU data. We used parallel Amazon MASSIVE dataset for fine-tuning. There is significant performance improvement after fine-tuning (compared to zero-shot LLM-based machine translation) on multiATIS++ benchmark.

Here you can find fine-tuned BigTranslate: https://huggingface.co/Samsung/BigTranslateSlotTranslator
Here you can find code for fine-tuning + code for NLU training: https://github.com/samsung/mt-llm-nlu

In summary, we are wondering how we can merge our work into this project ) And what parts of our work might be useful for this proejct (e.g., scripts for conversion from BIO to tags format ??).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant