Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is license for Language data? #100

Open
madkote opened this issue Mar 22, 2021 · 3 comments
Open

What is license for Language data? #100

madkote opened this issue Mar 22, 2021 · 3 comments

Comments

@madkote
Copy link

madkote commented Mar 22, 2021

Hi

What is license for Language data? MIT, Apache, ...?
Very important to know, if one can use it in a commercial application.

Thanks!

@barrust
Copy link
Owner

barrust commented Mar 22, 2021

The data was orginially pulled from opensubtitles.org but was heavily modified and the dictionary itself is part of this code. I did not pull the dictionaries directly from any other project. Then again, I am not a lawyer. I gave credit to the original source of the text used to build the dictionaries in the README and the script that pulled and parsed the data for the dictionary builds.

@madkote
Copy link
Author

madkote commented Mar 23, 2021

@barrust thanks for reply. I am not sure, what kind of license is it - kind of hard to find out on their homepage.

But the if the license is not for commercial use, then it must be clearly stay in the README here. Not knowing the law does not free from responsibility 8))

Anyhow, please keep the issue open - I will try to find out the license.

can you list here, which data you have used and modified? it will make it simpler.

@barrust
Copy link
Owner

barrust commented Mar 23, 2021

The scripts/build_dictionary.py script lists each item used but the data can be found here: https://opus.nlpl.eu/OpenSubtitles2018.php

Per this page, the requirements to use are to:

  1. Add the url to http://www.opensubtitles.org/
  2. Please cite the following article if you use any part of the corpus in your own work:
    P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)

I have had both items on the README but if this isn't enough, another data source could be found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants