-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ontolearn/nces_data_generator/generate_data.py #325
Comments
That's already the nces data generator. We can just move it somewhere else.
I can do that
…On Tue, 5 Dec 2023, 17:16 Caglar Demir, ***@***.***> wrote:
This is a standalone script and it shouldn't be part of ontolearn.
We should have a learning problem python package (e.g.
ontolearn/lp_generator ) that allows us to generate learning problems.
Would you like to take care of ontolearn/lp_generator @Jean-KOUAGOU
<https://github.com/Jean-KOUAGOU> ?
—
Reply to this email directly, view it on GitHub
<#325>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK4X4EV3JF3VHVVZPILJAULYH5CEBAVCNFSM6AAAAABAH6PAKCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZDMNJYGI4DCNI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
nces_data_generator is not a python package or a python module but rather, it is a folder containing two scripts, i.e.,
Ideally, we should have a python module that one should be able to import a learning problem generator ( say from ontolearn.lp_generator import CustomLPGen
gen=CustomLPGen(args)
# a list of learning problems
lp=gen.generate()
# generates a list of learning problems and saves it locally
gen.generate_and_save(path) Can you still do that ? |
In simple terms, you mean creating a python class inside ontolearn?
…On Tue, 5 Dec 2023, 17:41 Caglar Demir, ***@***.***> wrote:
nces_data_generator
<https://github.com/dice-group/Ontolearn/tree/master/ontolearn/nces_data_generator>
is not a python package. It is rather a folder containing two scripts.
—
Reply to this email directly, view it on GitHub
<#325 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK4X4EXSQKBFVNM5H2LPE33YH5FEVAVCNFSM6AAAAABAH6PAKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBRGE4DAMRTGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I create a python class
LPGenerator in a script called learning_problem_generator, yes?
…On Tue, 5 Dec 2023, 18:10 N'dah Jean Kouagou, ***@***.***> wrote:
In simple terms, you mean creating a python class inside ontolearn?
On Tue, 5 Dec 2023, 17:41 Caglar Demir, ***@***.***> wrote:
> nces_data_generator
> <https://github.com/dice-group/Ontolearn/tree/master/ontolearn/nces_data_generator>
> is not a python package. It is rather a folder containing two scripts.
>
> —
> Reply to this email directly, view it on GitHub
> <#325 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AK4X4EXSQKBFVNM5H2LPE33YH5FEVAVCNFSM6AAAAABAH6PAKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBRGE4DAMRTGQ>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
No
If it is still not clear, @alkidbaci could you take care of it if my description makes sense to you ? |
Yes, it does. Thanks
…On Tue, 5 Dec 2023, 19:15 Caglar Demir, ***@***.***> wrote:
No
1. lp_generator should be a python module (
https://docs.python.org/3/tutorial/modules.html) containing __init__.py
and nces_data_generator.py
2. nces_data_generator.py implements a python class called CustomLPGen
3. __init__.py has the following line .nces_data_generator import
CustomLPGen
If it is still not clear, @alkidbaci <https://github.com/alkidbaci> could
you take care of it if my description makes sense to you ?
—
Reply to this email directly, view it on GitHub
<#325 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK4X4ETKAWILZERVGO63R3TYH5QFHAVCNFSM6AAAAABAH6PAKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBRGM2TCOJQGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
In which branch should we do this? |
Please create one and after merging into dev, we can delete it |
The dev branch has some issue with owlapy, see then error below:
|
I created the learning problem generator module but I encounter the error above. Would it make sens to start from a different branch? |
Your branch is not up-to-date. The error report shows that a dependency is missing. Please firstly merge dev into your branch. |
Ok, I will check.
…On Mon, 11 Dec 2023, 19:28 Caglar Demir, ***@***.***> wrote:
Your branch is not up-to-date. The error report shows that a dependency is
missing. Please firstly merge dev into your branch.
—
Reply to this email directly, view it on GitHub
<#325 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK4X4ETICIPCO5NY6MN4GWLYI5GEXAVCNFSM6AAAAABAH6PAKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJQGY2DGOJRGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thanks. It runs now. Is the following ok? The generator takes the path to the knowledge base, and other optional parameters, including the output path. If the output path is not specified, it stores the generated data at the location of the input knowledge base. Below an example:
|
Thank you. Looks great.
|
Sure |
Actually, we cannot fix the number of concepts to be generated. I intentionally did not set it this way. We only specify the following hyperparameters and they determine the number of concepts to be generated--this process is also stochastic unless we fix a random seed--which I did not find necessary for NCES.
|
I can add the test about storing the generated learning problems and loading them. But if we really need a specific number (or at least the maximum number) of learning problems we can enforce this |
We have to ensure that the data generating process is not a random process. Perhaps, we can determine the random seed for the data generation proces by introducing
Yes please do |
But setting max_num_lps to a specific number won't necessarily speed up the generation process. The following are the ones that can reduce the amount of data to generate: max_num_lps will only speedup the part concerned with removing redundant concepts. Is this ok? |
If they ( |
The efficiency in the data generation process is not the concern here.
|
We can return a given number of concepts but they are not the same ones for different runs despite random.seed at the top level of the LPGen class call |
This is how I implemented the test import unittest
import json
from ontolearn.lp_generator import LPGen
from ontolearn.utils import setup_logging
setup_logging("ontolearn/logging_test.conf")
PATH_FAMILY = 'KGs/Family/family-benchmark_rich_background.owl'
STORAGE_DIR = 'KGs/Family/new_dir'
last_10_concepts_to_generate = ['∃ hasParent.(Parent ⊓ (∃ hasChild.(¬Granddaughter)))',
'∃ hasParent.(PersonWithASibling ⊓ (∀ hasParent.(¬Granddaughter)))',
'∃ hasParent.(PersonWithASibling ⊓ (∃ hasSibling.(¬Sister)))',
'∃ hasParent.(Granddaughter ⊓ (∃ hasChild.(¬Sister)))',
'∃ hasParent.(Granddaughter ⊓ (∀ hasChild.(¬Sister)))',
'Sister ⊔ (∃ hasParent.(Child ⊓ (¬Brother)))',
'∃ hasParent.(Daughter ⊓ (∃ hasSibling.(¬Sister)))',
'Sister ⊔ (∃ hasParent.(∃ hasSibling.(¬Grandparent)))',
'∃ hasParent.(Grandmother ⊓ (∀ hasParent.(¬Granddaughter)))',
'Sister ⊔ (∃ hasParent.(Child ⊓ (¬Grandchild)))']
class LPGen_Test(unittest.TestCase):
def test_generate_load(self):
lp_gen = LPGen(kb_path=PATH_FAMILY, storage_dir=STORAGE_DIR)
lp_gen.generate()
print("Loading generated data...")
with open(f"{STORAGE_DIR}/triples/train.txt") as file:
triples_data = file.readlines()
print("Number of triples:", len(triples_data))
with open(f"{STORAGE_DIR}/LPs.json") as file:
lps = json.load(file)
print("Number of learning problems:", len(lps))
self.assertGreaterEqual(lp_gen.lp_gen.max_num_lps, len(lps))
self.assertEqual(list(lps.keys())[-10:], last_10_concepts_to_generate)
if __name__ == '__main__':
unittest.main() |
Everything else passes except self.assertEqual(list(lps.keys())[-10:], last_10_concepts_to_generate) |
This is a standalone script and it shouldn't be part of ontolearn.
We should have a learning problem python module (e.g.
ontolearn/lp_generator
) that allows us to generate learning problems.Would you like to take care of
ontolearn/lp_generator
@Jean-KOUAGOU ?The text was updated successfully, but these errors were encountered: