ontolearn/nces_data_generator/generate_data.py #325

Demirrr · 2023-12-05T16:15:49Z

This is a standalone script and it shouldn't be part of ontolearn.

We should have a learning problem python module (e.g. ontolearn/lp_generator ) that allows us to generate learning problems.

Would you like to take care of ontolearn/lp_generator @Jean-KOUAGOU ?

The text was updated successfully, but these errors were encountered:

Jean-KOUAGOU · 2023-12-05T16:24:20Z

That's already the nces data generator. We can just move it somewhere else. I can do that

…

On Tue, 5 Dec 2023, 17:16 Caglar Demir, ***@***.***> wrote: This is a standalone script and it shouldn't be part of ontolearn. We should have a learning problem python package (e.g. ontolearn/lp_generator ) that allows us to generate learning problems. Would you like to take care of ontolearn/lp_generator @Jean-KOUAGOU <https://github.com/Jean-KOUAGOU> ? — Reply to this email directly, view it on GitHub <#325>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK4X4EV3JF3VHVVZPILJAULYH5CEBAVCNFSM6AAAAABAH6PAKCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZDMNJYGI4DCNI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Demirrr · 2023-12-05T16:41:36Z

nces_data_generator is not a python package or a python module but rather, it is a folder containing two scripts, i.e.,

Ideally, we should have a python module that one should be able to import a learning problem generator ( say CustomLPGen() from ontolearn/lp_generator) from it to generate learning problems, e.g.,

from ontolearn.lp_generator import CustomLPGen

gen=CustomLPGen(args)
# a list of learning problems
lp=gen.generate()
# generates a list of learning problems and saves it locally
gen.generate_and_save(path)

Can you still do that ?

Jean-KOUAGOU · 2023-12-05T17:11:09Z

In simple terms, you mean creating a python class inside ontolearn?

…

On Tue, 5 Dec 2023, 17:41 Caglar Demir, ***@***.***> wrote: nces_data_generator <https://github.com/dice-group/Ontolearn/tree/master/ontolearn/nces_data_generator> is not a python package. It is rather a folder containing two scripts. — Reply to this email directly, view it on GitHub <#325 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK4X4EXSQKBFVNM5H2LPE33YH5FEVAVCNFSM6AAAAABAH6PAKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBRGE4DAMRTGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Jean-KOUAGOU · 2023-12-05T17:13:12Z

I create a python class LPGenerator in a script called learning_problem_generator, yes?

…

On Tue, 5 Dec 2023, 18:10 N'dah Jean Kouagou, ***@***.***> wrote: In simple terms, you mean creating a python class inside ontolearn? On Tue, 5 Dec 2023, 17:41 Caglar Demir, ***@***.***> wrote: > nces_data_generator > <https://github.com/dice-group/Ontolearn/tree/master/ontolearn/nces_data_generator> > is not a python package. It is rather a folder containing two scripts. > > — > Reply to this email directly, view it on GitHub > <#325 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AK4X4EXSQKBFVNM5H2LPE33YH5FEVAVCNFSM6AAAAABAH6PAKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBRGE4DAMRTGQ> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

Demirrr · 2023-12-05T18:15:36Z

No

lp_generator should be a python module (https://docs.python.org/3/tutorial/modules.html) containing __init__.py and nces_data_generator.py
nces_data_generator.py implements a python class called CustomLPGen
__init__.py has the following line .nces_data_generator import CustomLPGen

If it is still not clear, @alkidbaci could you take care of it if my description makes sense to you ?

Jean-KOUAGOU · 2023-12-05T18:44:39Z

Yes, it does. Thanks

…

On Tue, 5 Dec 2023, 19:15 Caglar Demir, ***@***.***> wrote: No 1. lp_generator should be a python module ( https://docs.python.org/3/tutorial/modules.html) containing __init__.py and nces_data_generator.py 2. nces_data_generator.py implements a python class called CustomLPGen 3. __init__.py has the following line .nces_data_generator import CustomLPGen If it is still not clear, @alkidbaci <https://github.com/alkidbaci> could you take care of it if my description makes sense to you ? — Reply to this email directly, view it on GitHub <#325 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK4X4ETKAWILZERVGO63R3TYH5QFHAVCNFSM6AAAAABAH6PAKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBRGM2TCOJQGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Jean-KOUAGOU · 2023-12-06T10:36:40Z

In which branch should we do this?

Demirrr · 2023-12-06T10:54:42Z

Please create one and after merging into dev, we can delete it

Jean-KOUAGOU · 2023-12-11T18:04:10Z

The dev branch has some issue with owlapy, see then error below:

from ontolearn.lp_generator import LPGen
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 from ontolearn.lp_generator import LPGen

File ~/Documents/Ontolearn/ontolearn/lp_generator/__init__.py:1
----> 1 from .generate_data import LPGen
      2 from .helper_classes import RDFTriples, KB2Data

File ~/Documents/Ontolearn/ontolearn/lp_generator/generate_data.py:1
----> 1 from .helper_classes import RDFTriples, KB2Data
      3 class LPGen:
      4     def __init__(self, kb_path, storage_path=None, depth=5, max_child_length=25, refinement_expressivity=0.6,
      5                  downsample_refinements=True, k=10, num_rand_samples=150, min_num_pos_examples=1):

File ~/Documents/Ontolearn/ontolearn/lp_generator/helper_classes.py:4
      2 import random
      3 from rdflib import graph
----> 4 from ontolearn.knowledge_base import KnowledgeBase
      5 from owlapy.render import DLSyntaxObjectRenderer
      6 from ontolearn.refinement_operators import ExpressRefinement

File ~/Documents/Ontolearn/ontolearn/knowledge_base.py:6
      4 import random
      5 from typing import Iterable, Optional, Callable, overload, Union, FrozenSet, Set, Dict
----> 6 from ontolearn.base import OWLOntology_Owlready2, OWLOntologyManager_Owlready2, OWLReasoner_Owlready2
      7 from ontolearn.base.fast_instance_checker import OWLReasoner_FastInstanceChecker
      8 from owlapy.model import OWLOntologyManager, OWLOntology, OWLReasoner, OWLClassExpression, \
      9     OWLNamedIndividual, OWLObjectProperty, OWLClass, OWLDataProperty, IRI, OWLDataRange, OWLObjectSomeValuesFrom, \
     10     OWLObjectAllValuesFrom, OWLDatatype, BooleanOWLDatatype, NUMERIC_DATATYPES, TIME_DATATYPES, OWLThing, \
     11     OWLObjectPropertyExpression, OWLLiteral, OWLDataPropertyExpression

File ~/Documents/Ontolearn/ontolearn/base/__init__.py:2
      1 """Implementations of owlapy abstract classes based on owlready2."""
----> 2 from owlapy._utils import MOVE
      3 from ontolearn.base._base import OWLOntologyManager_Owlready2, OWLReasoner_Owlready2, \
      4     OWLOntology_Owlready2, BaseReasoner_Owlready2
      5 from ontolearn.base.complex_ce_instances import OWLReasoner_Owlready2_ComplexCEInstances

ModuleNotFoundError: No module named 'owlapy'

Jean-KOUAGOU · 2023-12-11T18:05:43Z

I created the learning problem generator module but I encounter the error above. Would it make sens to start from a different branch?

Demirrr · 2023-12-11T18:28:17Z

Your branch is not up-to-date. The error report shows that a dependency is missing. Please firstly merge dev into your branch.

Jean-KOUAGOU · 2023-12-11T18:54:01Z

Ok, I will check.

…

On Mon, 11 Dec 2023, 19:28 Caglar Demir, ***@***.***> wrote: Your branch is not up-to-date. The error report shows that a dependency is missing. Please firstly merge dev into your branch. — Reply to this email directly, view it on GitHub <#325 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK4X4ETICIPCO5NY6MN4GWLYI5GEXAVCNFSM6AAAAABAH6PAKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJQGY2DGOJRGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Jean-KOUAGOU · 2023-12-11T19:12:40Z

Thanks. It runs now. Is the following ok? The generator takes the path to the knowledge base, and other optional parameters, including the output path. If the output path is not specified, it stores the generated data at the location of the input knowledge base. Below an example:

from ontolearn.lp_generator import LPGen
* Owlready2 * Warning: optimized Cython parser module 'owlready2_optimized' is not available, defaulting to slower Python implementation

Warning: SQLite3 version 3.40.0 and 3.41.2 have huge performance regressions; please install version 3.41.1 or 3.42!

kb_
lp_gen = LPGen(kb_path="./KGs/Family/family-benchmark_rich_background.owl")
lp_gen.generate()

*** Embedding triples exist ***


############################################################
Started generating data on the family-benchmark_rich_background knowledge base
############################################################

Number of individuals in the knowledge base: 202 

|Thing refinements|:  4760
Size of sample:  150
Refining roots...: 100%|██████████| 150/150 [00:03<00:00, 42.03it/s]
Filtering process...: 100%|██████████| 71790/71790 [00:12<00:00, 5779.63it/s] 
Concepts generation done!

Number of atomic concepts:  18
Longest concept length:  12 

Total number of concepts:  9332 

Data generation completed
Sample examples and save data...: 100%|██████████| 9332/9332 [00:03<00:00, 2335.82it/s]
Data saved at ./KGs/Family/LPs/

Demirrr · 2023-12-11T19:21:54Z

Thank you. Looks great.
Could you please add two tests for this class?

One fixes the number of concepts and other parameters if any and creates an assertion to check the generated concepts ( list of objects)
Saves concepts into a specified local file and loads it back.

Jean-KOUAGOU · 2023-12-18T09:49:41Z

Sure

Jean-KOUAGOU · 2023-12-18T10:17:33Z

Actually, we cannot fix the number of concepts to be generated. I intentionally did not set it this way. We only specify the following hyperparameters and they determine the number of concepts to be generated--this process is also stochastic unless we fix a random seed--which I did not find necessary for NCES.

kb_path, storage_dir=None, depth=5, max_child_length=25, refinement_expressivity=0.6, downsample_refinements=True, sample_fillers_count=10, num_sub_roots=150, min_num_pos_examples=1

Jean-KOUAGOU · 2023-12-18T10:20:20Z

I can add the test about storing the generated learning problems and loading them. But if we really need a specific number (or at least the maximum number) of learning problems we can enforce this

Demirrr · 2023-12-18T10:59:21Z

--this process is also stochastic unless we fix a random seed--which I did not find necessary for NCES

We have to ensure that the data generating process is not a random process. Perhaps, we can determine the random seed for the data generation proces by introducing random_seed=1 variable

But if we really need a specific number (or at least the maximum number) of learning problems we can enforce this

Yes please do

Jean-KOUAGOU · 2023-12-18T11:18:55Z

But setting max_num_lps to a specific number won't necessarily speed up the generation process. The following are the ones that can reduce the amount of data to generate: depth, refinement_expressivity, sample_fillers_count, num_sub_roots

max_num_lps will only speedup the part concerned with removing redundant concepts. Is this ok?

Jean-KOUAGOU · 2023-12-18T11:21:02Z

If they (depth, refinement_expressivity, sample_fillers_count, num_sub_roots) are set too low, the actual number of generated learning problems will be less than max_num_lps

Demirrr · 2023-12-18T11:46:05Z

But setting max_num_lps to a specific number won't necessarily speed up the generation process

The efficiency in the data generation process is not the concern here.
The main concern is to fix the randomness.
Hence,

random_seed=1 num_concepts=10, 10 concepts should be returned deterministically, i.e., no randomness
random_seed=2 num_concepts=5, 5 concepts should be returned deterministically, i.e., no randomness.

Jean-KOUAGOU · 2023-12-18T13:49:01Z

We can return a given number of concepts but they are not the same ones for different runs despite random.seed at the top level of the LPGen class call

Jean-KOUAGOU · 2023-12-18T13:50:04Z

This is how I implemented the test

import unittest
import json
from ontolearn.lp_generator import LPGen
from ontolearn.utils import setup_logging
setup_logging("ontolearn/logging_test.conf")

PATH_FAMILY = 'KGs/Family/family-benchmark_rich_background.owl'
STORAGE_DIR = 'KGs/Family/new_dir'
last_10_concepts_to_generate = ['∃ hasParent.(Parent ⊓ (∃ hasChild.(¬Granddaughter)))',
 '∃ hasParent.(PersonWithASibling ⊓ (∀ hasParent.(¬Granddaughter)))',
 '∃ hasParent.(PersonWithASibling ⊓ (∃ hasSibling.(¬Sister)))',
 '∃ hasParent.(Granddaughter ⊓ (∃ hasChild.(¬Sister)))',
 '∃ hasParent.(Granddaughter ⊓ (∀ hasChild.(¬Sister)))',
 'Sister ⊔ (∃ hasParent.(Child ⊓ (¬Brother)))',
 '∃ hasParent.(Daughter ⊓ (∃ hasSibling.(¬Sister)))',
 'Sister ⊔ (∃ hasParent.(∃ hasSibling.(¬Grandparent)))',
 '∃ hasParent.(Grandmother ⊓ (∀ hasParent.(¬Granddaughter)))',
 'Sister ⊔ (∃ hasParent.(Child ⊓ (¬Grandchild)))']

class LPGen_Test(unittest.TestCase):
    def test_generate_load(self):
        lp_gen = LPGen(kb_path=PATH_FAMILY, storage_dir=STORAGE_DIR)
        lp_gen.generate()
        print("Loading generated data...")
        with open(f"{STORAGE_DIR}/triples/train.txt") as file:
            triples_data = file.readlines()
            print("Number of triples:", len(triples_data))
        with open(f"{STORAGE_DIR}/LPs.json") as file:
            lps = json.load(file)
            print("Number of learning problems:", len(lps))
        self.assertGreaterEqual(lp_gen.lp_gen.max_num_lps, len(lps))
        self.assertEqual(list(lps.keys())[-10:], last_10_concepts_to_generate)

if __name__ == '__main__':
    unittest.main()

Jean-KOUAGOU · 2023-12-18T13:51:54Z

Everything else passes except

self.assertEqual(list(lps.keys())[-10:], last_10_concepts_to_generate)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ontolearn/nces_data_generator/generate_data.py #325

ontolearn/nces_data_generator/generate_data.py #325

Demirrr commented Dec 5, 2023 •

edited

Jean-KOUAGOU commented Dec 5, 2023 via email

Demirrr commented Dec 5, 2023 •

edited

Jean-KOUAGOU commented Dec 5, 2023 via email

Jean-KOUAGOU commented Dec 5, 2023 via email

Demirrr commented Dec 5, 2023

Jean-KOUAGOU commented Dec 5, 2023 via email

Jean-KOUAGOU commented Dec 6, 2023 •

edited

Demirrr commented Dec 6, 2023

Jean-KOUAGOU commented Dec 11, 2023 •

edited

Jean-KOUAGOU commented Dec 11, 2023

Demirrr commented Dec 11, 2023

Jean-KOUAGOU commented Dec 11, 2023 via email

Jean-KOUAGOU commented Dec 11, 2023 •

edited

Demirrr commented Dec 11, 2023

Jean-KOUAGOU commented Dec 18, 2023

Jean-KOUAGOU commented Dec 18, 2023

Jean-KOUAGOU commented Dec 18, 2023

Demirrr commented Dec 18, 2023 •

edited

Jean-KOUAGOU commented Dec 18, 2023 •

edited

Jean-KOUAGOU commented Dec 18, 2023 •

edited

Demirrr commented Dec 18, 2023 •

edited

Jean-KOUAGOU commented Dec 18, 2023

Jean-KOUAGOU commented Dec 18, 2023 •

edited

Jean-KOUAGOU commented Dec 18, 2023 •

edited

ontolearn/nces_data_generator/generate_data.py #325

ontolearn/nces_data_generator/generate_data.py #325

Comments

Demirrr commented Dec 5, 2023 • edited

Jean-KOUAGOU commented Dec 5, 2023 via email

Demirrr commented Dec 5, 2023 • edited

Jean-KOUAGOU commented Dec 5, 2023 via email

Jean-KOUAGOU commented Dec 5, 2023 via email

Demirrr commented Dec 5, 2023

Jean-KOUAGOU commented Dec 5, 2023 via email

Jean-KOUAGOU commented Dec 6, 2023 • edited

Demirrr commented Dec 6, 2023

Jean-KOUAGOU commented Dec 11, 2023 • edited

Jean-KOUAGOU commented Dec 11, 2023

Demirrr commented Dec 11, 2023

Jean-KOUAGOU commented Dec 11, 2023 via email

Jean-KOUAGOU commented Dec 11, 2023 • edited

Demirrr commented Dec 11, 2023

Jean-KOUAGOU commented Dec 18, 2023

Jean-KOUAGOU commented Dec 18, 2023

Jean-KOUAGOU commented Dec 18, 2023

Demirrr commented Dec 18, 2023 • edited

Jean-KOUAGOU commented Dec 18, 2023 • edited

Jean-KOUAGOU commented Dec 18, 2023 • edited

Demirrr commented Dec 18, 2023 • edited

Jean-KOUAGOU commented Dec 18, 2023

Jean-KOUAGOU commented Dec 18, 2023 • edited

Jean-KOUAGOU commented Dec 18, 2023 • edited

Demirrr commented Dec 5, 2023 •

edited

Demirrr commented Dec 5, 2023 •

edited

Jean-KOUAGOU commented Dec 6, 2023 •

edited

Jean-KOUAGOU commented Dec 11, 2023 •

edited

Jean-KOUAGOU commented Dec 11, 2023 •

edited

Demirrr commented Dec 18, 2023 •

edited

Jean-KOUAGOU commented Dec 18, 2023 •

edited

Jean-KOUAGOU commented Dec 18, 2023 •

edited

Demirrr commented Dec 18, 2023 •

edited

Jean-KOUAGOU commented Dec 18, 2023 •

edited

Jean-KOUAGOU commented Dec 18, 2023 •

edited