(Closes #2462) generalise ModuleManager #2564

arporter · 2024-04-30T20:46:32Z

No description provided.

arporter · 2024-04-30T21:06:52Z

@hiker commented on this implementation (in #2462):

I had a quick look. As far as I can see, you are basically reading in and storing all source files of all search directories (assuming that the file you are looking for is the last ;) ). That seems to be a huge overhead (e.g. in my driver creation I pass in the whole LFRic source tree, so potentially this would mean all of the LFRic sources would need to be read - over 6800 files). Wouldn't it be better to rely on the coding style to do a first filter, and only read them all if we can't find anything?

I just read the history of this ticket, and my assumption was always that there will be an option to use the filename to determine the module names (potentially verified by then doing a regex on that one file only) - with the idea that more rules to select filenames based on module names could be added (e.g. by a setting in the config file).

Also, the handling of src feels wrong. We read in the source code using a static method - why is that in ModuleInfo, and not the ModuleManager? That somehow doesn't feel right,

I am not sure if this would affect me. I would expect a performance (which could be an issue, though it needs to be measured) and memory usage impact (which probably doesn't matter much). But reading a significant(?) part of LFRic over and over for each file we process feels like a huge waste of resources.

arporter · 2024-04-30T21:22:35Z

My concern is not LFRic but everything else. However, I take your point about potentially, repeatedly reading 6800 files! I've taken a look at the NEMO source and the rule there appears to be that a file <some_name>.f90 contains module some_name. In WaveWatchIII that rule largely holds but there are exceptions. @LonelyCat124 and @sergisiso, could you comment on the other sources that you've been looking at? As @hiker said earlier in #2462, a good first test would be if module_name in file_name.

The "where to put the file reading" problem is a bit of a chicken-and-egg. ModuleManager has to read a file to confirm that it contains a given module (unless this is strictly enforced as it is in LFRic) and it is only then that it creates a ModuleInfo object for that module. However, currently it is the ModuleInfo that contains the functionality for reading (and parsing, processing) the source file.

arporter · 2024-04-30T21:31:29Z

Using:

from difflib import SequenceMatcher

score = SequenceMatcher(None, str1, str2).ratio()

gives a floating-point measure of how similar two strings are (1.0 for identical, 0.0 for nothing in common).

hiker · 2024-05-01T12:30:06Z

We could have several rules implemented, e.g. module 'xxx' is in file xxx.[fF]90, xxx_mod.[fF]90, and then only handle a greatly reduced number of files (to then either regex and/or parse them).

And/or additionally, we could provide a list of exceptions in the config file (since this is project specific)? When I tried the kernel extraction on the um physics, I had to rename a file or two, but being able to just add them to a config file would have been great.

I love ❤️ the similarity one! Can we try to test various module names with the basename of the files? We could define a threshold (again, in the config file?) for which files are similar enough.

arporter · 2024-05-01T12:42:05Z

I love ❤️ the similarity one! Can we try to test various module names with the basename of the files? We could define a threshold (again, in the config file?) for which files are similar enough.

Yes, I think that might be a good, general-purpose solution that doesn't require too much work. (I need this in a hurry really.)

LonelyCat124 · 2024-05-01T13:27:57Z

Socrates is ~ 700 files - it seems to be either module_name_mod.f90 or module_name.f90. There are some non-module containing files but I guess this just doesn't deal with those.

arporter · 2024-05-01T14:16:27Z

The problem at the moment is that the ModuleManager expects to process all files it encounters as it searches for a particular module (in get_module_info) whereas we really need it to search through all files and identify the most likely candidates before actually processing them. We may need to keep a record of all files we have seen but not processed in order to do this which is quite a big change to how it currently works?

LonelyCat124 · 2024-05-01T14:21:02Z

Can it (or indeed PSyclone at all) deal with multiple modules inside a single file?

hiker · 2024-05-01T14:24:00Z

The problem at the moment is that the ModuleManager expects to process all files it encounters as it searches for a particular module (in get_module_info) whereas we really need it to search through all files and identify the most likely candidates before actually processing them. We may need to keep a record of all files we have seen but not processed in order to do this which is quite a big change to how it currently works?

Yes. ATM we immediately store a mapping from the (unverified/assumed) module names to ModuleInfo object, which (initially) only stores the path to the file. So, if we can't deduce the mod name (based on coding style), we should store this information to avoid having to back to the file system again.

hiker · 2024-05-01T14:28:34Z

Can it (or indeed PSyclone at all) deal with multiple modules inside a single file?

Somewhat. It can map several module information to the same file. But (from memory), if you then request the source code (or fparser or psyir info), each module info will independently read and parse the source code again. It shouldn't be hard to fix that though, each module info could keep a list of other modules (and then use the module manager to update the other objects. I feel there might be an even better solution ... maybe we can share some state info between these module info objects that come from the same file??

arporter · 2024-05-01T21:14:54Z

I feel there might be an even better solution ... maybe we can share some state info between these module info objects that come from the same file??

Should we have an intermediate e.g. FileInfo object that then stores references to related ModuleInfo instances if they've been created for it (or an empty list if the contents of the file are yet to be examined)? These FileInfo objects would then keep track of the files we've found and their full path (+...?). I did wonder whether this should be handled in fparser but decided that fparser should probably just stick to parsing.

hiker · 2024-05-01T22:52:52Z

Should we have an intermediate e.g. FileInfo object that then stores references to related ModuleInfo instances if they've been created for it (or an empty list if the contents of the file are yet to be examined)? These FileInfo objects would then keep track of the files we've found and their full path (+...?). I did wonder whether this should be handled in fparser but decided that fparser should probably just stick to parsing.

That was actually my very first thought, but then I expected you to say that this should be done in the PSyIR - just make FileContaier to store that info (and be populated later on parsing time) :) That was pretty stupid reasoning, so yes, I think that would make sense, and the very clean solution to handle multiple modules in one file.

…tests [skip ci]

…kip ci]

arporter · 2024-05-24T15:27:43Z

Ready for review now from either @hiker or @sergisiso . It generalises the ModuleManager so that it no longer assumes a strict mapping between filename and the module that it contains. More significantly, it also changes the interface-resolving mechanism to use the ModuleManager. Integration tests are running.

hiker

Will continue later.

src/psyclone/parse/file_info.py

src/psyclone/parse/module_info.py

src/psyclone/parse/module_manager.py

src/psyclone/psyir/tools/call_tree_utils.py

src/psyclone/tests/domain/gocean/transformations/gocean1p0_transformations_test.py

src/psyclone/tests/parse/module_info_test.py

src/psyclone/tests/parse/module_manager_test.py

src/psyclone/tests/psyir/nodes/container_test.py

src/psyclone/tests/psyir/symbols/symbol_table_test.py

hiker

Nice PR, it is good to see the module manager becoming more useful.
There are a few minor issues mentioned in the comments.

I also realised that we now have two FileInfo classes (the other one in ./parse/algorithm.py), not exactly ideal :(

Looking at the documentation, I wonder if the new FileInfo object should be mentioned together with the module manager in the dev guide. And that section sits atm in the psy_data.rst, now that the module manager is used elsewhere, maybe it should go in a different section (not sure ... maybe modules.rst)?

arporter · 2024-05-28T13:44:29Z

Thanks for that Joerg. I've restructured the docs (which made me realise some doc strings were still wrong) by adding a new Module Manager section and referring to it appropriately. Good spot on the second (or first) FileInfo. I've renamed it AlgFileInfo as that seems appropriate and didn't require very much effort :-)

arporter · 2024-05-28T15:48:50Z

All green again. Ready for another look.

doc/developer_guide/module_manager.rst

src/psyclone/parse/module_manager.py

src/psyclone/tests/parse/module_info_test.py

hiker · 2024-05-31T02:42:07Z

The documentation builds fine, all issues were addressed. We have one CI failures, but that appears to be related to a missing/incorrect OpenMPI module:

Lmod has detected the following error: The following module(s) are unknown:
"openmpi/5.0.2"

Additionally, we have one question for @sergisiso.

@arporter , let me know if you are happy to go ahead with a merge anyway (and bring it up to master, there are some conflicts), or if yuo prefer me to wait for feedback and CI to be fixed.

arporter added 2 commits April 30, 2024 21:42

For #2462. Bring changes from 924 branch over.

dea312e

#2462 bring ModuleManager and ModuleInfo over from 924 work

40cdd4e

arporter self-assigned this Apr 30, 2024

arporter marked this pull request as draft April 30, 2024 20:46

arporter assigned hiker May 1, 2024

#2462 add import of SequenceMatcher [skip ci]

a2b2583

arporter added enhancement in progress labels May 1, 2024

arporter added 10 commits May 3, 2024 12:05

#2462 begin adding FileInfo class

53d7db6

Merge branch 'master' into 2462_generalise_mod_manager

ff7701b

#2564 WIP extending ModuleManager to use FileInfo [skip ci]

afef759

#2462 extend searching to check cached files and fix all mod manager …

10730ad

…tests [skip ci]

Merge branch 'master' into 2462_generalise_mod_manager

295be09

#2462 WIP fixing module_info tests [skip ci]

7f6593e

#2462 improve ModuleInfo test to use Container.get_routine_psyir() [s…

88911e4

…kip ci]

#2564 move resolve_routine from ModuleInfo to Container.

234f1c0

#2462 move resolve_routine into Container and update tests

e924f62

#2462 WIP fixing tests [skip ci]

4fd37c2

arporter added the ready for review label May 24, 2024

arporter requested review from hiker and sergisiso May 24, 2024 15:27

hiker added under review and removed ready for review labels May 27, 2024

hiker reviewed May 27, 2024

View reviewed changes

hiker requested changes May 27, 2024

View reviewed changes

hiker added reviewed with actions and removed under review labels May 27, 2024

arporter added 7 commits May 28, 2024 08:33

#2462 rm unnecessary error handler

a1644e5

#2462 rename parse.algorithm.FileInfo to AlgFileInfo

a3aa3b7

#2462 tidying for review

8feabe3

#2462 WIP restructuring dev guide [skip ci]

feb2f3d

Merge branch 'master' into 2462_generalise_mod_manager

36bcd62

#2462 post-merge tidying

2c122c5

#2462 update documentation

696de46

#2462 fix api-naming errors following merge with master

c511afe

arporter added ready for review and removed reviewed with actions labels May 28, 2024

hiker added under review and removed ready for review labels May 31, 2024

hiker temporarily deployed to integration May 31, 2024 02:13 — with GitHub Actions Inactive

hiker reviewed May 31, 2024

View reviewed changes

doc/developer_guide/module_manager.rst Show resolved Hide resolved

src/psyclone/parse/module_manager.py Show resolved Hide resolved

src/psyclone/tests/parse/module_info_test.py Show resolved Hide resolved

hiker added reviewed with actions and removed under review labels May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Closes #2462) generalise ModuleManager #2564

(Closes #2462) generalise ModuleManager #2564

arporter commented Apr 30, 2024

arporter commented Apr 30, 2024

arporter commented Apr 30, 2024

arporter commented Apr 30, 2024

hiker commented May 1, 2024

arporter commented May 1, 2024

LonelyCat124 commented May 1, 2024

arporter commented May 1, 2024 •

edited

LonelyCat124 commented May 1, 2024

hiker commented May 1, 2024

hiker commented May 1, 2024

arporter commented May 1, 2024

hiker commented May 1, 2024

arporter commented May 24, 2024

hiker left a comment

hiker left a comment

arporter commented May 28, 2024

arporter commented May 28, 2024

hiker commented May 31, 2024

(Closes #2462) generalise ModuleManager #2564

Are you sure you want to change the base?

(Closes #2462) generalise ModuleManager #2564

Conversation

arporter commented Apr 30, 2024

arporter commented Apr 30, 2024

arporter commented Apr 30, 2024

arporter commented Apr 30, 2024

hiker commented May 1, 2024

arporter commented May 1, 2024

LonelyCat124 commented May 1, 2024

arporter commented May 1, 2024 • edited

LonelyCat124 commented May 1, 2024

hiker commented May 1, 2024

hiker commented May 1, 2024

arporter commented May 1, 2024

hiker commented May 1, 2024

arporter commented May 24, 2024

hiker left a comment

Choose a reason for hiding this comment

hiker left a comment

Choose a reason for hiding this comment

arporter commented May 28, 2024

arporter commented May 28, 2024

hiker commented May 31, 2024

arporter commented May 1, 2024 •

edited