Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Closes #2462) generalise ModuleManager #2564

Open
wants to merge 44 commits into
base: master
Choose a base branch
from

Conversation

arporter
Copy link
Member

No description provided.

@arporter arporter self-assigned this Apr 30, 2024
@arporter arporter marked this pull request as draft April 30, 2024 20:46
@arporter
Copy link
Member Author

@hiker commented on this implementation (in #2462):

I had a quick look. As far as I can see, you are basically reading in and storing all source files of all search directories (assuming that the file you are looking for is the last ;) ). That seems to be a huge overhead (e.g. in my driver creation I pass in the whole LFRic source tree, so potentially this would mean all of the LFRic sources would need to be read - over 6800 files). Wouldn't it be better to rely on the coding style to do a first filter, and only read them all if we can't find anything?

I just read the history of this ticket, and my assumption was always that there will be an option to use the filename to determine the module names (potentially verified by then doing a regex on that one file only) - with the idea that more rules to select filenames based on module names could be added (e.g. by a setting in the config file).

Also, the handling of src feels wrong. We read in the source code using a static method - why is that in ModuleInfo, and not the ModuleManager? That somehow doesn't feel right,

I am not sure if this would affect me. I would expect a performance (which could be an issue, though it needs to be measured) and memory usage impact (which probably doesn't matter much). But reading a significant(?) part of LFRic over and over for each file we process feels like a huge waste of resources.

@arporter
Copy link
Member Author

My concern is not LFRic but everything else. However, I take your point about potentially, repeatedly reading 6800 files! I've taken a look at the NEMO source and the rule there appears to be that a file <some_name>.f90 contains module some_name. In WaveWatchIII that rule largely holds but there are exceptions. @LonelyCat124 and @sergisiso, could you comment on the other sources that you've been looking at? As @hiker said earlier in #2462, a good first test would be if module_name in file_name.

The "where to put the file reading" problem is a bit of a chicken-and-egg. ModuleManager has to read a file to confirm that it contains a given module (unless this is strictly enforced as it is in LFRic) and it is only then that it creates a ModuleInfo object for that module. However, currently it is the ModuleInfo that contains the functionality for reading (and parsing, processing) the source file.

@arporter
Copy link
Member Author

Using:

from difflib import SequenceMatcher

score = SequenceMatcher(None, str1, str2).ratio()

gives a floating-point measure of how similar two strings are (1.0 for identical, 0.0 for nothing in common).

@hiker
Copy link
Collaborator

hiker commented May 1, 2024

We could have several rules implemented, e.g. module 'xxx' is in file xxx.[fF]90, xxx_mod.[fF]90, and then only handle a greatly reduced number of files (to then either regex and/or parse them).

And/or additionally, we could provide a list of exceptions in the config file (since this is project specific)? When I tried the kernel extraction on the um physics, I had to rename a file or two, but being able to just add them to a config file would have been great.

I love ❤️ the similarity one! Can we try to test various module names with the basename of the files? We could define a threshold (again, in the config file?) for which files are similar enough.

@arporter
Copy link
Member Author

arporter commented May 1, 2024

I love ❤️ the similarity one! Can we try to test various module names with the basename of the files? We could define a threshold (again, in the config file?) for which files are similar enough.

Yes, I think that might be a good, general-purpose solution that doesn't require too much work. (I need this in a hurry really.)

@LonelyCat124
Copy link
Collaborator

Socrates is ~ 700 files - it seems to be either module_name_mod.f90 or module_name.f90. There are some non-module containing files but I guess this just doesn't deal with those.

@arporter
Copy link
Member Author

arporter commented May 1, 2024

The problem at the moment is that the ModuleManager expects to process all files it encounters as it searches for a particular module (in get_module_info) whereas we really need it to search through all files and identify the most likely candidates before actually processing them. We may need to keep a record of all files we have seen but not processed in order to do this which is quite a big change to how it currently works?

@LonelyCat124
Copy link
Collaborator

Can it (or indeed PSyclone at all) deal with multiple modules inside a single file?

@hiker
Copy link
Collaborator

hiker commented May 1, 2024

The problem at the moment is that the ModuleManager expects to process all files it encounters as it searches for a particular module (in get_module_info) whereas we really need it to search through all files and identify the most likely candidates before actually processing them. We may need to keep a record of all files we have seen but not processed in order to do this which is quite a big change to how it currently works?

Yes. ATM we immediately store a mapping from the (unverified/assumed) module names to ModuleInfo object, which (initially) only stores the path to the file. So, if we can't deduce the mod name (based on coding style), we should store this information to avoid having to back to the file system again.

@hiker
Copy link
Collaborator

hiker commented May 1, 2024

Can it (or indeed PSyclone at all) deal with multiple modules inside a single file?

Somewhat. It can map several module information to the same file. But (from memory), if you then request the source code (or fparser or psyir info), each module info will independently read and parse the source code again. It shouldn't be hard to fix that though, each module info could keep a list of other modules (and then use the module manager to update the other objects. I feel there might be an even better solution ... maybe we can share some state info between these module info objects that come from the same file??

@arporter
Copy link
Member Author

arporter commented May 1, 2024

I feel there might be an even better solution ... maybe we can share some state info between these module info objects that come from the same file??

Should we have an intermediate e.g. FileInfo object that then stores references to related ModuleInfo instances if they've been created for it (or an empty list if the contents of the file are yet to be examined)? These FileInfo objects would then keep track of the files we've found and their full path (+...?). I did wonder whether this should be handled in fparser but decided that fparser should probably just stick to parsing.

@hiker
Copy link
Collaborator

hiker commented May 1, 2024

Should we have an intermediate e.g. FileInfo object that then stores references to related ModuleInfo instances if they've been created for it (or an empty list if the contents of the file are yet to be examined)? These FileInfo objects would then keep track of the files we've found and their full path (+...?). I did wonder whether this should be handled in fparser but decided that fparser should probably just stick to parsing.

That was actually my very first thought, but then I expected you to say that this should be done in the PSyIR - just make FileContaier to store that info (and be populated later on parsing time) :) That was pretty stupid reasoning, so yes, I think that would make sense, and the very clean solution to handle multiple modules in one file.

@arporter
Copy link
Member Author

Ready for review now from either @hiker or @sergisiso . It generalises the ModuleManager so that it no longer assumes a strict mapping between filename and the module that it contains. More significantly, it also changes the interface-resolving mechanism to use the ModuleManager. Integration tests are running.

Copy link
Collaborator

@hiker hiker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will continue later.

src/psyclone/parse/file_info.py Outdated Show resolved Hide resolved
src/psyclone/parse/file_info.py Outdated Show resolved Hide resolved
src/psyclone/parse/file_info.py Outdated Show resolved Hide resolved
src/psyclone/parse/module_info.py Show resolved Hide resolved
src/psyclone/parse/module_manager.py Outdated Show resolved Hide resolved
src/psyclone/psyir/tools/call_tree_utils.py Show resolved Hide resolved
src/psyclone/psyir/tools/call_tree_utils.py Show resolved Hide resolved
src/psyclone/psyir/tools/call_tree_utils.py Show resolved Hide resolved
src/psyclone/psyir/tools/call_tree_utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@hiker hiker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR, it is good to see the module manager becoming more useful.
There are a few minor issues mentioned in the comments.

I also realised that we now have two FileInfo classes (the other one in ./parse/algorithm.py), not exactly ideal :(

Looking at the documentation, I wonder if the new FileInfo object should be mentioned together with the module manager in the dev guide. And that section sits atm in the psy_data.rst, now that the module manager is used elsewhere, maybe it should go in a different section (not sure ... maybe modules.rst)?

@arporter
Copy link
Member Author

Thanks for that Joerg. I've restructured the docs (which made me realise some doc strings were still wrong) by adding a new Module Manager section and referring to it appropriately. Good spot on the second (or first) FileInfo. I've renamed it AlgFileInfo as that seems appropriate and didn't require very much effort :-)

@arporter
Copy link
Member Author

All green again. Ready for another look.

@hiker
Copy link
Collaborator

hiker commented May 31, 2024

The documentation builds fine, all issues were addressed. We have one CI failures, but that appears to be related to a missing/incorrect OpenMPI module:

Lmod has detected the following error: The following module(s) are unknown:
"openmpi/5.0.2"

Additionally, we have one question for @sergisiso.

@arporter , let me know if you are happy to go ahead with a merge anyway (and bring it up to master, there are some conflicts), or if yuo prefer me to wait for feedback and CI to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants