-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Closes #2462) generalise ModuleManager #2564
base: master
Are you sure you want to change the base?
Conversation
@hiker commented on this implementation (in #2462): I had a quick look. As far as I can see, you are basically reading in and storing all source files of all search directories (assuming that the file you are looking for is the last ;) ). That seems to be a huge overhead (e.g. in my driver creation I pass in the whole LFRic source tree, so potentially this would mean all of the LFRic sources would need to be read - over 6800 files). Wouldn't it be better to rely on the coding style to do a first filter, and only read them all if we can't find anything? I just read the history of this ticket, and my assumption was always that there will be an option to use the filename to determine the module names (potentially verified by then doing a regex on that one file only) - with the idea that more rules to select filenames based on module names could be added (e.g. by a setting in the config file). Also, the handling of src feels wrong. We read in the source code using a static method - why is that in ModuleInfo, and not the ModuleManager? That somehow doesn't feel right, I am not sure if this would affect me. I would expect a performance (which could be an issue, though it needs to be measured) and memory usage impact (which probably doesn't matter much). But reading a significant(?) part of LFRic over and over for each file we process feels like a huge waste of resources. |
My concern is not LFRic but everything else. However, I take your point about potentially, repeatedly reading 6800 files! I've taken a look at the NEMO source and the rule there appears to be that a file <some_name>.f90 contains The "where to put the file reading" problem is a bit of a chicken-and-egg. |
Using:
gives a floating-point measure of how similar two strings are (1.0 for identical, 0.0 for nothing in common). |
We could have several rules implemented, e.g. module 'xxx' is in file xxx.[fF]90, xxx_mod.[fF]90, and then only handle a greatly reduced number of files (to then either regex and/or parse them). And/or additionally, we could provide a list of exceptions in the config file (since this is project specific)? When I tried the kernel extraction on the um physics, I had to rename a file or two, but being able to just add them to a config file would have been great. I love ❤️ the similarity one! Can we try to test various module names with the basename of the files? We could define a threshold (again, in the config file?) for which files are similar enough. |
Yes, I think that might be a good, general-purpose solution that doesn't require too much work. (I need this in a hurry really.) |
Socrates is ~ 700 files - it seems to be either module_name_mod.f90 or module_name.f90. There are some non-module containing files but I guess this just doesn't deal with those. |
The problem at the moment is that the ModuleManager expects to process all files it encounters as it searches for a particular module (in |
Can it (or indeed PSyclone at all) deal with multiple modules inside a single file? |
Yes. ATM we immediately store a mapping from the (unverified/assumed) module names to ModuleInfo object, which (initially) only stores the path to the file. So, if we can't deduce the mod name (based on coding style), we should store this information to avoid having to back to the file system again. |
Somewhat. It can map several module information to the same file. But (from memory), if you then request the source code (or fparser or psyir info), each module info will independently read and parse the source code again. It shouldn't be hard to fix that though, each module info could keep a list of other modules (and then use the module manager to update the other objects. I feel there might be an even better solution ... maybe we can share some state info between these module info objects that come from the same file?? |
Should we have an intermediate e.g. |
That was actually my very first thought, but then I expected you to say that this should be done in the PSyIR - just make FileContaier to store that info (and be populated later on parsing time) :) That was pretty stupid reasoning, so yes, I think that would make sense, and the very clean solution to handle multiple modules in one file. |
…tests [skip ci]
Ready for review now from either @hiker or @sergisiso . It generalises the ModuleManager so that it no longer assumes a strict mapping between filename and the module that it contains. More significantly, it also changes the interface-resolving mechanism to use the ModuleManager. Integration tests are running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will continue later.
src/psyclone/tests/domain/gocean/transformations/gocean1p0_transformations_test.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR, it is good to see the module manager becoming more useful.
There are a few minor issues mentioned in the comments.
I also realised that we now have two FileInfo classes (the other one in ./parse/algorithm.py), not exactly ideal :(
Looking at the documentation, I wonder if the new FileInfo object should be mentioned together with the module manager in the dev guide. And that section sits atm in the psy_data.rst, now that the module manager is used elsewhere, maybe it should go in a different section (not sure ... maybe modules.rst)?
Thanks for that Joerg. I've restructured the docs (which made me realise some doc strings were still wrong) by adding a new |
All green again. Ready for another look. |
The documentation builds fine, all issues were addressed. We have one CI failures, but that appears to be related to a missing/incorrect OpenMPI module:
Additionally, we have one question for @sergisiso. @arporter , let me know if you are happy to go ahead with a merge anyway (and bring it up to master, there are some conflicts), or if yuo prefer me to wait for feedback and CI to be fixed. |
No description provided.