Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about issues with subobjects when template expansion is left to a parser function #5584

Open
D-Groenewegen opened this issue Jan 25, 2024 · 0 comments
Labels

Comments

@D-Groenewegen
Copy link
Contributor

D-Groenewegen commented Jan 25, 2024

Dear developers and PHP wizards, I recently ran against an issue I hope you can help me with.

PHP: 8.0.30 / MediaWiki: 1.39.2 / SMW: 4.1.1

The situation

As part of an attempt to survive Parsoid in the future, I wrote a parser function that allows for the expansion of multiple wiki templates. One of the things it can do is that it reads the raw source content from the page, isolates and extracts the 'multiple-instance' templates that it requires, transfers data to a different wiki template and expands instances of that template. For lack of established terms, I'll just call them the input template (the source) and output template (the template used and expanded through the parser function).

The problem

While it works, there are for two issues I ran into:

1. Subobjects are always one revision behind

There is no issue with the HTML that is being output so template expansion is working as far as that goes. But if the output template contains a call to Semantic MediaWiki's #subobject, there is always a lag in updating semantic data. Basically, it is always one revision behind: after a change is made and the page is saved, the HTML gets updated but the semantic data represent a previous revision of the page. A new page purge or null edit is required to get the semantic data to represent the latest page content.

(The same issue would probably occur for #set. It does NOT occur if templates are called in the regular way.)

2. The job queue for refreshLinks hangs

More seriously, it seems like this way of calling #subobject severely messes up job queue execution - more specifically, the refreshLinks group of jobs that is triggered because templates are changed to use the new parser function and pages using those templates need to be updated. At some point after runJobs is initiated, one of the jobs gets stuck and leads to an awful spike in resource usage - much to the displeasure of the provider, who decided to shut down the site (!).

I'm running tests on a different server where I can reproduce the problem. There is no Exception or Error thrown. The sore spot where things are consistently hanging is in MediaWiki's RenderedRevision::getRevisionParserOutput(), where it uses call_user_func() ) - not that every job necessarily hangs if arriving at this point, but if/when it does, that's where we find our bottleneck.

Also separately reported here because the limit options set for runJobs.php don't guard against scripts becoming unresponsive.

But why?

For now, I can only do some guesswork. The semantic data belonging to subobjects are stored in 'associates' of the ParserCache. A page edit means a new revision with a new timestamp, which should trigger a series of events: (a) invalidate the cache, (b) create a new cache entry and (c) replace the old entry. The process is working fine for html cache updates and the jobs running them, but for some reason, the sequence does not finalise for the relevant SMW data until yet another revision is saved.

(As an aside, it is reminiscient of a relatively recent problem with the ParserOutputAccess class: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/972019 - "Fix local cache when page is edited within the process")

It is possible that some of the steps that SMW takes to act in time for the ParserOutput, maybe via the InternalParseBeforeLinks or LinksUpdateComplete hooks, are now bypassed precisely because template expansion is left to a parser function.

The job queue failures appear to be related but it is unclear to me exactly how they tie in with the above. Again outdated metadata may be polluting the parser cache, but how?

Questions

I'll add more diagnostics as soon as I get round to it. For now:

  • Any gotchas I should be aware of?
  • Are there are measures that can be taken or tried ? Ways to intervene by way of hooks?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant