You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear developers and PHP wizards, I recently ran against an issue I hope you can help me with.
PHP: 8.0.30 / MediaWiki: 1.39.2 / SMW: 4.1.1
The situation
As part of an attempt to survive Parsoid in the future, I wrote a parser function that allows for the expansion of multiple wiki templates. One of the things it can do is that it reads the raw source content from the page, isolates and extracts the 'multiple-instance' templates that it requires, transfers data to a different wiki template and expands instances of that template. For lack of established terms, I'll just call them the input template (the source) and output template (the template used and expanded through the parser function).
The problem
While it works, there are for two issues I ran into:
1. Subobjects are always one revision behind
There is no issue with the HTML that is being output so template expansion is working as far as that goes. But if the output template contains a call to Semantic MediaWiki's #subobject, there is always a lag in updating semantic data. Basically, it is always one revision behind: after a change is made and the page is saved, the HTML gets updated but the semantic data represent a previous revision of the page. A new page purge or null edit is required to get the semantic data to represent the latest page content.
(The same issue would probably occur for #set. It does NOT occur if templates are called in the regular way.)
2. The job queue for refreshLinks hangs
More seriously, it seems like this way of calling #subobject severely messes up job queue execution - more specifically, the refreshLinks group of jobs that is triggered because templates are changed to use the new parser function and pages using those templates need to be updated. At some point after runJobs is initiated, one of the jobs gets stuck and leads to an awful spike in resource usage - much to the displeasure of the provider, who decided to shut down the site (!).
I'm running tests on a different server where I can reproduce the problem. There is no Exception or Error thrown. The sore spot where things are consistently hanging is in MediaWiki's RenderedRevision::getRevisionParserOutput(), where it uses call_user_func() ) - not that every job necessarily hangs if arriving at this point, but if/when it does, that's where we find our bottleneck.
Also separately reported here because the limit options set for runJobs.php don't guard against scripts becoming unresponsive.
But why?
For now, I can only do some guesswork. The semantic data belonging to subobjects are stored in 'associates' of the ParserCache. A page edit means a new revision with a new timestamp, which should trigger a series of events: (a) invalidate the cache, (b) create a new cache entry and (c) replace the old entry. The process is working fine for html cache updates and the jobs running them, but for some reason, the sequence does not finalise for the relevant SMW data until yet another revision is saved.
It is possible that some of the steps that SMW takes to act in time for the ParserOutput, maybe via the InternalParseBeforeLinks or LinksUpdateComplete hooks, are now bypassed precisely because template expansion is left to a parser function.
The job queue failures appear to be related but it is unclear to me exactly how they tie in with the above. Again outdated metadata may be polluting the parser cache, but how?
Questions
I'll add more diagnostics as soon as I get round to it. For now:
Any gotchas I should be aware of?
Are there are measures that can be taken or tried ? Ways to intervene by way of hooks?
The text was updated successfully, but these errors were encountered:
Dear developers and PHP wizards, I recently ran against an issue I hope you can help me with.
PHP: 8.0.30 / MediaWiki: 1.39.2 / SMW: 4.1.1
The situation
As part of an attempt to survive Parsoid in the future, I wrote a parser function that allows for the expansion of multiple wiki templates. One of the things it can do is that it reads the raw source content from the page, isolates and extracts the 'multiple-instance' templates that it requires, transfers data to a different wiki template and expands instances of that template. For lack of established terms, I'll just call them the input template (the source) and output template (the template used and expanded through the parser function).
The problem
While it works, there are for two issues I ran into:
1. Subobjects are always one revision behind
There is no issue with the HTML that is being output so template expansion is working as far as that goes. But if the output template contains a call to Semantic MediaWiki's #subobject, there is always a lag in updating semantic data. Basically, it is always one revision behind: after a change is made and the page is saved, the HTML gets updated but the semantic data represent a previous revision of the page. A new page purge or null edit is required to get the semantic data to represent the latest page content.
(The same issue would probably occur for #set. It does NOT occur if templates are called in the regular way.)
2. The job queue for refreshLinks hangs
More seriously, it seems like this way of calling #subobject severely messes up job queue execution - more specifically, the refreshLinks group of jobs that is triggered because templates are changed to use the new parser function and pages using those templates need to be updated. At some point after runJobs is initiated, one of the jobs gets stuck and leads to an awful spike in resource usage - much to the displeasure of the provider, who decided to shut down the site (!).
I'm running tests on a different server where I can reproduce the problem. There is no Exception or Error thrown. The sore spot where things are consistently hanging is in MediaWiki's
RenderedRevision::getRevisionParserOutput()
, where it uses call_user_func() ) - not that every job necessarily hangs if arriving at this point, but if/when it does, that's where we find our bottleneck.Also separately reported here because the limit options set for runJobs.php don't guard against scripts becoming unresponsive.
But why?
For now, I can only do some guesswork. The semantic data belonging to subobjects are stored in 'associates' of the ParserCache. A page edit means a new revision with a new timestamp, which should trigger a series of events: (a) invalidate the cache, (b) create a new cache entry and (c) replace the old entry. The process is working fine for html cache updates and the jobs running them, but for some reason, the sequence does not finalise for the relevant SMW data until yet another revision is saved.
(As an aside, it is reminiscient of a relatively recent problem with the ParserOutputAccess class: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/972019 - "Fix local cache when page is edited within the process")
It is possible that some of the steps that SMW takes to act in time for the ParserOutput, maybe via the InternalParseBeforeLinks or LinksUpdateComplete hooks, are now bypassed precisely because template expansion is left to a parser function.
The job queue failures appear to be related but it is unclear to me exactly how they tie in with the above. Again outdated metadata may be polluting the parser cache, but how?
Questions
I'll add more diagnostics as soon as I get round to it. For now:
The text was updated successfully, but these errors were encountered: