Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom documentclass for pgf backend #28167

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

voidstarstar
Copy link

@voidstarstar voidstarstar commented May 4, 2024

PR summary

This allows the user to set a custom documentclass for the LaTeX compiler to use. It is recommended to combine this with pgf.preamble and have both match the LaTeX document that the pgf will appear in. This helps fix consistency issues with fonts, sizes, positions, etc. These bugs were typically caused by differences when rendering the pgf with matplotlib versus the final rendering of the LaTeX document. Fixes #28119 and fixes #28213.

When combined with #26893, many consistency issues should be avoidable.

PR checklist

@voidstarstar
Copy link
Author

The goal is for this to work with any arbitrary documentclass. That means that any package may already be loaded with arbitrary options. I tried to reload the hyperref and geometry packages with forced options, but hopefully someone can let me know if there is a better way to do this and if there are any other packages I missed. It seems like some packages can be loaded multiple times without error, so I left them alone.

@voidstarstar voidstarstar force-pushed the change-documentclass branch 2 times, most recently from db62b70 to c6a80a5 Compare May 4, 2024 10:35
@github-actions github-actions bot added the Documentation: user guide files in galleries/users_explain or doc/users label May 4, 2024
@voidstarstar voidstarstar marked this pull request as ready for review May 4, 2024 16:13
@voidstarstar
Copy link
Author

voidstarstar commented May 5, 2024

Is there any reason why matplotlib inserts some LaTeX commands (e.g. for the graphicx, hyperref, geometry, or pgf packages) before calling _get_preamble()? I think it might be better for the pgf.preamble to come immediately after pgf.documentclass in order to contain all user code together. This would give matplotlib the "last word" in one place for resolving conflicts and the ability to redefine anything the user already defined. Otherwise, I think in theory a user's code could break the code that matplotlib inserts. Further, this would allow us to take a more DRY approach by moving pgf.documentclass to _get_preamble().

In other words I'm suggesting to:

  1. Make _get_preamble() contain mpl["pgf.documentclass"], mpl["pgf.preamble"], and additional commands added by matplotlib (in that order), and
  2. Always call _get_preamble() prior to the commands added by matplotlib in the 3 locations it is called.

If there are no objections, I'll make these changes.

@voidstarstar
Copy link
Author

This PR solves multiple issues. Please let me know if these should be split into multiple PRs:

  1. Fixes [Bug]: LaTeX option clash error when pgf.preamble uses certain packages #28213. Moves packages matplotlib needs to after the custom preamble.
  2. Fixes [ENH]: Allow a custom documentclass when using the pgf backend. #28119.
  3. Removes the now unnecessary workaround introduced by PGF: Consistently set LaTeX document font size #26893. This is removed and replaced by the new documentclass option as mentioned here.
  4. Updates documentation.
  5. Moves custom preamble to immediately follow custom documentclass.
  6. Adds tests.

@voidstarstar voidstarstar marked this pull request as ready for review May 13, 2024 01:11
@voidstarstar
Copy link
Author

The changes I made to the docs were fairly simple, so I don't understand why the build fails. If anyone knows, I would appreciate the insight.

@voidstarstar
Copy link
Author

This PR also has a known regression.

It fixes most custom preambles, but I know of at least one that it will break. The following works before this PR but breaks after:

mpl.rcParams['pgf.preamble'] = r'\usepackage[strings]{underscore}'

Matplotlib uses the underscore package with the strings option. Unlike other commands, this command is very fragile. The documentation says it should be loaded last to avoid conflicts with certain packages. The pgf package happens to be one such package that it conflicts with. Therefore, special care must be taken to load underscore after the other packages. If the above preamble is used, then underscore will be loaded before pgf which will trigger the conflict.

I assume this regression is much more rare than the bug it fixes, so I think this PR should still be considered a net improvement. If this gets merged, then a new issue should probably be opened for this regression. I'm not sure how to solve this (maybe something using \AtBeginDocument?), so if anyone has any ideas, please let me know. The best solution might be to just avoid using the underscore package entirely. This package seems to have been used to fix a problem with column names in pandas, but was a controversial fix at the time (See #20706).

@tacaswell
Copy link
Member

attn @pwuertz

@tacaswell
Copy link
Member

For better or worse, the use of underscore did go in and we can not pull it back out now (without replacing it with something else that does the same thing).

I have concerns about introducing a known regression to add a feature. Given that we only import a small number packages by default, can we detect if the user also used them and either move our invocation around or drop adding it?

@voidstarstar
Copy link
Author

voidstarstar commented May 14, 2024

To be clear, this PR both fixes a bug (#28213) and adds a new feature (#28119). The above regression is caused by the reordering of packages (specifically the pgf package) needed to fix the option clash bugs #28213.

Unfortunately, package ordering conflicts are a huge pain in LaTeX. Since we're taking an arbitrary user preamble, the current version of matplotlib (pre-PR) probably has plenty of them. I only mentioned underscore, but hyperref is probably an even more notorious mess.

This PR does essentially drop adding them if they're already loaded, but that only fixes the option clash bugs. It does not solve the problem where underscore with strings must come after pgf. I think simply reordering tricky packages will often/always cause a tradeoff. For any ordering of the packages and user preamble, I can probably create an "adversarial" user preamble that will break things. For example, the regression mentioned above was caused when I moved \usepackage{pgf}, but if we move it back, then the tradeoff is that this will have an option clash with the following (as of version 3.8.4):

mpl.rcParams['pgf.preamble'] = r'\usepackage[draft]{pgf}'

Solutions

The ideal solution would be a proper automated topological sorting of dependencies that would be transparent to us. There was an effort made to automatically resolve known conflicts, but sadly this has been abandoned. I think LaTeX3 was also supposed to help as well, but for the current LaTeX2e, we're stuck.

The most practical solution I can think of is to leave the responsibility up to the user if they choose to create a preamble. In other words, if the user preamble is non-empty, then matplotlib would not inject any packages that cause conflicts and the user must handle it themselves. Alternatively, yet another rcParam flag could be used to prevent injecting them. Basically, if in the worst case we can't guarantee it won't crash, then we should document the feature as experimental, but also give the user the tools they need to put out the fire they create.

@tacaswell
Copy link
Member

The above regression is caused by the reordering of packages (specifically

I see, bug-for-bug is possibly a better trade


Maybe the right fix here is to move the default preamble into the default rc (so at least the user can see what minimal working preamble is). For any of these changes though, how do effectively warn users during the transition period? Can we check if the preamble has been customized and if so if it contains the needed imports. If not put them in someplace and warn that we will stop doing that in the future. In the future we can either just warn critical things are missing or raise a helpful error message that has instructions of what needs to be added (and a link to an explanation of why we can not just add it our self?)

@voidstarstar
Copy link
Author

Your solution of moving the default preamble to the default value of pgf.preamble sounds good to me. 👍 (My "yet another rcParam flag" proposal would make warning the user a non-issue, but it seems to me like an uglier permanent solution than what you suggested.)

Below are my thoughts on some of the details for transitioning.

Warnings to User

I think we can check if the preamble loaded the necessary packages and options by using \@ifpackagewith.

I'm not quite sure the best way to warn the user. One option could be to run the user preamble without modification except for a subsequent check to see if the packages were loaded by using \@ifpackagewith. If they were not, then we force a crash which can be caught by python. If we detect a crash, we then re-run the LaTeX compiler a 2nd time, but with the "someplace" modification.

Before the future deprecation we would do the following:

  • If we get 0 LaTeX crashes, no warnings are issued.
  • If we get 1 LaTeX crash, we warn the user that we mitigated the issue, but will not in the future.
  • If we get 2 LaTeX crashes, we do exactly what we do now by raising an exception and crashing matplotlib so the user can see the LaTeX log. An additional helpful error message should be added to explain what is necessary.

After the future deprecation, the 2nd re-run of the LaTeX compiler should be removed and we would do the following:

  • If we get 0 LaTeX crashes, no warnings are issued.
  • If we get 1 LaTeX crashes, we do exactly what we do now by raising an exception and crashing matplotlib so the user can see the LaTeX log. An additional helpful error message should be added to explain what is necessary.

Temporary Mitigation

The "someplace" that is chosen will inherently sometimes fail for the reasons we previously discussed.
I think there are 2 options:

  1. We pick a place with a high chance of being reliable. This PR currently attempts this by using \PassOptionsToPackage before the user preamble and \usepackage after. This will (hopefully) cause fewer crashes, but may surprise some users due to regressions.
  2. We use the current (but probably more crash prone) locations. This won't break any current user code, but also won't have the better "bug-for-bug" trade. The goal here is to only warn the user. We're intending to fully give the user this responsibility in the future anyway, so any "bug-for-bug" trade will only be temporary and therefore we should avoid disrupting someone's workflow for little/temporary benefit.

I would probably opt for option 2 and abandon the "bug-for-bug" mitigations in this PR.

@tacaswell
Copy link
Member

I also like option 2 better.

I like the plan of checking if the package is loaded and forcing a crash (I did not know \@ifpackagewith existed until just now). That gets us out of the business of trying to parse latex which seemed like a risk for the intermediate state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: pgf Documentation: user guide files in galleries/users_explain or doc/users topic: rcparams
Projects
None yet
2 participants