Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't require that @inbounds depends only on local information #54270

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

LilithHafner
Copy link
Member

Delete "from the information locally available" from "Only use @inbounds when it is certain that all accesses are in bounds."

This reflects the fact that the compiler models the possibility of undefined behavior and therefore does not constant propagate functions with @inbounds (unless it can prove they are safe).

If one semantically executes and out of bounds @inbounds, that's UB. If it's not semantically executed, it's not UB.

Related: #54099. cc @Keno
Documents that @stevengj is right here

@LilithHafner LilithHafner added the domain:docs This change adds or pertains to documentation label Apr 26, 2024
@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 26, 2024

The question seems to be whether the compiler is permitted to speculatively construct and object (ignoring constructor invariants with abstract interpretation) and then use that object to speculate other queries (such as that getindex). I don't think anything currently disallows that, though @oscardssmith was previously examining that question in the context of SubString constructor/methods also.

@Keno
Copy link
Member

Keno commented Apr 26, 2024

Yes, there's different requirements that apply to @assume_effects :ub vs @inbounds. @inbounds only asserts that the inbounds access does not happen, so the user is allowed to place their own contracts on things (in this case, if you construct a Triangular while bypassing the inner constructor, or mutate something afterwards in a way that invalidates the assertions and then call this method that's UB). @assume_effects is stronger and is required for all arguments, no matter how constructed.

As @vtjnash mentioned, there are some plans to annotate types to allow inner constructor constraints to be valid for @assume_effects purposes, but this issue is a bit separate.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 26, 2024

if you construct a Triangular while bypassing the inner constructor, ... and then call this method that's UB

Isn't that what inference does? Or are you saying that is why inference poisons the UB flag on the method containing the inbounds so that it shouldn't attempt that second call?

@Keno
Copy link
Member

Keno commented Apr 26, 2024

Or are you saying that is why inference poisons the UB flag on the method containing the inbounds so that it shouldn't attempt that second call?

Yes

@Keno
Copy link
Member

Keno commented Apr 26, 2024

I think the most coherent thing we can say is that:

  1. @inbounds needs to be valid for all code paths reachable from "public api" (where the latter is somewhat deliberately vague)
  2. @assume_effects needs to be valid for all code path reachable from any call to the annotated function (for any element of the argument types).

So this PR should reflect 1) appropriately for inbounds.

`eachindex(A)` in a function like the one above is _not_ safely inbounds because
the first index of `A` may not be `1` for all user defined types that subtype
`AbstractArray`.
Only use `@inbounds` when it is certain that all accesses are in bounds. In
Copy link
Member

@stevengj stevengj Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Only use `@inbounds` when it is certain that all accesses are in bounds. In
Only use `@inbounds` when you are certain that all accesses are in bounds (as undefined behavior, e.g. crashes, might occur if this assertion is violated). In

Maybe just change it to "when you are certain". Basically, it's up to the code writer to decide what level of certainty they are comfortable with (e.g. "I'm certain as long as users don't break my API abstractions", or even "I'm certain because this is throwaway code and I'm only passing 1-based arrays today"). I think that subsumes @Keno's comment without opening a can of worms with phrasing about "public API".

LilithHafner and others added 2 commits April 27, 2024 14:28
Co-authored-by: Steven G. Johnson <stevenj@alum.mit.edu>
base/essentials.jl Outdated Show resolved Hide resolved
LilithHafner and others added 2 commits April 27, 2024 20:46
Co-authored-by: Steven G. Johnson <stevenj@alum.mit.edu>
@LilithHafner
Copy link
Member Author

It seems form the comments above that this PR gets the semantics right, so I'm going to go ahead and merge it unless folks object.

`eachindex(A)` in a function like the one above is _not_ safely inbounds because
the first index of `A` may not be `1` for all user defined types that subtype
`AbstractArray`.
Only use `@inbounds` when you are certain that all accesses are in bounds (as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are

The original "it is" achieves better clarity. Use of "you" could be interpreted as "am I certain? yes, then let's try my luck" -- but one can be wrong.

Instead "it is" implies that a truth exists outside of oneself, and, furthermore, that due to fallibility, one should provide a proof. Whether one actually provides a proof for each @inbounds is unlikely, but if one gets in the habit of thinking that way, the probability of flippant and unsound @inbounds should greatly decrease.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @stevengj's #54270 (comment). The crux is that there is no external truth outside of oneself. It's perfectly valid to try your luck. If you're wrong, you might get hard to trace problems, which this docstring indicates clearly.

I, for one, do use @inbounds "flippantly" and unsoundly at many points during early development, and sometimes provide proofs for usage in registered packages (e.g. https://github.com/LilithHafner/AliasTables.jl/blob/ca6eacd2c222e7e797ec446b5cf6343a733f756e/src/AliasTables.jl#L354). That whole range is allowed by the language semantic. It is the place of a style guide or code quality standard to require or encourage correctness proofs, not the place of this language.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, it's true: the nature of @inbounds implies acceptance of potential UB. I'm reading too much into "it is" / "you are", anyway.

It is the place of a style guide or code quality standard to require or encourage correctness proofs, not the place of this language.

Requirement, ok. However, I fail to see the harm caused by active discouragement of bad practices in any programming language manual.

The crux is that there is no external truth outside of oneself.

In the metaphysical sense? Perhaps. However, in a machine, the truth value of inbounds? can be determined, albeit, sometimes with a trawling of a great number of states, other people's code, etc.


To complement the tangent: discrete sampling based on tables necessitates @inbounds for performance. This comes from direct experience; incidentally, you may be interested in reading Marsaglia's article "Fast generation of discrete random variables."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, in a machine, the truth value of inbounds? can be determined, albeit, sometimes with a trawling of a great number of states, other people's code, etc.

Julia has an open world (i.e. you can always add more methods later) so it is impossible to make a universal truth determination without also trawling the future or having some alternative formalization of the language that prohibits piracy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without getting bogged down in metaphysical discussions, I think having a note/comment explaining why the @inbounds is safe next to the use is a good idea, because it provides a future reader with the same context the original author had when determining that the use was safe. As such, I also think the it is formulation is better, because it prompts a user of @inbounds to think about why their use is safe and to state that clearly to a future reader.

Rust has a similar culture around its uses of unsafe, where people put comments next to uses of unsafe to justify & document it for the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:docs This change adds or pertains to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants