Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlay methods disabling IR interpreter breaks const-prop of GPU-incompatible code #384

Closed
Tuebel opened this issue Jan 11, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@Tuebel
Copy link

Tuebel commented Jan 11, 2023

Describe the bug

When comparing irrational numbers, the kernel compilation throws an InvalidIRError.
This does not happen in Julia 1.8, only in Julia 1.9.

To reproduce

The Minimal Working Example (MWE) for this bug:

julia> A = 4 * CUDA.rand(3,3)
3×3 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 2.82133   2.05078   3.8729
 3.14324   0.218199  3.05251
 0.227112  3.68884   3.91693

julia> A .< π
ERROR: InvalidIRError: compiling kernel #broadcast_kernel#17(CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to mpfr_signbit)

However, explicit conversion works:

julia> A .< Float32.(CUDA.fill(π,3,3))
3×3 CuArray{Bool, 2, CUDA.Mem.DeviceBuffer}:
 0  1  1
 1  1  0
 1  0  0

I am using a temp environment where only CUDA is installed:

(jl_bFolVV) pkg> status
Status `/tmp/jl_bFolVV/Project.toml`
  [052768ef] CUDA v3.12.1

Expected behavior

I get the expected result without errors in Julia 1.8:

julia> A = 4 * CUDA.rand(3,3)
3×3 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.459628  1.38646  3.23488
 3.10404   1.11073  3.11335
 1.95598   3.8854   2.79495

julia> A .< π
3×3 CuArray{Bool, 2, CUDA.Mem.DeviceBuffer}:
 1  1  0
 1  1  1
 1  0  1

Version info

Details on Julia:

julia> versioninfo()
Julia Version 1.9.0-beta2
Commit 7daffeecb8c (2022-12-29 07:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × AMD Ryzen 9 3900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
  Threads: 1 on 24 virtual cores
Environment:
  LD_LIBRARY_PATH = /opt/ros/noetic/lib

Details on CUDA:

julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 515.86.1, for CUDA 11.7
CUDA driver 11.7

Libraries: 
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+515.86.1
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.9.0-beta2
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 3080 (sm_86, 1.470 GiB / 10.000 GiB available)

Additional context
Full Stacktrace:

  [1] signbit
    @ ./mpfr.jl:811
  [2] _cpynansgn
    @ ./mpfr.jl:338
  [3] Float32
    @ ./mpfr.jl:344
  [4] Float32
    @ ./mpfr.jl:346
  [5] JuliaGPU/CUDA.jl#888
    @ ./irrationals.jl:70
  [6] #setprecision#25
    @ ./mpfr.jl:964
  [7] setprecision
    @ ./mpfr.jl:960
  [8] Type
    @ ./irrationals.jl:69
  [9] <
    @ ./irrationals.jl:96
 [10] _broadcast_getindex_evalf
    @ ./broadcast.jl:683
 [11] _broadcast_getindex
    @ ./broadcast.jl:656
 [12] getindex
    @ ./broadcast.jl:610
 [13] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to ijl_rethrow)
Stacktrace:
 [1] rethrow
   @ ./error.jl:61
 [2] #setprecision#25
   @ ./mpfr.jl:966
 [3] setprecision
   @ ./mpfr.jl:960
 [4] Type
   @ ./irrationals.jl:69
 [5] <
   @ ./irrationals.jl:96
 [6] _broadcast_getindex_evalf
   @ ./broadcast.jl:683
 [7] _broadcast_getindex
   @ ./broadcast.jl:656
 [8] getindex
   @ ./broadcast.jl:610
 [9] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call to an unknown function (call to ijl_excstack_state)
Stacktrace:
 [1] #setprecision#25
   @ ./mpfr.jl:963
 [2] setprecision
   @ ./mpfr.jl:960
 [3] Type
   @ ./irrationals.jl:69
 [4] <
   @ ./irrationals.jl:96
 [5] _broadcast_getindex_evalf
   @ ./broadcast.jl:683
 [6] _broadcast_getindex
   @ ./broadcast.jl:656
 [7] getindex
   @ ./broadcast.jl:610
 [8] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call to an unknown function (call to julia.except_enter)
Stacktrace:
 [1] #setprecision#25
   @ ./mpfr.jl:963
 [2] setprecision
   @ ./mpfr.jl:960
 [3] Type
   @ ./irrationals.jl:69
 [4] <
   @ ./irrationals.jl:96
 [5] _broadcast_getindex_evalf
   @ ./broadcast.jl:683
 [6] _broadcast_getindex
   @ ./broadcast.jl:656
 [7] getindex
   @ ./broadcast.jl:610
 [8] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call to an unknown function (call to ijl_pop_handler)
Stacktrace:
 [1] #setprecision#25
   @ ./mpfr.jl:964
 [2] setprecision
   @ ./mpfr.jl:960
 [3] Type
   @ ./irrationals.jl:69
 [4] <
   @ ./irrationals.jl:96
 [5] _broadcast_getindex_evalf
   @ ./broadcast.jl:683
 [6] _broadcast_getindex
   @ ./broadcast.jl:656
 [7] getindex
   @ ./broadcast.jl:610
 [8] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to mpfr_custom_get_size)
Stacktrace:
  [1] _
    @ ./mpfr.jl:112
  [2] #BigFloat#1
    @ ./irrationals.jl:209
  [3] BigFloat (repeats 2 times)
    @ ./irrationals.jl:208
  [4] JuliaGPU/CUDA.jl#888
    @ ./irrationals.jl:70
  [5] #setprecision#25
    @ ./mpfr.jl:964
  [6] setprecision
    @ ./mpfr.jl:960
  [7] Type
    @ ./irrationals.jl:69
  [8] <
    @ ./irrationals.jl:96
  [9] _broadcast_getindex_evalf
    @ ./broadcast.jl:683
 [10] _broadcast_getindex
    @ ./broadcast.jl:656
 [11] getindex
    @ ./broadcast.jl:610
 [12] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to ijl_alloc_string)
Stacktrace:
  [1] _string_n
    @ ./strings/string.jl:90
  [2] _
    @ ./mpfr.jl:115
  [3] #BigFloat#1
    @ ./irrationals.jl:209
  [4] BigFloat (repeats 2 times)
    @ ./irrationals.jl:208
  [5] JuliaGPU/CUDA.jl#888
    @ ./irrationals.jl:70
  [6] #setprecision#25
    @ ./mpfr.jl:964
  [7] setprecision
    @ ./mpfr.jl:960
  [8] Type
    @ ./irrationals.jl:69
  [9] <
    @ ./irrationals.jl:96
 [10] _broadcast_getindex_evalf
    @ ./broadcast.jl:683
 [11] _broadcast_getindex
    @ ./broadcast.jl:656
 [12] getindex
    @ ./broadcast.jl:610
 [13] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to mpfr_const_pi)
Stacktrace:
  [1] #BigFloat#1
    @ ./irrationals.jl:210
  [2] BigFloat (repeats 2 times)
    @ ./irrationals.jl:208
  [3] JuliaGPU/CUDA.jl#888
    @ ./irrationals.jl:70
  [4] #setprecision#25
    @ ./mpfr.jl:964
  [5] setprecision
    @ ./mpfr.jl:960
  [6] Type
    @ ./irrationals.jl:69
  [7] <
    @ ./irrationals.jl:96
  [8] _broadcast_getindex_evalf
    @ ./broadcast.jl:683
  [9] _broadcast_getindex
    @ ./broadcast.jl:656
 [10] getindex
    @ ./broadcast.jl:610
 [11] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to mpfr_get_flt)
Stacktrace:
  [1] Float32
    @ ./mpfr.jl:344
  [2] Float32
    @ ./mpfr.jl:346
  [3] JuliaGPU/CUDA.jl#888
    @ ./irrationals.jl:70
  [4] #setprecision#25
    @ ./mpfr.jl:964
  [5] setprecision
    @ ./mpfr.jl:960
  [6] Type
    @ ./irrationals.jl:69
  [7] <
    @ ./irrationals.jl:96
  [8] _broadcast_getindex_evalf
    @ ./broadcast.jl:683
  [9] _broadcast_getindex
    @ ./broadcast.jl:656
 [10] getindex
    @ ./broadcast.jl:610
 [11] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/validation.jl:141
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/qdoh1/src/driver.jl:418 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/qdoh1/src/driver.jl:416 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/utils.jl:83
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.ThreadSafeContext)
    @ CUDA ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:354
  [7] JuliaGPU/CUDA.jl#224
    @ ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:347 [inlined]
  [8] LLVM.ThreadSafeContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}}})
    @ LLVM ~/.julia/packages/LLVM/9gCXO/src/executionengine/ts_module.jl:14
  [9] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/driver.jl:74
 [10] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:346
 [11] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/cache.jl:90
 [12] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:299
 [13] cufunction
    @ ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:292 [inlined]
 [14] macro expansion
    @ ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:102 [inlined]
 [15] #launch_heuristic#248
    @ ~/.julia/packages/CUDA/Ey3w2/src/gpuarrays.jl:17 [inlined]
 [16] _copyto!
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:63 [inlined]
 [17] copyto!
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:46 [inlined]
 [18] copy
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:37 [inlined]
 [19] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(<), Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Irrational{:π}}})
    @ Base.Broadcast ./broadcast.jl:873
 [20] top-level scope
    @ REPL[40]:1
 [21] top-level scope
    @ ~/.julia/packages/CUDA/Ey3w2/src/initialization.jl:52
@Tuebel Tuebel added the bug Something isn't working label Jan 11, 2023
@Tuebel Tuebel changed the title 1.9 InvalidIRError when numbers to irrational numbers 1.9 InvalidIRError when comparing numbers to irrational numbers Jan 11, 2023
@maleadt maleadt transferred this issue from JuliaGPU/CUDA.jl Jan 20, 2023
@maleadt
Copy link
Member

maleadt commented Jan 20, 2023

The conversion of Irrational to Float32 didn't meaningfully change, so I assume this is related to changing Float32(::AbstractIrrational, ::RoundingMode) (which uses BigFloat/MPFR) from @pure to @assume_effects :total in JuliaLang/julia#44776. IIUC, where we used to const-prop this constructor without much thinking, we now detect that down the chain some overlayed method may be called, which renders this IR unsafe to interpret during compilation.

MWE:

julia> f(x) = Float32(x,RoundDown)

julia> code_llvm(f, Tuple{Irrational{}})
;  @ REPL[6]:1 within `f`
define float @julia_f_175() #0 {
top:
  ret float 0x400921FB40000000
}

julia> CUDA.code_llvm(f, Tuple{Irrational{}})
;  @ REPL[6]:1 within `f`
define float @julia_f_496() local_unnamed_addr #0 {
top:
; ┌ @ irrationals.jl:69 within `Type`
; │┌ @ mpfr.jl:960 within `setprecision`
    %0 = call fastcc float @julia__setprecision_25_499() #0
; └└
  ret float %0
}

Fixing this isn't trivial. We probably need some more subtle notion of method overlays, where we vouch that the replacement method has the same behavior and thus the Julia compiler is allowed to interpret this IR using the original methods.

@aviatesk Is my understanding correct here?

@maleadt maleadt changed the title 1.9 InvalidIRError when comparing numbers to irrational numbers Overlay methods disabling IR interpreter breaks const-prop of GPU-incompatible code Jan 20, 2023
@maleadt
Copy link
Member

maleadt commented Jan 20, 2023

Another example is JuliaLang/julia#48097, where a Core.throw_inexacterror overlay broke codegen for certain kwargs invocations.

@aviatesk
Copy link
Contributor

Yeah, I guess most overlays used by GPU stack is safe to be ignored during compilation assuming those overlays aren't supposed to change the semantics of the original methods.

@aviatesk
Copy link
Contributor

We can close this issue now?

@maleadt
Copy link
Member

maleadt commented Mar 18, 2023

The underlying issue is still there, right? We just worked around kwcall triggering InexactError, but IIUC the fact remains that @overlay methods still inhibit certain parts of the optimizer, like the IR interpreter, or const-prop. I guess we want a way to express that an overlay should behave identically to the original method (so the compiler can evaluate that), or a way to indicate that the overlaid method is safe to interpret.

aviatesk added a commit to JuliaLang/julia that referenced this issue Aug 28, 2023
Previously we tainted `:nonoverlayed` bit of the callers of overlay-ed
methods by looking at the method match results, rather than tainting
the overlay-ed methods' effects themselves. This is a bit confusing
since it is not aligned with how the other effect bits are tainted.

Moreover, I am planning to allow `Base.@assume_effects`-override for
`:nonoverlayed` effect bit in the future to solve issues like
JuliaGPU/GPUCompiler.jl#384, and it would be necessary for the solution
to be functional that `:nonoverlayed` effect bit is tainted on the
callee-side as the other effect bits are.

This commit refactors the compiler internal so that we taint
`:nonoverlayed` bit of overlay-ed methods and propagate it to callers.
It turns out that this refactor simplifies the internal implementations
a lot.
aviatesk added a commit to JuliaLang/julia that referenced this issue Aug 28, 2023
Certain external `AbstractInterpreter`s like GPUCompiler.jl have long
been wanting to allow concrete eval for certain overlay-ed methods to
get a best inference accuracy, although it is currently prohibited.
It should be safe when an overlay-ed method has the same semantics
as the original method and its result can be safely replaced by the
result of the original method. See JuliaGPU/GPUCompiler.jl#384 for
examples.

To address this issue, this commit allows `@assume_effects`-override of
`:nonoverlayed` effect bit. On top of #51078, the override of
`:nonoverlayed` works as like the other effect bits, and so external
`AbstractInterpreter` can use it to allow concrete eval of annotated
overlay-ed methods:
```julia
@overlay OVERLAY_MT Base.@assume_effects :nonoverlayed f(x) = [...]
```

Now it sounds awkward to annotate `Base.@assume_effects :nonoverlayed`
on `@overlay`-ed method. We likely want to rename `:nonoverlayed` to
`native_executable` or something more reasonable.

Certain external `AbstractInterpreters`, such as GPUCompiler.jl, have
long sought the ability to allow concrete evaluation for specific
overlay-ed methods to achieve optimal inference accuracy. This is
currently not permitted, although it should be safe when an overlay-ed
method has the same semantics as the original method, and its result can
be safely replaced with the result of the original method. Refer to
JuliaGPU/GPUCompiler.jl#384 for more examples.

To address this issue, this commit introduces the capability to override
the `:nonoverlayed` effect bit using `@assume_effects`. With the
enhancements in PR #51078, this override behaves similarly to other
effect bits. Consequently, external `AbstractInterpreters` can utilize
this feature to permit concrete evaluation for annotated overlay-ed
methods, e.g. `@overlay OVERLAY_MT Base.@assume_effects :nonoverlayed f(x) = [...]`.

However, it now seems awkward to annotate a method with `Base.@assume_effects :nonoverlayed`
when it is actually marked with `@overlay`. A more intuitive terminology,
like `native_executable`, might be more appropriate for renaming the
`:nonoverlayed` effect bit.
aviatesk added a commit to JuliaLang/julia that referenced this issue Aug 28, 2023
Certain external `AbstractInterpreters`, such as GPUCompiler.jl, have
long sought the ability to allow concrete evaluation for specific
overlay-ed methods to achieve optimal inference accuracy. This is
currently not permitted, although it should be safe when an overlay-ed
method has the same semantics as the original method, and its result can
be safely replaced with the result of the original method. Refer to
JuliaGPU/GPUCompiler.jl#384 for more examples.

To address this issue, this commit introduces the capability to override
the `:nonoverlayed` effect bit using `@assume_effects`. With the
enhancements in PR #51078, this override behaves similarly to other
effect bits. Consequently, external `AbstractInterpreters` can utilize
this feature to permit concrete evaluation for annotated overlay-ed
methods, e.g.
```julia
@overlay OVERLAY_MT Base.@assume_effects :nonoverlayed f(x) = [...]
```

However, it now seems awkward to annotate a method with `Base.@assume_effects :nonoverlayed`
when it is actually marked with `@overlay`. A more intuitive terminology,
like `native_executable`, might be more appropriate for renaming the
`:nonoverlayed` effect bit.
aviatesk added a commit to JuliaLang/julia that referenced this issue Aug 29, 2023
Previously we tainted `:nonoverlayed` bit of the callers of overlay-ed
methods by looking at the method match results, rather than tainting
the overlay-ed methods' effects themselves. This is a bit confusing
since it is not aligned with how the other effect bits are tainted.

Moreover, I am planning to allow `Base.@assume_effects`-override for
`:nonoverlayed` effect bit in the future to solve issues like
JuliaGPU/GPUCompiler.jl#384, and it would be necessary for the solution
to be functional that `:nonoverlayed` effect bit is tainted on the
callee-side as the other effect bits are.

This commit refactors the compiler internal so that we taint
`:nonoverlayed` bit of overlay-ed methods and propagate it to callers.
It turns out that this refactor simplifies the internal implementations
a lot.
aviatesk added a commit to JuliaLang/julia that referenced this issue Aug 29, 2023
Certain external `AbstractInterpreters`, such as GPUCompiler.jl, have
long sought the ability to allow concrete evaluation for specific
overlay-ed methods to achieve optimal inference accuracy. This is
currently not permitted, although it should be safe when an overlay-ed
method has the same semantics as the original method, and its result can
be safely replaced with the result of the original method. Refer to
JuliaGPU/GPUCompiler.jl#384 for more examples.

To address this issue, this commit introduces the capability to override
the `:nonoverlayed` effect bit using `@assume_effects`. With the
enhancements in PR #51078, this override behaves similarly to other
effect bits. Consequently, external `AbstractInterpreters` can utilize
this feature to permit concrete evaluation for annotated overlay-ed
methods, e.g.
```julia
@overlay OVERLAY_MT Base.@assume_effects :nonoverlayed f(x) = [...]
```

However, it now seems awkward to annotate a method with `Base.@assume_effects :nonoverlayed`
when it is actually marked with `@overlay`. A more intuitive terminology,
like `native_executable`, might be more appropriate for renaming the
`:nonoverlayed` effect bit.
aviatesk added a commit to JuliaLang/julia that referenced this issue Aug 30, 2023
Certain external `AbstractInterpreters`, such as GPUCompiler.jl, have
long sought the ability to allow concrete evaluation for specific
overlay-ed methods to achieve optimal inference accuracy. This is
currently not permitted, although it should be safe when an overlay-ed
method has the same semantics as the original method, and its result can
be safely replaced with the result of the original method. Refer to
JuliaGPU/GPUCompiler.jl#384 for more examples.

To address this issue, this commit introduces the capability to override
the `:nonoverlayed` effect bit using `@assume_effects`. With the
enhancements in PR #51078, this override behaves similarly to other
effect bits. Consequently, external `AbstractInterpreters` can utilize
this feature to permit concrete evaluation for annotated overlay-ed
methods, e.g.
```julia
@overlay OVERLAY_MT Base.@assume_effects :nonoverlayed f(x) = [...]
```

However, it now seems awkward to annotate a method with `Base.@assume_effects :nonoverlayed`
when it is actually marked with `@overlay`. A more intuitive terminology,
like `native_executable`, might be more appropriate for renaming the
`:nonoverlayed` effect bit.
@maleadt
Copy link
Member

maleadt commented Jan 17, 2024

Moving the discussion upstream: JuliaLang/julia#52940

@maleadt maleadt closed this as completed Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants