Overlay methods disabling IR interpreter breaks const-prop of GPU-incompatible code #384

Tuebel · 2023-01-11T09:50:55Z

Describe the bug

When comparing irrational numbers, the kernel compilation throws an InvalidIRError.
This does not happen in Julia 1.8, only in Julia 1.9.

To reproduce

The Minimal Working Example (MWE) for this bug:

julia> A = 4 * CUDA.rand(3,3)
3×3 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 2.82133   2.05078   3.8729
 3.14324   0.218199  3.05251
 0.227112  3.68884   3.91693

julia> A .< π
ERROR: InvalidIRError: compiling kernel #broadcast_kernel#17(CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to mpfr_signbit)

However, explicit conversion works:

julia> A .< Float32.(CUDA.fill(π,3,3))
3×3 CuArray{Bool, 2, CUDA.Mem.DeviceBuffer}:
 0  1  1
 1  1  0
 1  0  0

I am using a temp environment where only CUDA is installed:

(jl_bFolVV) pkg> status
Status `/tmp/jl_bFolVV/Project.toml`
  [052768ef] CUDA v3.12.1

Expected behavior

I get the expected result without errors in Julia 1.8:

julia> A = 4 * CUDA.rand(3,3)
3×3 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.459628  1.38646  3.23488
 3.10404   1.11073  3.11335
 1.95598   3.8854   2.79495

julia> A .< π
3×3 CuArray{Bool, 2, CUDA.Mem.DeviceBuffer}:
 1  1  0
 1  1  1
 1  0  1

Version info

Details on Julia:

julia> versioninfo()
Julia Version 1.9.0-beta2
Commit 7daffeecb8c (2022-12-29 07:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × AMD Ryzen 9 3900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
  Threads: 1 on 24 virtual cores
Environment:
  LD_LIBRARY_PATH = /opt/ros/noetic/lib

Details on CUDA:

julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 515.86.1, for CUDA 11.7
CUDA driver 11.7

Libraries: 
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+515.86.1
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.9.0-beta2
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 3080 (sm_86, 1.470 GiB / 10.000 GiB available)

Additional context
Full Stacktrace:

  [1] signbit
    @ ./mpfr.jl:811
  [2] _cpynansgn
    @ ./mpfr.jl:338
  [3] Float32
    @ ./mpfr.jl:344
  [4] Float32
    @ ./mpfr.jl:346
  [5] JuliaGPU/CUDA.jl#888
    @ ./irrationals.jl:70
  [6] #setprecision#25
    @ ./mpfr.jl:964
  [7] setprecision
    @ ./mpfr.jl:960
  [8] Type
    @ ./irrationals.jl:69
  [9] <
    @ ./irrationals.jl:96
 [10] _broadcast_getindex_evalf
    @ ./broadcast.jl:683
 [11] _broadcast_getindex
    @ ./broadcast.jl:656
 [12] getindex
    @ ./broadcast.jl:610
 [13] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to ijl_rethrow)
Stacktrace:
 [1] rethrow
   @ ./error.jl:61
 [2] #setprecision#25
   @ ./mpfr.jl:966
 [3] setprecision
   @ ./mpfr.jl:960
 [4] Type
   @ ./irrationals.jl:69
 [5] <
   @ ./irrationals.jl:96
 [6] _broadcast_getindex_evalf
   @ ./broadcast.jl:683
 [7] _broadcast_getindex
   @ ./broadcast.jl:656
 [8] getindex
   @ ./broadcast.jl:610
 [9] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call to an unknown function (call to ijl_excstack_state)
Stacktrace:
 [1] #setprecision#25
   @ ./mpfr.jl:963
 [2] setprecision
   @ ./mpfr.jl:960
 [3] Type
   @ ./irrationals.jl:69
 [4] <
   @ ./irrationals.jl:96
 [5] _broadcast_getindex_evalf
   @ ./broadcast.jl:683
 [6] _broadcast_getindex
   @ ./broadcast.jl:656
 [7] getindex
   @ ./broadcast.jl:610
 [8] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call to an unknown function (call to julia.except_enter)
Stacktrace:
 [1] #setprecision#25
   @ ./mpfr.jl:963
 [2] setprecision
   @ ./mpfr.jl:960
 [3] Type
   @ ./irrationals.jl:69
 [4] <
   @ ./irrationals.jl:96
 [5] _broadcast_getindex_evalf
   @ ./broadcast.jl:683
 [6] _broadcast_getindex
   @ ./broadcast.jl:656
 [7] getindex
   @ ./broadcast.jl:610
 [8] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call to an unknown function (call to ijl_pop_handler)
Stacktrace:
 [1] #setprecision#25
   @ ./mpfr.jl:964
 [2] setprecision
   @ ./mpfr.jl:960
 [3] Type
   @ ./irrationals.jl:69
 [4] <
   @ ./irrationals.jl:96
 [5] _broadcast_getindex_evalf
   @ ./broadcast.jl:683
 [6] _broadcast_getindex
   @ ./broadcast.jl:656
 [7] getindex
   @ ./broadcast.jl:610
 [8] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to mpfr_custom_get_size)
Stacktrace:
  [1] _
    @ ./mpfr.jl:112
  [2] #BigFloat#1
    @ ./irrationals.jl:209
  [3] BigFloat (repeats 2 times)
    @ ./irrationals.jl:208
  [4] JuliaGPU/CUDA.jl#888
    @ ./irrationals.jl:70
  [5] #setprecision#25
    @ ./mpfr.jl:964
  [6] setprecision
    @ ./mpfr.jl:960
  [7] Type
    @ ./irrationals.jl:69
  [8] <
    @ ./irrationals.jl:96
  [9] _broadcast_getindex_evalf
    @ ./broadcast.jl:683
 [10] _broadcast_getindex
    @ ./broadcast.jl:656
 [11] getindex
    @ ./broadcast.jl:610
 [12] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to ijl_alloc_string)
Stacktrace:
  [1] _string_n
    @ ./strings/string.jl:90
  [2] _
    @ ./mpfr.jl:115
  [3] #BigFloat#1
    @ ./irrationals.jl:209
  [4] BigFloat (repeats 2 times)
    @ ./irrationals.jl:208
  [5] JuliaGPU/CUDA.jl#888
    @ ./irrationals.jl:70
  [6] #setprecision#25
    @ ./mpfr.jl:964
  [7] setprecision
    @ ./mpfr.jl:960
  [8] Type
    @ ./irrationals.jl:69
  [9] <
    @ ./irrationals.jl:96
 [10] _broadcast_getindex_evalf
    @ ./broadcast.jl:683
 [11] _broadcast_getindex
    @ ./broadcast.jl:656
 [12] getindex
    @ ./broadcast.jl:610
 [13] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to mpfr_const_pi)
Stacktrace:
  [1] #BigFloat#1
    @ ./irrationals.jl:210
  [2] BigFloat (repeats 2 times)
    @ ./irrationals.jl:208
  [3] JuliaGPU/CUDA.jl#888
    @ ./irrationals.jl:70
  [4] #setprecision#25
    @ ./mpfr.jl:964
  [5] setprecision
    @ ./mpfr.jl:960
  [6] Type
    @ ./irrationals.jl:69
  [7] <
    @ ./irrationals.jl:96
  [8] _broadcast_getindex_evalf
    @ ./broadcast.jl:683
  [9] _broadcast_getindex
    @ ./broadcast.jl:656
 [10] getindex
    @ ./broadcast.jl:610
 [11] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Reason: unsupported call through a literal pointer (call to mpfr_get_flt)
Stacktrace:
  [1] Float32
    @ ./mpfr.jl:344
  [2] Float32
    @ ./mpfr.jl:346
  [3] JuliaGPU/CUDA.jl#888
    @ ./irrationals.jl:70
  [4] #setprecision#25
    @ ./mpfr.jl:964
  [5] setprecision
    @ ./mpfr.jl:960
  [6] Type
    @ ./irrationals.jl:69
  [7] <
    @ ./irrationals.jl:96
  [8] _broadcast_getindex_evalf
    @ ./broadcast.jl:683
  [9] _broadcast_getindex
    @ ./broadcast.jl:656
 [10] getindex
    @ ./broadcast.jl:610
 [11] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:57
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/validation.jl:141
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/qdoh1/src/driver.jl:418 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/qdoh1/src/driver.jl:416 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/utils.jl:83
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.ThreadSafeContext)
    @ CUDA ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:354
  [7] JuliaGPU/CUDA.jl#224
    @ ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:347 [inlined]
  [8] LLVM.ThreadSafeContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}}})
    @ LLVM ~/.julia/packages/LLVM/9gCXO/src/executionengine/ts_module.jl:14
  [9] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/driver.jl:74
 [10] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:346
 [11] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/cache.jl:90
 [12] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Bool, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(<), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Irrational{:π}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:299
 [13] cufunction
    @ ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:292 [inlined]
 [14] macro expansion
    @ ~/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:102 [inlined]
 [15] #launch_heuristic#248
    @ ~/.julia/packages/CUDA/Ey3w2/src/gpuarrays.jl:17 [inlined]
 [16] _copyto!
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:63 [inlined]
 [17] copyto!
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:46 [inlined]
 [18] copy
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:37 [inlined]
 [19] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(<), Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Irrational{:π}}})
    @ Base.Broadcast ./broadcast.jl:873
 [20] top-level scope
    @ REPL[40]:1
 [21] top-level scope
    @ ~/.julia/packages/CUDA/Ey3w2/src/initialization.jl:52

The text was updated successfully, but these errors were encountered:

maleadt · 2023-01-20T08:46:34Z

The conversion of Irrational to Float32 didn't meaningfully change, so I assume this is related to changing Float32(::AbstractIrrational, ::RoundingMode) (which uses BigFloat/MPFR) from @pure to @assume_effects :total in JuliaLang/julia#44776. IIUC, where we used to const-prop this constructor without much thinking, we now detect that down the chain some overlayed method may be called, which renders this IR unsafe to interpret during compilation.

MWE:

julia> f(x) = Float32(x,RoundDown)

julia> code_llvm(f, Tuple{Irrational{:π}})
;  @ REPL[6]:1 within `f`
define float @julia_f_175() #0 {
top:
  ret float 0x400921FB40000000
}

julia> CUDA.code_llvm(f, Tuple{Irrational{:π}})
;  @ REPL[6]:1 within `f`
define float @julia_f_496() local_unnamed_addr #0 {
top:
; ┌ @ irrationals.jl:69 within `Type`
; │┌ @ mpfr.jl:960 within `setprecision`
    %0 = call fastcc float @julia__setprecision_25_499() #0
; └└
  ret float %0
}

Fixing this isn't trivial. We probably need some more subtle notion of method overlays, where we vouch that the replacement method has the same behavior and thus the Julia compiler is allowed to interpret this IR using the original methods.

@aviatesk Is my understanding correct here?

maleadt · 2023-01-20T08:48:18Z

Another example is JuliaLang/julia#48097, where a Core.throw_inexacterror overlay broke codegen for certain kwargs invocations.

aviatesk · 2023-01-20T09:12:22Z

Yeah, I guess most overlays used by GPU stack is safe to be ignored during compilation assuming those overlays aren't supposed to change the semantics of the original methods.

aviatesk · 2023-03-18T04:55:04Z

We can close this issue now?

maleadt · 2023-03-18T08:43:40Z

The underlying issue is still there, right? We just worked around kwcall triggering InexactError, but IIUC the fact remains that @overlay methods still inhibit certain parts of the optimizer, like the IR interpreter, or const-prop. I guess we want a way to express that an overlay should behave identically to the original method (so the compiler can evaluate that), or a way to indicate that the overlaid method is safe to interpret.

Previously we tainted `:nonoverlayed` bit of the callers of overlay-ed methods by looking at the method match results, rather than tainting the overlay-ed methods' effects themselves. This is a bit confusing since it is not aligned with how the other effect bits are tainted. Moreover, I am planning to allow `Base.@assume_effects`-override for `:nonoverlayed` effect bit in the future to solve issues like JuliaGPU/GPUCompiler.jl#384, and it would be necessary for the solution to be functional that `:nonoverlayed` effect bit is tainted on the callee-side as the other effect bits are. This commit refactors the compiler internal so that we taint `:nonoverlayed` bit of overlay-ed methods and propagate it to callers. It turns out that this refactor simplifies the internal implementations a lot.

@overlay

Certain external `AbstractInterpreter`s like GPUCompiler.jl have long been wanting to allow concrete eval for certain overlay-ed methods to get a best inference accuracy, although it is currently prohibited. It should be safe when an overlay-ed method has the same semantics as the original method and its result can be safely replaced by the result of the original method. See JuliaGPU/GPUCompiler.jl#384 for examples. To address this issue, this commit allows `@assume_effects`-override of `:nonoverlayed` effect bit. On top of #51078, the override of `:nonoverlayed` works as like the other effect bits, and so external `AbstractInterpreter` can use it to allow concrete eval of annotated overlay-ed methods: ```julia @overlay OVERLAY_MT Base.@assume_effects :nonoverlayed f(x) = [...] ``` Now it sounds awkward to annotate `Base.@assume_effects :nonoverlayed` on `@overlay`-ed method. We likely want to rename `:nonoverlayed` to `native_executable` or something more reasonable. Certain external `AbstractInterpreters`, such as GPUCompiler.jl, have long sought the ability to allow concrete evaluation for specific overlay-ed methods to achieve optimal inference accuracy. This is currently not permitted, although it should be safe when an overlay-ed method has the same semantics as the original method, and its result can be safely replaced with the result of the original method. Refer to JuliaGPU/GPUCompiler.jl#384 for more examples. To address this issue, this commit introduces the capability to override the `:nonoverlayed` effect bit using `@assume_effects`. With the enhancements in PR #51078, this override behaves similarly to other effect bits. Consequently, external `AbstractInterpreters` can utilize this feature to permit concrete evaluation for annotated overlay-ed methods, e.g. `@overlay OVERLAY_MT Base.@assume_effects :nonoverlayed f(x) = [...]`. However, it now seems awkward to annotate a method with `Base.@assume_effects :nonoverlayed` when it is actually marked with `@overlay`. A more intuitive terminology, like `native_executable`, might be more appropriate for renaming the `:nonoverlayed` effect bit.

@overlay

Certain external `AbstractInterpreters`, such as GPUCompiler.jl, have long sought the ability to allow concrete evaluation for specific overlay-ed methods to achieve optimal inference accuracy. This is currently not permitted, although it should be safe when an overlay-ed method has the same semantics as the original method, and its result can be safely replaced with the result of the original method. Refer to JuliaGPU/GPUCompiler.jl#384 for more examples. To address this issue, this commit introduces the capability to override the `:nonoverlayed` effect bit using `@assume_effects`. With the enhancements in PR #51078, this override behaves similarly to other effect bits. Consequently, external `AbstractInterpreters` can utilize this feature to permit concrete evaluation for annotated overlay-ed methods, e.g. ```julia @overlay OVERLAY_MT Base.@assume_effects :nonoverlayed f(x) = [...] ``` However, it now seems awkward to annotate a method with `Base.@assume_effects :nonoverlayed` when it is actually marked with `@overlay`. A more intuitive terminology, like `native_executable`, might be more appropriate for renaming the `:nonoverlayed` effect bit.

Previously we tainted `:nonoverlayed` bit of the callers of overlay-ed methods by looking at the method match results, rather than tainting the overlay-ed methods' effects themselves. This is a bit confusing since it is not aligned with how the other effect bits are tainted. Moreover, I am planning to allow `Base.@assume_effects`-override for `:nonoverlayed` effect bit in the future to solve issues like JuliaGPU/GPUCompiler.jl#384, and it would be necessary for the solution to be functional that `:nonoverlayed` effect bit is tainted on the callee-side as the other effect bits are. This commit refactors the compiler internal so that we taint `:nonoverlayed` bit of overlay-ed methods and propagate it to callers. It turns out that this refactor simplifies the internal implementations a lot.

@overlay

Certain external `AbstractInterpreters`, such as GPUCompiler.jl, have long sought the ability to allow concrete evaluation for specific overlay-ed methods to achieve optimal inference accuracy. This is currently not permitted, although it should be safe when an overlay-ed method has the same semantics as the original method, and its result can be safely replaced with the result of the original method. Refer to JuliaGPU/GPUCompiler.jl#384 for more examples. To address this issue, this commit introduces the capability to override the `:nonoverlayed` effect bit using `@assume_effects`. With the enhancements in PR #51078, this override behaves similarly to other effect bits. Consequently, external `AbstractInterpreters` can utilize this feature to permit concrete evaluation for annotated overlay-ed methods, e.g. ```julia @overlay OVERLAY_MT Base.@assume_effects :nonoverlayed f(x) = [...] ``` However, it now seems awkward to annotate a method with `Base.@assume_effects :nonoverlayed` when it is actually marked with `@overlay`. A more intuitive terminology, like `native_executable`, might be more appropriate for renaming the `:nonoverlayed` effect bit.

@overlay

Certain external `AbstractInterpreters`, such as GPUCompiler.jl, have long sought the ability to allow concrete evaluation for specific overlay-ed methods to achieve optimal inference accuracy. This is currently not permitted, although it should be safe when an overlay-ed method has the same semantics as the original method, and its result can be safely replaced with the result of the original method. Refer to JuliaGPU/GPUCompiler.jl#384 for more examples. To address this issue, this commit introduces the capability to override the `:nonoverlayed` effect bit using `@assume_effects`. With the enhancements in PR #51078, this override behaves similarly to other effect bits. Consequently, external `AbstractInterpreters` can utilize this feature to permit concrete evaluation for annotated overlay-ed methods, e.g. ```julia @overlay OVERLAY_MT Base.@assume_effects :nonoverlayed f(x) = [...] ``` However, it now seems awkward to annotate a method with `Base.@assume_effects :nonoverlayed` when it is actually marked with `@overlay`. A more intuitive terminology, like `native_executable`, might be more appropriate for renaming the `:nonoverlayed` effect bit.

maleadt · 2024-01-17T10:22:20Z

Moving the discussion upstream: JuliaLang/julia#52940

Tuebel added the bug Something isn't working label Jan 11, 2023

Tuebel changed the title ~~1.9 InvalidIRError when numbers to irrational numbers~~ 1.9 InvalidIRError when comparing numbers to irrational numbers Jan 11, 2023

maleadt transferred this issue from JuliaGPU/CUDA.jl Jan 20, 2023

maleadt changed the title ~~1.9 InvalidIRError when comparing numbers to irrational numbers~~ Overlay methods disabling IR interpreter breaks const-prop of GPU-incompatible code Jan 20, 2023

maleadt mentioned this issue Feb 10, 2023

Support for setting floating-point rounding modes JuliaGPU/CUDA.jl#1765

Open

This was referenced Mar 9, 2023

Bump GPUCompiler. JuliaGPU/CUDA.jl#1787

Merged

kwcall codegen regression on 1.9-rc1 #399

Closed

Tuebel mentioned this issue Aug 22, 2023

AbstractIrrational does not play nice with CUDA JuliaStats/LogExpFunctions.jl#73

Closed

Red-Portal mentioned this issue Aug 25, 2023

Comparison operators with AbstractIrrational are GPU incompatible JuliaLang/julia#51058

Open

aviatesk mentioned this issue Aug 28, 2023

effects: taint overlay-ed method's :nonoverlayed effect bit JuliaLang/julia#51078

Merged

aviatesk mentioned this issue Aug 28, 2023

effects: allow override of :nonoverlayed effect bit (and rename it to :native_executable) JuliaLang/julia#51080

Closed

This was referenced Jan 16, 2024

Fixes for nightly JuliaGPU/CUDA.jl#2240

Merged

Improve optimization of overlay methods JuliaLang/julia#52940

Open

maleadt closed this as completed Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overlay methods disabling IR interpreter breaks const-prop of GPU-incompatible code #384

Overlay methods disabling IR interpreter breaks const-prop of GPU-incompatible code #384

Tuebel commented Jan 11, 2023

maleadt commented Jan 20, 2023

maleadt commented Jan 20, 2023

aviatesk commented Jan 20, 2023

aviatesk commented Mar 18, 2023

maleadt commented Mar 18, 2023

maleadt commented Jan 17, 2024

Overlay methods disabling IR interpreter breaks const-prop of GPU-incompatible code #384

Overlay methods disabling IR interpreter breaks const-prop of GPU-incompatible code #384

Comments

Tuebel commented Jan 11, 2023

maleadt commented Jan 20, 2023

maleadt commented Jan 20, 2023

aviatesk commented Jan 20, 2023

aviatesk commented Mar 18, 2023

maleadt commented Mar 18, 2023

maleadt commented Jan 17, 2024